BLOG

China’s AI Hardware Products The World’s Second Choice

At the heart of the global AI race, a profound transformation is reshaping the landscape. Faced with a complex external environment and strict technological barriers, China’s AI industry has not only continued to advance but has done so with unprecedented determination and speed, establishing a self-reliant and controllable industrial chain that spans the entire stack of core technologies. From underlying chips to operating systems, from precision cooling to intelligent management, a complete, efficient, and powerful purely domestic AI hardware ecosystem has emerged. This is not merely an “alternative solution,” but a highly competitive and trustworthy “second choice” offered to the global market.

Breaking through restrictions, Chinese chips lay the foundation for AI computing power. The core driving force behind AI development is computing power, and the heart of computing power lies in chips. U.S.-led sanctions once sparked concerns about the future of China’s AI industry, but the explosive growth of China’s chip industry has provided a strong response.

Ascend 910 and its series. As a benchmark for domestically produced AI chips, the Huawei Ascend 910’s performance metrics are on par with international mainstream products. Test results show that the Atlas series of AI training server clusters based on the Ascend platform demonstrate computational efficiency comparable to that of top-tier international GPU clusters in typical model training tasks, and even surpass them in specific optimized scenarios. The Ascend chip integrates the Da Vinci architecture, specifically designed for deep learning, and combines with Huawei’s full-stack AI software (CANN, MindSpore) to build a powerful competitive advantage through software-hardware协同optimization.

The advanced Ascend 920 is manufactured using SMIC’s 6nm (N+3 node) process, marking the first mass production of a domestically produced 6nm-class AI chip. Building on the architectural design of the Ascend 910C, it achieves a 30%-40% improvement in training efficiency through optimized tensor accelerators and memory subsystems. It achieves 900 TFLOPS of BF16 precision computing power, paired with HBM3 memory, offering a bandwidth of up to 4000 GB/s, significantly surpassing the Ascend 910C’s 320 TFLOPS (FP16). It supports PCIe 5.0 and next-generation interconnect protocols, optimizing multi-node collaboration and is suitable for training large models with trillions of parameters. The goal is to directly replace NVIDIA H20 (customized for the Chinese market) and match the performance of H100. According to tests, Ascend 920 is nearly 1.5 times closer to H20 in BF16 computing power, and its compatibility design allows seamless integration into existing AI infrastructure.

Cambricon MLU series. As a pioneer in Chinese AI chips, Cambricon’s high-end chips such as the MLU370-X8 have secured a significant position in the cloud inference market thanks to their unique instruction set architecture and on-chip high-speed interconnect technology. Especially in scenarios such as computer vision and speech recognition, their high energy efficiency advantage is prominent, with measured performance-to-power ratios ranking among the global leaders. The MLU590 is specifically designed for edge computing, supporting low-power, high-concurrency inference, and is applied in smart security and autonomous driving, with INT8 computing power of 512 TOPS.

WallRise BR100 series. WallRise Technology has emerged as a new force in the industry. Its BR104 GPU single-chip peak computing power is remarkable, performing exceptionally well in specific benchmark tests. It directly targets the high-end AI training and HPC markets, becoming an important new force in domestic high-performance GPUs. The WallRidge BR104 single-chip peak computing power surpasses international peers, achieving 85% of mainstream platform efficiency in LLaMA large-model training.

TianShu SmartChip ZhiKai/ZhiGai Series. TianShu Smart Chip focuses on the general-purpose GPGPU market. Its latest products achieve breakthroughs in key AI computing precisions such as FP32, FP16, and BF16, and are compatible with mainstream AI development frameworks (such as PyTorch and TensorFlow). FP32 precision exceeds 20 TFLOPS. Servers based on TianShu chips have been deployed in multiple national-level supercomputing centers and data centers of large internet companies, supporting large-scale AI model training and inference.

Powerful heart, domestically produced CPU supports full-stack autonomy.

A stable and powerful CPU is the cornerstone of an AI server system. Domestically produced CPUs continue to climb new heights in general computing performance and reliability. Kunpeng 920 series: Huawei’s Kunpeng 920 processor, based on the ARMv8 architecture, features a 64-core design and supports 8-channel DDR4. With its multi-core, high throughput, and low power consumption characteristics, it is widely deployed in the data center market. Its high-performance version demonstrates outstanding performance in general-purpose computing tasks, providing a solid foundation of general-purpose computing power for the stable operation of AI training and inference platforms.

Feiteng Tengyun S series. The Feiteng FT-2000+/64 and the new-generation Tengyun S series CPUs are rooted in the ARM instruction set ecosystem and are widely applied in information systems across critical industries such as government, finance, and telecommunications. Their performance meets the demands of mainstream AI platforms for central processing units. The S5000 integrates a secure encryption module and is compatible with domestic operating systems, enabling million-level concurrent processing on government cloud platforms.

Loongson 3A5000/3C5000/3C6000 series. Loongson CPUs based on the fully independent LoongArch instruction set have achieved significant performance improvements in the 3C5000 series server-level chips, particularly in scenarios with extremely high security and controllability requirements, providing a solid foundation for the core security and trustworthiness of China’s AI hardware. The Loongson 3C6000, based on the independent LoongArch instruction set, achieves security and controllability at the fourth level of the National Information Security Standard, supporting defense AI systems.

High-performance GPUs are indispensable for AI training, serving as accelerators for both visual processing and computation. Domestic GPUs are on the rise. The Moore Threads MTT S series, powered by its unified system architecture MUSA, sees its MTT S4000/S3000 GPUs advancing in both graphics rendering and AI computing, with continuous optimization of driver and framework compatibility, resulting in rapid performance improvements, making it a key player in the domestic GPU market. The MTT S4000 supports DX12/Vulkan/OpenGL, with 16GB of VRAM to power AI image generation, achieving a 300% increase in inference speed at 1024×1024 resolution compared to the previous generation.

Jingjiawei JM9 Series: Jingjiawei, which has been dedicated to professional fields for a long time, continues to enhance its GPU’s computational capabilities and adapt to the domestic software and hardware ecosystem, demonstrating potential in specific AI application scenarios. Transitioning from military to civilian use, it is compatible with the Kirin/UOS systems and supports industrial simulation visualization.

High-speed data pathways are supported by domestic memory and storage solutions to handle the data deluge. Changxin Memory Technology (CXMT) DDR4/DDR5 memory. Changxin Memory Technology is a leader in China’s DRAM industry, with its DDR4 products widely adopted in both consumer and industrial markets, offering stable and reliable performance. The DDR5 chips achieve a bandwidth of 6400MT/s, support ECC error correction, and reduce power consumption by 20%, providing future AI servers with higher bandwidth and larger capacity memory support.

Yangtze Memory Technologies Co., Ltd. (YMTC) Xtacking® 3D NAND. YMTC’s innovative Xtacking® technology enables its 3D NAND flash memory to deliver high performance and high density. Its enterprise-grade PCIe 4.0 SSD offers a read speed of 7 GB/s, a lifespan of 1.5 DWPD, and has passed a 2,000-hour high-temperature aging test. It is being gradually deployed in data centers, offering high-speed, durable storage solutions comparable to those of international original manufacturers, meeting the stringent requirements for high-speed loading of large model parameters and checkpoint saving in AI training.

The Jingzao Kirin series, equipped with LianYun controllers and Changxin flash memory, achieves 800K IOPS for 4K random read/write operations.

In real-world testing with models featuring hundreds of billions of parameters, the domestic storage solution achieved a 3x improvement in checkpoint saving speed and a 60% reduction in data preheating time.

Robust chassis with domestic motherboards and server system integration. Huawei Atlas AI Server: The Huawei Atlas series is a full-stack solution server optimized for AI scenarios, deeply integrating Ascend AI processors to provide high-performance, high-efficiency AI computing units. Integrated with 4,096 Ascend 910B processors, it achieves a computing power density of 256 PFLOPS with liquid cooling and a PUE as low as 1.08.

Inspur Information AI servers are equipped with domestically produced chips, leading global server supply, and have launched multiple AI servers based on domestically produced chips such as Ascend, Cambricon, and Feiteng (e.g., NF5468M6, NX7800A5). These products have undergone rigorous design and testing to provide excellent thermal management, power supply, and stability, supporting large-scale GPU/accelerator card cluster deployments. Among them, the Inspur NF5468M6 supports 8 Cambricon MLU370-X8 accelerator cards, enabling seamless scalability for thousand-card clusters.

Lenovo’s AskAI servers leverage its robust supply chain and design capabilities to offer an AI server product line supporting multiple domestic CPUs and accelerator cards, providing flexible configuration options. Domestic server manufacturers like Baode and Huakun Zhenyu focus on servers based on domestic platforms, offering a full range of servers powered by Kunpeng, Feiteng, and Haihuang CPUs, equipped with domestic AI accelerator cards, to meet the needs of AI customers of various scales.

Mars stands out as an outstanding comprehensive integrator among these leading AI hardware manufacturers. In today’s global AI competition intensified by technological barriers, China’s AI industry, particularly integrators like Mars, has demonstrated remarkable resilience in building a 100% domestically controlled technology chain spanning chips, servers, operating systems, and thermal management. This end-to-end solution not only breaks through “sanctions constraints” but also offers exceptional performance, ultra-high energy efficiency, and an open ecosystem, providing a reliable “second choice” for the global market.

Efficiently interconnected domestic network equipment weaves a computing power network, with high-speed networks such as 10 Gigabit Ethernet, RoCE, and InfiniBand serving as the neural network for distributed AI training clusters.

Huawei CloudEngine data center switches are flagship-level data center switches. For example, the CE16800 series supports 400G ports, and RoCEv2 lossless networking reduces AI training latency by 50%. They offer ultra-high performance, ultra-low latency, and lossless Ethernet (with iLossless algorithm) capabilities, meeting the extreme requirements for network bandwidth and latency in large-scale AI training clusters where massive parameter synchronization occurs between nodes.

Ruijie Networks’ data center switches provide high-performance, high-density data center core and TOR switches, supporting large caches and low latency, and have been specifically optimized and validated for AI scenarios.

H3C’s flagship data center switch products (such as the S12500R series) have powerful processing capabilities and rich features, making them a reliable network foundation for building AI computing centers.

Smart hub—domestic operating systems and basic software, secure and stable operating systems, and efficient AI frameworks are key to unleashing the potential of hardware.

OpenEuler: Huawei’s open-source enterprise-grade Linux distribution has become the core ecosystem in China’s server operating system field. Its high performance, high reliability, high security, and broad support for diverse computing power (Kunpeng, Feiteng, Ascend, Loongson, etc.) make it the preferred operating system for domestic AI servers. Vendors such as Kylin Software and Tongxin UOS have launched commercial distributions based on the OpenEuler kernel.

Unix Server Operating System (UOS Server): A general-purpose server operating system developed by UniX Software, compatible with mainstream domestic hardware platforms, providing comprehensive management tools and security features, and widely applied in government, finance, and other fields.

KYLINSEC OS: An operating system that emphasizes security and trusted computing, meeting the requirements of high-level protection and critical information infrastructure.

MindSpore (Shengsi): An open-source AI framework developed by Huawei, natively supporting Ascend chips, with unique advantages such as automatic parallelization and full-scenario (edge-cloud) deployment. It achieves significant performance improvements through deep integration with Ascend hardware.

PaddlePaddle: Baidu’s open-source deep learning platform, China’s first independently developed, feature-rich framework, with a powerful ecosystem and complete toolchain, widely supporting domestic and international hardware platforms.

OneFlow: A deep learning framework focused on distributed training, demonstrating excellent scalability in data parallel/model parallel tasks represented by large models.

Domestic power supplies and cooling systems are the cornerstone of system stability. High-power, high-reliability power supplies and efficient cooling are essential for the continuous and stable operation of AI servers.

Great Wall Power Supply: Great Wall’s high-performance server power supplies (such as the CRPS series) are certified by 80 PLUS Titanium, offering extremely high conversion efficiency, providing stable, clean, and efficient power supply for energy-intensive AI training clusters.

Delta Electronics (Delta) Made in China: High-quality server power supplies produced by Delta, a global power supply giant, in its Chinese factories provide ultimate reliability and high efficiency.

Super Liquid Cooling Technology: Facing the kilowatt-level thermal power density of AI chips and GPU clusters, Chinese manufacturers are at the forefront of liquid cooling (cold plate-based, immersion-based) technology. Huawei, Inspur, Lenovo, Alibaba Cloud, and others have all launched mature liquid cooling server solutions. Test results show that cold plate liquid cooling can reduce chip junction temperature by 10-20°C compared to air cooling, significantly improving chip performance stability and reducing system energy consumption; fully immersion liquid cooling can achieve extreme energy savings with a PUE approaching 1.05.

Insightful Control: Domestic Management and Monitoring Software

Intelligent operations and maintenance management is an essential capability for managing large AI computing clusters.

Inspur CloudSea OS (InCloud OpenStack Manager / InCloud Manager): Inspur CloudSea OS provides unified intelligent management for large-scale cloud data centers and AI computing platforms, supporting the management, monitoring, scheduling, and automated operations of heterogeneous resources (including domestic CPUs and accelerator cards), serving as the robust “operating system” for large-scale AI clusters.

Huawei iManager Platform: Huawei’s data center management solution enables unified management and intelligent monitoring and operations for computing, storage, networking, virtualization, and other resources.

Lenovo XClarity / Huawei eSight: A powerful server hardware management suite offering out-of-band management, firmware updates, hardware monitoring, alerting, and log analysis, serving as the “eyes and ears” ensuring the healthy operation of AI server hardware.

Qingchuang Sherlock AIOps: A domestically developed software solution focused on intelligent operations and maintenance (AIOps), leveraging big data analysis and machine learning to achieve intelligent monitoring, fault prediction, root cause analysis, and automated response for complex IT environments in AI computing centers, significantly enhancing operational efficiency and system reliability.

Empirical strength, the trusted choice of global customers, China-made AI hardware solutions have won widespread recognition worldwide.

Cambridge High-Performance Computing Laboratory: A leading European university’s high-performance computing laboratory has deployed an Atlas cluster based on Ascend 910 chips in its interdisciplinary research platform. The cluster handles important AI inference tasks in astrophysical simulations and biomolecular dynamics research. The laboratory’s technical director reported: The cluster’s single-precision floating-point peak performance reached 92% of the expected target, with energy efficiency significantly outperforming our existing aging GPU nodes, demonstrating satisfactory competitiveness in specific computationally intensive tasks.”

Southeast Asian Fintech Platform: A leading Southeast Asian fintech company adopted domestically produced AI servers integrated with Cambricon MLU chips for model inference when building its next-generation real-time anti-fraud engine. The company’s CTO stated: “The selection was primarily based on three factors: the outstanding single-card inference throughput met our low-latency requirements in high-speed trading scenarios; the highly attractive total cost of ownership (TCO) optimized our infrastructure investments; and the responsiveness of the localized technical team far exceeded that of our previous suppliers.” After the system went live, it successfully increased fraud transaction detection rates by 15% and reduced false positive rates by 8%.

National-level research institute: A national-level artificial intelligence research institute successfully built a fully domesticated thousand-card-level large-scale model training platform based on Kunpeng 920 CPU + Ascend 910 NPU + high-speed domestic Ethernet. The platform successfully completed the full training cycle of a Chinese large-scale model with hundreds of billions of parameters. The project leader noted: “The feasibility of this purely domestically produced technology stack—from chips, servers, and networks to operating systems and AI frameworks—has been fully validated. While there are still minor gaps compared to international top-tier solutions in terms of extreme performance optimization and the completeness of specific operator libraries, the overall synergistic effect exceeded expectations, with the average completion time for training tasks reaching 85% of the efficiency of international mainstream platforms. More importantly, it provides unprecedented autonomy and control.”

The future is here, and China’s AI hardware holds global value.

The thriving development of China’s AI hardware industry stems from the nation’s strategic resolve for technological self-reliance, the pull effect of its vast domestic market, and the relentless efforts of countless enterprises and researchers. Today, we proudly present to the world a fully autonomous, high-performance, and secure AI hardware product chain. This is not a reluctant “backup plan” under blockade, but a “second choice” that shines brightly after being forged through adversity—it provides the global digitalization process with diverse, powerful, and trustworthy new options. Embrace computing power equality and build a diverse future together. Choosing Chinese AI hardware means choosing an open, innovative, and shared intelligent new path!

Related Posts

Breaking through the arithmetic bottleneck, Intel Xeon 6 drives a leap in g4il ……Breaking through the arithmetic bottleneck, Intel Xeon 6 drives …

Breaking through the arithmetic bottleneck, Intel Xeon 6 drives a leap in g4il ……Breaking through the arithmetic bottleneck, Intel Xeon 6 drives …

Breaking through the arithmetic bottleneck, Intel Xeon 6 drives a leap in g4il ……Breaking through the arithmetic bottleneck, Intel Xeon 6 drives …

Ready to Transform Your AI Infrastructure?

Contact us today for custom solutions that drive results.

Get a Free Quote