SpaceX built an AI training system in C — and claims it's 10x faster than Google's JAX

By: Anton Kratiuk | today, 09:36

SpaceX has nearly completed version 1.0 of a custom AI training stack written entirely in C — a low-level programming language that sits far closer to raw hardware than the Python-based frameworks most AI labs rely on. The system is designed to run across a cluster of roughly 220,000 Nvidia GB300 accelerators connected via 800G networking. If the performance holds up in real-world conditions, it could reshape how competitors think about AI training infrastructure.

The bare-metal bet

Most AI training today runs on frameworks like Google's JAX or Meta's PyTorch — tools that add convenience through abstraction layers but inevitably introduce overhead. SpaceX's approach strips that away. Writing in C means the code talks directly to the hardware, cutting out intermediary software layers that can slow down large-scale training runs.

The hardware underneath is substantial. Each Nvidia GB300 NVL72 rack packs 72 Blackwell Ultra GPUs and 36 Grace CPUs, with 800 Gb/s networking per GPU via ConnectX-8. At 220,000 accelerators total, the cluster is one of the largest AI training installations announced to date.

The stack places particular emphasis on pipeline parallelism — a technique for splitting model training across thousands of interconnected chips simultaneously. In clusters of this scale, minimizing communication latency between chips is where performance is won or lost.

Musk vs. Google — and the xAI angle

Elon Musk claims the new stack could be more than 10 times faster than JAX for large training runs, per IBTimes AU (May 28, 2026). That figure needs context: no independent benchmarks have been published yet, and JAX itself holds multiple MLPerf records. The 10x claim is SpaceX's own, not a third-party result.

The system's first major workload will be Grok, the AI model developed by xAI — which merged into SpaceX in February 2026. That consolidation puts Musk in control of the silicon, the training software, and the model deployment in a single entity, a pattern that mirrors how Tesla handles its Full Self-Driving stack.

What it means for the AI race

SpaceX is entering a field where Google, Microsoft (with OpenAI on Azure GB300 clusters), and Meta have spent years optimizing their training infrastructure. A proprietary C-based stack could offer real efficiency gains at scale, but it also creates lock-in: unlike open frameworks, a closed system is harder for outside researchers to audit, benchmark, or build on.

For now, the 10x claim remains just that — a claim. Independent validation will determine whether SpaceX's bare-metal gamble pays off, or whether the abstraction layers everyone else uses exist for good reason.