Oracle’s RoCE networking links GPUs through isolated planes to cut latency
Oracle claims its Zettascale10 system can hit 16 zettaFLOPS peak
The project uses around 800,000 Nvidia GPUs spread across data centers
OpenAI’s Stargate cluster in Texas runs on Oracle’s new infrastructure
Oracle has announced what it calls the largest AI supercomputer in the cloud, the OCI Zettascale10.
The company claims the system can deliver 16 zettaFLOPS of peak performance across 800,000 Nvidia GPUs.
That output, when divided, equals about 20 petaflops per GPU, roughly matching the Grace Blackwell GB300 Ultra chip used in high-end desktop AI systems.Network design for large-scale AI workloads
Oracle says the platform is the foundation for OpenAI’s Stargate cluster in Abilene, Texas, built to handle some of the most demanding AI workloads now emerging in research and commercial use.
“The highly scalable custom RoCE design maximizes fabric-wide performance at gigawatt scale while keeping most of the power focused on compute…,” said Peter Hoeschele, vice president, Infrastructure and Industrial Compute, OpenAI.
At the core of the Zettascale10 system is Oracle Acceleron RoCE networking, designed to increase scalability and reliability for data-heavy AI operations.
This architecture uses network interface cards as mini switches, linking GPUs across several isolated network planes.
Home
United States
USA — software Oracle claims to have the largest AI supercomputer in the cloud with...