Cerebras Systems, the maker of an AI “chip” the size of a pizza box, is making some impressive claims about its AI processing performance.
At the recent Supercomputing 24 show, Cerebras announced a breakthrough in molecular dynamics simulations. Data from third-party benchmark firm Artificial Analysis shows a single Cerebras CS-2 system with one Wafer Scale Engine-2 (WSE) achieved over 1.1 million steps per second, which is 748x faster than what is possible on the Frontier supercomputer (which just lost its world’s fastest supercomputer title to newcomer El Capitan).
Not only that, but it was a single WSE chip in a single CS-2 server unit, occupying one rack about 16U in height and consuming 27 kilowatts of power. Frontier has 37,000 GPUs and CPUs in rows of cabinets and consumes 21 megawatts of power.
Even more impressive, this benchmark was run on a CS-2, which is an older Cerebras model. The CS-3 and third-generation WSE chip is at least twice as fast as the CS-2.
The experiment came in partnership with Sandia National Laboratories, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory.
“This new world record means that scientists can now complete two years’ worth of GPU-based simulation work every single day. This will greatly accelerate the rate of innovation derived from molecular simulations,” said Michael James, chief architect of advanced technologies and co-founder of Cerebras Systems, in a statement. “This critical breakthrough is ready to provide insights on material structure and function. When we extend our work to biomolecules, it will unlock new capabilities in protein folding, medicine, and drug development.”
The Wafer Scale Engine measures 8 inches by 8 inches, which is considerably larger than a 1-inch to 1.5-inch GPU. Whereas a GPU has about 5,000 cores, the WSE has 850,000 cores and 40 GB of on-chip SRAM memory, which is 10 times faster than HBM memory used in GPUs. That means 20 PB/sec of memory bandwidth and 6.25 petaflops of processing power on dense matrices and 62.5 petaflops on sparse matrices.
In another benchmark against the Meta Llama 3.1-405B model used to train generative AI to respond to human input, Cerebras produced 969 tokens per second, far outpacing the number two performer, Samba Nova, which generated 164 tokens per second. That makes Cerebras’s throughput 12 times faster than AWS’s AI instance and six times faster than its closest competitor, Samba Nova.
Cerebras isn’t shy about the secret to its success. According to James Wang, director of product marketing at Cerebras, it’s the giant Wafer Scale Engine with its 850,000 cores that can all talk to each other at high speeds.
“Supercomputers today are great for weak scaling,” said Wang. “You can do more work, more volume of work, but you can’t make the same work go faster. Typically it tapers out at the max number of GPUs you have per node, which is around eight or 16, depending on configuration. Beyond that, you can do more volume, but you can’t go faster. And we don’t have this problem. We literally, because our chip itself is so large, move the strong scaling curve up by one or two orders of magnitude.”
Inside a single server with eight GPUs, the GPUs use NVLink to share data and communicate, so they can be programmed roughly to look like a single processor, Wang adds. But once it goes beyond eight GPUs, in any supercomputer configurations, the interconnect changes from NVLink to InfiniBand or Ethernet, and at that point, “they can’t be programmed like a single unit,” Wang says.
Earlier this month, Cerebras announced that Sandia National Laboratories is deploying a Cerebras CS-3 testbed for AI workloads.
The system, nicknamed Kingfisher, will start out as a cluster of four CS-3 systems and eventually expand to eight systems. The Kingfisher cluster will be used in both traditional HPC simulation work as well as generative AI for the U.S. Department of Energy.
Source:: Network World