AI appliance vendor NeuReality has announced its NR1-S solution significantly boosts the output of CPU- and GPU-based systems, reducing energy costs and energy efficiency, by taking over the work normally done by the CPU.
The news came in a blog post with the results of multiple tests. NeuReality compared its NR1-S inference appliance paired with Qualcomm Cloud AI 100 Ultra and Pro accelerators against traditional CPU-centric inference servers with Nvidia H100 or L40S GPU cards. The NR1-S demonstrated significantly improved cost savings and energy efficiency compared to the standard CPU-centric systems running standard AI apps. The tests were done using real-world scenarios involving natural language processing (NLP), automatic speech recognition (ASR), and computer vision (CV) commonly used in medical imaging, fraud detection, customer call centers, online assistants and more.
The NR1-S takes over the work normally done by the CPUs in the system, because the CPUs aren’t fast enough to handle all the data movement generated by the GPUs, said Iddo Kadim, CTO for NeuReality.
“Systems built with CPUs simply can’t support and scale the number of accelerators that are put in the system,” he said. “The CPU becomes a data-moving machine, and unfortunately, CPUs were built to compute, not to move a ton of data back and forth. There are a number of reasons why the CPU architecture basically becomes a bottleneck.”
The appliance takes over for the CPU, greatly enhancing the throughput and scalability of the GPUs. This allows GPUs to run much faster and at greater utilization than with CPUs. When paired with Qualcomm’s AI 100 Ultra, NR1-S achieves up to 90% cost savings across various AI data types, such as image, audio and text. Along with the cost saving, the NR1-S shows up to 15 times better energy efficiency compared to traditional CPU-centric systems. Unlike traditional CPU-centric systems, NR1-S can ensure 100% utilization of the integrated AI accelerators without performance drop-offs or delays observed in today’s CPU-reliant systems, the vendor claims.
The tests also measured energy consumption for audio processing per watt. In a voice-to-text test, the NR1-S was able to convert seven seconds of audio using the same amount of power as 0.7 seconds in traditional CPU-centric systems. This translates to a 10-fold increase in performance for the energy used, according to NeuReality. Another audio test showed NR1-S cutting the cost of processing 1 million audio seconds from 43 cents to only 5 cents.
The NR1-S works with existing accelerators, GPUs or otherwise, as long as they are PCI Express-based. The device is a heterogeneous compute device with network and data movement optimization and some compute engines to basically take over the functions that the CPU would take care of.
The appliance comes with a SDK that enables it to convert the processing pipeline automatically, making it a plug-and-play deployment with no modifications required to the hardware environment or the software environment.
The NR1-S appliance is available now.
Read more from Andy Patrizio:
- Accelsius offers liquid cooling without a data center retrofit
- Everyone but Nvidia joins forces for new AI interconnect
- Pure Storage adds AI features for security and performance
- AMD updates Instinct data center GPU line
Source:: Network World