Домой United States USA — software VMware demos 'bare-metal' performance from virtualized GPUs

VMware demos 'bare-metal' performance from virtualized GPUs

207
0
ПОДЕЛИТЬСЯ

Is. is that why Broadcom wants to buy it?
The future of high-performance computing will be virtualized, VMware’s Uday Kurkure has told The Register. Kurkure, the lead engineer for VMware’s performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported «near or better than bare-metal performance» for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia’s NVLink interconnect. NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0’s 2.5GB/s. The interconnect enabled Kurkure’s team to pool 160GB of GPU memory from the Dell PowerEdge system’s four 40GB Nvidia A100 SXM GPUs.
«As the machine learning models get bigger and bigger, they don’t fit into the graphics memory of a single chip, so you need to use multiple GPUs», he explained. Support for NVLink in VMware’s vSphere is a relatively new addition. By toggling NVLink on and off in vSphere between tests, Kurkure was able to determine how large of an impact the interconnect had on performance. And in what should be a surprise to no one, the large ML workloads ran faster, scaling linearly with additional GPUs, when NVLink was enabled. Testing showed Mask R-CNN training running 15 percent faster in a twin GPU, NVLink configuration, and 18 percent faster when using all four A100s. The performance delta was even greater in the BERT natural language processing model, where the NVLink-enabled system performed 243 percent faster when running on all four GPUs. What’s more, Kurkure says the virtualized GPUs were able to achieve the same or better performance compared to running the same workloads on bare metal.

Continue reading...