Arm’s new Lumex compute subsystem brings a big boost to on-device AI, graphics, and general compute tasks to flagship smartphones
As the smartphone market has matured, the workloads that consumers expect from their tiny in-pocket mobile computers has increased drastically. Fortunately, chip designers continue to build faster processors that do perform well with varied workloads, without completely tanking battery life. Tonight, Arm introduced its Lumex Compute Subsystem (CSS) platform, which drives big improvements in not only general CPU workloads, but also on-device artificial intelligence and gaming tasks, too. Arm Lumex CPU Enhancements
The big news is that every part of the Lumex CSS has been purpose-built to make on-device AI better. The Lumex CPU cores implement Scalable Matrix Extension v2 (SME2) instructions that are built for the matrix operations that modern AI models need. While we believe that AI-specific neural coprocessors are going to remain a vital part of any mobile SoC, adding these instructions essentially turns the CPU block into an AI coprocessor on its own. Arm says that these new accelerated instructions will enable Arm licensees to bring AI devices to market faster with performance more akin to their desktop brethren.
Arm says Lumex CSS-based CPU architectures should be available up and down a customer’s product stack from flagships down to low-powered efficient devices. The various designs can power anything from a PC to a wearables with the smallest form factors. To handle all of that, Lumex CSS designs include four different core types.
At the top is C1-Ultra, which is the highest performing core design with a 25% single-thread performance increase from the previous generation Neoverse year-over-year. These are what you might think of as « prime » and « performance » cores with the highest clock rates, the best performance, and highest power usage. These are suitable for large model inference, AI-fueled photography features, and generative AI content.
Below that is C1-Premium, which Arm says packs C1-Ultra performance into a 35% smaller area compared to C1-Ultra. Most likely this will come at the cost of energy efficiency, which will affect clock speeds and therefore somewhat decrease performance.
Home
United States
USA — IT Arm Unveils Lumex Compute Subsystem For Powerful, Efficient On-Device AI