Start United States USA — software AMD RX 7900 XTX

AMD RX 7900 XTX

Von

December 12, 2022

AMD’s fastest graphics card, powered by chiplets.
The AMD Radeon RX 7900 XTX has a lot going for it. We’re used to seeing GPU generations that arrive on smaller process nodes, redesigned architectures, larger caches, reworked shaders, more memory—the list goes on. But all of that, all at once? That’s what RDNA 3 delivers: the whole lot in one fell swoop.
The RX 7900 XTX is the best example of the all-encompassing upgrade to the Radeon DNA, and it’s a mighty 4K card for those improvements. Just one of the new things introduced with this graphics card may be one of the most significant changes in manufacturing and design that we’ve seen in a very long time: a chiplet-based GPU. I feel I’ve been waiting for years to say that in reference to a gaming card and yet this past week I’ve actually been playing games on a chip that’s made up of interconnected silicon wafers all working together seamlessly as one.
Though the RX 7900 XTX parachutes in as a prohibitively expensive card at $999. It’s certainly not for everyone at that price. The $899 RX 7900 XT sits below this card and offers a pared back number for less, but not by much. At the very least you are getting the best of the best for your money with the RX 7900 XTX, and sort-of price parity with the previous generation, which can’t be said for its fiercest competition from team green.
With AMD and Nvidia focusing on the high-end with their first cards of a new generation, it comes down to where best to spend that $1,000 or more you have lying around. Is it AMD’s new chiplet GPU that will win you over, or shall you find the allure of the more stable performance of Nvidia’s Ada Lovelace to be worth the extra cash? The answer, as ever, is complicated.
AMD RX 7900 XTX architecture
(Image credit: Future)What’s new in the RDNA 3 architecture?
The Navi 31 GPU powering the RX 7900 XTX is pretty special. Rather than stuffing ever-growing amounts of circuitry onto a single slab of silicon—the go-to approach for, well, every modern gaming GPU thus far—the Navi 31 GPU is actually seven discrete, smaller slabs of silicon.
That’s seven individual chips all working together as one to provide a high-end ultra-enthusiast gaming experience: One Graphics Die (GCD) and six Memory Cache Dies (MCDs).
Think of RDNA 3 as a translation of what AMD’s achieved with chiplets in the CPU world with its Ryzen chips—two chiplets connected by a high-bandwidth interconnect. The thing is when it comes to the graphics world you need an even bigger interconnect and even more chips chocked-full of transistors to get the job done.
Let’s start with the biggest chip of the lot, the GCD. The GCD is the largest component inside the Navi 31 GPU at 300mm2. When I say largest, it is comparatively tiny compared to a monolithic GPU of a similar calibre today. Consider the RTX 4090 (opens in new tab)’s gargantuan GPU at 608mm2 and AMD’s new GCD sounds paltry by comparison.
The GCD is the only 5nm component in the Navi 31 GPU, built using TSMC’s N5 process node.
(Image credit: AMD)
Within the GCD lies the foundational computational cores that power the AMD RDNA 3 gaming experience, the Compute Units (CUs). Much like in RDNA 2, these are divided up into Dual Compute Units, which means they share access cache and memory, though most of their constituent parts are discrete to each CU.
Each CU contains 64 Stream Processors (SPs). These are the driving force behind rasterised gaming performance on any AMD GPU. From the AMD Dual CU diagram you can see a basic outline of the upgraded CU in all its glory.
(Image credit: AMD)
With RDNA 3, AMD brags a 2.7x increase in shader FLOPs (floating point operations per second) versus RDNA 2. That’s despite fitting the RX 7900 XTX with only 20% more cores than the RX 6950 XT (opens in new tab). ‚How’s that achieved?‘ I hear you ask. Well, it comes down to greater power and silicon management, via clock and utilisation improvements, but also a significant change to the RDNA 3 shader core.
With RDNA 3, each SP has seen a significant overhaul, leading to much enhanced instruction throughput.
Within a CU there are two important blocks for chomping through maths to render a frame: Float / INT / Matrix SIMD32 and Float / Matrix SIMD 32. There are other new things stuffed in there, like the AI Matrix Accelerator, but it’s these blocks, essentially huddles of Arithmetic Logic Units (ALUs), that make this RDNA 3 graphics card tick. These chomp through two types of number format: floating point and integer.
One new thing within the Navi 31 GPU is the inclusion of the second block, the one named Float / Matrix SIMD 32, which enables double the floating point performance where applicable. Pretty important as our much beloved videogames require shifting a lot of these numbers all at once to manifest a frame.
There are actually double the number of shaders within an RDNA 3 CU compared to RDNA 2.
What that means is one: there are actually double the number of shaders within an RDNA 3 CU compared to RDNA 2. AMD has done something similar here to what Nvidia did with its Ampere architecture. And two: we can’t compare SP counts as we’d like to between the RX 7900-series cards and the RX 6000-series cards because what constitutes an SP no longer matches up gen-on-gen.
But you might remember that when Nvidia doubled its FP32 capabilities with Ampere it decided to double the listed CUDA cores for all of its Ampere and more recent GPUs. However, AMD has not done the equivalent for RDNA 3. I’ve tried asking AMD why it made the decision to stick with its definition of a core with RDNA 3, but I didn’t receive a straightforward answer. My only assumption is that AMD wishes to stick to its definition of a core being a single SP, rather than being a count of specific FP32 units. Fair enough—I’ve just seen marketing departments make more of a lot less.
The key thing to note is that RDNA 3 is actually capable of a lot more than its core count lets on. Both in terms of FP32 throughput and AI acceleration—remember those extra ALUs are also handy for Matrix operations.
AMD says all-in, the new CU design leads to „approximately 17.4% architectural improvement clock for clock.“
In addition to the redesigned CU, the new discrete GCD in Navi 31 has received increased cache sizes (240% more L0, 300% more L1, and 50% more L2) and higher clock speeds to deliver further improvements to the RDNA 3 shader pipeline.
Actually, clock speeds are specifically noteworthy. AMD has splintered its internal clocks with RDNA 3. There’s now a shader clock and a front-end clock. According to AMD, RDNA was more front-end limited in gaming workloads, so the change benefits us gamers by allowing the front-end clock to run a little faster at 2.5GHz than the shader clock at 2.3GHz.
Image 1 of 3
(Image credit: AMD)
(Image credit: AMD)
(Image credit: AMD)
Now let’s talk about ray tracing. AMD has a bead on what is required to build an efficient accelerator for this purpose with its own aptly-named Ray Tracing Accelerators introduced with RDNA 2. These dedicated RT blocks are back and better than ever with RDNA 3. Included with them is new hardware for specialised box sorting, aimed at pushing efficiency up for ray-traced workloads. Along with that, better utilisation techniques to extract more out of the improved RDNA 3 GCD hardware.
It’s all for a pretty serious finessing of the entire RT pipeline in RDNA 3, and I have to say it’s clearly working for AMD. We’ll get to the performance, but AMD’s focus on reducing necessary bandwidth requirements, and the pressure on ALUs to perform RT tasks, has put it on a better footing versus the competition in ray-traced games.
Right, that’s the GCD, or some of it, but there are also six MCDs in the Navi 31 package.
(Image credit: AMD)
Each MCD is a portion of the memory subsystem of the RDNA 3 GPU. It’s actually a rather simple looking component next to the more complex GCD. Each MCD contains a 16MB slice of the Infinity Cache (96MB total), along with a 4 x 16-bit memory interface (384-bit total) to hook into the 24GB of GDDR6 memory close by on the PCB.
Considering the RX 6950 XT’s slimmer 256-bit wide memory bus, that’s a big leap in actual memory bandwidth. Effective memory bandwidth is also up with Infinity Cache factored in.
Each MCD is built using TSMC’s 6nm process node. A decision aimed at reducing costs while maintaining performance. The cutting-edge 5nm node might be less power hungry or allow you to stuff more transistors into any given space but it’s expensive and in high demand. Taking non-critical components off the expensive process node and dividing them between tiny 37mm2 chips makes for a much more scalable and affordable option. This scalable chiplet-based approach worked for Ryzen, so I’m hopeful that the same sort of success will drive down costs here, too. Providing that lower cost translates to more in my pocket and not just more for AMD and its stakeholders, of course.
But you can’t just slice off components and stick them next to each other and hope they magically communicate. Just like a Ryzen CPU requires Infinity Fabric, a Radeon GPU requires an interconnect. And one with heaps of bandwidth at its disposal.
(Image credit: AMD)
Sticking to its theme, AMD is introducing Infinity Links with RDNA 3 (opens in new tab). Infinity Links are a way to transfer heaps of data at a rapid pace between chiplets on the Navi 31 GPU.
Infinity Links is the part of RDNA 3 that AMD’s engineers seemed the most excited to talk about during our briefings ahead of launch. And I get why. The problem was the routes of tiny wires hooking up parts of the chip package were, by comparison to everything around them, far too large to allow for effective communication across chiplets within the GPU. To solve this, AMD’s engineers tasked themselves with the job of creating a new fanout technology, and that’s Infinity Links.
„The bandwidth density that we achieve is almost 10x,“ AMD’s Sam Naffziger tells us in a briefing. „And that’s with the bit rates that you can see here, 9.2 gigabits per second signalling across these interfaces. And with the finer pitches of the bumps, and single routes, we get a dramatic increase in bandwidth density, which is exactly what the GPU needs.“
(Image credit: AMD)
Above: The traditional organic fanout is on the left, taken from one of AMD’s server products. AMD didn’t specify which, but assumedly an Epyc processor. The new Infinity Link fanout is on the right. The images are roughly to scale according to AMD.
The resulting performance is 3.5TB/s across the new interconnect, and to be able to offer that for less than 5% GPU power consumption.
Infinity Links completes the trifecta of parts that ultimately make up the Navi 31 GPU. Of course, I’m grossly oversimplifying swathes of components of the architecture, but I listened to one of AMD’s best brains, Mike Mantor, talk RDNA 3 for an hour and it felt like we hardly scratched the surface. I never really stood a chance of explaining it all here. Though what I can do is actually play games on this thing, and faster frame rates in-game is ultimately what it’s all leading up to.AMD RX 7900 XTX specs
(Image credit: Future)What’s inside the RX 7900 XTX?
The RX 7900 XTX is the most powerful graphics card AMD has ever produced. It’s no surprise that it brings with it an almighty spec sheet to drive significant gains in gaming over the previous generation. From 24GB of memory squeezed onto its surprisingly slim hull, to the 61 TFLOPs performance it’s said to deliver in raw compute, the RX 7900 XTX is a beast by all measures.