Домой United States USA — IT Nvidia Turing GPU deep dive: What's inside the radical GeForce RTX 2080...

Nvidia Turing GPU deep dive: What's inside the radical GeForce RTX 2080 Ti

По

September 19, 2018

184

Nvidia’s radical Turing GPU brings RT and tensor cores to consumer graphics cards along with numerous other architectural changes. We dig into the TU102 GPU inside the GeForce RTX 2080 Ti.
It’s time to pull back the curtain on the Turing GPU inside Nvidia’s radical new GeForce RTX 20-series, the first-ever graphics cards designed to handle real-time ray tracing thanks to the inclusion of dedicated tensor and RT cores. But the GeForce RTX 2080 and RTX 2080 Ti were also designed to significantly improve performance in traditionally rendered games, with enough power to feed those blazing-fast 4K, 144Hz G-Sync HDR gaming monitors.
Nvidia revealed plenty of numbers during the GeForce RTX 2080 Ti announcement. Clock speeds, memory bandwidth, CUDA core counts—it was all there. This deeper dive explains the underlying, architectural changes that make Nvidia’s Turing GPU more potent than its Pascal predecessor. We’ll also highlight some new Nvidia tools that developers can embrace to speed up performance even more, or bring the AI-boosted power of Nvidia’s Saturn V supercomputer into your graphics card.
Before we dig in, here’s a high-level specifications overview for the Turing TU102 GPU inside the flagship GeForce RTX 2080 Ti.
Nvidia’s TU102 GPU, found inside the GeForce RTX 2080 Ti. (Click on any image in this article to enlarge it.)
Here’s Nvidia’s high-level overview, in case what you’re looking at isn’t clear:
“The TU102 GPU includes six Graphics Processing Clusters (GPCs), 36 Texture Processing Clusters (TPCs), and 72 Streaming Multiprocessors (SMs). Each GPC includes a dedicated raster engine and six TPCs, with each TPC including two SMs. Each SM contains 64 CUDA Cores, eight Tensor Cores, a 256 KB register file, four texture units, and 96 KB of L1/shared memory which can be configured for various capacities depending on the compute or graphics workloads… Tied to each memory controller are eight ROP units and 512 KB of L2 cache.”
You’ll also find a single RT processing core within each SM, so there are 72 in the GeForce RTX 2080 Ti. Because the RT and tensor cores are baked right into each streaming multiprocessor, the lower you go in the GeForce RTX 20-series lineup, the fewer you’ll find of each. The RTX 2080 has 46 RT cores and 368 tensor cores, for example, and the RTX 2070 will have 36 RT cores and 288 tensor cores.
Let’s explain the improvements to the long-established stuff before digging into the exotic new tensor and RT cores.
Nvidia says the GeForce RTX 2080 can be roughly 50 percent faster than the GTX 1080 in traditional games. Many of the comparisons occur in games with HDR enabled, which take a performance hit on current GTX 10-series cards. The GeForce RTX 2080 can be more than twice as fast as the GTX 1080 in games that support Nvidia’s DLSS technology, Nvidia claims (we’ll talk more about DLSS later), and surpass 60 frames per second in several triple-A games at 4K resolution with HDR visuals enabled.
Nvidia also rejiggered how the memory caches inside its simultaneous multiprocessors work. Now, smaller SMs each feed into a unified pool of L1 and shared memory, which in turn feeds into an L2 cache that’s twice as large as before. The shake-up means Turing has almost three times more L1 memory available than the Pascal GPUs in the GTX 10-series, with twice as much bandwidth and lower latency.
But games aren’t bound by shading performance alone. Memory bandwidth can directly affect how well your games play. Turing improves upon Pascal’s superb memory compression technology, and the GeForce RTX 2080 and 2080 Ti build atop that with the introduction of Micron’s next-gen GDDR6 memory—the first time it’s appeared in a GPU. GDDR6 blazes along at 14Gbps despite being 20 percent more power-efficient than GDDR5X, and Nvidia optimized Turing’s RAM for 40 percent lower crosstalk than in its predecessor.
The grab-bag of improvements gives the RTX 2080 Ti a 50-percent increase in effective memory bandwidth over the GTX 1080 Ti, Nvidia says. In real-world terms, the GeForce RTX 2080 Ti hits a total memory bandwidth of 616GBps, versus the GTX 1080 Ti’s 484GBps, even though both cards offer identical memory capacities and bus sizes. That’s the power of GDDR6.
As with most major GPU architecture launches, Nvidia also introduced some new shading technologies that developers can take advantage of to improve performance, visuals, or both.
Mesh shading help take some of the burden off your CPU during very visually complex scenes, with tens or hundreds of thousands of objects. It consists of two new shader stages. Task shaders perform object culling to determine which elements of a scene need to be rendered. Once that’s decided, Mesh Shaders determine the level of detail at which the visible objects should be rendered. Ones that are farther away need a much lower level of detail, while closer objects need to look as sharp as possible.
Nvidia showed off mesh shading with an impressive, playable demo where you flew a spaceship through a massive field of 300,000 asteroids. The demo ran around 50 frames per second despite that gargantuan object count because mesh shading reduced the number of drawn triangles at any given point down to around 13,000, from a maximum of 3 trillion potential drawn triangles. Intriguing stuff.
Variable rate shading is sort of like a supercharged version of the multi-resolution shading that Nvidia’s supported for years now. Human eyes only see the focal points of what’s in their vision at full detail; objects at the periphery or in motion aren’t as sharp. Variable rate shading takes advantage of that to shade primary objects at full resolution, but secondary objects at a lower rate, which can improve performance.
Variable rate shading can also help in virtual reality workloads by tailoring the level of detail to where you’re looking. Another new VR tech, Multi-View Rendering, expands upon the Simultaneous Multi-Projection technology introduced with the GTX 10-series to allow “developers to efficiently draw a scene from multiple viewpoints or even draw multiple instances of a character in varying poses, all in a single pass.”
Finally, Nvidia also introduced Texture Space Shading, which shades an area around an object rather than a single scene to let developers reuse shading in multiple perspectives and frames.
For a standard GPU architecture, that’d be all you need to know.