Nvidia Lovelace
NVIDIA’S ADA LOVELACE ARCHITECTURE
Potentially the biggest jump in generational GPU per formance ever
Partners including Asus, Colorful, Gainward, Galaxy, GIGABYTE, INNO3D, MSI, Palit, PNY, and ZOTAC will soon sell custom cards using their own unique designs and cooling solutions.
© NVIDIA
IN A TEASER leading up to the Ada Lovelace and RTX 40-series announcement, Nvidia had a phone number asking, “Tell us, how fast do you want to go?” Perhaps KITT—that’s the ‘Knight Industries Two Thousand’, if you weren’t around in the 1980s—has finally arrived in the real world, except now he’s a GPU instead of a souped-up Pontiac Trans Am. If you want to take Nvidia at its word, the apparent answer is two to four times faster than the RTX 3090 Ti.
However, there are a lot of caveats with those claims. In some scenarios, like with the new DLSS 3 algorithm and running extreme levels of ray tracing, that four-times difference in performance between the current generation RTX 3090 Ti and the RTX 4090 might materialize. In more typical gaming scenarios, especially without DLSS 3, we expect the gains to be smaller, but still substantial.
We’re going to take a deep dive into everything that makes Nvidia’s new RTX 40-series graphics cards tick. We have specifications on the first three announced models and their respective GPUs, access to the RTX 4090, and a whole load of technical documents. We also need to discuss pricing, the current GPU oversupply situation, and the end of Moore’s Law. So get comfortable, grab your favorite beverage, and let’s meet Ada Lovelace.
–JARRED WALTON
The Ada Streaming Multiprocessor packs third-generation RT cores and fourth-gen Tensor cores with enhanced GPU shaders.
© NVIDIA
DIALING UP CORE COUNTS AND CLOCK SPEEDS
Fundamentally, a big part of what makes a GPU fast comes from the core and clock speeds. With CPUs, there are lots of other factors to consider and scaling to higher core counts quickly starts to deliver diminishing returns. Graphics workloads are another matter, where scaling from dozens to hundreds to thousands of processing pipelines over the past two decades has generally resulted in directly proportional boosts in performance.
Check our Speeds and Feeds sidebar (below) for the raw specs of the three announced Ada Lovelace GPUs, though bear in mind that the announced cards have slightly different specs than we’re showing. We have a full review of the RTX 4090 on page 74 (spoiler: it’s good), and the RTX 4080 16GB and RTX 4080 12GB models hopefully next month. Yes, there are two almost completely different 4080 models coming—but for now, let’s just talk theoretical specs.
The biggest and fastest of the new Ada Lovelace GPUs, AD102 increases raw core counts by 71 percent compared with the existing Ampere GA102. That alone should provide a huge jump in generational performance, but Nvidia has also tuned critical paths in the architecture to hit substantially higher clocks than previous generation architectures. All the graphics cards announced so far have reference clock speeds of 2.5GHz or more, and Nvidia has reached speeds of 2.85GHz and even 3.0GHz. Combine the two and you get a theoretical performance uplift of 130 percent.
Let that sink in for a moment. Assuming relatively similar architectures—and everything Nvidia has said so far indicates that most of the base functionality in Ada isn’t radically different from Ampere—the fastest RTX 40-series card could double what we’ve seen from the RTX 3090 Ti. Or stepping down to the new RTX 4080 12GB, it still has 35 percent more theoretical performance than the existing RTX 3080 10GB, while using 11 percent less power. There will of course be custom cards from Nvidia’s addin board partners that come overclocked and use more power in pursuit of higher performance, but the baseline specs look very promising indeed.
CASHING IN ON CACHE The memory subsystem is the one area that might hold Ada Lovelace back. Nvidia says it has worked with Micron, the exclusive manufacturer of GDDR6X memory, to move to a new process node and reduce the power consumption of the memory. Clocks are also increasing in some cases, but not by much. So far, the RTX 4080 16GB has been announced with 22.4 Gbps GDDR6X, while the 4090 and 4080 12GB both stick with 21 Gbps memory — the same as the RTX 3090 Ti. But the memory interface width either stays the same in the case of the 4090 versus 3090/3090 Ti and shrinks with the 4080 models.