NVIDIA BLACKWELL RTX DEEP DIVE
The biggest changes all revolve around AI
The RTX 5090 Founders Edition packs a whopping amount of compute and power into an impressively thin 2-slot card.
IT TOOK A BIT LONGER
than we were expecting, but Nvidia’s Blackwell RTX architecture and GPU family have finally arrived. We reviewed the RTX 5090 last issue, and test the RTX 5080 in this issue, but there’s a lot more to say about everything that’s going on under the hood.
Blackwell RTX differs in several key areas from the data center Blackwell architecture, just as the Hopper architecture differed from Ada Lovelace design. As expected, AI and machine learning lands front and center for all the new GPUs coming out this year, which can be both good and bad.
Beside the core architectural changes relative to the prior generation, Nvidia has overhauled its Founders Edition design, created new software tools and features, doubled down on frame generation as a way to ‘boost’ performance, and added new functionality that can help with professional workloads.
The RTX 50-series family of GPUs will carry the torch for the next two years of Nvidia graphics cards, give or take. Some of the technology is forward-looking, and could take at least that long before it starts to see significant uptake by games and applications, and other features will have an almost immediate impact. So put on your thinking cap, and let’s discuss the ups and downs of the new architecture.
BLACKWELL RTX SPECIFICATIONS
The core specifications for the Blackwell RTX GPUs look like the prior-generation Ada Lovelace architecture, outside of the top RTX 5090. GB202 represents a sizeable upgrade from AD102— literally. It’s 750 mm² compared to 608 mm²—a 23 percent increase in die area. It also has up to 192 SMs, compared to a maximum of 144 SMs—a 33 percent increase that mostly overlaps with the die size change. Below GB202, however, everything seems to be far less impressive in terms of gen-on-gen upgrades.
GB203 tops out at 84 SMs (the same as AD103), with the RTX 5080 using the fully enabled chip, while the RTX 5070 Ti trims things down to 70 active SMs. GB205 drops down to just 50 SMs at most, 46 of which are enabled on the RTX 5070—the same number as used in the RTX 4070, but AD104 offered up to 60 SMs.
There are supposed to be two more chips: GB206 for the RTX 5060, and 5060 Ti (including laptop variants) with up to 36 SMs, and GB207 for RTX 5050 (desktop and laptop), with up to 20 SMs. It will be interesting to see if Nvidia actually launches a desktop 5050, considering it skipped the desktop RTX 4050 on the previous generation.
But across the GPU stack, excepting the biggest chip, the new Blackwell chips are equal to or smaller than their predecessors. Does that mean we shouldn’t expect much in the way of performance improvements? The answer ends up being quite nebulous, as it depends heavily on how you want to quantify performance. Let’s delve deeper into what makes the Blackwell RTX architecture tick.
BLACKWELL’S AI ASPIRATIONS
At the heart of everything Blackwell, Nvidia has made some key changes. Superficially, it’s easy to look at the performance and specifications, and conclude that Blackwell is merely Ada version 2.0—or not even that (see TSMC 4N sidebar). But despite a lot of similarities, the two architectures are definitely not the same. Blackwell builds on Ada, just as Ada built on Ampere, and so on back to Nvidia’s first GPUs. Here’s an over view of the changes.
AI enhancements are a big deal, with the chief new addition being native support for FP4 and FP6 number formats on the tensor cores. FP4 requires half the memory space, and offers double the computational throughput compared to FP8. 4-bit floating point versus 8-bit floating point will do that, just as FP8 used half the storage and offered twice the throughput of FP16.