Data, not guessing: Looking at Nvidia's past GPUs to predict the specs for its future RTX 60-series graphics cards

Вчера, 14:02 Pcgamer.com

This year will be the tenth anniversary of the GeForce GTX 10-series, and since then Nvidia's gaming GPUs have undergone some fundamental changes to bring ray tracing and AI to the PC gaming masses. While compute performance, cache levels, and VRAM bandwidth are still key to getting high frame rates in games, today's GeForce graphics cards are far more versatile, capable, and complex than those from 2016.

But what of the future? What will Nvidia's next generation of gaming GPUs look like? With the chance of a Super refresh of Blackwell chips looking increasingly less likely, due to supply pressures on affordable VRAM, I've been spending some time mulling over what's next for Team Green.

To that end, I've looked back over 10 years' worth of GeForce cards, collated all the key information, and compared four tiers of models: 60-class, 70-class, 80-class, and the one at the very top of the chain. The latter is currently the 90-class, but with the GTX 10-series and RTX 20-series, it was known as the Titan.

I've got a few charts for you to peruse, and I'll discuss what each one can potentially tell us about the future. And then from all of this, a table of specs for the four primary tiers of RTX 60-series graphics cards I think we'll see in the near future.

Die size and process node

(Image credit: Taiwan Semiconductor Manufacturing Co., Ltd.)

As I'm sure you already know, Nvidia doesn't manufacture the GPUs and other products that it designs. For that, it hires the services of TSMC (Taiwan Semiconductor Manufacturing Company), partly because it has a long history with this company, but mostly because it's the world's biggest and best when it comes to churning out hulking chips at the cutting edge of processor technology.

Just like all chip makers do, TSMC uses the simple phrase 'process node' to describe the hugely complex sequence of steps it carries out to manufacture hundreds of thousands of silicon wafers, coated with all kinds of materials and etched with light, that eventually get sliced up into individual processor dies.

Nvidia employs one of TSMC's most advanced process nodes, N3, to make its Rubin AI behemoths, but the Blackwell gaming chips that power the RTX 50-series graphics cards are made via a custom version of the previous N5 node, called 4N. The RTX 40-series GPUs were also made on this node, but for the 30-series, Nvidia used Samsung's 8LPH.

(Image credit: Locuza / Fritzchens Fritz)

That was a bit of a surprise when it was announced because before the Ampere generation of GPUs, Nvidia's GTX 10- and RTX 20-series were both made on a custom TSMC N16 node, and its N28 process for a further three prior GTX generations.

I mention all of this because it's the first port of call we need to make before making an educated guess as to what RTX 60-series chips will be like. With Nvidia so heavily invested in AI now, I suspect that it won't use TSMC's most cutting-edge node, N2, but will stick with N3 for cost reasons.

This is important to understand because it will determine the approximate die density of these future GPUs, i.e. the number of transistors per square millimetre of die area. Nvidia's Blackwell and Ada Lovelace gaming chips have roughly the same density, as they're made on the same node: around 120 million transistors/mm².

Relative die density for the last five generations of Nvidia gaming GPUs

(Image credit: Future)

TSMC's N3 is reported to be in the region of 200 or so (higher with certain variants), and if that's what we can expect for the next round of RTX GPUs, then we're looking at 66% increase in density. However, this doesn't automatically mean we'll see chips with 66% more shaders and cache.

That's because the die density figure is for logic only, the stuff that makes up the shader cores and other processing elements. For GPU parts such as cache and PCIe/VRAM circuitry, the increase in density is much smaller, around 5% at best. So while Nvidia can jam lots more CUDA cores into its next-gen GPUs, it's quite limited as to what it can do with cache and analogue systems.

It's also steadily favoured using small dies for the majority of its gaming products, helping to improve wafer yields (the percentage of dies from a wafer that can be used) and profit margins (smaller dies mean more dies per wafer).

The exception to this has been at the very top-end of the GPU scale, with the RTX 5090's chip being fairly close to the maximum size that TSMC's equipment can make. The reason for this isn't about making the 'ultimate' gaming GPU: it's all about having a product for prosumer AI market.

Relative die sizes for a selection of Nvidia GeForce RTX graphics cards

(Image credit: Future)

If you're wondering why the GTX 10-series chips were so small, it's because they were made on a heavily refined TSMC N16 process node that Nvidia had been working with for quite some time.

Anyway, as so much has changed since the GTX 10-series era of GPUs, it's hard to ascertain whether this trend will continue or if Nvidia's next chips will be substantially larger or smaller. My gut feeling is that, having refined its designs with Blackwell and Ada Lovelace, Nvidia will probably stick to using similar-sized dies for the RTX 60-series.

Putting all of this together suggests that we're going to see RTX 60-series GPUs with around 60-70% more transistors than in Blackwell chips, but still the same size. The next question to ponder is how well Nvidia spend that transistor budget?

CUDA cores and cache

With a stack more transistors to play around with, you'd think the first thing Nvidia would do would be to ramp up the number of CUDA cores (i.e. the 'shader' units), but historically that's not always been the case.

Relative number of CUDA cores in selected tiers of Nvidia GeForce GTX/RTX graphics cards

(Image credit: Future)

In the above chart, you can see that the CUDA count jumped up significantly with the switch to Samsung for the RTX 30-series, but other than the top-end models, the number of shaders in 60-, 70-, and 80-class graphics cards has barely changed. However, this chart is also rather misleading, and for two reasons.

First, not all shaders are equal, and second, not all shaders are clocked the same. Since the Pascal era of the GTX 10-series, CUDA cores have become increasingly more capable and flexible. Better process nodes and chip designs have substantially lifted clock speeds.

A more appropriate chart, though still a touch limited, is one that shows the peak FP32 throughput for each GPU. This is a measure of how many 32-bit floating-point operations the chip is capable of handling per second, one of the most common routines that takes place in 3D rendering.

Relative peak FP32 throughput for classes of GeForce graphics cards

(Image credit: Future)

At first glance, this chart might seem no different to the previous one, but if you look closely, you can see that there is a noticeable gap between the RTX 20 and 30-series, and again between the RTX 30 and 40/50-series. All because of big increases in clock speeds and changes to the CUDA cores themselves.

Generally speaking, Nvidia has squeezed out roughly similar levels of FP32 performance for a given die density across the past five generations of GPUs, with the exception of the RTX 30-series which was noticeable higher. If you wondering why the switch to TSMC N5 didn't make much difference in that aspect, it's because RTX 40/50-series GPUs have vastly more L2 cache than all previous chips.

So much so, that for the chart below, I've had to use a logarithmic y-axis scale (base 2) in order to separate out of the various GPUs enough for viewing.

Relative L2 cache levels in various GeForce RTX graphics cards

(Image credit: Future)

Where Pascal, Turing, and Ampere GPUs had to make do with a handful of megabytes of Level 2 cache, Nvidia took a leaf from AMD's RDNA 2 book and significantly increased the amount of last-level cache. Such large amounts of cache can be tricky to get right, in terms of capacity versus latencies, but the huge slices of SRAM go a long way in reducing the pressure on the VRAM bandwidth, as well as helping overall compute and ray tracing performance.

As already mentioned, due to how poorly SRAM scales with process node shrinks, Nvidia can't lob in a pile more cache without significantly increasing the die size. So it will probably stick to very similar amounts of L2 cache as used in Blackwell.

We should still see a healthy jump in the number of shaders, and thus FP32 throughput, but it's unlikely to be in the same order of scale as we'll probably see with die density. For example, Ampere chips have an average density 81% higher than Turing chips, and on average, 180% more CUDA cores per die size.

However, while Ada Lovelace GPUs are 173% more dense than Ampere, in terms of transistors per square millimetre, the shader units per die size figure is only 57% larger on average.

I feel that Nvidia will err on the side of caution with its RTX 60-series, motivated by a desire to keep profit margins as high as possible, and that we'll see something like a 30 to 50% increase in the shader count, compared to Blackwell. Before I go all crystal-ball and attempt to predict the specs of the main RTX 60-series cards, though, there are a couple more factors to consider: ray tracing and AI.

Tensor and Ray Tracing cores

In the chart below, I've plotted Nvidia's quoted 'AI TOPS' figures for each GPU. This is a measure of the absolute peak throughput for the GPU's Tensor cores, as measured in trillions of operations per second, and on first impression, it would seem that RTX GPUs (GTX chips don't have these matrix/tensor units) are almost nothing but Tensor cores.

Relative peak AI TOPS for specific classes of GeForce graphics cards

(Image credit: Future)

However, the chart is rather misleading, because for each successive generation of Tensor cores, Nvidia has upgraded them to not only carry out more operations per second, but also expanded the data formats they support. The respective AI TOPS figures are for the smallest, and thus quickest, format each GPU can handle. So for the RTX 20-series figures, they're all in INT4, whereas for RTX 50, it's FP4 with FP32 accumulate, using sparsity.

To address this, I spent some time calculating the relative figures for when INT8 is used, if only to have an even playing field. This particular data format isn't used in DLSS nor gaming in general, but it's the one that I get the data most easily for.

Relative peak INT8 performance of the tensor cores in Nvidia GPUs

(Image credit: Future)

Yes, the number of Tensor cores has increased in the top-end chips, but the units themselves have become more capable, especially after the RTX 30-series. How far Nvidia will be able to push this with RTX 60-series is anyone's guess, but I suspect that they won't be any better in terms of operations per cycle, just more of them due to the increased die density.

Not that this is a problem, as such. Believe it or not, the tiny RTX 5060 has an INT8 figure that's only 6% lower than an RTX 2080's, so getting more of those cores will certainly benefit the performance of DLSS, even though it mostly uses FP16 and FP8 for upscaling (the data format used in frame generation isn't clear, unfortunately).

(Image credit: Nvidia)

The one thing Nvidia can't do is throw a massive pile of Tensor cores into the RTX 60-series GPUs, even though they take up relatively little die space by themselves, compared to the entire Streaming Multiprocessor (SM) structure that houses the CUDA cores.

At least, not without increasing the size of the register file in the SM. From Pascal through to Blackwell, each one has 64 kB worth of SRAM that stores the data the CUDA cores process while grinding through an operation. The Tensor cores also use that register file, so adding more to an SM could potentially cause problems with running out of registers.

For its AI data center Blackwell chips, Nvidia solved this by adding a dedicated 256 kB cache in the SM, purely for the Tensor cores, and it's possible that it could do something similar for its RTX 60-series chips. DLSS 4.5 doesn't really load up the Tensor cores all that much, but DLSS 5 (whatever its final form looks like) may well be the polar opposite.

This annotated die shot of the RTX 4090's AD102 GPU shows just how much space the L2 cache takes up (Image credit: Nemez / Fritzchens Fritz)

And it's a similar story when it comes to ray tracing. Each SM in every RTX GPU is home to a single 'RT core', but what's there and what it's capable of has significantly changed over the generations. I don't have any charts to show you for this one, but one only needs to compare the RT cores in Blackwell to the first iteration units in Turing to see that there will be more to come.

Fortunately, all that kind of stuff is pure logic, rather than a big slab of SRAM, so although the next-gen RT cores will be even more potent, I don't expect them to take up any more space than they currently do, relative to the rest of the SM.²

Putting it all together

	RTX 6090	RTX 6080	RTX 6070	RTX 6060
Die size	750 mm²	370 mm²	260 mm²	180 mm²
CUDA cores	32768	14734	7896	5346
FP32 TFLOPS	157	74	39	26
% increase to RTX 50 FP32	+50%	+31%	+25%	+37%
L2 cache	120	72	48	32
VRAM	32	24	16	12

Taking everything into account, and churning through various calculations with Excel, I've put down some ballpark figures for die size, CUDA count, L2 cache size, and VRAM capacity for the primary RTX 60-series models. I'm not suggesting that these are absolute figures or even targets; just think of them as being 'zones' for where I suspect the cards will fall into.

I think Nvidia will want to stick with similar die sizes that it already uses with Blackwell, but there's a good chance it could go smaller, especially if TSMC ends up charging a small fortune to use its N3 process node. If that turns out to be the case, then the above CUDA core counts are clearly going to be maximums, and the final numbers could be a good deal lower.

Due to the SRAM's poor scaling with node shrinks, I don't think next-gen Nvidia GPUs for consumers will be packing much more, if any, L2 cache than RTX 50-series chips currently do. Perhaps a little more for the higher-class models, but nothing outlandishly big.

(Image credit: Future)

So far, I've not said a word about VRAM, and that's because the global memory supply crisis has made that very difficult to judge. I do think that Nvidia will want to raise the capacities in some tiers, by using 3 GB GDDR7 modules instead of the usual 2 GB chips, if only to stave off some of the flak it got with the RTX 5060 and 5060 Ti.

However, I don't foresee any changes of note to the size of the aggregated memory bus width, i.e. the RTX 6060 will still be 128 bits, RTX 6070 will be 192 bits, and so on. The reason for this is that it will help to keep the number of VRAM modules required to the bare minimum, and in turn, help profit margins.

With memory prices still sky-high and showing no signs of drastically reducing anytime soon, we could even have another generation of RTX graphics cards where there's no increase in VRAM capacities at all.

(Image credit: Micron)

The biggest unknown, though, is how high the GPUs will be clocked. Although the RTX 40 and 50-series graphics cards enjoyed a substantial boost over the previous generations, I'm not certain that TMSC's N3 will afford the same luxury, so I wouldn't be surprised if the RTX 60-series launches with clocks similar to those in the RTX 40-series.

If that's the case, then we're potentially looking at between a 25 to 50% increase in compute performance, compared to the 50-series. That might seem like an outlandish suggestion to be making, especially given that the RTX 40-series had far bigger increases compared to the RTX 30-series. Apart from the RTX 4060, which was merely 19% higher, in terms of peak FP32 performance.

Of course, I could be wildly wrong here, and the bump in the number of CUDA cores is much bigger than I think it's going to be. But I'm not totally convinced it will be an almighty jump. The only product that Nvidia really needs to push the boat with is the RTX 6090, to keep on top of the AI prosumer demand. For the rest of the range, Team Green has little in the way of competition, snapping at its ankles, forcing to stay well ahead.

(Image credit: Nvidia)

That said, Nvidia can't rely on another DLSS moment to gift the RTX 60-series with seemingly outrageous levels of performance. Multi Frame Generation appears to be as performant as it's going to get, and Super Resolution upscaling certainly is. DLSS 5 is about changing the appearance of graphics, not outright boosting frame rates, so it's unlikely that Nvidia can lean on it to help out.

To push neural rendering and path tracing into the gaming masses, Nvidia will need to raise every aspect of its RTX GPUs, from CUDA count and cache levels to Tensor core performance and data bandwidth. Countering this are factors such as profit margin targets, process node costs, supply constraints for GDDR7, and a near-total lack of competition.

I've based my predictions on past data and trends, but with the semiconductor and PC markets being somewhat uncertain right now, I've also been somewhat cautious with my figures. All we need to do now is just wait for the inevitable 'leaks' and 'rumours' to see just how close to the mark I've been.

Data, not guessing: Looking at Nvidia's past GPUs to predict the specs for its future RTX 60-series graphics cards

Die size and process node

Relative die density for the last five generations of Nvidia gaming GPUs

Relative die sizes for a selection of Nvidia GeForce RTX graphics cards

CUDA cores and cache

Relative number of CUDA cores in selected tiers of Nvidia GeForce GTX/RTX graphics cards

Relative peak FP32 throughput for classes of GeForce graphics cards

Relative L2 cache levels in various GeForce RTX graphics cards

Tensor and Ray Tracing cores

Relative peak AI TOPS for specific classes of GeForce graphics cards

Relative peak INT8 performance of the tensor cores in Nvidia GPUs

Putting it all together

60 в новостях

У 60-летнего фаната случился сердечный приступ во время матча «Барселоны» на «Камп Ноу»

Обзор Marshall Heston 60 в России — производит впечатление реальной угрозы для Sonos

Владимир Торсуев дал интервью RT в преддверии 60-летия

Общество

Мюзикл у фонтана. Что посмотреть в новом сезоне ВДНХ

Слива-сахарная вата: сорт, который покорит детей и заставит ветки гнуться до земли

Вашингтон не отступает: в Минфине США пообещали новые способы «выкручивания рук» Москве

Бизнес

В Нижнем Тагиле закончился приём заявок на грантовый конкурс «ЕВРАЗ: город друзей — город идей!»

Это уже не суперкомпьютер, а гиперкомпьютер. Google представила то, что она называет AI Hypercomputer

Культура

«Никуда от нее не денется»: Лолита об отношениях Лепса с бывшей женой

Трейлер романтической комедии «За любовь» с Ксенией Бородиной — премьера 30 июля

Спорт

Никита Хайкин назвал сильнейших игроков Лиги чемпионов, отметив Александра Головина

Главный тренер «Ахмата» Черчесов: Хлусевич после прихода к нам похудел на 3,5 килограмма

Политика

Эрдоган лично следит за расследованием дел со стрельбой в школах

V Саммит Россия – АСЕАН пройдёт в Казани

Авторы

Владимир Семенов поздравил с 90-летием Валентину Казакову — «человека-эпоху»

Шапка, которая читает мысли: стартап Sabi готовит запуск

Новости тенниса