Discussion RDNA 5 / UDNA (CDNA Next) speculation

soresu · Nov 14, 2025

reaperrr3 said:
it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.

Oh thank god, it's like they were just begging for nVidia to point it out.

Kepler_L2 · Nov 14, 2025

reaperrr3 said:
RDNA4 (and RDNA3) already have 128 ALUs per CU, it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
RDNA5 will have the same ALU count per WGP/CU, only with much better utilisation.

They still advertise dual-issue FLOPs, what they don't do is advertise 2x number of "streaming processors" because they don't want another Bulldozer lawsuit. I expect that will change with RDNA5

marees · Nov 14, 2025

reaperrr3 said:
The Medusa 3nm-SoC-Die only still has an NPU because it'll be used as stand-alone APU as well, and its small RDNA3.5 IGP can't handle AI workloads.

What is the spec of medusa point 1 monolithic apu ?

Is it 4 + 8 + 2 ??

Kepler_L2 · Nov 14, 2025

marees said:
What is the spec of medusa point 1 monolithic apu ?

Is it 4 + 8 + 2 ??

4x Zen6 4x Zen6c 2x Zen6LP

marees · Nov 14, 2025

Kepler_L2 said:
4x Zen6 4x Zen6c 2x Zen6LP

Thx. One more question:

Do medusa halo & medusa premium share any chiplets at all ?

Kepler_L2 · Nov 14, 2025

marees said:
Thx. One more question:

Do medusa halo & medusa premium share any chiplets at all ?

No

dangerman1337 · Nov 14, 2025

Tigerick said:
Updated RDNA5-AT2 Lineup Speculation

View attachment 133874

As explained in SWV thread, RDNA5 will get double SP per CU. Thus AT2 with max 70CU will get 140CU in old format. That's explain 20% faster performance than RTX4080. It also means AT2 GPU is severely bounded by memory bandwidth.

Therefore, AMD do not need to clock as high as RDNA4, I am expecting 2GHz+ not ~3GHz. It also means AT2 has headroom to grow. That's why I suspect AMD is reserving XTX model with full die of 70CU for future 40Gbps and 4GB GDDR7 die to appear. That explains the cancellation of AT1 cause AT2 XTX is good enough to compete with upcoming Rubin-70Ti with 24GB 256-bit memory bus.

There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV 😛: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.

That's why I am predicting AMD will set higher price point for AT2-70XT and AT2-70. AMD will keep selling RX9070XT until NV able to launch Rubin-60.

AMD most likely will keep selling N48 in the form of 9070GRE by then. And no, AT3 and AT4 are NOT for cheap dGPU lineup, period. Now that we know Medusa will have XDNA3, where do you think the NPU will reside in AT3, huh? 😉

Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.

reaperrr3 · Nov 14, 2025

dangerman1337 said:
Unless AMD has a reason to cut 2 CUs out there.

If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.

maddie · Nov 15, 2025

dangerman1337 said:
Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.

Yield?

ETI4711 · Nov 16, 2025

Tigerick said:
There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV 😛: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.

At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.

Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.

marees · Nov 16, 2025

ETI4711 said:
At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.

Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.

Still doesn't rule out xbox

I believe xbox has a great opportunity to release RDNA 5 hardware (magnus) along with GTO (provided the software guys are ready for it)

marees said:
Valve is coming for Microsoft’s lunch. The xbox team needs to communicate their strategies better to the public. There is now a decent window to strike if they launch next gen in another 1 years time (25th anniversary) & capitalize on the GTO delay. Xbox next could be the best way to play GTO — if they manage to fix the windows-xbox software merge in time by then

Any next-gen Xbox demands a UI that works. In that way, Valve is already far ahead of whatever is happening at Microsoft. Valve’s Linux-based SteamOS software is easier to navigate on handhelds, and it could be coming to more devices like a VR headset or a PC-like console. Microsoft “has always been chasing Valve,” video game researcher and NYU Stern School of Business professor Joost van Dreunen told Gizmodo. It’s a smaller, more agile company run by ex-Microsoft programmer Gabe Newell. Steam—which most developers think is a monopoly—makes so much money, you could consider it a yacht factory for the Valve CEO. It’s not likely to sell out to Microsoft or anybody else any time soon.

Win2012R2 · Nov 16, 2025

marees said:
I believe xbox has a great opportunity to release RDNA 5 hardware (magnus)

No chance with current RAM/NAND trends for the next couple of years, frankly they won't sell many even with old pricing and even if they could match whatever Sony's cooking: they've lost the war. Sony themselves recently said that console cycles getting longer, it's no brainer for them to wait till 2029 and get proper upgrade on N2.

gdansk · Nov 16, 2025

Win2012R2 said:
No chance with current RAM/NAND trends for the next couple of years

When's the last time dram prices remained stupid for 2 years?

eek2121 · Nov 17, 2025

gdansk said:
When's the last time dram prices remained stupid for 2 years?

We are in a bubble. The last bubble (crypto) absolutely wrecked GPU affordability. We will never recover from that. The same could happen for DRAM pricing, or the AI bubble could pop and prices will drop due to a large amount of product being on the market.

The future is unpredictable.

My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.

ToTTenTranz · Nov 17, 2025

eek2121 said:
My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.

I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.

I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.

lightmanek · Nov 17, 2025

ToTTenTranz said:
I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.

I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.

Recently, investing in gold is almost as profitable as investing in RAM .... go for it 😀

Win2012R2 · Nov 17, 2025

gdansk said:
When's the last time dram prices remained stupid for 2 years?

This time it's different ™️

Console release needs at least 12-18 months price fix, but already 2026 is sold out, and no doubt 2027 will sell out very soon, then a year of slow price decreases at best, so we are now looking at 2028 at best.

Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.

adroc_thurston · Nov 17, 2025

Win2012R2 said:
Console release needs at least 12-18 months price fix

Not really, no.

dangerman1337 · Nov 17, 2025

reaperrr3 said:
If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.

I thought Kepler L2 speculated it was 72 CUs?

Win2012R2 said:
Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.

Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...

reaperrr3 · Nov 18, 2025

dangerman1337 said:
I thought Kepler L2 speculated it was 72 CUs?

It sounded like kepler is only sure about the SE count, not the CU count, while MLID's info says 70 CUs for the full chip, of which 68 will be enabled on Xbox Next and possibly as few as 64 enabled for desktop cards (may depend on yields).

MLID's info could be wrong, of course, but some of the info in his leaked slide on ATx was already confirmed by others later, so there's a chance the 70CU-info could turn out to be correct, too.

Win2012R2 · Nov 18, 2025

dangerman1337 said:
Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...

Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.

Vikv1918 · Nov 18, 2025

Win2012R2 said:
Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.

CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.

Win2012R2 · Nov 18, 2025

Vikv1918 said:
CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.

It can be, but that will be standard Perf mode that will work on PS5 too, assuming it's still CPU bottleneck for their logic, chances are they shifted a bunch to GPU, if that's humanly possible

marees · Nov 18, 2025

MrMPFR said:
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...

No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?

developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.

Source: VKD3D-Proton

https://videocardz.com/newz/vkd3d-proton-3-0-brings-amd-fsr4-to-proton-with-an-experimental-support-for-older-gpus

MrMPFR · Nov 19, 2025

marees said:
developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.

Source: VKD3D-Proton

https://videocardz.com/newz/vkd3d-proton-3-0-brings-amd-fsr4-to-proton-with-an-experimental-support-for-older-gpus

Didn't know emulating Work Graphs was even possible. Compute shaders FTW!

Also look at the performance characteristics of vkd3d-proton / RADV based emulation vs native (supported RDNA 3+ and Ampere+) below. I didn't expect this but maybe WG is that early in development:

SimpleClassify

v / >	FPS (native)	FPS (emulated)
RX 7600	80	~500
RX 6800	n/a	1339
RTX 4070	400	650

AMD's Compute shader rasterizer

v / >	ms (WG-native)	ms (WG-emulated)	ms (EI-native)	ms (EI-emulated)
RX 7600	1.7	0.9	2.9	1
RTX 4070	0.55	n/a	3.1

NVIDIA's Work Graph demo

v / >	ms (native-WG)	ms (native-compute dispatch/EI)	ms (proton-WG)	ms (proton-compute dispatch/EI)
RTX 4070	3.2	3.1	5.5	3.9
RX 7600	6.8	5.8	5.1	5.8

Source: VK3D-Proton/docs/workgraphs.md

Maybe there's a chance we'll see workgraphs adoption a lot sooner than post-crossgen?

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Senior member

Member

Diamond Member

Member

Platinum Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Senior member

Member

Golden Member

Member

Golden Member

Platinum Member

Senior member