• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 71 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
RDNA4 (and RDNA3) already have 128 ALUs per CU, it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
RDNA5 will have the same ALU count per WGP/CU, only with much better utilisation.
They still advertise dual-issue FLOPs, what they don't do is advertise 2x number of "streaming processors" because they don't want another Bulldozer lawsuit. I expect that will change with RDNA5
 
Updated RDNA5-AT2 Lineup Speculation

View attachment 133874

  • As explained in SWV thread, RDNA5 will get double SP per CU. Thus AT2 with max 70CU will get 140CU in old format. That's explain 20% faster performance than RTX4080. It also means AT2 GPU is severely bounded by memory bandwidth.
  • Therefore, AMD do not need to clock as high as RDNA4, I am expecting 2GHz+ not ~3GHz. It also means AT2 has headroom to grow. That's why I suspect AMD is reserving XTX model with full die of 70CU for future 40Gbps and 4GB GDDR7 die to appear. That explains the cancellation of AT1 cause AT2 XTX is good enough to compete with upcoming Rubin-70Ti with 24GB 256-bit memory bus.
  • There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV 😛: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.
  • That's why I am predicting AMD will set higher price point for AT2-70XT and AT2-70. AMD will keep selling RX9070XT until NV able to launch Rubin-60.
  • AMD most likely will keep selling N48 in the form of 9070GRE by then. And no, AT3 and AT4 are NOT for cheap dGPU lineup, period. Now that we know Medusa will have XDNA3, where do you think the NPU will reside in AT3, huh? 😉
Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.
 
Unless AMD has a reason to cut 2 CUs out there.
If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.
 
Last edited:
There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV 😛: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.
At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.

Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.
 
At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.

Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.
Still doesn't rule out xbox

I believe xbox has a great opportunity to release RDNA 5 hardware (magnus) along with GTO (provided the software guys are ready for it)

Valve is coming for Microsoft’s lunch. The xbox team needs to communicate their strategies better to the public. There is now a decent window to strike if they launch next gen in another 1 years time (25th anniversary) & capitalize on the GTO delay. Xbox next could be the best way to play GTO — if they manage to fix the windows-xbox software merge in time by then

Any next-gen Xbox demands a UI that works. In that way, Valve is already far ahead of whatever is happening at Microsoft. Valve’s Linux-based SteamOS software is easier to navigate on handhelds, and it could be coming to more devices like a VR headset or a PC-like console. Microsoft “has always been chasing Valve,” video game researcher and NYU Stern School of Business professor Joost van Dreunen told Gizmodo. It’s a smaller, more agile company run by ex-Microsoft programmer Gabe Newell. Steam—which most developers think is a monopoly—makes so much money, you could consider it a yacht factory for the Valve CEO. It’s not likely to sell out to Microsoft or anybody else any time soon.
 
I believe xbox has a great opportunity to release RDNA 5 hardware (magnus)
No chance with current RAM/NAND trends for the next couple of years, frankly they won't sell many even with old pricing and even if they could match whatever Sony's cooking: they've lost the war. Sony themselves recently said that console cycles getting longer, it's no brainer for them to wait till 2029 and get proper upgrade on N2.
 
When's the last time dram prices remained stupid for 2 years?

We are in a bubble. The last bubble (crypto) absolutely wrecked GPU affordability. We will never recover from that. The same could happen for DRAM pricing, or the AI bubble could pop and prices will drop due to a large amount of product being on the market.

The future is unpredictable.

My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.
 
My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.
I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.


I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.
 
I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.


I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.
Recently, investing in gold is almost as profitable as investing in RAM .... go for it 😀
 
When's the last time dram prices remained stupid for 2 years?
This time it's different ™️

Console release needs at least 12-18 months price fix, but already 2026 is sold out, and no doubt 2027 will sell out very soon, then a year of slow price decreases at best, so we are now looking at 2028 at best.

Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.
 
If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.
I thought Kepler L2 speculated it was 72 CUs?
Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.
Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...
 
I thought Kepler L2 speculated it was 72 CUs?
It sounded like kepler is only sure about the SE count, not the CU count, while MLID's info says 70 CUs for the full chip, of which 68 will be enabled on Xbox Next and possibly as few as 64 enabled for desktop cards (may depend on yields).

MLID's info could be wrong, of course, but some of the info in his leaked slide on ATx was already confirmed by others later, so there's a chance the 70CU-info could turn out to be correct, too.
 
Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.
CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.
 
CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.
It can be, but that will be standard Perf mode that will work on PS5 too, assuming it's still CPU bottleneck for their logic, chances are they shifted a bunch to GPU, if that's humanly possible
 
Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...



No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.

One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."

This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.

Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.

RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.

This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.

As for SWC that is probably most beneficial to coalescing launches.

On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."

RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.

@Kepler_L2 is this accurate or am I missing some important details?
developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.

Source: VKD3D-Proton

 
developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.

Source: VKD3D-Proton

Didn't know emulating Work Graphs was even possible. Compute shaders FTW!

Also look at the performance characteristics of vkd3d-proton / RADV based emulation vs native (supported RDNA 3+ and Ampere+) below. I didn't expect this but maybe WG is that early in development:


SimpleClassify
v / >FPS (native)FPS (emulated)
RX 760080~500
RX 6800n/a1339
RTX 4070400650


AMD's Compute shader rasterizer
v / >ms (WG-native)ms (WG-emulated)ms (EI-native)ms (EI-emulated)
RX 76001.70.92.91
RTX 40700.55n/a3.1


NVIDIA's Work Graph demo

v / >ms (native-WG)ms (native-compute dispatch/EI)ms (proton-WG)ms (proton-compute dispatch/EI)
RTX 40703.23.15.53.9
RX 76006.85.85.15.8

Source: VK3D-Proton/docs/workgraphs.md


Maybe there's a chance we'll see workgraphs adoption a lot sooner than post-crossgen?
 
Back
Top