Discussion RDNA 5 / UDNA (CDNA Next) speculation

del42sa · Mar 14, 2026

marees said:
dual issue on RDNA 5 will be more effective

https://twitter.com/x/status/2032488613414367722

RDNA6 going to have unified rentable units, with SMT4 like operation and will quadruple performance and brand new Primitive Shader² ® with advanced Radiance AI Cores ® which going to make Jensenn begg for mercy 😎 😛

approved by MLID ®

MrMPFR · Mar 14, 2026

del42sa said:
RDNA6 going to have unified rentable units, with SMT4 like operation and will quadruple performance and brand new Primitive Shader² ® with advanced Radiance AI Cores ® which going to make Jensenn begg for mercy 😎 😛

approved by MLID ®

You forgot 7ghz

RnR_au · Mar 14, 2026

del42sa said:
RDNA6 going to have unified rentable units, with SMT4 like operation and will quadruple performance and brand new Primitive Shader² ® with advanced Radiance AI Cores ® which going to make Jensenn begg for mercy 😎 😛

approved by MLID ®

You forgot AMD RamDoubler(tm)

MrMPFR · Mar 15, 2026

MrMPFR said:
Unfortunately that likely means FSR5 won't be able to run on RDNA 4 cards.

ML perf and feature set delta between GFX12 and GFX13 is too big. Like Adroc has already said FSR Diamond will be RDNA 5+ exclusive.
Hunyh also said "natively optimized for Project Helix". Translation: FSR porting aint gonna happen.

MrMPFR · Mar 15, 2026

MrMPFR said:
Looks like this extends the concept from the 2020 Shared L1 paper to register files.

L1 shared paper via Work Distribution Crossbar implementation under 5.3 provided 140% perf uplift for P-GEMM. Much larger uplift than 16X L1 private or theoretical perf with zero latency and replication with slow mesh interconnect.

Making L2 bigger doesn't help because "because performance is limited by the L2 reply bandwidth bottleneck [49, 73, 74]. Such a bottleneck is relieved with Shared++ and DynEB as the shared L1 organization utilizes the remote cores as an additional source of bandwidth."
Put differently it's basically free performance without exploding cache sizes for shared friendly workloads + section 5.5 shows progressive IPC scaling with more CUs, because remote core BW scales accordingly.

For other ML the mediocre mesh baseline already give huge gains for U-nets such as AN, RN and SN. With NoC + higher clock the gains would be larger. For example L1 decoupled follow up paper beefed things up with NoC + higher clocks. 8X vs ~2X for AN. SN 5X vs 4X.
If AMD are smart they also use Work Distribution Crossbar here for inter-SE communication to bypass L2 completely. Good for workload re-distribution (load balancing) too. Work graphs and many other workloads would benefit.
There's way lower BW pressure on L2 since distributed L0 acts as quasi-L2. Considering everything is SE local scratchpad L2 spill-over prob done too. Tiny L2's for ATx makes more sense if this is true and likely more passive role (LLC) than existing L2. They might get away with slower and denser cache.

NVIDIA have shared DSMEM since Hopper on DC side, since Blackwell on consumer side. Prob over ringbus within each GPC.
Beefing up that ringbus and permitting the GPU to send the instruction and transfer the register data on on it in addition to LDS data. This enhances VGPR with remote core BW and bypass existing cachemem bottlenecks as discussed previously. Memory wall problem addressed on three fronts.

Unconfirmed but we'll know in ~3 months if CDNA5 exceeds Blackwell baseline for cachemem. Regardless these kind of insane perf figures from simulations are too good to ignore. Hope RDNA 5 goes for all three (L0s, LDS and VGPR) because it's basically free IPC.

Tachyonism · Mar 15, 2026

MrMPFR said:
ML perf and feature set delta between GFX12 and GFX13 is too big. Like Adroc has already said FSR Diamond will be RDNA 5+ exclusive.
Hunyh also said "natively optimized for Project Helix". Translation: FSR porting aint gonna happen.

Best case scenario: exclusive to RDNA5+ cards
Worst case scenario: exclusive to Helix family

MrMPFR · Mar 15, 2026

Tachyonism said:
Best case scenario: exclusive to RDNA5+ cards
Worst case scenario: exclusive to Helix family

Helix uses AT2.
It's gonna work across the ATx stack and on PS6 (supplementary to their PSSR models).

basix · Mar 15, 2026

AMD should really make FSR4 work on RDNA3 and FSR5 on RDNA4. Compared to not even running it, it is much better when it runs - eventhough very slowly. It just gives the much better impression, even if RDNA3/4 owners might not like the performance hit.

adroc_thurston · Mar 15, 2026

Tachyonism said:
Worst case scenario: exclusive to Helix family

Helix uses off the shelf AT2.

basix said:
AMD should really make FSR4 work on RDNA3 and FSR5 on RDNA4. Compared to not even running it, it is much better when it runs - eventhough very slowly. It just gives the much better impression, even if RDNA3/4 owners might not like the performance hit.

The install base is so marginally tiny they don't have to port anything.

1250 · Mar 15, 2026

It looks like there is sufficient capacity for bdie(SF4X?)
It seems the Samsung Foundry team is firmly convinced that AMD won't be giving them any orders

marees · Mar 15, 2026

1250 said:
It looks like there is sufficient capacity for bdie(SF4X?)
It seems the Samsung Foundry team is firmly convinced that AMD won't be giving them any orders

sf4 was supposed to be sonoma valley (mendocino replacement) right ?

i hope one or two low end zen 7 socs ( especially the 15watt grimlock point 4) find their way to samsung 2nm

1250 · Mar 15, 2026

I suspect it was cancelled, as it seems to have been replaced by Shockwave and Bumblebee

marees · Mar 15, 2026

1250 said:
I suspect it was cancelled, as it seems to have been replaced by Shockwave and Bumblebee

both soundwave & bumblebee on tsmc rather than samsung

plus in leaked amd roadmaps bumblebee doesn’t replace mendocino but sits a tier above that

mendocino lives until 2029 / zen 7 grimlock point 4

ETI4711 · Mar 15, 2026

On these AMD uses Samsung rumors:

AMD CEO "Do You Trust Korean Media?"... Samsung Emphasizes 'TSMC Alliance' Amid AMD 3nm Order Reports - The Asia Business Daily

Lisa Su, CEO of AMD and one of the Taiwanese-American executives representing the U.S. semiconductor industry, visited Taiwan on the 18th for the first...

view.asiae.co.kr

Whenever AMD executives are asked about their collaboration with TSMC, they speak very highly of their partnership with TSMC.

Whenever Lisa Su is asked whether AMD will have its chips manufactured by Intel or Samsung, or is considering doing so, she offers a few polite platitudes and sings the praises of the excellent working relationship with TSMC.

marees said:
i hope one or two low end zen 7 socs ( especially the 15watt grimlock point 4) find their way to samsung 2nm

Why would AMD do that?

Is it because Samsung’s wafers are supposed to be so cheap? Why do you think they’re so cheap?

AMD was prepared to pay GF for every wafer it had manufactured by TSMC using the 7 nm process. I wonder why?

A nice video; it gives a sense that the manufacture of semiconductor chips involves far more than just processes and fabs:

1250 · Mar 15, 2026

It seems like even the 4nm FinFET process doesn't have many customers. They still seem to think that C-BaseDIE is out of the question.

ETI4711 · Mar 15, 2026

1250 said:
They still seem to think that CBDIE is out of the question.

What is "CBDIE"?

1250 · Mar 15, 2026

"According to an industry source, Samsung and AMD are reportedly discussing a joint announcement regarding their partnership. The source noted that this move is expected to reaffirm the strategic significance of Samsung's semiconductor division."

1250 · Mar 16, 2026

I'm not sure if I should say this here, but Korean memory guys will likely be shocked by the news of Micron supplying hbm4.

SolidQ · Mar 16, 2026

Wonder if consoles with RDNA5 will have enought AI power for AI slop like DLSS5

ETI4711 · Mar 16, 2026

1250 said:
I'm not sure if I should say this here, but Korean memory guys will likely be shocked by the news of Micron supplying hbm4.

If by "Korean memory guys" you mean SK Hynix and Samsung, then they probably weren't shocked. I think SK Hynix and Samsung are very well positioned to assess Micron's capabilities.

If you mean the guys who have been spreading the word for months that Micron can't deliver... Who takes these guys seriously?

1250 · Mar 16, 2026

ETI4711 said:
Who takes these guys seriously?

Unfortunately, they are denying the claims, stating that the Qual(qualification tests) have not yet been completed
As far as I know, SemiAnalysis is making those claims overseas(?)

MrMPFR · Mar 16, 2026

SolidQ said:
Wonder if consoles with RDNA5 will have enought AI power for AI slop like DLSS5

Hope they ignore this BS. Even if it becomes flawless I don't want every game to look like the same Photorealistic game (UE5 bland look 2.0). Per-game training is possible but ain't gonna happen.

Do we have any idea about roughly what kind of GEMM throughput RDNA 5 will achieve compared to RDNA 4 and 50 series? Not talking about on paper TFLOPs here but actual perf differences assuming same number of CUs/SMs and clocks.

reaperrr3 · Mar 17, 2026

SolidQ said:
Wonder if consoles with RDNA5 will have enought AI power for AI slop like DLSS5

"Power" is only a problem if they want to offer something similar.
I suspect FSR Diamond will be somewhere between DLSS 4.5 and 5:
More like 4.5 for stuff like faces (no AI "upgrades" that make characters look different), but with some RT-related stuff similar to DLSS 5, so they can offer lighting/shading quality equivalent to full RT/PT at a fraction of the performance cost.

MrMPFR said:
Do we have any idea about roughly what kind of GEMM throughput RDNA 5 will achieve compared to RDNA 4 and 50 series? Not talking about on paper TFLOPs here but actual perf differences assuming same number of CUs/SMs and clocks.

I'm not sure we even have numbers on on-paper TFLOPs yet (at least not for lower-precision GEMM formats), and the new CUs likely have seen some significant changes to the VGPR and/or caches as well, so I doubt anyone can give even a rough estimate except "higher than RDNA4".

I'd expect substantial improvements in any case, since RDNA5 is AMD's first architecture whose dev cycle fell square into the AI craze, and I doubt MI4xx would've gotten as many early orders if the GFX1250/13xx CU design wasn't a substantial improvement for some AI workloads.

rntongo · Mar 17, 2026

SolidQ said:
Wonder if consoles with RDNA5 will have enought AI power for AI slop like DLSS5

I think these two videos from coreteks are really good for gauging that

and

He was pretty spot on predicting 6090 will focus on neural rendering!!

Dragonetti031 · Mar 17, 2026

I'm not an expert , but I wanna share this so more knowledgeable people here can talk about it. 😉

Techpowerup article -> AMD "RDNA 5" to Heavily Boost Shader Performance in Games with New Dual-Issue Pipeline.

Source -> RDNA 5 may allow dual issues to work in more cases, making it easier to achieve peak FP32 performance.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Member

Senior member

Diamond Member

Senior member

Senior member

Member

Senior member

Senior member

Diamond Member

Member

Platinum Member

Member

Platinum Member

Member

Member

Member

Member

Member

Golden Member

Member

Member

Senior member

Member

Junior Member

Junior Member

​

​