Discussion RDNA 5 / UDNA (CDNA Next) speculation

jpiniero · Jan 31, 2026

reaperrr3 said:
LP6-10667@288bit would still give 480 GB/s

I think that's at 14.4 and not 10.7. (12 channel LPDDR6)

basix · Jan 31, 2026

reaperrr3 said:
LP6-10667 offers higher bandwidth per channel (equal to hypothetical 16 GT/s LP5X), probably needs less voltage, and will be produced by all 3 memory manufacturers, so probably be cheaper per GB.
So even cutting the interface of AT3 to 75% width and go with 24GB LP6-10667@288bit might still be a better overall solution than 16GB of this "Ultra-Pro" (probably also Ultra-expensive) Samsung-only LP5X-12700.

LP6-10667@288bit would still give 480 GB/s, and 24GB has less risk of the PCIe-interface ever becoming a bottleneck.
Full config AT3 will likely perform around 9070 and has only 8x PCIe, so putting only 16GB on it may actually be risky.

Good point. With LPDDR6 you could use a narrower than full-spec memory bus width.

Anther option could be 12.8 Gbps at 240bit. Same bandwidth, only 20 GByte VRAM.

reaperrr3 said:
Are we sure it works that way?
We've had DeltaColorCompression and internal compression on GPUs with ongoing improvements for over a decade, but it never really reduced VRAM capacity requirements in any noticeable way, only bandwidth efficiency.
The only way it could reduce capacity needs would be if data is stored compressed even in VRAM.

If you do not reduce the asset size in DRAM, you do not save DRAM bandwidth.

Win2012R2 said:
That's what AMD implies will happen - decompression is a lot quicker than (good) compression, so doing compression once when placing asset in memory makes sense.

You also need to have a HW accelerated compressor. If you modify data and want to write it back to higher level caches or DRAM you need to compress it.

jpiniero said:
I think that's at 14.4 and not 10.7. (12 channel LPDDR6)

10.7 Gbps are correct. LPDDR6 is 1.5x wider than LPDDR5(X). With 14.4 Gbps at 384bit (256bit with LPDDR5) you would get 864 GB/s brutto (net is less due to encoding overheads).

jpiniero · Jan 31, 2026

basix said:
10.7 Gbps are correct. LPDDR6 is 1.5x wider than LPDDR5(X).

https://www.jedec.org/sites/default/files/Brett%20Murdock_FINAL_Mobile_2024.pdf

This says 10.7 is 28 GB/sec effective per channel. So 12 channel would be 342 GB/sec.

Regardless it's clear that if any of these LPDDR6 dGPUs end up shipping, the marketing is going to be entirely around VRAM. Which could blow up in AMD's face if it ends up being more expensive than anticipated. Maybe AMD would just cancel if that ends up happening and ship the GDDR7 parts only.

Could still work out. I am not expecting much from the Rubin lower end cards and those will have less VRAM for sure.

adroc_thurston · Jan 31, 2026

basix said:
With LPDDR6 you could use a narrower than full-spec memory bus width.

No?
The comboPHY is very much fixed-width.

jpiniero said:
Maybe AMD would just cancel if that ends up happening and ship the GDDR7 parts only.

those are mobile APUs.

branch_suggestion · Jan 31, 2026

reaperrr3 said:
We don't know yet if desktop AT2 will get more than the 64 active CUs the leaked slide from MLID suggested, in that case it'd only be 33% more CUs.

AT2 is 70CU/35WGP(old).
4SE/8SA configured in 4/5, 4/5, 4/5, 4/4 (USR PHY is in the way presumably).
Config triggers me very much but BOM savings are vital in high volume parts.

basix · Feb 1, 2026

adroc_thurston said:
The comboPHY is very much fixed-width.

Sure you can. Not by reducing the width of a channel, but by simply not using all channels. The same thing gets done on salvaged GPUs since ages 😉

Edit:

branch_suggestion said:
AT2 is 70CU/35WGP(old).
4SE/8SA configured in 4/5, 4/5, 4/5, 4/4 (USR PHY is in the way presumably).
Config triggers me very much but BOM savings are vital in high volume parts.

72 CU makes more sense to me. 24 per SE. Or more accuratly 36 / 12 CU in RDNA5 terms.

AT0 = 8 SE, 96 CU, 512bit GDDR7 @ 32 Gbps, ~2048 GB/s, 32 GByte or more
AT2 = 3 SE, 36 CU, 192bit GDDR7 @ 32 Gbps, ~768 GB/s, 18...24 GByte
AT3 = 2 SE, 24 CU, 288bit LPDDR6 @ 10.7 Gbps // 240bit LPDDR6 @ 12.8 Gbps, ~480 GB/s, 20...24 GByte
- Mobile: 256bit LPDDR5X @ 8.5...9.6 Gbps, ~340...384 GB/s
AT4 = 1 SE, 12 CU, 128bit LPDDR5X @ 9.6...10.7 Gbps, ~192...210 GB/s, 12...16 GByte
- Mobile: 128bit LPDDR5X @ 8.5...9.6 Gbps, ~170...192 GB/s

AT3 and AT4 are difficult to pin down regarding DRAM interface. Fewer DRAM packages due to LPDDR6 might be nice (smaller PCB area occupied). But you could partly solve that by using x64 instead of x32 packages.

adroc_thurston · Feb 1, 2026

basix said:
Sure you can. Not by reducing the width of a channel, but by simply not using all channels. The same thing gets done on salvaged GPUs since ages 😉

GPUs use a heavily segmented memory shoreline.
An LPDDR comboPHY is a one thingy. It either is or isn't.

reaperrr3 · Feb 1, 2026

MrMPFR said:
Assumed it had an impact due to poor 9070 -> 9070XT perf scaling is in raster games. ~12% at 4K according to TPU. That's only half of ~25% compute gain (based on TPU avg. game clock).
RT games and blender show bigger increases but still only ~15% avg and ~18% respectively.
Guess the issue is somewhere else.

Can be all kinds of things:
- slight CPU / driver overhead limitations
- slight command processor limitations
- Primitive / geometry throughput (tied to SE count)
- L1/L2/L3/mem bw/capacity holding back the XT a bit more, the latter might've needed GDDR7 to fully stretch its legs
- TPU's game selection is a bit meh

Keep in mind that some things like the driver/CPU, cache and mem bw are soft limits. They don't hard-cap your FPS, rather some frames are just computed slightly slower than they'd be under ideal circumstances.

Perf scaling of adding CUs/SMs has been at only ~60-75% since forever.

Also, according to computerbase.de, the 9070XT is actually 14% faster than the 9070 at 4K (even 16% in RT), with only ~14.3% more CUs.
For comparison, the 5080 is only 15% faster than the 5070Ti in the same test (only 12% in RT), despite 20% more SM, 33% more L2 and slightly faster VRAM vs the latter.

MrMPFR · Feb 1, 2026

reaperrr3 said:
Can be all kinds of things:
- slight CPU / driver overhead limitations
- slight command processor limitations
- Primitive / geometry throughput (tied to SE count)
- L1/L2/L3/mem bw/capacity holding back the XT a bit more, the latter might've needed GDDR7 to fully stretch its legs
- TPU's game selection is a bit meh

- 5090 scaled far higher
- Possible but 9060XT -> 9070XT showed better perf scaling. Maybe 14 -> 16 CUs + higher clocks where issue arises.
- IDK
- Possible.
- If you know someone else that compiles entire stack for averages let me know.

reaperrr3 said:
Perf scaling of adding CUs/SMs has been at only ~60-75% since forever.

Clock adjusted 9060XT -> 9070XT falls short by only ~5%. +90% TFLOP/FPS scaling at 4K.

Localizing basically everything to SEs in RDNA5 should solve that.

reaperrr3 said:
Also, according to computerbase.de, the 9070XT is actually 14% faster than the 9070 at 4K (even 16% in RT), with only ~14.3% more CUs.

Factor in clockspeed as well. Still nowhere near 25%.TPU had higher averages for RT as well.

reaperrr3 said:
For comparison, the 5080 is only 15% faster than the 5070Ti in the same test (only 12% in RT), despite 20% more SM, 33% more L2 and slightly faster VRAM vs the latter.

Yeah NVIDIA has serious problems with core scaling.

reaperrr3 · Feb 2, 2026

MrMPFR said:
- 5090 scaled far higher

But below 4K it scaled worse than any GPU before in terms of real-world uplift vs. paper advantage, and even at 4K the scaling isn't always as good as it should be in theory.

Not every game or even every scene within a game is uniformly CPU-/driver-heavy.
Anyway, just a guess, probably a minor factor with the XT, if any, I'll give you that.

MrMPFR said:
Clock adjusted 9060XT -> 9070XT falls short by only ~5%. +90% TFLOP/FPS scaling at 4K.

That's not really an apples-to-apples comparison at 4K.

Due to how LLC influences bandwidth scaling, at 4K a 32MB IF$ will have far less than half the hitrate of a 64MB IF$ at that resolution, same for the L2 fwiw, so the 9060XT might be (more) bandwidth-limited in some games at 4K than the 9070XT, skewing the result in the latters' favor.

The other issue is: I was talking about scaling with CUs/SMs in isolation in otherwise identical configs, aka largely same number of SEs/GPCs/ROPs etc.
The full dies have never lived up to their theoretical TFLOPs advantage over their salvage parts... unless the latter were also crippled in cache and bandwidth, like the 7900GRE (which btw has terrible scaling over the 7800 XT too, despite 67% more SEs and ROPs).

Also, kepler said ROPs are rarely a bottleneck in modern games. That doesn't mean they have zero influence, it just means they aren't THE reason the 9070XT isn't that far ahead of the 9070.

I think it's ultimately more a "death by a thousand cuts" thing:
No cache advantage, no mem bw advantage, no frontend advantage (except from clocks), no backend advantage (except from clocks).
CUs and SM simply aren't the only factor in the performance equation, they probably make up exactly what I suggested, 60-75% in most games, maybe a little more in some, maybe even less in others, but not 95%+.

RDNA5 might improve the scaling, but I think it'll only be a little better, it won't magically result in 10% more perf/clk with 10% more CUs.

MrMPFR · Feb 3, 2026

Yes you're right I'll admit 9060XT -> 9070XT is a bad comparison at 4K + SE/GPC scaling is different from SM/CU scaling within a GPC/SE.
Most extreme example I can think of is 3070 TI -> 3080.
5070 -> 5070 TI as well. Compute =/= FPS.

reaperrr3 said:
RDNA5 might improve the scaling, but I think it'll only be a little better, it won't magically result in 10% more perf/clk with 10% more CUs.

I was referring to SE scaling (4090/5090 perf/SM deterioration), but yes assuming same frontend/backend and clocks scaling won't be perfect.

Extrapolating from your 60-75% estimate RDNA5 needs a massive compute increase per CU to make up for -50% SE with AT4 vs Navi 44, unless the redesigned frontend/backend has massively boosted throughput through brute force and/or efficiency. Maybe this is an indication:

Kepler_L2 said:
256 for both but RDNA5 has much fewer dual-issue restrictions.

With +50% CUs/SE (except for AT2) and unleashed dual-issue maybe RDNA5 is AMD's Ampere moment for compute?

marees · Feb 3, 2026

Digital Foundry's take on is RDNA 5 being overhyped

One of the best signs is that major console platforms have promised to use RDNA5 for their next-generation hardware. That choice shows a lot of faith in the underlying infrastructure and is a big commitment. From our point of view, this alone suggests that RDNA 5 will be able to handle tough long-term needs.

AMD RDNA 5 Architecture: What the Current Information Really Suggests | NoobFeed

There has been talk about whether AMD's RDNA5 design is being praised too much. People are often afraid of what might happen if new AMD GPUs get too much att...

www.noobfeed.com

adroc_thurston · Feb 3, 2026

marees said:
Digital Foundry's take on is RDNA 5 being overhyped

One of the best signs is that major console platforms have promised to use RDNA5 for their next-generation hardware. That choice shows a lot of faith in the underlying infrastructure and is a big commitment. From our point of view, this alone suggests that RDNA 5 will be able to handle tough long-term needs.

AMD RDNA 5 Architecture: What the Current Information Really Suggests | NoobFeed

There has been talk about whether AMD's RDNA5 design is being praised too much. People are often afraid of what might happen if new AMD GPUs get too much att...

www.noobfeed.com

who cares about NV shill opinion on gfx13

MrMPFR · Feb 3, 2026

adroc_thurston said:
who cares about NV shill opinion on gfx13

Yeah they've done problematic coverage before but recent 70 series vid is practically a MFG + 5070 advert.

Heartbreaker · Feb 3, 2026

marees said:
Digital Foundry's take on is RDNA 5 being overhyped

One of the best signs is that major console platforms have promised to use RDNA5 for their next-generation hardware. That choice shows a lot of faith in the underlying infrastructure and is a big commitment. From our point of view, this alone suggests that RDNA 5 will be able to handle tough long-term needs.

AMD RDNA 5 Architecture: What the Current Information Really Suggests | NoobFeed

There has been talk about whether AMD's RDNA5 design is being praised too much. People are often afraid of what might happen if new AMD GPUs get too much att...

www.noobfeed.com

It wouldn't be an AMD discussion if fans weren't overhyping the next generation.

marees · Feb 3, 2026

Heartbreaker said:
It wouldn't be an AMD discussion if fans weren't overhyping the next generation.

I think both consoles using RDNA 5 is exciting. this is as big as RDNA 2

But late 2027 / early 2028 launch is a big downer

Bigos · Feb 3, 2026

Heartbreaker said:
It wouldn't be an AMD discussion if fans weren't overhyping the next generation.

What fans? DF is as far away from "AMD fan" as humanly possible.

gdansk · Feb 3, 2026

Heartbreaker said:
It wouldn't be an AMD discussion if fans weren't overhyping the next generation.

Even Blackwell and Arrow Lake managed to not meet expectations. It's not just an AMD "fan" feature anymore. But they're still the best at it here.

Panino Manino · Feb 3, 2026

Heartbreaker said:
It wouldn't be an AMD discussion if fans weren't overhyping the next generation.

Digital Foundry is right now hyping Intel's current GPU tech and chip.
In a video that will release soon they can Panther Lake at 30w is better than the Series S.
AMD better watch out.

poke01 · Feb 3, 2026

Panino Manino said:
Digital Foundry is right now hyping Intel's current GPU tech and chip.
In a video that will release soon they can Panther Lake at 30w is better than the Series S.
AMD better watch out.

Everybody is shilling for Intels Panther lake It’s such an average product.

adroc_thurston · Feb 3, 2026

Panino Manino said:
In a video that will release soon they can Panther Lake at 30w is better than the Series S.

80W stxH was better than PS5 so whatever.

poke01 said:
Everybody is shilling for Intels Panther lake It’s such an average product.

Good BL.
The rest is yeah.

Thunder 57 · Feb 3, 2026

DF hates AMD. Well known Nvidia shills.

adroc_thurston · Feb 3, 2026

Thunder 57 said:
DF hates AMD. Well known Nvidia shills.

It's not that they hate AMD; it's just that them (especially Battaglia) love NV so much they just do infinite NV shill pieces every (even the smallest) product launch.

Thunder 57 · Feb 3, 2026

adroc_thurston said:
It's not that they hate AMD; it's just that them (especially Battaglia) love NV so much they just do infinite NV shill pieces every (even the smallest) product launch.

It was abundantly clear when I watched their retrospect video on HL2 (I think) where the lowest end Radeon 9200 (I think) beat the highest end FX 5800 in DirectX 9 and they didn't bother testing any ATI cards.

Thunder 57 · Feb 3, 2026

Well the AT article is long gone as I can't get it on the Wayback Machine I see them coping hard here:

Halflife 2 Benchmarks

After the surprisingly hard critics from Valve's Gabe Newell on NVIDIA yesterday, the press today got the chance to get our own performance numbers in Halflife 2. Within the scope of ATI's Shader Day

www.tomshardware.com

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Lifer

Senior member

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Member

Senior member

Member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member