• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 102 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Additional revenue implies there's a market for $2k gaming Radeons which is just not true.
I'll buy it if it's even 30% faster than 5090, with ideally a bit more sane TDP.

There are a few million gamers who have 4090 and most were not convinced to buy crappy 5090 (crap availability too), so if next Radeon is close to or over double 4090 then yes, I'll buy it - unless it's so late that I'll judge N2 GPUs coming soon enough, AMD needs to intro it this year, don't wait for Nvidia.
 
Last edited:
所以如果下一款Radeon能接近甚至超过4090的两倍,那我肯定会买——除非太晚,我很快就会判断N2显卡即将到来,AMD今年就得推出它, 别等Nvidia了。
Too optimistic, but it’s obvious that RDNA5’s flagship card is going to blow past the 5090 in terms of performance.
 
Unless AT0 is cancelled entirely it will 100% release to consumers, even if it's a token release with no real volume like the Radeon VII.
I mean early 2027 was that 36GB SKU, but if it goes to 2H of 2027, mabye even Q4 I do wonder if they'll change plans like 4GB modules instead of 3, or even make AT0 on N3X? Or maybe save stuff like 4GB modules, N3X until a RDNA 5 refresh launching with Zen 7 since IIRC Lisa Su noted that GPU and CPU sales go hand in hand when new GPUs launch.
Too optimistic, but it’s obvious that RDNA5’s flagship card is going to blow past the 5090 in terms of performance.
Yeah, AMD going all with RDNA 5 from a technical perspective. I think AT0 cut down SKU (say 154 old CU/77 new CU) could like 2x the raster performance of the RTX 4090 and at least 2x the Ray and Path Tracing the performance which could IMV convert discontent RTX 40 buyers and an upgrade path for the 7900 XTX gang and maybe even offer a slightly less cut (say 184-188/92-94 but everything else intact) or even full-fat-die-but-not-as-efficient-as-full-fat-pro-SKUs as a Halo tier card. Because speaking of which I think AT0 intended for xCloud may change now:
Six months after Bond’s promotion, Xbox executive Kareem Choudhry, who reported directly to Bond, departed Microsoft and triggered another shake-up of some teams inside Xbox. Choudhry was key to Xbox’s backward compatibility support and helped bring Xbox Cloud Gaming to life as xCloud in 2019.

Just weeks after Choudhry departed, former Xbox chief marketing officer Jerret West also left Microsoft in June 2024. That meant the Xbox marketing team was now reporting directly to Bond. A month later, Microsoft delivered a marketing campaign that signalled people didn’t need to buy an Xbox console anymore. The message was that “you don’t need an Xbox to play Xbox,” because games were available through Xbox Cloud Gaming on TVs.

I understand that Bond’s strategy had been failing internally and been questioned multiple times. Bond had tried to push mobile and cloud over console, to reach potentially millions more Xbox customers, but the result has been a classic case of chasing tomorrow’s customers by neglecting today’s.

Very possible that AT0 allocation could change because with the Xbox changes we may AT0 being changed away from xCloud with what is going on at Xbox.

Again all of this is speculation on my part 😛.
 
Last edited:
This is just a delusion.(mi455x meh)
Please let me know if I'm too far off the mark.
It seems Samsung has been chosen as the foundry for custom B-die. The package deal may include LPDDR5X.
 
Last edited:
This is just a delusion.(mi455x meh)
According to the Korean article, AMD's qual is finished.(samsung) Does this mean S-HBM that will go into the 440x or 430x? Does it include C-HBM?
What he's looking at might also be a document related to qual. BTW, he doesn't know much about mi455x.
 
Last edited:
Hmm good point. MI440X cannot feature LPDDR5X the same way because of OAM form factor. So S-HBM could be sufficient there.

For MI430X I am wondering, if an additional LPDDR5X memory pool (and rack scale as well) wouldn't be helpful as well?
 
For MI430X I am wondering, if an additional LPDDR5X memory pool (and rack scale as well) wouldn't be helpful as well?
Imho it would be a waste, most scientific computing workloads don't need much VRAM, and even "big" bio/chem/material ML models are far smaller than current LLMs.
 
Hmm good point. MI440X cannot feature LPDDR5X the same way because of OAM form factor. So S-HBM could be sufficient there.

For MI430X I am wondering, if an additional LPDDR5X memory pool (and rack scale as well) wouldn't be helpful as well?
HPC workloads don't need that much memory
 
1000002536.jpgThere seems to be a lot of white space, and it fits the TSVs of the three companies.
For some reason, if I put the ones that are separated together, there'll be room.
P.s hynix base die problem rumor, amd well?
 
Last edited:
From what infos, leaks and rumors are suggesting, RDNA5's HW capabilites look on par or even above Blackwell.
Exceeding Blackwell HW capabilities should be easy. On NVIDIA side the RT foundation unchanged since Turing (spamming ray/tri + adding low hanging fruits), SM foundation and cachemem since Turing when Ampere is just concurrent execution, beefed up L1 and 2X FP32. Lovelace is Ampere+ with ML and RT low hanging fruits, Blackwell = Ampere++.
Based on everything we know it's beyond or well beyond 50 series. With clean sheet redesign across everything (Conroe-esque reset), IIRC the largest since GCN, anything less would be a major disappointment.

With probably two or three especially advanced HW accelerated things (DGF, work graphs, universal compression).
Hopefully much more than just that. Universal compression = mem BW multiplier, DGF is just better BLAS compression. But Work Graphs HW accel is interesting.

it's far far far far far weirder than that.
far weirder than anything NV will ever ship.
Larrabee-esque?
OoO scheduling like @basix said can't explain this alone so it will be interested to see what this peculiar design is about.
 
Out-of-order scheduling on GPUs is special, for sure.
My memory is failing me so has this OoO thing been confirmed yet?

Never heard of such specific numbers about RDNA5. Leak or just speculation?
Sounds like genAI nonsense. IIRC MLID gave +5-10% IPC figure for RDNA5.

Gaming AT0 was hatcheted off the roadmap.
When did this happen?
Was the MLID ATx Summer leak referencing old documents?
 
Yeah that's a bit too optimistic
You said 24CU AT0 dGPU = 9060XT should be doable and as I understood it much expanded dual issue use plays a large part here:
Yeah 9060 XT perf at desktop power levels should be possible.
AT2 is 4X more Shader Engine + SE tied logic (4 SEs vs 1) AND 2.9X CUs (70 vs 24). With reduced clocks matching 4090 raster perf doesn't sound outlandish and is a ~35% raster gain vs 9070XT.

But if they only ship 64 CU scraps with low power budget then it makes sense. Maybe 5080 perf is more realistic here?
 
Yes, if there are issues with clock rates and power efficiency (RDNA3 ahoi) or CU count scaling (5090 ahoi), then it will not happen.
Hopefully they learned their lesson with RDNA3.
A massive engineering failure if scheduling (decentralized scheduling and execution) and cachemem reset (data locality) doesn't fix this. Should scale to arbitrarily wide designs.

If RDNA5 is 50% quicker than 5090 (which will be then close to double of 4090) then I'll be very interested, as long as ray tracing does not totally suck. I find it hard to believe it will be though, maybe if it was on N2 but it ain't.
Easily doable if the revamped scheduling works as detailed in the patents.

PT better exceed 50 series at iso-raster. That is really just 20 series RT + SM cachemem with extensions (low hanging fruits) and beefed up logic and cache. No clean slate innovative redesign.

If NVIDIA wanted they could do superior SM and RT core cachemem redesign and turbocharge PT. This is what AMD should target with clean slate GFX13. If they don't they're fools.

I'll buy it if it's even 30% faster than 5090, with ideally a bit more sane TDP.
That shouldn't be hard to accomplish. N3P, cachemem + scheduling clean slate, and CU spam = 5090 obliterator guaranteed.

I think AT0 cut down SKU (say 154 old CU/77 new CU) could like 2x the raster performance of the RTX 4090 and at least 2x the Ray and Path Tracing
Sounds reasonable. 2X 4090 is ~50% ahead of 5090.
Hopefully far more than 2X with the cachemem and RT core redesign. Considering RT cachemem efficiency of 50 series is atrocious, even with SER, >3X doesn't sound impossible and maybe even 4-5X could happen with the big config (+180CUs) and a work graphs rewrite for PT code.
 
But isn't that due to inherent bad cacheability of RT stuff? It's just too random, at least for puny caches
No the NVIDIA design is just very inefficient. Does thread coherency sorting (SER) and OMM but other than that not really any major changes since Turing.
Some examples of things that could be done differently:
  • Data coherency sorting
    - Wide partial sorting (SER but for data)
  • Ray coherency sorting
    - Allows MIMD -> SIMD = data reuse, latency tolerant and lower cachemem pressure
  • Parallel fixed point prefiltering
    - Low precision INT parallel tests dominate. FP32 fallback.
  • Parallel BVH traversal with sophisticated stack management
    - Too complex to explain, but you can find the implementation at patent US20250173812.
  • Ray stack culling
    - Precomputed and at traversal reduces cachemem and BVH traversal circuitry overhead and allows shallower and wider BVH.
  • DGF BLAS HW compression + fallback
  • Other compression
  • Flexcache
    - No fixed caches, borrowing and leasing depending on the workload. For Ray traversal could be huge LDS and/or L0.
  • M5 like Dynamic caching (2nd gen).
    - e.g. extending Dynamic VGPR to traversal stack and other stuff. No more static worst case allocations + plays a big role in A19 Pro's >2X RT perf gain. Geekerwan's testing shows A19 Pro hit 67% RT unit utilization and overall kernel occupancy. For A18 Pro those are 52% and 44%. By comparison Chips and Cheese showed 35-45% kernel occupancy with SER enabled in their SER testing. Misleading. occupancy =/= coherence. Chips and Cheese tested coherence. But Apple results still impressive.
  • Mapping producer-consumer to HW
    - Proper work graphs accel that is decentralized scheduling and execution, data locality focused cachemem, and payload sorting ASIC etc... Large speedups from early to mid 2030s.
  • Selective instruction issue throttling
    - Avoid cache thrashing
  • More stuff I forgot

Can't know how much if it RDNA5 has but as I said AMD could easily destroy 50 series in RT if they bothered. It's not like this isn't described in detail + in many instances implemented in shipping products.
 
Last edited:
So they have the silicon developed, in manufacturing, but deliberately don't try to get additional revenue for it to help recoup the design cost.
I mean, I could see it turning into an Arc B770 situation (where it likely would have happened but memory prices doomed it) in 2027. But it's not H2 2027 today.

About you knowing what will happen for sure 2 years ahead, I have my doubts.
You're thinking with the mentality of the market from 10, 15 years ago. Back then the high end "Pro" cards (like Quadro, FirePro or Tesla) were niche things that sold into a very limited market. AMD/Nvidia would sell as many of them as they could (because high margins), but it wasn't a big market. So they needed the volume of gaming to make up the numbers and make the R&D worthwhile.

Unfortunately the AI bubble is not like that. AMD and Nvidia are selling every single AI chip they can manufacture (or rather selling out the entire capacity years ahead of time), and the only factor stopping them selling even more is fab throughput. They don't need gaming, when capitalism appears happy to pour infinite money into the black hole of AI capex.

We'll be lucky if we can even buy GPUs by next Christmas, if this AI bubble doesn't pop soon.
 
Apparently Dylan Patel partied too hard with the hockey team and forgot to close Twitter
This is just racist or ignorant.

That is Kash Patel(the head of the FBI). Not the same Dylan Patel. Patel is a super common east Indian name.

If you think it is funny to insert one guy in as a joke just because his skin color and last name is the same as another guy, that is pretty racist.
 
You're thinking with the mentality of the market from 10, 15 years ago. Back then the high end "Pro" cards (like Quadro, FirePro or Tesla) were niche things that sold into a very limited market. AMD/Nvidia would sell as many of them as they could (because high margins), but it wasn't a big market. So they needed the volume of gaming to make up the numbers and make the R&D worthwhile.

Unfortunately the AI bubble is not like that. AMD and Nvidia are selling every single AI chip they can manufacture (or rather selling out the entire capacity years ahead of time), and the only factor stopping them selling even more is fab throughput. They don't need gaming, when capitalism appears happy to pour infinite money into the black hole of AI capex.

We'll be lucky if we can even buy GPUs by next Christmas, if this AI bubble doesn't pop soon.
APUs.

Get used to this reality, that only hardware that will be available for consumers will be: mini PCs, AIOs, laptops, Mobile-on-Desktop hardware, and that DIY is going to be HEDT hardware. Extremely expensive, extremely scarce.
 
Back
Top