• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 108 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
You said there was no market for it and now it's a winner? 🙄



Prove it. Here is my proof that you are wrong.

5090 chip size: 24x31=744 sq mm, ideal yield: 73 per wafer, also assume total worst case of 50% get thrown away completely (which seems way too high even for this big chip):

Chips/wafer
ProcessWafer cost73 chips36 chips
N3$ 20,000$ 274$ 556
N2 (maybe)$ 25,000$ 342$ 694
N2 (max)$ 30,000$ 411$ 833

That's 5090 sized chip, not AT0 which is smaller, even with high memory prices selling GPU for $1500 will make very good money, as long as it sells in volume, which it will IF it is substantially quicker than 5090 (which in UK currently sold for ~ £ 3000 (GBP)), sell it for $2000 at start, discount later to $1500. Very clear business case to make good money, have halo part that will rub off nicely on the rest of the range: easily marketed as long term support because RDNA5 is in PS6.

And this is 5090 sized chip, which AT0 ain't, i fit is reported 600 sq mm then we are looking at total of 88 chips per wafer (in ideal case). What would be real total write off detects, max of 25%? Certainly not 50% I assumed above.

Here is same calculation for AT0 (22x28=616 sq mm): 88 total chips, 66 - 25% total write off, 44 is 50%.

ProcessWafer cost 886644
N3$ 20,000$ 227$ 303$ 455
N2 (maybe)$ 25,000$ 284$ 379$ 568
N2 (max)$ 30,000$ 341$ 455$ 682

And AMD won't make a bucket load of money if core GPU wafer costs are $303???

Since memory costs are high then make it on N2 then and sell for $2k, heck I'd pay $2500 for a good N2 GPU like that, will be good enough for a fair few years.


Where are the other costs
 
There isn't any, and if there was, $1500 isn't economical given the units and the general N3 product mix.
Well maybe you should have put it like that in the first place, such as perhaps: "Due to high demand from Nvidia for N3 there is simply not enough capacity in the near term and therefore high silicon/lower margin AMD products can't get greenlit as they won't recover their costs due to low likely production volume, even if they would have been welcomed by all the poor gamers tired of being ****ed by Nvidia".

Nice and easy, isn't it? Much better than "no market!" or "Lisa hates gamers!", plus it does not make you look like total ( o )*

* that's a Tie Figher
 
And backwards compatibility? Any new card will be expected to run old stuff faster, otherwise it will be a bust. It will take at least few years after PS6 launch before work graphs are used by some seriously popular game.
Answer here:
If devs bother to rewrite RTGI pipelines on PC ports (unlikely) there's potential for massive perf gains in the short term, otherwise unfortunately this is post-crossgen and no earlier.
I talked about DXR 1.3 being rebuilt for work graphs, not the new HW abandoning EI or not supported DXR 1.2. It's really more about API than HW.

Also never claimed it's gonna be relevant anytime soon.
- SER + OMM = short term
- BVH redesign (DGF+DMM+PTLAS) and work graphs = long term
- Crackpot design could extend BVH sideY further by adding GPU BVH builder into overall RT work graph + implements reduce and sort in HW.

i fit is reported 600 sq mm
Prob a bit above that:
- 2.7X mem PHY shoreline
- 2.7X CUS
- 2.7X L2
- 2X frontend and backend
AT2 is already 264mm^2, so prob 650-675mm^2 + MID.

Overall silicon cost gonna be well above GB202 with the current rate of node hikes.
 
And there will always be people not buying Nvidia cards because they are a bit unlikeable as a company.
Is that market segment big enough to justify bringing expensive die on N3? Doubt it.

Most gamers are fed up with Nvidia and their GPU prices
Nah AMD is doing more to harm themselves, and that directly benefits NVIDIA. Eroding their mindshare and tarnishing their reputation by the minute thanks to their recent antics: FSR4 INT8 ignored for RDNA 2-3, not opening FSR4 up to RDNA 3 with FP16 emulation path, FSR Redstone token release, FSR 4 inferior adoption, AMD pausing RDNA 1+2 game drivers, no pre-FSR 3.1 upgrade path for older games no support for SER and OMM + rest of the SM6.9 situation etc...

Go to the r/Radeon subreddit and see the kinds of comments people leave there and you'll know what I mean. If it wasn't for Optiscaler people would be even more pissed. Loud minority but enthusiasts are mostly the ones normies ask when doing DIY build. A Halo AMD card where you can't count on support will sell very poorly.

Seems like Radeon owners are now considering NVIDIA the lesser of two evils. Radeon buyers have realized the emperor has no clothes and are livid. AMD has become 14900K-esque unpredictable on SW front + known for lagging behind in HW. Will only get worse. NVIDIA is the scummy monopoly but you at least know what you pay for. They have S-tier driver support, extends DLSS SR across entire stack etc... NVIDIA is the expensive but safe choice, AMD is a gamble. It'll take multiple generations to build back that trust.

Only the people interested in Battlematrix would buy prob this product, that's if AMD even bothers with such a thing. Tiny TAM. It's a dumping ground for Xcloud scraps. If Xcloud plans are cancelled AT0 is 100% done. TAM for cheapo inference card + gaming doesn't justify it.

If AMD drops a $2k GPU, they'll just wait for the next Nvidia.
Considering AMD's latest moves yes.
 
Eroding their mindshare and tarnishing their reputation by the minute thanks to their recent antics: FSR4 INT8 ignored for RDNA 2-3, not opening FSR4 up to RDNA 3 with FP16 emulation path, FSR Redstone token release, FSR 4 inferior adoption, AMD pausing RDNA 1+2 game drivers, no pre-FSR 3.1 upgrade path for older games no support for SER and OMM + rest of the SM6.9 situation etc...
None of that matters if you have the fastest part. They don't.
They have S-tier driver support
Bruh the last driver release was catastrophic.
Considering AMD's latest moves yes.
No, it's the market perception.
Shaped by over a decade of NV always ekeing out a win on the top-end.
 
Bruh the last driver release was catastrophic.
I was referring to how quickly AMD ends driver support, unlike NVDIA that supported Maxwell until recently.

No, it's the market perception.
Indeed but this current mess should end all discussion for good. Even if AMD takes the crown it won't matter. Gonna take years to build up brand reputation again. No one is getting AMD card unless it's >15% faster than 6090.
 
I was referring to how quickly AMD ends driver support, unlike NVDIA that supported Maxwell until recently
Everything pre-Turing has been maintenance mode for years.
NV understands the optics a lot better, though.
Indeed but this current mess should end all discussion for good. Even if AMD takes the crown it won't matter. Gonna take years to build up brand reputation again. No one is getting AMD card unless it's >15% faster than 6090.
See?
All they gotta do is winning.
 
No one is getting AMD card unless it's >15% faster than 6090.
Top card will need to be at least 25-30% faster, and I'd say anything above 4090 level of 450W is unacceptable insanity.

The only way to achieve this easily is to release BEFORE nvidia, and not by a few months - that won't hack it, Nvidia will just leak some specs (like Ampere's con job of dual FP32s) to kill sales.
 
Last edited:
Very interesting RT research paper from last year:

Highlights and thoughts:
  • INT8 (quantized) box and triangle coordinates consumed directly by box evaluators and triangle intersection units without decompression.
  • Possible because INT avoids FP rounding errors enabling lower precision.
  • Seems like INT12 for ray directions is OK.
  • BVH8 rays stream compressed has 18% memory traffic of BVH8 single ray uncompressed.
I assume ray stream tracing is widely used on current GPUs, so mem traffic reductions will be more moderate. Can anyone confirm or deny this? For example the reduction from BVH8-RS uncompressed to BVH8-RS compressed is ~20-40%. This isn't the most significant bottleneck and I would've liked to see numbers for traffic across the cache hierarchy and hit rates instead.

This agressive quantization enables a very large reduction in BVH footprint and in traffic across the cache hierarchy. The reduced traffic is because data requests are smaller and for memory because more of the BVH can remain cached on L2. In addition, it acts as a multiplier on the size of the L0 cache, which helps reduce cache thrashing and lower memory access latencies.
If we ignore the control and logic side overhead + changes to the computational throughput, at the same cache architecture, especially if this one is overburdened (it is xD), this can only result in a very large speedup for ray traversal.


Related RDNA 5 patents and analysis
Now if you remember the prefiltering, DGF and low-precision patents I shared months ago (Kepler was first), this could be a sneak peak into one aspect of RDNA 5's RT traversal pipeline. RDNA 5 could adopt platonic solid OBBs (quantized), prefiltering, and DGF nodes across ray box intersection and ray triangle intersection.
Then INT8 precision for the entire BVH stack would be used by default to speed up the prefiltering pipeline. For ray directions and calculations it could use INT12, unless this produces too many inconclusive tests (speculative IDK). If it does then the pipeline could opt for INT16 for rays and calculations instead.

For inconclusive tests that require high precision we would keep a scaled back version of the current FP32 pipeline as a fallback. For distant objects and effects that don't need high precision, we just could avoid this step entirely.
To perform calculations the FP32 units would need to decompress the quantized BVH data first.

For reference RDNA 4 already has 12-bit primitive nodes (triangles) acting as leaf node compression, while the rest remains uncompressed). In order to perform FP32 ray triangle tests first this data must be decompressed.
Also in the case of he DGF format, that uses 24-bit quantization, RDNA 5 would need to use this to construct DGF and prefiltering nodes (primitive nodes replacement) at INT8 precision.

Summarizing this is what switching from serial FP32 to parallel INT12 prefiltering with FP32 as a fallback achieves:
- Superior compression, that enables large reduction in cachemem traffic
- Parallel execution, which means less control overhead
- INT12 is default. It is far simpler than FP32 pipeline, permitting a massive increase perf/area, perf/watt, reduce latency, even with FP32 as a fallback.

Based on this one aspect of RDNA 5's RT traversal implementation, it does sound like a monster in ray traversal throughput. That of course assumes AMD doesn't just aim for iso-intersection rate and cuts die area.
 
Last edited:
If we take into account the rest of gfx13's patent derived changes for RT Traversal, ignoring everything else, then I can see only one scenario: one where RDNA 5 dominates. In ray traversal throughput (at same raster) it should be able to destroy ALL existing generations, including current NVIDIA ones. In addition, factoring in the rest of the potential architectural changes should only extend the lead.
- I'll prob need to redo October analysis because it ain't complete at all, since many new patents have surfaced in the last 5 months.

As for RT shading portion many things will likely change such as cachemem and scheduling to name a few. In addition it seems like SWC is a dedicated block unlike NVIDIA SER (repurposes existing HW). As a result in existing PT games (DXR 1.2) RDNA 5 shouldn't have any problems winning at iso-raster. When it comes to a potential work graphs pivot for DXR 1.3, AMD and everyone else has untapped fine wine, but RDNA 5 much more so for reasons explained too many times to repeat.

Remember that RT is just one aspect of an architecture, and if we need to consider all potential changes with the nextgen Radeon. With this in mind architecturally it looks like RDNA 5 could be their most impressive product family since the R300 series (9700 pro). I'm talking about architectural sophistication and superiority here over existing products, not that they're gonna take the performance crown.

As for 60 series I have no idea and wonder what kind of HW changes 60 series introduces to counter this. Another Ada rebrand or Turing resource spam (Ampere) definitely isn't going to cut it, and it would be asinine for NVIDIA to keep bolting things unto Ampere for the third time. Hopefully we see a major architectural change on NVIDIA side.

It will be interesting to see if any of the PT stuff will even matter long term considering neural shading is on track to replace most if not all of the shading pipeline.
 
Last edited:
No one is getting AMD card unless it's >15% faster than 6090.
I disagree.

i'm currently on a 7900XTX (RDNA3)

If RDNA 5 comes out and is equal to a 5090 but MRSP is $1500 vs the launch MSRP of the 5900 which was $2000 I would buy it. Current prices on a 5090 right now in CAD is like $4500-$5500 and some version hitting $7,000 lol.

A 5090 is something like 70% faster than a 7900XTX.

It doesn't need to equal a 6090 or be faster for it to be an option for me price is a big factor not just performance. Power draw also matters to me I don't want a 600 watt beast of a gpu i'm good in the 350w to 450w range. If RDNA 5 fits in that power range and has a 3x8 pin offering from an AIB that means I don't need to upgrade my PSU.
 
Last edited:
I disagree.

i'm currently on a 7900XTX (RDNA3)

If RDNA 5 comes out and is equal to a 5090 but MRSP is $1500 vs the launch MSRP of the 5900 which was $2000 I would buy it. Current prices on a 5090 right now in CAD is like $4500-$5500 and some version hitting $7,000 lol.

A 5090 is something like 70% faster than a 7900XTX.

It doesn't need to equal a 6090 or be faster for it to be an option for me price is a big factor not just performance. Power draw also matters to me I don't want a 600 watt beast of a gpu i'm good in the 350w to 450w range. If RDNA 5 fits in that power range and has a 3x8 pin offering from an AIB that means I don't need to upgrade my PSU.
my best guess is that AT0 starts from $1399 & goes all the way upto $2399

all versions should comfortably beat a 5090

bus widths could be 320 bit gddr7 & 384 bit gddr7

320 bit should fit comfortably in the power envelope that you are looking for.
 
my best guess is that AT0 starts from $1399 & goes all the way upto $2399

all versions should comfortably beat a 5090

bus widths could be 320 bit gddr7 & 384 bit gddr7

320 bit should fit comfortably in the power envelope that you are looking for.
Nice i'm already saving the money for that purchase and since we will not see it until 2027 have plenty of time.

If it does comfortably beat a 5090 I won't have problem spending $2,000 but that would be the upper limit for me.

As for power requirements we shall see I do need to move to a ATX 3.1 PSU at some point but if I can avoid that for abit that works.
 
I disagree.

i'm currently on a 7900XTX (RDNA3)

If RDNA 5 comes out and is equal to a 5090 but MRSP is $1500 vs the launch MSRP of the 5900 which was $2000 I would buy it. Current prices on a 5090 right now in CAD is like $4500-$5500 and some version hitting $7,000 lol.

A 5090 is something like 70% faster than a 7900XTX.

It doesn't need to equal a 6090 or be faster for it to be an option for me price is a big factor not just performance. Power draw also matters to me I don't want a 600 watt beast of a gpu i'm good in the 350w to 450w range. If RDNA 5 fits in that power range and has a 3x8 pin offering from an AIB that means I don't need to upgrade my PSU.
This is what I've been saying as well. Not just for gaming but proviz. AMD has the biggest opportunity ever to get market share in gaming and proviz and solve a lot of their software issues. Most people are going to prefer an AMD card thats between 5090 and 6090 performance but with alot of memory over a 6090. Add xGMI or UALink, which Nvidia took out of its consumer and Workstation cards and AMD becomes arguably a better choice at that consumer and workstation/entry level datacenter cards. Unless Nvidia adds back NVlink but that will undercut some of their more expensive datacenter cards!!
 
This is what I've been saying as well. Not just for gaming but proviz. AMD has the biggest opportunity ever to get market share in gaming and proviz and solve a lot of their software issues. Most people are going to prefer an AMD card thats between 5090 and 6090 performance but with alot of memory over a 6090. Add xGMI or UALink, which Nvidia took out of its consumer and Workstation cards and AMD becomes arguably a better choice at that consumer and workstation/entry level datacenter cards. Unless Nvidia adds back NVlink but that will undercut some of their more expensive datacenter cards!!
No worries buddy it doesn't have it.
 
Tell that to Sony, AMD and NVIDIA.
For AAA looks like it's where things are going. MLP spam replaces PT.

GEMM slop is very very very expensive.
Any way to solve that without TMEM and bigger VRF? Can't just spam more GEMM compute at same VRF, right?

AMD's been filing patents for data reuse cache + global VGPR-to-VGPR network but not sure if it's gonna solve the bottleneck. Not confirmed yet.

Maybe related to Neural Arrays, again not confirmed.

Crackpot design would be if flexcache allows some of LDS to be TMEM. Prob not enough BW though.

But boring AF if RDNA 5 is just Hopper DSMEM+TBC and GFX12.5 cache design.

It's only cheaper than RTRT actual (nightmare workloads for everything but GPUs in particular).
Yep only relevant for PT successor.
 
Back
Top