• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Beyond zen 6

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
He clearly said 15-25% but you do you.
MLID also clearly said what 20-30% for ARL? So clearly taking his numbers at face value isn't smart.
A couple more points:
1776475498473.png
This is so hard to believe. LNC had massive, massive core hierarchy changes, and got almost that much IPC gains total.
Doubling the L2 cache can not possibly help that much, and interestingly enough the only cache hierarchy changes that got such a large IPC uplift in specint2017 alone in the "armchair qb GLC" chips and cheese article was switching to Apple's cache hierarchy lol.
1776475990388.png
I believe MLID is once again confusing perf uplifts at different power levels with IPC. Just like what he did with RWC, claiming that it would see a 20% IPC uplift because he saw a document claiming RWC would get 20% "power-perf" improvements for server.
Problem is, just like with RWC over RPC, Zen 7 over Zen 6 will have perf improvements attributed to the node uplift as well, at least at mid/low power.
In fact, this document MLID also showed:
1776477641187.png
Reveals at near Fmax (22 watts per core), the perf uplift is apparently only 13-17%. That is prob the targeted IPC improvement, if Fmax doesn't really improve as well.
And really they're gonna be targeting 20% bare minimum, OG Z5 on N3B was kinda that but they had to roll back and neuter a bunch of core resources.
Then you have Zen 6 to fix that? If it's just about rightsizing structures and not actually implementing new ideas, that's very plausible even on what should be a "tick" core. Never mind the fact that AMD actually gets good IPC uplifts on "tick" cores as well because of their longer release schedule.
Also Zen 5 already blew up a lot of core resources compared to how lean previous Zen versions were. I shudder to think what "OG Z5" then would have looked like.
Forgot to add this, but 20% "bare minimum" would make it the largest IPC uplift since OG Zen. Press x to doubt.
2MB is plenty, ARM basically tops out there and M5 Super is 1MB.
How is 2MB plenty? ARM already implements more L2 cache in their P-cores on the exynos 2600 (3MB). And that's the current gen. Zen 7 is 2 years down the line.
Plus, there's the issue of AMD slowly increasing CCX core count, esp in server. This will not help the L3 latency situation, which was a huge factor in how performant AMD cores are.
Yessir, H2 2028, always 2 years.
N3 from N5 took 3 years, N3 to N2 took 3 years.
A14 taking 2 years from N2 would be them getting back on schedule.
 
MLID also clearly said what 20-30% for ARL? So clearly taking his numbers at face value isn't smart.
A couple more points:
View attachment 141923
This is so hard to believe. LNC had massive, massive core hierarchy changes, and got almost that much IPC gains total.
Z7 is gonna have L1/opcache changes as well.
More important overall than just double L2.
Z3 got 19% with fairly modest changes overall vs Z2, Z7 is a bigger departure from Z6 relatively speaking.
IDC just plain suck, look at the ARM bros for meaningful uplifts.
Problem is, just like with RWC over RPC, Zen 7 over Zen 6 will have perf improvements attributed to the node uplift as well, at least at mid/low power.
Same for every core on a new node ever.
In fact, this document MLID also showed:
View attachment 141927
Reveals at near Fmax (22 watts per core), the perf uplift is apparently only 13-17%. That is prob the targeted IPC improvement, if Fmax doesn't really improve as well.
Z7 is a fatter core, v/f curve at the top of the power range is gonna favour the narrower core.
Either Z7 plateaus in freq earlier or the last few hundred Mhz take exponentially more power than Z6 does.
In fact, 20-25% @12w looks like clocks are probably similar to Z6 in that range, then either falls behind or you start getting excessive power consumption from the clock network being overmatched across a bigger core.
Considering Florence TDP is the same as Venice with a core count bump, this is the kinda power range you want to dial in for.
Then you have Zen 6 to fix that? If it's just about rightsizing structures and not actually implementing new ideas, that's very plausible even on what should be a "tick" core. Never mind the fact that AMD actually gets good IPC uplifts on "tick" cores as well because of their longer release schedule.
Well Z6 does have a bunch of changes like going back to several INT schedulers vs the unified one on Z5.
Though that is probably just for clock scaling.
Also Zen 5 already blew up a lot of core resources compared to how lean previous Zen versions were. I shudder to think what "OG Z5" then would have looked like.
Z5 FP is indeed chungus, but INT is laughably small vs basically all other cores in this class.
INT PRF and ROB are too small, simple as.
Main Z5 issue really is front end latency, will be nice to see how they tighten that up.
Forgot to add this, but 20% "bare minimum" would make it the largest IPC uplift since OG Zen. Press x to doubt.
Since clocks are probably identical you need to target higher IPC.
Z3 scaled both IPC and clocks. Z5 is the outlier that failed in client to get to the usual bump.
In server though Z5 is a monster, funny how that works.
How is 2MB plenty? ARM already implements more L2 cache in their P-cores on the exynos 2600 (3MB). And that's the current gen. Zen 7 is 2 years down the line.
Outlier, nearly all implementations are 2MB max.
AMD has the best L3 on the planet, that is why they can go leaner on L2 size and focus more on latency and bandwidth/assoc.
Plus, there's the issue of AMD slowly increasing CCX core count, esp in server. This will not help the L3 latency situation, which was a huge factor in how performant AMD cores are.
Yes but the nodes offset any latency regression, plus clever design choices like abusing SoIC and such.
And you know more L3 means less traffic to the much higher latency IOD and main memory.
N3 from N5 took 3 years, N3 to N2 took 3 years.
A14 taking 2 years from N2 would be them getting back on schedule.
N3 was a misadventure trying to push SRAM too hard, had to be redefined.
N2 is GAAFET, naturally a big departure that took a while, remember how long 28nm>16nm took?
A14 is 2nd gen GAAFET, so less changes to design for.
 
Z7 is gonna have L1/opcache changes as well.
AMD separates the L1i and uop cache (front end/fetch, or labeled separately) from the L1D + L2 (L/S, data cache, or labeled separately).
They release IPC breakdowns of their new cores usually, so one can cross reference AMD's own documents to verify this.
Z3 got 19% with fairly modest changes overall vs Z2, Z7 is a bigger departure from Z6 relatively speaking.
We used this same reasoning with Zen 5 and Zen 5 did not deliver.
Z7 is a fatter core, v/f curve at the top of the power range is gonna favour the narrower core.
Perf/watt is going to favor the fatter core at the top of the power range. It's going to favor the skinnier core at the bottom of the power range.
Either Z7 plateaus in freq earlier
It's going to have to be a regression for this to make sense (higher IPC uplift than what the perf uplift at high power suggests).
In fact, 20-25% @12w looks like clocks are probably similar to Z6 in that range, then either falls behind or you start getting excessive power consumption from the clock network being overmatched across a bigger core.
If Zen 7 is able to match Zen 6 in freq/power at 12 watts, why won't it be able to match it at increasing frequencies/power?
Don't forget Zen 7 is also a node shrink, so it should be safe to assume any increased power draw from a larger arch will be compensated by the ability of a node shrink to lower power draw at the top of the curve.
Or perhaps a better question would be, do you think Zen 7 won't be hitting its, or near Fmax at 22 watts per core?
but INT is laughably small vs basically all other cores in this class.
LNC has a ~15% larger int reg file. That's not such a large difference.
What an insane difference was when Zen 3 was matching SNC IPC, but SNC had a ~45% larger int reg file, and even GLC vs Zen 4 was worse (25% difference).
Apple and Qualcomm gap everyone though.
In server though Z5 is a monster, funny how that works.
AVX-512 is really carrying their numbers for their performance gain claims (37% for HPC and AI), and prob the increased server Fmax.
AMD only claims a 17% ipc uplift for "enterprise and cloud" workloads.
Outlier, nearly all implementations are 2MB max.
Most implementations are 2MB per core... in 26'... for client.
2MB for server 2 years later is not "plenty".
AMD has the best L3 on the planet, that is why they can go leaner on L2 size and focus more on latency and bandwidth/assoc.
The problem is that they are deteriorating the latency of the L3, which is what makes it so good.
Yes but the nodes offset any latency regression,
I doubt they will be able to increase mesh frequency enough to not prevent a L3 latency regression.
 
AMD separates the L1i and uop cache (front end/fetch, or labeled separately) from the L1D + L2 (L/S, data cache, or labeled separately).
They release IPC breakdowns of their new cores usually, so one can cross reference AMD's own documents to verify this.
Sure, but the balance of L1i and uOP$ are interconnected.
We used this same reasoning with Zen 5 and Zen 5 did not deliver.
Because FE latency and anemic integer resources.
Perf/watt is going to favor the fatter core at the top of the power range. It's going to favor the skinnier core at the bottom of the power range.
Depends on the design goal, for a high performance core skinnier cores tend to scale a bit more at higher voltages.
It's going to have to be a regression for this to make sense (higher IPC uplift than what the perf uplift at high power suggests).
Of course we are both speculating, final v/f curve will reveal all.
If Zen 7 is able to match Zen 6 in freq/power at 12 watts, why won't it be able to match it at increasing frequencies/power?
Just compare high performance ARM and x86 cores up to this point.
Don't forget Zen 7 is also a node shrink, so it should be safe to assume any increased power draw from a larger arch will be compensated by the ability of a node shrink to lower power draw at the top of the curve.
Or perhaps a better question would be, do you think Zen 7 won't be hitting its, or near Fmax at 22 watts per core?
I think Z7 will be at Fmax at or even below 22w, Z6 likely hits Fmax at more like 25w.
My speculation, but it would explain that behaviour.
LNC has a ~15% larger int reg file. That's not such a large difference.
What an insane difference was when Zen 3 was matching SNC IPC, but SNC had a ~45% larger int reg file, and even GLC vs Zen 4 was worse (25% difference).
Apple and Qualcomm gap everyone though.
Which is why comparing to Intel is a waste of time until UC, Really only comparison should be Apple, in server nobody is close.
AVX-512 is really carrying their numbers for their performance gain claims (37% for HPC and AI), and prob the increased server Fmax.
AMD only claims a 17% ipc uplift for "enterprise and cloud" workloads.
At the end of the day, we should be thankful Mr Norrod lets us have anything at all.
Most implementations are 2MB per core... in 26'... for client.
2MB for server 2 years later is not "plenty".
It is enough considering the cachemem model AMD uses and what they target.
Big L2 works so well for Intel, right?
The problem is that they are deteriorating the latency of the L3, which is what makes it so good.
They wouldn't make the CCD bigger without keeping latency tight.
I doubt they will be able to increase mesh frequency enough to not prevent a L3 latency regression.
Please remember that it is also moar L3, quite a lot more which can offset a small latency regression overall.
Bigger private caches and a larger execution window also shield the core from some latency regressions.
Not a worry at all.
 
A14 taking 2 years from N2 would be them getting back on schedule.
A14 to N2 is like 3 Years (H2 25 Volume Production for N2) (H2 28 Volume Production A14)
N2 is GAAFET, naturally a big departure that took a while, remember how long 28nm>16nm took?
meh Intel did that in 2 years.
Perf/watt is going to favor the fatter core at the top of the power range. It's going to favor the skinnier core at the bottom of the power range.
Not really cause you gotta burn more power to increase Frequency on fatter one and also depends on what designers are trying to do.
Which is why comparing to Intel is a waste of time until UC, Really only comparison should be Apple, in server nobody is close.
why lmfao next year or this is the perfect opportunity cause everyone and their family is on N2
I think Z7 will be at Fmax at or even below 22w, Z6 likely hits Fmax at more like 25w.
have to see can't say without knowing the targets.
 
Because FE latency and anemic integer resources.
Backend cpu bound is just not a major bottleneck.
1776505224237.png
Just compare high performance ARM and x86 cores up to this point.
ARM and x86 cores clock completely differently though.
I mean I agree, if Zen 7 has a large clock speed regression vs Zen 6, then it would make sense how at 22 watts, Zen 7 only being 13-17% better can align with Zen 7 having more than that range of an IPC uplift. But I don't think that's the case.
Depends on the design goal, for a high performance core skinnier cores tend to scale a bit more at higher voltages.
I think Z7 will be at Fmax at or even below 22w, Z6 likely hits Fmax at more like 25w.
My speculation, but it would explain that behaviour.
This wouldn't explain that behavior, if anything that would make the situation worse.
MLID claims at 22 watts per core, Zen 7 has a 13-17, lets say 15% perf uplift over Zen 6. Let's arbitrarily pretend that both cores hit 6.7GHz, and that Zen 6 scores a 1000 points, while Zen 7 scores 1150.
If Zen 7 is hitting Fmax (6.7GHz) at 22 watts, and scoring 1150 points, and Zen 6 is only hitting 6.5GHz at 22 watts (hitting 6.7GHz at 25 watts), then the IPC uplift is only ~12%.
Which is why comparing to Intel is a waste of time until UC,
Well my point with that comment is that either Intel got a lot better at utilizing their INT reg file capacity for IPC with LNC... or AMD got much worse at utilizing theirs.
ARM stock stuff also has a comparable capacity int reg file. In between AMD and Intel in capacity. All 3 get similar perf.
It is enough considering the cachemem model AMD uses and what they target.
Please remember that it is also moar L3, quite a lot more which can offset a small latency regression overall.
Bigger private caches and a larger execution window also shield the core from some latency regressions.
AMD has traditionally only been allowed to have very small L2s because of their low latency L3, but now with the L3 latency going up, they have to have higher L2 capacity to compensate.
2MB of L2 by Zen 7 might be "enough", but it's not "plenty".
tbf Lion Cove is like 14% perf/clock it's the fabric/L3 neutralized the gains look at RWC vs LNC
David Huang has it at 13% while ARL gained a nice mem latency increase over MTL.
14 to N2 is like 3 Years (H2 25 Volume Production for N2) (H2 28 Volume Production A14)
H2 25 volume production for what product?
Looking at first products (2H 2026 for N2, rumored 2H 2028 for A14), it would be 2 years.
Not really cause you gotta burn more power to increase Frequency on fatter one
You are also going to have to do that at lower power levels too though. That's why Vmin shifts up.
1776509245749.png
and also depends on what designers are trying to do.
Oh cmon, name one tock core in even recentish history where perf/watt didn't increase anywhere. Even RKL vs SKL had a perf/watt increase at the very top of the curve.
1776509399866.png
If it doesn't, then it's just an extremely poorly designed core.
 
David Huang has it at 13% while ARL gained a nice mem latency increase over MTL.
pretty close
H2 25 volume production for what product?
Looking at first products (2H 2026 for N2, rumored 2H 2028 for A14), it would be 2 years.
TSMC Official roadmap looks like A14 is 2028 production not not H2 but even than more than 2 Years gap
1776511036456.png
1776511010421.png
You are also going to have to do that at lower power levels too though. That's why Vmin shifts up.
View attachment 141956

Oh cmon, name one tock core in even recentish history where perf/watt didn't increase anywhere. Even RKL vs SKL had a perf/watt increase at the very top of the curve.
View attachment 141957
If it doesn't, then it's just an extremely poorly designed core.
But it can happen like you said if someone screws the design
 
meh Intel did that in 2 years.
What volume though?
If Zen 7 is hitting Fmax (6.7GHz) at 22 watts, and scoring 1150 points, and Zen 6 is only hitting 6.5GHz at 22 watts (hitting 6.7GHz at 25 watts), then the IPC uplift is only ~12%.
Or for example both are ~6Ghz at 12w and Z7 is like 6.5Ghz at 22w vs like 6.8 for Z6.
That would explain it.
AMD has traditionally only been allowed to have very small L2s because of their low latency L3, but now with the L3 latency going up, they have to have higher L2 capacity to compensate.
There was nothing stopping AMD for putting 2MB back in even Z3 days other than a thing called area.
L3 latency going up is speculation, compared to what exactly?
Logically it will probably regress by a handful of cycles, sure. but you are getting 50-100% more L3 which should be a net win overall.
And don't forget the dense Z7 has 7MB/core 3D stacked which should be lower latency than 252MB monolithic.
2MB of L2 by Zen 7 might be "enough", but it's not "plenty".
It is perfectly fine for a 16c cluster.
ARM designs only give a few cores at most that or more L2, outside of server parts which have far less L3/core.
 
Or for example both are ~6Ghz at 12w and Z7 is like 6.5Ghz at 22w vs like 6.8 for Z6.
That would explain it.
Zen 7 being a wider arch explains why maybe it is going to consume more power than Zen 6 to hit the same frequency at 22 watts... but in that case, what's the justification for how the wider arch isn't consuming more power at a lower frequency?

When's the last time you saw a next gen core that took less power to hit at the same frequency at low power, but then took more power than the predecessor at high power, and then hit the same Fmax?

Frankly it's just a lot simpler to accept the explanation that Zen 7 is going to be an ~15% IPC increase than it is to accept the explanation that some how AMD is doing something completely unprecedented in CPU core power characteristics lol.
There was nothing stopping AMD for putting 2MB back in even Z3 days other than a thing called area.
Which they got away with because of their very fast L3.
L3 latency going up is speculation, compared to what exactly?
Compared to their standard 8 or even now 12 core CCX.
Logically it will probably regress by a handful of cycles, sure. but you are getting 50-100% more L3 which should be a net win overall.
I never said it won't be a net win. What I did say that AMD's hand was forced into increasing their L2 by their design change. And even then, AMD is still at best being comparable to other designs with their L2 capacity.
 
Again, Intel does not exist in AMD roadmaps.
They do not care about Intel. It's not a thing.
LOL. Yea, you are seriously incorrect. Remind me again who AMD's biggest competitor is again?
He clearly said 15-25% but you do you.
And really they're gonna be targeting 20% bare minimum, OG Z5 on N3B was kinda that but they had to roll back and neuter a bunch of core resources.
You correctly point out that I mis-quoted him. My OPINION is that we will likely see about 15% .... if that.
N3 from N5 took 3 years, N3 to N2 took 3 years.
A14 taking 2 years from N2 would be them getting back on schedule.
I think that die shrinks are getting exponentially more difficult to achieve (and more costly). I think "on schedule" is going to get redefined and that 3 years is likely the new norm. In 10 more years, it may well be 5 years.
Frankly it's just a lot simpler to accept the explanation that Zen 7 is going to be an ~15% IPC increase than it is to accept the explanation that some how AMD is doing something completely unprecedented in CPU core power characteristics lol.
Agree.

The fact is that Z5->Z6 is a BIG change with a fairly large transistor density (60%-80%). That's a BIG deal. Moving from N2 to A14 is more like %15. Don't ever expect BIG gains without BIG lithography improvements (or packaging improvements).

Z7 having more cores per CCD will eat up all that density and then some leaving any increased transistor count to simply make a bigger die and thus lower margins for AMD.

I can easily see both AMD and Intel leaving client on older nodes as time goes on and the newest nodes become cost prohibitive.

The biggest changes for AMD in the future will come from BSPDN and High NA .... and more the latter for density where BSPDN will likely effect power usage more than density.

Anyway, that's just my crystal ball reading. YMMV.
 
Remind me again who AMD's biggest competitor is again?
Apple.
My OPINION is that we will likely see about 15% .... if that.
It's >20% at 20% budgeted Cdyn bump.
The fact is that Z5->Z6 is a BIG change with a fairly large transistor density (60%-80%). That's a BIG deal. Moving from N2 to A14 is more like %15. Don't ever expect BIG gains without BIG lithography improvements (or packaging improvements).
Zen6 is a relatively incremental core redesign.
 
Zen6 is a relatively incremental core redesign.
Core redesigns are bound by transistor count increases..... OR by increased die sizes / decreased profits.

The transistor budget is simply not there for Z7 to improve much ESPECIALLY if they are adding cores to the CCD. I expect the increased L2 is about all they have in them.

DDR8000 will likely also be all the desktop gets with Z7 so no increase over Z6 there.

I applaud your enthusiasm, but it isn't founded on the data present.
 
Core redesigns are bound by transistor count increases.....
This hasn't been true for a long while.
You know, it's the multiprocessor era, where raw xtor counts haven't been the limiter of raw 1t perf.
The transistor budget is simply not there for Z7 to improve much ESPECIALLY if they are adding cores to the CCD. I expect the increased L2 is about all they have in them.
Man, you really are not that smart.
 
Well, if they make all L3 vertical then that would create room for new logic transistors, maybe they can do that to L2 as well.
That is definitely a possibility. Still, it isn't free. It does get the L3 off of the expensive process node though. By moving it to 3D cache, it does add a good number of process steps (ie more expensive). Not sure if this is a wash or a win in cost though.
This hasn't been true for a long while.
You know, it's the multiprocessor era, where raw xtor counts haven't been the limiter of raw 1t perf.
Agree to disagree.
Man, you really are not that smart.
Yep. I knew it was a mistake to reply to your post. Won't be making that mistake again soon.
 
Back
Top