• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Intel’s Unified Core: Hammer Lake

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Zen 5 isn't identical 4x FPUs. So Unified Core's FPU doesn't need to be 4x Skymont's, since Skymont has four identical FPUs. Lion Cove is similar to Zen 5 in this regard.

Zen 5:
-FMA
-FMA
-FADD
-FADD

Lion Cove:
-FMA
-FMA
-FADD
-FADD

Skymont:
-FMA
-FMA
-FMA
-FMA

So if you 4x Skymont, then it would be way fatter than Zen 5's AVX implementation. Doubling the width of all 4 units to 256-bit would make it already significantly beefier than Lion Cove's FPUs. Even just doubling it might be ok for all applications, not just PC.
Honestly it looks like the P-cores have a bunch of extra space to grow because of how big the E-core clusters are getting lol.
There's also the problem of P-cores having to be used in DC, but I still think area is the least important factor in PPA, and Intel especially seems to be giving very little concern into area by adding massive core private caches and AMX extensions per core.
Think that waste may be the same team that caused the bloat in the P cores. You have a different one in charge now.

Caches are misleading in terms of transistor count and area. They are not only super-dense and error-resistant due to their extreme redundancy, they also lower the overall power of the SoC thus is worth adding versus logic, which isn't always the case. So this isn't necessarily accurate analysis either. There's a saying "before you expand your uarch, consider if it'll be worth adding equivalent amount in caches instead". When they used to breakdown area, transistor and power of the large server cores years ago in Hot Chips, L3 caches would take up 60% of the space but use 10% of the whole power or less. This is ignoring the elephant in the room where cache access would mean less access in DRAM, meaning overall you LOWER power. I don't know where people got the idea caches are power or area inefficient. It is a "dumb" way of adding performance.

Also the decision makers that put AMX were working under a different Intel, and they might have wanted to push it in all areas to "grab them all", whereas 4 year delay makes it look like a waste. You must also consider when comparing chips like Clearwater Forest that there's a high likely chance that the project fell short of the original expectations, which is typical when delays happen. Delays happen not to "make it better" but because they mis-fired and needed extra time to fix things.

Look at projects that went really well - they were ahead of schedule and expectations. Core 2 for example were delivered weeks ahead of original schedule. The impact of the delay is not something we can ever put it on a sheet of paper and compare. We can't "ISO" such things. Nvidia's NV30 got delayed and also disappointed. Intel's Knight's Landing was 9 months delayed, and compared to plans, used 10% more power and performed 10% less. So a hypothetical CWF last year could have been 15% better perf/watt while arriving more than 6 months earlier. The impact of the delay affecting not only schedule but performance of the product is almost profound - they are both in new categories and it's a make or break difference. Xeon Phi might have had a longer lifespan and a far greater impact if it was 9 months earlier with higher perf and lower power. The same with Clearwater Forest arriving 3-5 months ago and performing 10-15% better. It would also have been a big impact.
 
Last edited:
I don't know where people got the idea caches are power or area inefficient. It is a "dumb" way of adding performance.
I'm not claiming that caches are power inefficient, but they definitely can be less area efficient than adding more cores. That's why Zen5c halves the L3 cache.
 
I'm not claiming that caches are power inefficient, but they definitely can be less area efficient than adding more cores. That's why Zen5c halves the L3 cache.
I'm responding to @Geddagod's claims that having large private caches is somehow a bad thing. That's why we DON'T include caches in our calculation, because it's the easiest thing to add(except L1). It's close as to copy paste as you can get.

If you are going to add anything, then first you have to consider whether it'll be better than adding cache. But the area efficiency argument is also not straightforward either, because it's extremely redundant. It's a sea of redundancy, that's why the cache portion of the die shots are so smooth.
 
I'm responding to @Geddagod's claims that having large private caches is somehow a bad thing.
It eats up a bunch of area, and Intel needs it to compensate for a bad uncore. This isn't an added bonus.
That's why we DON'T include caches in our calculation, because it's the easiest thing to add(except L1). It's close as to copy paste as you can get.
Ur right, lets not include caches. It doesn't take up space on the die, those giant cache blobs are actually just figments of our imaginations.
But the area efficiency argument is also not straightforward either, because it's extremely redundant. It's a sea of redundancy, that's why the cache portion of the die shots are so smooth.
No, it's pretty straight forward. Even if it doesn't hurt yields as much, that space is still being missed out for stuff like wider cores or just outright more cores. And it only gets worse and worse as logic scales better than SRAM with newer nodes.
There's a reason why entire cache layers (infinity cache MI300, L3 clearwater forest) are being moved off the compute tile. Cache eats up a bunch of space needed on the compute tile for other stuff.
So this isn't necessarily accurate analysis either. There's a saying "before you expand your uarch, consider if it'll be worth adding equivalent amount in caches instead".
The leading edge designs in ARM cores have very area consuming cores in terms of logic, with smaller shared caches. The leading edge design in x86 has "skinny" cores with a very fast L3.
In Intel's case, maybe for LNC they need those extremely large private caches, but that's just a result of L3 being bad and mem latency being relatively even worse vs the competition.
I'm not claiming that caches are power inefficient,
Yeah, no one has ever claimed that lol, idk where that strawman is coming from.
Delays happen not to "make it better" but because they mis-fired and needed extra time to fix things.
Not when things get redefined. Like GNR.
So a hypothetical CWF last year could have been 15% better perf/watt while arriving more than 6 months earlier.
How?
The same with Clearwater Forest arriving 3-5 months ago and performing 10-15% better. It would also have been a big impact.
Sure lol? Idk what your point with this is.
 
Not when things get redefined. Like GNR.
redefinition adds delay
No, it's pretty straight forward. Even if it doesn't hurt yields as much, that space is still being missed out for stuff like wider cores or just outright more cores. And it only gets worse and worse as logic scales better than SRAM with newer nodes.
There's a reason why entire cache layers (infinity cache MI300, L3 clearwater forest) are being moved off the compute tile. Cache eats up a bunch of space needed on the compute tile for other stuff.
also cost cause cache per sq. mm2 is not improving much when compared to logic improvement
 
Back
Top