• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Intel’s Unified Core: Hammer Lake

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Copper Shark. I guess I am an insider now wohoo.
Complete with a lack of bold claims regarding how great or terrible it is based on said inside information, hence all the more believable 😉

So will Intel be 'jumping the shark' with CSK? Or is it an amusing name for a return to form? Will be fun to see which one as it plays out.
 
Agreed. No idea why you're bringing up 5W though as lower power, including what I was referencing, is less than 2W per core.
Given how CLF perf projections look, I have my doubts on the E-cores being competitive with AMD in perf/watt even vs Turin Dense.
Though maybe the terrible L3 bandwidth in that product just fcks the e-cores over super hard. Idk.
 
Given how CLF perf projections look, I have my doubts on the E-cores being competitive with AMD in perf/watt even vs Turin Dense.
Though maybe the terrible L3 bandwidth in that product just fcks the e-cores over super hard. Idk.
Well you said it yourself Intel L3 keeps botching design you can look at E core in PTL it's at the level of ARL-S perf/clock
 
So essentially with UC Intels creating a core that’s area efficient, low power when it needs to be, clocks high and has high 1T.

That looks to be an achievable goal
 
What perf at 2W though any ballpark or what are the targets
?
Higher than Zen8 😛 Sorry for the silly response, but those targets will likely bear little resemblance to actual performance when eventually released. It's the same with AMD and elsewhere - early architecture modeling rarely pans out to the promised extent. It's certainly amusing when competition forecasts end up being more accurate than internal projections.

Another interesting historical+forecast comparison that came up during lunch today was perf/process-normalized core area trends. Will Intel continue improving performance with roughly flat gate counts while AMD shows similar performance gains with markedly increased area?
 
Higher than Zen8 😛 Sorry for the silly response, but those targets will likely bear little resemblance to actual performance when eventually released. It's the same with AMD and elsewhere - early architecture modeling rarely pans out to the promised extent. It's certainly amusing when competition forecasts end up being more accurate than internal projections.
Well yeah but still would have nice to know what SIR2K17 are they targeting at 2W per core is it more than 9 or less 🤔
Another interesting historical+forecast comparison that came up during lunch today was perf/process-normalized core area trends. Will Intel continue improving performance with roughly flat gate counts while AMD shows similar performance gains with markedly increased area?
Well that's to be seen we don't even know the gate count for either SKT/LNC/GLC or any modern core
 
This job listing "news" has led to so many misleading pieces and takes esp. around "Hybrid era" or how 'P' and 'E' combos are dead, and some even managed to put "Unified Core" as a Royal Core tech.

 
This job listing "news" has led to so many misleading pieces and takes esp. around "Hybrid era" or how 'P' and 'E' combos are dead, and some even managed to put "Unified Core" as a Royal Core tech.

Kind of expected most Modern Tech journalist don't know anything what they are talking about
 
This job listing "news" has led to so many misleading pieces and takes esp. around "Hybrid era" or how 'P' and 'E' combos are dead, and some even managed to put "Unified Core" as a Royal Core tech.


Well, all of these Unified Core information are all rumors while Intel doesn't dare to say anything about future, calling them 'misleading' is still more or less unfair. But I agree 'Ditching hybrid core' might be incorrect and 'ditching hybrid architecture' is assured to be real though.
 
Well, all of these Unified Core information are all rumors while Intel doesn't dare to say anything about future, calling them 'misleading' is still more or less unfair. But I agree 'Ditching hybrid core' might be incorrect and 'ditching hybrid architecture' is assured to be real though.
They are not ditching Hybrid Cores lmao HX 370 is a hybrid design with both the cores derived of the same uArch
 
Last edited:
So since Unified Core will be a derivative of their E core, it makes me wonder how much bigger it would be if they were to modify it to be at feature parity with their P core lineup. For example, Skymont and Darkmont's FPUs are 4x128-bit. To reach parity with their P cores, they'll have to double the width of the FPUs and associated pathways.

For reference, for Zen 5 you can see how much area going from half-pumped to a true full 4x256-bit FPU required:
1771973914991.png
Currently, E cores are ~1/3 the area of their P cores. I wonder if Unified Core would be like 1/2 the area...
 
Well you said it yourself Intel L3 keeps botching design you can look at E core in PTL it's at the level of ARL-S perf/clock
The question is how hard the fabric screws the E-cores here.
The 17% IPC gain from Sierra Glenn to Darkmont is far behind the 34% IPC gain Huang saw in ARL-H E-cores vs MTL-H iso clock.

However, the perf/watt situation looks really bad here given what a 288 core Clearwater Forest matching as 192 core Turin Dense part ~iso power. Assuming a optimistic 1.3x SMT gain iso power (SMT gains iso power are pretty bad for AMD at low frequencies/power), you would get each AMD dense core being ~15% higher perf/watt than a darkmont core.
1771989843192.png

Maybe if we attribute all of the "missing" IPC gain Darkmont/Skymont sees in client but not in server, and pretend it costs no power what so ever, we would see a Darkmont core be ~ Zen 5 dense on N3E perf/watt, however that starts to look a lot less impressive again when we note that Turin Dense has less advanced packaging and also blows a bunch of area on full width avx-512.

Quite frankly, I don't think the E-cores are great in power as a whole. Even in Huang's data, where the shared L2 power is also being omitted from the power graphs, the E-cores don't seem amazing vs AMD's classic cores in power either.

Another interesting historical+forecast comparison that came up during lunch today was perf/process-normalized core area trends. Will Intel continue improving performance with roughly flat gate counts while AMD shows similar performance gains with markedly increased area?
Looks at SNC and GLC
Intel has had plenty of experience of improving performance while blowing up area. LNC doesn't look like it greatly increased core buffers/capacity dramatically vs SNC and GLC, but it also didn't see nearly as strong of an IPC uplift as those cores. And it blew up the core private caches instead.

For reference, for Zen 5 you can see how much area going from half-pumped to a true full 4x256-bit FPU required:
Yup, and Zen 5 mobile, which still has AVX-512 mind you, has half the FPU area as desktop/server Zen 5.

Currently, E cores are ~1/3 the area of their P cores. I wonder if Unified Core would be like 1/2 the area...
I think you overestimate how small unified core will be. Current E-cores are already like ~40% the area of the P-cores once you take into account the fact that the P-cores spend a bunch of area on huge core private caches.
And as ....
Intel's E cores also lack the "slack" space and extra transistors that their P cores require to achieve their target Fmax. That has a marked effect on the area of the E cores.
... states, you also save a bunch of area targeting lower Fmax. There are already tons of graphs showing how poorly area scales when you try to hit extremely high clocks.
1771989854828.png
To reach parity with their P cores, they'll have to double the width of the FPUs and associated pathways.
I think this is actually going to play a big part in either Arctic Wolf not improving IPC a bunch, or having to increase area (relative to the p cores) a good bit tbh. We will see how this ages in a couple of months ig lol.

There's also an interesting case study out there with the C1 ultra vs C1 premium. Both archs are very similar except for the FPU, where the C1 premium FPU is cut down significantly.
1771990356694.png
And the C1 ultra can also clock ~1.2x faster than the C1 premium as well. However, the C1 ultra is significantly larger than the C1 premium. One can prob look at the mediatek 9500 die shot and get an exact percentage, but it's large.
 
The question is how hard the fabric screws the E-cores here.
The 17% IPC gain from Sierra Glenn to Darkmont is far behind the 34% IPC gain Huang saw in ARL-H E-cores vs MTL-H iso clock.

However, the perf/watt situation looks really bad here given what a 288 core Clearwater Forest matching as 192 core Turin Dense part ~iso power. Assuming a optimistic 1.3x SMT gain iso power (SMT gains iso power are pretty bad for AMD at low frequencies/power), you would get each AMD dense core being ~15% higher perf/watt than a darkmont core.
View attachment 138788
Just Spam the cores for MT that's what the area efficiency is for.
Maybe if we attribute all of the "missing" IPC gain Darkmont/Skymont sees in client but not in server, and pretend it costs no power what so ever, we would see a Darkmont core be ~ Zen 5 dense on N3E perf/watt, however that starts to look a lot less impressive again when we note that Turin Dense has less advanced packaging and also blows a bunch of area on full width avx-512.
Advanced doesn't mean anything if you botch the design look at PVC it got the most advanced packing during it's time but fell flat core team relys on fabric team but they fall flat on the face each time in DC it's with PTL they didn't fall.
Quite frankly, I don't think the E-cores are great in power as a whole. Even in Huang's data, where the shared L2 power is also being omitted from the power graphs, the E-cores don't seem amazing vs AMD's classic cores in power either.
Their design target is perf/mm2 even than they are close to or better than their P core parts by a decent bit not that it is something to write home about.
Looks at SNC and GLC
Intel has had plenty of experience of improving performance while blowing up area. LNC doesn't look like it greatly increased core buffers/capacity dramatically vs SNC and GLC, but it also didn't see nearly as strong of an IPC uplift as those cores. And it blew up the core private caches instead.

Yup, and Zen 5 mobile, which still has AVX-512 mind you, has half the FPU area as desktop/server Zen 5.
Don't remember SIR2K17 Relying so heavily on FPU
I think you overestimate how small unified core will be. Current E-cores are already like ~40% the area of the P-cores once you take into account the fact that the P-cores spend a bunch of area on huge core private caches.
And as ....
Well it's 1.12mm2 Core + L1 for SKT vs 2.51mm2 FOR Lion Cove Core+L0+L1 so yeah the
... states, you also save a bunch of area targeting lower Fmax. There are already tons of graphs showing how poorly area scales when you try to hit extremely high clocks.
View attachment 138789

I think this is actually going to play a big part in either Arctic Wolf not improving IPC a bunch, or having to increase area (relative to the p cores) a good bit tbh. We will see how this ages in a couple of months ig lol.
Well have to see with Arctic wolf cause it's 256b pipes up from 128b
There's also an interesting case study out there with the C1 ultra vs C1 premium. Both archs are very similar except for the FPU, where the C1 premium FPU is cut down significantly.
View attachment 138790
And the C1 ultra can also clock ~1.2x faster than the C1 premium as well. However, the C1 ultra is significantly larger than the C1 premium. One can prob look at the mediatek 9500 die shot and get an exact percentage, but it's large.
Someone needs to measure it.
 
Advanced doesn't mean anything if you botch the design
Yes, but it does mean esentially free power advantages compared to iFOP that AMD server currently uses.
Their design target is perf/mm2 even than they are close to or better than their P core parts by a decent bit not that it is something to write home about.
Maybe, but given the E-cores currently aren't great at power, it's not a given that unified core would be either. In fact, power seems to be the biggest weakness of both the E and P-core team compared to the competition. The E-core teams are pretty good at area, and the P-core team is hanging on against the x86 competition in ST perf at least...

Not saying @adroc_thurston is right about unified core power, god knows how much he has gotten wrong before lol, but it's not hard to believe...
Don't remember SIR2K17 Relying so heavily on FPU
The last line you quoted was in response to how area increases when you have to widen the FPU (which unified core would have to do vs current E-cores), not specint2017 perf or anything.

My point with that was if simply reducing how effective your avx-512 implementation is saves you half of your FPU area, how much area would the E-cores have to add in order to get a P-core class AVX-512 capable FPU for unified core?
Even with Arctic Wolf the E-core team can prob get away with "double pumped" AVX-512 like Zen 4, but with unified core I'm expecting that they have to go a full 512 bit data path with their FPU.
Someone needs to measure it.
I would lol but I unfortunately don't have my mouse rn :/ Kurnal has the die shots on his twitter.
 
Maybe, but given the E-cores currently aren't great at power, it's not a given that unified core would be either. In fact, power seems to be the biggest weakness of both the E and P-core team compared to the competition. The E-core teams are pretty good at area, and the P-core team is hanging on against the x86 competition in ST perf at least...
uh they are decent for power why do you think PTL or ARL gets a decent bit higher MT score without SMT mostly due to the E cores as for ST perf E core team didn't get the Area budget
The last line you quoted was in response to how area increases when you have to widen the FPU (which unified core would have to do vs current E-cores), not specint2017 perf or anything.

My point with that was if simply reducing how effective your avx-512 implementation is saves you half of your FPU area, how much area would the E-cores have to add in order to get a P-core class AVX-512 capable FPU for unified core?
Even with Arctic Wolf the E-core team can prob get away with "double pumped" AVX-512 like Zen 4, but with unified core I'm expecting that they have to go a full 512 bit data path with their FPU.
Full fat avx-512 is expensive yes but what if even after adding AVX-512 you get a core that is better in PPA vs the current P core implementation
 
Back
Top