• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Intel’s Unified Core: Hammer Lake

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
CLIENT and mobile unified cores don't need full fat AVX512. They typically have neither the power budget nor memory bandwidth requires to fully feed that monster
 
uh they are decent for power why do you think PTL or ARL gets a decent bit higher MT score without SMT mostly due to the E cores
Well look at PTL like this.
PTL has esentially the same perf/watt as strix point in cinebench 2024 for most of the curve, with a marginal lead past 60 or so watts, while it's outright worse than in cinebench r23 than the competition.
Strix Point uses a 4+8 setup, PTL also has a 4+8 setup. AMD has SMT, sure, but Intel has an node advantage, also tacks on a 4 LPE island to their system and has better mem support, and a lower power uncore...

Nothing from Intel looks good on the power front so far, at least anything that relates to the cores. Even PTL, which outright has an uncore advantage over AMD that is not seen in DT or server, does not perform as well as it should across the power curve, for what it is.

The LP island is the only thing that seems impressive, but we have not seen AMD's answer to that to compare it to.
Full fat avx-512 is expensive yes but what if even after adding AVX-512 you get a core that is better in PPA vs the current P core implementation
Well sure, I just don't think the core will be half the area. I think unified core can cut a marginal amount of area for sure, but frankly I think area is the least important aspect out of PPA too.
Also, in terms of area, I think based on what we see from the x925 vs LNC area estimates, cutting out 20-30% area iso perf will put them on par with where stock ARM cores are on the P-core.
We will prob get another bench based on where ARM stock cores are in 2026 on N2 vs PTC as well. LNC vs whatever ARM has rn might be slightly more comparable though, because the FPU for PTC is going to have to be beefed up again for AVX-512, something which the ARM gang does not care about. And that by itself costs considerable area.
they don't need to add full avx512 in all e cores ,eg. lpe & small e
You won't need full width AVX-512 in any e-cores till unified core IMO, where technically it won't be a e-core anymore lol.
CLIENT and mobile unified cores don't need full fat AVX512. They typically have neither the power budget nor memory bandwidth requires to fully feed that monster
AMD offers full fat AVX-512 in their client desktop skus. Maybe mobile doesn't need it. Interestingly enough though, AMD thought full fat AVX-512 was important enough in Zen 5C Turin Dense products as well, despite power per core being low, and L3 per core also being low, just like mobile.
 
Well look at PTL like this.
PTL has esentially the same perf/watt as strix point in cinebench 2024 for most of the curve, with a marginal lead past 60 or so watts, while it's outright worse than in cinebench r23 than the competition.
Strix Point uses a 4+8 setup, PTL also has a 4+8 setup. AMD has SMT, sure, but Intel has an node advantage, also tacks on a 4 LPE island to their system and has better mem support, and a lower power uncore...
Well SMT is very helpful in Cinebench it's like 20ish % gain in r23 it's bit higher than from what I remember so lack of SMT will hurt here and node advantage just got cancelled out in MT.

Well sure, I just don't think the core will be half the area. I think unified core can cut a marginal amount of area for sure, but frankly I think area is the least important aspect out of PPA too.
Also, in terms of area, I think based on what we see from the x925 vs LNC area estimates, cutting out 20-30% area iso perf will put them on par with where stock ARM cores are on the P-core.
Same but P core still has the full Hardware for AVX-512 it's just disabled as for where Unified Puts them we don't know cause we don't have the leaked Targets.
 
So since Unified Core will be a derivative of their E core, it makes me wonder how much bigger it would be if they were to modify it to be at feature parity with their P core lineup. For example, Skymont and Darkmont's FPUs are 4x128-bit. To reach parity with their P cores, they'll have to double the width of the FPUs and associated pathways.

For reference, for Zen 5 you can see how much area going from half-pumped to a true full 4x256-bit FPU required:
View attachment 138773
Currently, E cores are ~1/3 the area of their P cores. I wonder if Unified Core would be like 1/2 the area...
I think that's mislabeled. Full Zen5 has four 512-bit datapaths.

AVX-512 doesn't have to be a huge area cost. Turin dense is able to squeeze 4x512 datapaths into 1.6mm2. The FP/AVX unit is 0.57mm2. (dieshot from twitter)
turin_dense.png

I expect the full-size unified core will be closer to a p-core than an e-core in size. But that'll stem from increasing the clock speed target, improving branch prediction, and otherwise widening the core.

One point I think some people miss is that Lion Cove is intentionally big. Intel uses the area savings from the e-cores to give a bigger area budget to the p-cores (it would be a decent plan if the p-cores were actually good). With hammer lake, Intel will no longer have the p-cores so they will allot that area budget to the unified cores.
 
I think there will be 2-4 E cores to make a super large core, similar to the royal project that was rejected by PAT, but that one is based on ARL and does not require a four-part design.
 
Well SMT is very helpful in Cinebench it's like 20ish % gain in r23 it's bit higher than from what I remember so lack of SMT will hurt here and node advantage just got cancelled out in MT.
And then you have to consider that PTL also has the 4 LPE island helping it in perf, and just a stronger uncore (lower power, faster memory speeds, lower memory latency).
PTL truthfully should be more ahead, and it already kinda isn't that ahead of strix point in perf/watt testing, though the only benches I have seen for it are cinebench r23 and 2024.
Same but P core still has the full Hardware for AVX-512 it's just disabled
Source?
AVX-512 doesn't have to be a huge area cost. Turin dense is able to squeeze 4x512 datapaths into 1.6mm2. The FP/AVX unit is 0.57mm2
This actually is a lot of area. That alone is like a third of the core area excluding the L2. Also for reference, the Zen 5 dense N3E FPU alone is >50% the area of a Skymont core. The FPU in LNC is actually a similar size, but keep in mind it clocks like what, almost 60% higher?
AVX-512 maybe doesn't haaaaveee to be a huge area cost, but the way it's implemented in Zen 5C on Turin Dense is pretty expensive, esp relative to the rest of the Zen 5C core.
One point I think some people miss is that Lion Cove is intentionally big. Intel uses the area savings from the e-cores to give a bigger area budget to the p-cores (it would be a decent plan if the p-cores were actually good).
Honestly it looks like the P-cores have a bunch of extra space to grow because of how big the E-core clusters are getting lol.
There's also the problem of P-cores having to be used in DC, but I still think area is the least important factor in PPA, and Intel especially seems to be giving very little concern into area by adding massive core private caches and AMX extensions per core.
I think there will be 2-4 E cores to make a super large core, similar to the royal project that was rejected by PAT, but that one is based on ARL and does not require a four-part design.
Doubt it. Every new core rework Intel cancels seems to be getting less and less risky tbh. I forget who claimed this (prob exist50 or @adroc_thurston ), but apparently Ocean Cove was a large, large overhaul of CPU architectures, more radical than even Royal Core. And then Royal Core, which seemed to be pretty innovative (if not successful) too, got canned. Now we have unified core, and I think if we are lucky we see them adopt an ARM style, clock speed reset where they dramatically increase IPC and go wide to catch up to the ARM competition. Unfortunately, I think it's more likely I think we just see a more area/power efficient version of the current P-cores.
 
AMD offers full fat AVX-512 in their client desktop skus. Maybe mobile doesn't need it. Interestingly enough though, AMD thought full fat AVX-512 was important enough in Zen 5C Turin Dense products as well, despite power per core being low, and L3 per core also being low, just like mobile.
Client desktop has it due to how AMD does their CCDs. In ST, they can achieve their expected throughput, but the CCD doesn't have enough IOD link bandwidth to support all 8 cores running full song. Server is a different beast with lower clocks and fatter memory.
 
Well it's there since GLC
I think that's mislabeled. Full Zen5 has four 512-bit datapaths.

AVX-512 doesn't have to be a huge area cost. Turin dense is able to squeeze 4x512 datapaths into 1.6mm2. The FP/AVX unit is 0.57mm2. (dieshot from twitter)
View attachment 138813

I expect the full-size unified core will be closer to a p-core than an e-core in size. But that'll stem from increasing the clock speed target, improving branch prediction, and otherwise widening the core.

One point I think some people miss is that Lion Cove is intentionally big. Intel uses the area savings from the e-cores to give a bigger area budget to the p-cores (it would be a decent plan if the p-cores were actually good). With hammer lake, Intel will no longer have the p-cores so they will allot that area budget to the unified cores.
i think the dimension are a bit wrong i clearly remember Zen5C being 2.98mm2 with L2
 
Honestly it looks like the P-cores have a bunch of extra space to grow because of how big the E-core clusters are getting lol.
There's also the problem of P-cores having to be used in DC, but I still think area is the least important factor in PPA, and Intel especially seems to be giving very little concern into area by adding massive core private caches and AMX extensions per core.
Yeah, I could see the P-cores getting even bigger with Nova Lake. Since they're already doing heterogeneous cores and there's little point to more than 8 big cores in a big.LITTLE CPU, might as well go for the single thread performance crown.

I think cache is a huge part of this discussion, more important than the size of the cores themselves. If Skymont/Darkmont used a whole L3 stop per core, they would look really bad. Conversely, Intel could've make a dense variation of the p-core (à la Zen5c) and put it on a shared 4MB L2 cluster. It would have worse performance than the full p-cores and worse perf/area than the e-cores, but better perf/area than a full p-core. I wonder if Intel will change up the L1.5 and L2 caches for server.

The unified core could use L2 to differentiate between dense and full size cores. That would allow them to keep a lot of the perf/area savings of the current e-cores.
  • full size unified core with 2 cores on a shared L2 cluster and L3 stop.
  • dense unified core with 4 cores on a shared L2 cluster and L3 stop.
 
Yeah, I could see the P-cores getting even bigger with Nova Lake. Since they're already doing heterogeneous cores and there's little point to more than 8 big cores in a big.LITTLE CPU, might as well go for the single thread performance crown.
Sorry to say but it's the opposite Coyote Cove is smaller than Lion Cove in terms of area on their respective nodes they are on iso node sure Coyote is bigger. This is further solidified by the fact that NVL is 98mm2 8+16 and ARL is 117mm2.
I think cache is a huge part of this discussion, more important than the size of the cores themselves. If Skymont/Darkmont used a whole L3 stop per core, they would look really bad. Conversely, Intel could've make a dense variation of the p-core (à la Zen5c) and put it on a shared 4MB L2 cluster. It would have worse performance than the full p-cores and worse perf/area than the e-cores, but better perf/area than a full p-core. I wonder if Intel will change up the L1.5 and L2 caches for server.

The unified core could use L2 to differentiate between dense and full size cores. That would allow them to keep a lot of the perf/area savings of the current e-cores.
  • full size unified core with 2 cores on a shared L2 cluster and L3 stop.
  • dense unified core with 4 cores on a shared L2 cluster and L3 stop.
Yeah most likely
 
Last edited:
Just a note here, the A19 P-core is smaller than the A18 P-core.

The 5% shrink of A19P core looks about bang on the mark for what you'd get from the optical shrink of N3E to N3P. There may not have been all that much changed, the performance gains above what they got from frequency were probably of a more "housekeeping" nature.

N2 delivers a bigger shrink so we may see bigger core changes that utilize the additional transistors/mm^2 the process provides.

That's kind of what Intel did with tick/tock back in the day, where they'd have a big shrink node and then sort of Intel's equivalent of TSMC's "P" node the following year, and adjusted their core design cadence to match.
 
Back
Top