dttprofessor
Member
they don't need to add full avx512 in all e cores ,eg. lpe & small e
Alder Lake flashback.they don't need to add full avx512 in all e cores ,eg. lpe & small e
What?Alder Lake flashback.
well port size can be optionalCLIENT and mobile unified cores don't need full fat AVX512. They typically have neither the power budget nor memory bandwidth requires to fully feed that monster
Well look at PTL like this.uh they are decent for power why do you think PTL or ARL gets a decent bit higher MT score without SMT mostly due to the E cores
Well sure, I just don't think the core will be half the area. I think unified core can cut a marginal amount of area for sure, but frankly I think area is the least important aspect out of PPA too.Full fat avx-512 is expensive yes but what if even after adding AVX-512 you get a core that is better in PPA vs the current P core implementation
You won't need full width AVX-512 in any e-cores till unified core IMO, where technically it won't be a e-core anymore lol.they don't need to add full avx512 in all e cores ,eg. lpe & small e
AMD offers full fat AVX-512 in their client desktop skus. Maybe mobile doesn't need it. Interestingly enough though, AMD thought full fat AVX-512 was important enough in Zen 5C Turin Dense products as well, despite power per core being low, and L3 per core also being low, just like mobile.CLIENT and mobile unified cores don't need full fat AVX512. They typically have neither the power budget nor memory bandwidth requires to fully feed that monster
Well SMT is very helpful in Cinebench it's like 20ish % gain in r23 it's bit higher than from what I remember so lack of SMT will hurt here and node advantage just got cancelled out in MT.Well look at PTL like this.
PTL has esentially the same perf/watt as strix point in cinebench 2024 for most of the curve, with a marginal lead past 60 or so watts, while it's outright worse than in cinebench r23 than the competition.
Strix Point uses a 4+8 setup, PTL also has a 4+8 setup. AMD has SMT, sure, but Intel has an node advantage, also tacks on a 4 LPE island to their system and has better mem support, and a lower power uncore...
Same but P core still has the full Hardware for AVX-512 it's just disabled as for where Unified Puts them we don't know cause we don't have the leaked Targets.Well sure, I just don't think the core will be half the area. I think unified core can cut a marginal amount of area for sure, but frankly I think area is the least important aspect out of PPA too.
Also, in terms of area, I think based on what we see from the x925 vs LNC area estimates, cutting out 20-30% area iso perf will put them on par with where stock ARM cores are on the P-core.
I think that's mislabeled. Full Zen5 has four 512-bit datapaths.So since Unified Core will be a derivative of their E core, it makes me wonder how much bigger it would be if they were to modify it to be at feature parity with their P core lineup. For example, Skymont and Darkmont's FPUs are 4x128-bit. To reach parity with their P cores, they'll have to double the width of the FPUs and associated pathways.
For reference, for Zen 5 you can see how much area going from half-pumped to a true full 4x256-bit FPU required:
View attachment 138773
Currently, E cores are ~1/3 the area of their P cores. I wonder if Unified Core would be like 1/2 the area...

And then you have to consider that PTL also has the 4 LPE island helping it in perf, and just a stronger uncore (lower power, faster memory speeds, lower memory latency).Well SMT is very helpful in Cinebench it's like 20ish % gain in r23 it's bit higher than from what I remember so lack of SMT will hurt here and node advantage just got cancelled out in MT.
Source?Same but P core still has the full Hardware for AVX-512 it's just disabled
This actually is a lot of area. That alone is like a third of the core area excluding the L2. Also for reference, the Zen 5 dense N3E FPU alone is >50% the area of a Skymont core. The FPU in LNC is actually a similar size, but keep in mind it clocks like what, almost 60% higher?AVX-512 doesn't have to be a huge area cost. Turin dense is able to squeeze 4x512 datapaths into 1.6mm2. The FP/AVX unit is 0.57mm2
Honestly it looks like the P-cores have a bunch of extra space to grow because of how big the E-core clusters are getting lol.One point I think some people miss is that Lion Cove is intentionally big. Intel uses the area savings from the e-cores to give a bigger area budget to the p-cores (it would be a decent plan if the p-cores were actually good).
Doubt it. Every new core rework Intel cancels seems to be getting less and less risky tbh. I forget who claimed this (prob exist50 or @adroc_thurston ), but apparently Ocean Cove was a large, large overhaul of CPU architectures, more radical than even Royal Core. And then Royal Core, which seemed to be pretty innovative (if not successful) too, got canned. Now we have unified core, and I think if we are lucky we see them adopt an ARM style, clock speed reset where they dramatically increase IPC and go wide to catch up to the ARM competition. Unfortunately, I think it's more likely I think we just see a more area/power efficient version of the current P-cores.I think there will be 2-4 E cores to make a super large core, similar to the royal project that was rejected by PAT, but that one is based on ARL and does not require a four-part design.
Client desktop has it due to how AMD does their CCDs. In ST, they can achieve their expected throughput, but the CCD doesn't have enough IOD link bandwidth to support all 8 cores running full song. Server is a different beast with lower clocks and fatter memory.AMD offers full fat AVX-512 in their client desktop skus. Maybe mobile doesn't need it. Interestingly enough though, AMD thought full fat AVX-512 was important enough in Zen 5C Turin Dense products as well, despite power per core being low, and L3 per core also being low, just like mobile.
Well it's there since GLCSource?
i think the dimension are a bit wrong i clearly remember Zen5C being 2.98mm2 with L2I think that's mislabeled. Full Zen5 has four 512-bit datapaths.
AVX-512 doesn't have to be a huge area cost. Turin dense is able to squeeze 4x512 datapaths into 1.6mm2. The FP/AVX unit is 0.57mm2. (dieshot from twitter)
View attachment 138813
I expect the full-size unified core will be closer to a p-core than an e-core in size. But that'll stem from increasing the clock speed target, improving branch prediction, and otherwise widening the core.
One point I think some people miss is that Lion Cove is intentionally big. Intel uses the area savings from the e-cores to give a bigger area budget to the p-cores (it would be a decent plan if the p-cores were actually good). With hammer lake, Intel will no longer have the p-cores so they will allot that area budget to the unified cores.
That's Zen5c mobile on N4P. This is Zen5c server on N3E. They're not at all the same and should probably have two different names.i think the dimension are a bit wrong i clearly remember Zen5C being 2.98mm2 with L2
No i meant this particular die shot i measured itThat's Zen5c mobile on N4P. This is Zen5c server on N3E. They're not at all the same and should probably have two different names.
I mean they are the same core; that's the whole point.That's Zen5c mobile on N4P. This is Zen5c server on N3E. They're not at all the same and should probably have two different names.
You're right. I was mistakenly counting the pixels of the part of the image that I had cropped out (thanks gimp). Corrected image below, though your numbers are probably more accurate.

Turin Dense has the full AVX-512, not double pumped like Zen5c mobile.I mean they are the same core; that's the whole point.
i will update that thanksJust a note here, the A19 P-core is smaller than the A18 P-core.
View attachment 138823
(I know its not the Apple thread, but I was looking at the spreadsheet and I noticed it, so seems pertinent)
Yeah, I could see the P-cores getting even bigger with Nova Lake. Since they're already doing heterogeneous cores and there's little point to more than 8 big cores in a big.LITTLE CPU, might as well go for the single thread performance crown.Honestly it looks like the P-cores have a bunch of extra space to grow because of how big the E-core clusters are getting lol.
There's also the problem of P-cores having to be used in DC, but I still think area is the least important factor in PPA, and Intel especially seems to be giving very little concern into area by adding massive core private caches and AMX extensions per core.
Sorry to say but it's the opposite Coyote Cove is smaller than Lion Cove in terms of area on their respective nodes they are on iso node sure Coyote is bigger. This is further solidified by the fact that NVL is 98mm2 8+16 and ARL is 117mm2.Yeah, I could see the P-cores getting even bigger with Nova Lake. Since they're already doing heterogeneous cores and there's little point to more than 8 big cores in a big.LITTLE CPU, might as well go for the single thread performance crown.
Yeah most likelyI think cache is a huge part of this discussion, more important than the size of the cores themselves. If Skymont/Darkmont used a whole L3 stop per core, they would look really bad. Conversely, Intel could've make a dense variation of the p-core (à la Zen5c) and put it on a shared 4MB L2 cluster. It would have worse performance than the full p-cores and worse perf/area than the e-cores, but better perf/area than a full p-core. I wonder if Intel will change up the L1.5 and L2 caches for server.
The unified core could use L2 to differentiate between dense and full size cores. That would allow them to keep a lot of the perf/area savings of the current e-cores.
- full size unified core with 2 cores on a shared L2 cluster and L3 stop.
- dense unified core with 4 cores on a shared L2 cluster and L3 stop.
Just a note here, the A19 P-core is smaller than the A18 P-core.
Some of that is also due to savings from the shared L2 for P cores.This is further solidified by the fact that NVL is 98mm2 8+16 and ARL is 117mm2.