DisEnchantment
Golden Member
Speculate at will
Last edited:
CloudFlare ran centriq? Which % of all the CPUs? Maybe they bought 80% of what QCOM sold?CloudFlare and they don't work like that.
They just pick a h/w option for each server generation.
It can be AMD, Intel, QCOM (they ran Centriq 2400), whatever.
Yes.CloudFlare ran centriq?
I don't remember. 2017 and all. eons ago.Which % of all the CPUs
Yup.Maybe they bought 80% of what QCOM sold?
AMD loves to have typos in official materials.AMD states that Zen4 has 3x FP Pipes and Zen5 has 4x FP Pipes.
Assuming AMD is correct on this slide, it stands to reason that even if the three FP pipelines in Zen4 were 256-bit, they wouldn't be able to be combined into more than 1x512.Vector instructions and floating point instructions can execute at a rate of two vector additions, two vector multiplications, and two vector read or write instructions simultaneously per clock cycle. All vector units have full 512 bits capabilities except for memory writes. A 512-bit vector write instruction is executed as two 256-bit writes.
Integer memory operations can execute at a rate of four reads per clock cycle or two reads and two writes. Floating point and vector memory operations can execute at the rate of two reads or writes per clock cycle, except for 512-bit writes.
Software Optimization Guide for the AMD Zen4 Microarchitecture, publication number 57647Which AMD materials clearly show/state that Zen4 has a physical 4x256 bit?
Figure 6 below shows a basic diagram of the floating point unit and how it interfaces with the other
units in the processor. Notice that there are four execution pipes (0 through 3) that can execute an
operation every cycle. (...)
Because the data paths are 256 bits wide, the scheduler uses two consecutive cycles to issue a
512-bit operation.
Software Optimization Guide for the AMD Zen5 Microarchitecture, publication number 58455And which AMD materials confirm that Zen5 has a physical 4x512 bit?
Notice that there are four execution pipes (0 through 3) that can execute an operation every cycle. (...) The floating-point unit supports AVX-512 with full 512-bit data paths and operations.
Just because Zen5 can execute two 512-bit operands of one type and two 512-bit operands of another type, but only 2x512 at a time, doesn't mean it has a physical FP of 4x512 bits.
Similarly, Zen4 can execute 2x256 bits of one type and 2x256 bits of another type, but can only execute 2x256 at a time, doesn't mean it has a physical FP of 4x256 bits.
Someone let the intern make the slides.So it's possible that these 3FP Pipes for Zen4 and 4FP Pipes for Zen5 are a typo?
You can tell the real AMD marketing materials from the fakes by the fact that the authentic AMD materials are always completely full of errors.So it's possible that these 3FP Pipes for Zen4 and 4FP Pipes for Zen5 are a typo?
Is it just me, or is this not a nice showing? Perf doesn't scale linearly with power, and 80% the power draw for 90% the perf sounds achievable for a product on the same node.When taking the geometric mean of 555 benchmarks in total, the AMD EPYC 9745 was delivering about 90% the performance of its 128 core EPYC 9755 sibling. Considering that the EPYC 9745 has just 80% the TDP rating of the EPYC 9755, that's a nice showing for the EPYC 9745 with its dense cores.
Like what? We know that you can scale Zen 5 standard cores to even shy of 1 watt per core if need be.Scaling down the 9755 likely has some other issues that we just aren't seeing.
The problem is that it seems as if the 9755 was limited to the 9745's tdp... you would get very similar perf.The two products serve two completely different markets. One is pushing maximum per-thread performance achievable with the socket power budget. The other is trying to wring every last drop of performance out of lower power budgets, especially for legacy platforms that are limited to lower TDPs. It's that simple.
it haez half the cache and also a lot less CCDs in case you've ever wondered.I'm just going to copy paste my reddit comment here lol but:
Is it just me, or is this not a nice showing? Perf doesn't scale linearly with power, and 80% the power draw for 90% the perf sounds achievable for a product on the same node.
Even for the more embarrassingly multithreaded benchmarks, such as Blender, you see the 9745 getting 20-25% better perf/watt, however a good chunk of that can also easily be attributed to simply consuming way less power and thus getting better perf/watt from that.
For example, assuming each core in the 9745 is maybe getting ~2 watts, while each core in the 9755 is getting ~3 watts, that alone would provide an almost 30% perf/watt advantage to the 9745... if it was using the exact same N4P CCDs as the 9755.
It seems to me as if, per core, Zen 5C on N3E in Turin Dense is not any more efficient than Zen 5 on N4P in regular Turin.
Ofc, the only way to really confirm this would be to have someone test the two chips at the same power draw.
From: https://www.phoronix.com/review/amd-epyc-9745-9755/9
Half the cache would be the explanation that's believable, that PPC is getting negatively effected too much in some workloads and that the frequency/power benefit isn't enough to compensate.it haez half the cache and also a lot less CCDs in case you've ever wondered
It's an advantage for cost.Fewer CCDs seems like an advantage though, since you worry less about interconnect power
AFAICT, several of these benchmarks have poor performance-over-power scaling on the 9745 because of its halved level 3 cache.Nope the 400W is just at the high end of power consumption. Dropping to 320 W TDP has a 2% impact on performance but saves power. The dense cores are being pushed to the edge of their VF curve, while the classic ones are not.
More typical usage, +50% power for +10% performance.
Wouldn't be an AMD marketing item without at least one mistake. Calling the 9950x3D both Ryzen 7 and Ryzen 9 in the same video!