Question x86 and ARM architectures comparison thread.

adroc_thurston · Mar 5, 2026

yuri69 said:
That's the throughput part.

Nope!
Most server workloads are NOT that. at all.

poke01 · Mar 20, 2026

511 said:
Skymont on decent bit behind node refinement and horrible uncore we don't have an Apples to Apple comparison pun not intended insert core on windows vs core on MacOS.

this was a Linux vs macOS SPEC17 comparison. macOS is 4% faster in SPEC.

SPEC CPU 2017 | David Huang's Blog

But the main point is the M core uses only 2.5 watts to reach 10.5 points in SPEC Int. I don't think theres any core from others that comes close to that perf/w yet.

511 · Mar 20, 2026

poke01 said:
this was a Linux vs macOS SPEC17 comparison. macOS is 4% faster in SPEC.

SPEC CPU 2017 | David Huang's Blog
View attachment 140355

But the main point is the M core uses only 2.5 watts to reach 10.5 points in SPEC Int. I don't think theres any core from others that comes close to that perf/w yet.

well the issue is he is using -O3 with clang vs David Huang using GCC 12 and and -O2 flag matters and we won't know how the power measurements were taken Apple Power metrics shouldn't be taken at face value.

511 · Mar 20, 2026

duplicate

mvprod123 · Mar 20, 2026

https://twitter.com/x/status/2034869777978859708

LightningDust · Mar 20, 2026

mvprod123 said:
https://twitter.com/x/status/2034869777978859708

You're telling me Vera has... an L1 cache?!

Shocking revelations.

Saylick · Mar 20, 2026

mvprod123 said:
https://twitter.com/x/status/2034869777978859708

Very symmetric across all resources which allows them to implement a very basic form of SMT by simply statically partitioning everything in half.

poke01 · Sunday at 9:10 PM

This is a seriously impressive feat by Apples CPU teams

poke01 · Sunday at 9:32 PM

poke01 said:
This is a seriously impressive feat by Apples CPU teams

View attachment 140525

he also tested gaming and Strix Halo won there. That’s embarrassing for Apple considering RDNA3.5 is one generation behind.

gdansk · Sunday at 10:05 PM

poke01 said:
he also tested gaming and Strix Halo won there. That’s embarrassing for Apple considering RDNA3.5 is one generation behind.

Halo is using a slightly tweaked 2022 architecture with about half the memory bandwidth of the 32 GPU core M5 Max. And it's on 4nm instead of 3nm. But I guess ~40W can solve lots of problems.

S'renne · Sunday at 11:22 PM

poke01 said:
he also tested gaming and Strix Halo won there. That’s embarrassing for Apple considering RDNA3.5 is one generation behind.

Doesn't Apple Silicon being TBDR means benchmarking the same titles that has a native Mac port being better? Also Apple Silicon relies on on chip bandwidth for GPU bandwidth rather than DRAM isn't it?

MerryCherry · Sunday at 11:33 PM

S'renne said:
Also Apple Silicon relies on on chip bandwidth for GPU bandwidth rather than DRAM isn't it?

What? No.

In Apple Silicon the DRAM is on-package, not on-chip. It is technically no different from DRAM soldered to the motherboard like most laptops, except that you get some power and space savings.

poke01 · Sunday at 11:37 PM

S'renne said:
Doesn't Apple Silicon being TBDR means benchmarking the same titles that has a native Mac port being better? Also Apple Silicon relies on on chip bandwidth for GPU bandwidth rather than DRAM isn't it?

yeah TBDR is a problem. Almost all AAA games are made for IMR.

S'renne · Monday at 12:04 AM

MerryCherry said:
What? No.

In Apple Silicon the DRAM is on-package, not on-chip. It is technically no different from DRAM soldered to the motherboard like most laptops, except that you get some power and space savings.

Different graphics rendering technique, TBDR relies on advanced on chip cache bandwidth and efficiency for rendering workloads/tiles and passes before being read/written to DRAM is what I'm saying, unlike IMR that AMD uses for Strix Halo

S'renne · Monday at 12:06 AM

poke01 said:
yeah TBDR is a problem. Almost all AAA games are made for IMR.

Didn't Apple themselves already promote the M5 gaming performance gains on Cyberpunk? They should just benchmark the native Metal 3 version of the port vs Strix Halo until they hopefully update the old game to Metal 4

itsmydamnation · Monday at 2:28 AM

S'renne said:
Different graphics rendering technique, TBDR relies on advanced on chip cache bandwidth and efficiency for rendering workloads/tiles and passes before being read/written to DRAM is what I'm saying, unlike IMR that AMD uses for Strix Halo

IMR's havent been IMR's for like 10+ years , they have binned things for ages.

S'renne · Monday at 3:44 AM

itsmydamnation said:
IMR's havent been IMR's for like 10+ years , they have binned things for ages.

True but its still not what Apple uses so direct comparison is still literally apple to oranges unless its native port versions of the same title vs PC in a benchmark imo

mvprod123 · Monday at 4:50 AM

poke01 said:
he also tested gaming and Strix Halo won there. That’s embarrassing for Apple considering RDNA3.5 is one generation behind.

In Blender, the basic M5 outperforms the Strix Halo. There are still plenty of ways to optimise performance for gaming, but unfortunately the developers aren’t making full use of them.

S'renne · Monday at 5:45 AM

mvprod123 said:
In Blender, the basic M5 outperforms the Strix Halo. There are still plenty of ways to optimise performance for gaming, but unfortunately the developers aren’t making full use of them.

Apple M5 GPU Roofline Analysis

In this deep dive, we will examine M5's GPU performance across various workloads. TLDR: It can be very powerful if the programmer knows how to use it correctly.

www.michaelstinkerings.org

Idk how accurate this guy is, but it suggests that games needs a good port or the games will only use half of the GPU's power

511 · Monday at 5:52 AM

S'renne said:
Idk how accurate this guy is, but it suggests that games needs a good port or the games will only use half of the GPU's power

even PC doesn't good game ports 🤣 🤣 🤣

poke01 · Monday at 6:09 AM

mvprod123 said:
In Blender, the basic M5 outperforms the Strix Halo. There are still plenty of ways to optimise performance for gaming, but unfortunately the developers aren’t making full use of them.

i know it does in blender but the 3nm M5 shader core is weaker than a RDNA 3.5 on 4nm in pure raster but this only matters for old games. I bet if RT was enabled here, M5 Max would be much faster.

All AAA games released this year use RT and the PS6 AAA games will use even more demanding RT.

mvprod123 · Monday at 7:00 AM

poke01 said:
i know it does in blender but the 3nm M5 shader core is weaker than a RDNA 3.5 on 4nm in pure raster but this only matters for old games. I bet if RT was enabled here, M5 Max would be much faster.

All AAA games released this year use RT and the PS6 AAA games will use even more demanding RT.

The 8060s performs on par with the M5 Pro in Steel Nomad. I think the issue is not with the M5’s shader cores (they’re not weak) but with game optimisation. The reason Blender works so well is not only down to Apple’s superior RT implementation, but also to excellent optimisation and a well-implemented Metal backend. Apple has assigned a dedicated team of engineers to work on optimising Blender for Mac. If Apple had formed a separate engineering team to optimise game ports for Mac, the results would have been better.

Result

www.3dmark.com

S'renne · Monday at 8:58 AM

mvprod123 said:
The 8060s performs on par with the M5 Pro in Steel Nomad. I think the issue is not with the M5’s shader cores (they’re not weak) but with game optimisation. The reason Blender works so well is not only down to Apple’s superior RT implementation, but also to excellent optimisation and a well-implemented Metal backend. Apple has assigned a dedicated team of engineers to work on optimising Blender for Mac. If Apple had formed a separate engineering team to optimise game ports for Mac, the results would have been better.

View attachment 140534

Result

www.3dmark.com

If only Apple acquired a game studio from back then lol

The Hardcard · Tuesday at 7:03 AM

S'renne said:
Apple M5 GPU Roofline Analysis

In this deep dive, we will examine M5's GPU performance across various workloads. TLDR: It can be very powerful if the programmer knows how to use it correctly.

www.michaelstinkerings.org

Idk how accurate this guy is, but it suggests that games needs a good port or the games will only use half of the GPU's power

That is an impressive and detailed analysis. I am puzzled, given how much he assesses LLM performance implications, that he doesn’t even mention the neural accelerators, let alone test them.

S'renne · Tuesday at 11:46 AM

The Hardcard said:
That is an impressive and detailed analysis. I am puzzled, given how much he assesses LLM performance implications, that he doesn’t even mention the neural accelerators, let alone test them.

If you mean the ANE, its not part of the GPU die that's why

Question x86 and ARM architectures comparison thread.

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Member

Diamond Member

Member

Member

Diamond Member

Member

Senior member

Member

Diamond Member

Diamond Member

Senior member

Member

Senior member

Member