• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Apple Silicon SoC thread

Page 471 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:
Was that on an iGPU or an dGPU? When you have a single memory pool, your RAM is also your VRAM, and VRAM matters now even when you browse the web or open Chromium based apps.
That was a dGPU, but I myself played on an 8GB single channel system and SC2 with mining app running in the background. I am an enthusiast but I also remember when everyone though 1280x1024 resolution and 30 fps is fine and that we are actually very adaptable, as long as you are willing to. Because there might be a day when we have no more new computers and by that time any computer would be valued.
Presumably, someone bought that about 15 years ago and wisely avoided the cheapo 4GB SB laptops to allow such longevity. Maybe not.
Desktop. I know from personal experience enabling all C and P states that significant(but not all) responsiveness difference between laptop/desktop is due to those states. It was noticeable enough that in a desktop-form factor it was not acceptable to save maybe 5W.

If it wasn't for the laptop breaking due to a fall my mother would still be using an AMD A10 laptop.

The biggest point is that this is an Apple device, going for new Core i3 Windows prices. Unless they mess this up bad, they will take nice amount of share. Between Apple taking share and RAM/SSD pricing likely outright bankrupting few vendors, they are going to have a hard time.
 
Last edited:
The SoC is used in Phones, it probably doesn't have PC class storage performance of today's NVMe SSDs (and the complicated thing Apple uses in M SoCs for their petty reasons). Various latencies and IOPS limits likely add up when running a heavy OS.
It's the same thing Apple uses in M series SoCs, because why would it be any different? As far as I know the only difference is that M series chips have more flash memory channels than A series.

Your perceptions / prejudices about Apple's SSD tech are a little strange. Do you not know that it's literally NVMe plus a couple Apple extensions? The NVMe controller here isn't a separate chip made by Phison or whomever, instead it's an Apple designed controller integrated into the main SoC. That's why the SoC only provides for connecting flash memory to the SoC, and why you don't see a M.2 socket for connecting a standalone drive with its own NVMe controller.

Their reasons for doing this aren't "petty". The big headline feature they gain is tight integration of their NVMe controller with the Secure Enclave inside the same SoC, which results in a much better encryption architecture than is possible with generic NVMe's TCG Opal. (This is the reason why Apple had to extend the NVMe protocol a bit.) There's probably other improvements, too.

I stand corrected.

Technically it's 4 LPDDR channels. But why does the number of channels matter? What matters is the bandwidth at the end, right? Not the means used to achieve it.
Bandwidth is not the only thing that matters. Or, to be more precise, you can lose theoetical bandwidth due to other limitations, which I'm going to explain...

Let's say the reason why you have high bandwidth demands is that there's a really complicated SoC with a lot of memory agents, each generating its own memory request stream. Under these circumstances, memory controller channel count is a very important scaling resource. More channels equals more open pages, and more commands in flight. The more things you have generating memory requests, the more memory subsystem parallelism you need.

This is pretty much the entire reason why LPDDR4 and LPDDR5, the JEDEC standards designed primarily for phones, have narrower channels than desktop RAM standards. Phones SoCs are all unified memory systems, with a ton of bus agents. It's very common to have more agents generating memory requests than you'd see in a traditional performance desktop PC. Hence, LPDDR5's 16-bit wide channels - even phones really need four channels. If you built the same thing but with a single 64-bit wide channel, real-world bandwidth would drop relative to 4x16, just because you'd be much more likely to suffer from stalls waiting for pages to close and open and there'd be less queue capacity.
 
There's probably other improvements, too.
High speed storage controllers can throw off a lot of heat. Apple already has all the thermal management hanging off the SOC, so that's the best place to put that heat.

Bandwidth is not the only thing that matters. Or, to be more precise, you can lose theoetical bandwidth due to other limitations, which I'm going to explain...

Let's say the reason why you have high bandwidth demands is that there's a really complicated SoC with a lot of memory agents, each generating its own memory request stream. Under these circumstances, memory controller channel count is a very important scaling resource. More channels equals more open pages, and more commands in flight. The more things you have generating memory requests, the more memory subsystem parallelism you need.

This is pretty much the entire reason why LPDDR4 and LPDDR5, the JEDEC standards designed primarily for phones, have narrower channels than desktop RAM standards. Phones SoCs are all unified memory systems, with a ton of bus agents. It's very common to have more agents generating memory requests than you'd see in a traditional performance desktop PC. Hence, LPDDR5's 16-bit wide channels - even phones really need four channels. If you built the same thing but with a single 64-bit wide channel, real-world bandwidth would drop relative to 4x16, just because you'd be much more likely to suffer from stalls waiting for pages to close and open and there'd be less queue capacity.
It's worth noting that the camera is the driving factor for why Apple put NVMe storage in the iPhone and the main task on the device is streaming ~10Gb/s off the camera sensors to compute and about ⅓ of that to storage. I think if there's any opportunity to make that process lower power, etc. Apple will go to pretty great lengths to take it as there are a lot of users that are doing that level of recording for hours on end.
 
It's the same thing Apple uses in M series SoCs, because why would it be any different? As far as I know the only difference is that M series chips have more flash memory channels than A series.

Your perceptions / prejudices about Apple's SSD tech are a little strange. Do you not know that it's literally NVMe plus a couple Apple extensions? The NVMe controller here isn't a separate chip made by Phison or whomever, instead it's an Apple designed controller integrated into the main SoC. That's why the SoC only provides for connecting flash memory to the SoC, and why you don't see a M.2 socket for connecting a standalone drive with its own NVMe controller.

Their reasons for doing this aren't "petty". The big headline feature they gain is tight integration of their NVMe controller with the Secure Enclave inside the same SoC, which results in a much better encryption architecture than is possible with generic NVMe's TCG Opal. (This is the reason why Apple had to extend the NVMe protocol a bit.) There's probably other improvements, too.
The only actual reason is to prevent user upgrades of SSDs which would cut into the huge profit margins Apple has on extra NAND. The security part is not really requiring this.
Petty.

(BTW, imagine a situation where the security aspect thing would require using standard, commonly available drive and the proprietary way was less secure but allowed those NAND upcharges. Guess what would Apple pick.)

Bandwidth is not the only thing that matters. Or, to be more precise, you can lose theoetical bandwidth due to other limitations, which I'm going to explain...

Let's say the reason why you have high bandwidth demands is that there's a really complicated SoC with a lot of memory agents, each generating its own memory request stream. Under these circumstances, memory controller channel count is a very important scaling resource. More channels equals more open pages, and more commands in flight. The more things you have generating memory requests, the more memory subsystem parallelism you need.

This is pretty much the entire reason why LPDDR4 and LPDDR5, the JEDEC standards designed primarily for phones, have narrower channels than desktop RAM standards. Phones SoCs are all unified memory systems, with a ton of bus agents. It's very common to have more agents generating memory requests than you'd see in a traditional performance desktop PC. Hence, LPDDR5's 16-bit wide channels - even phones really need four channels. If you built the same thing but with a single 64-bit wide channel, real-world bandwidth would drop relative to 4x16, just because you'd be much more likely to suffer from stalls waiting for pages to close and open and there'd be less queue capacity.
 
The only actual reason is to prevent user upgrades of SSDs which would cut into the huge profit margins Apple has on extra NAND. The security part is not really requiring this.
Petty.

(BTW, imagine a situation where the security aspect thing would require using standard, commonly available drive and the proprietary way was less secure but allowed those NAND upcharges. Guess what would Apple pick.)
Just a note: There are third party storage modules available for the Mac mini, and by most accounts the install process is not very difficult.

I do not use one though, for a few reasons: It may void the warranty, and for a desktop it’s much easier to just use a Thunderbolt 4 drive.
 
Why are we expecting higher ST for M5 max vs M5? The clock speeds are the same according to apple. The new cache hierarchy?
I'm not sure if we should expect that this time around tbh.

There's already speculation in this thread that the memory controllers are on the GPU dies, so if that is true, I expect the spread of results to be just *slightly* lower than M5 results on a Macbook Pro once everything comes in.
 
You can have an M2 socket for a NAND board without any NVME controller, as seen with the Mac Mini.
Indeed.

The thing that is most telling to me is this:

Apple's SSD architecture should be perfectly possible to do with standard raw NAND dies: controller is in SoC, encryption is in SoC, at that point there is no harm with communicating with the NAND dies the same way standard SSD controllers do it.
Yet Apple reinvented wheels and uses those proprietary NVMe but not quite interfaces instead of the standard NAND device interface. That complexity adds nothing, its only benefit is that the commoners can't bypass Apple's money making machine and upgrade or fix the storage by simply resoldering standard NAND dies, because the modified Apple dies can't be bought on open market (perhaps sourced from recycled notebooks but getting used NAND sucks due to the limited rewrite durability, and Apple does have those "must shredd devices, no parts salvaging" deals with recyclying centers for this purpose).
It's kind of crazy, because that would be no easy task and personally I would not do it due to risk. Yet Apple felt need to cut that path off.

If the secure storage model (or really, user benefit) was what Apple cared about, you would see common NAND dies soldered on the motherboard (it would still be a downside for serviceability and useful life of the machine).
 
Oh yeah. My point was that Geekerwan and a couple other Chinese microbenchers did the A19s in September and then Geekerwan did the M5 on October 21. Why didn’t he reveal the new cache hierarchy then? Did he miss it? Or did he just not do a latency graph? The video was out fast, he may have rushed.

The fact that they microbenchmarked and uploaded videos on the M5 was the basis of my doubt about it being in the M5. We should have known 4 1/2 months ago about a private L2 cache in the M5. He (and the others) really dropped the ball there.

EDIT: I knew the A19s didn’t have it. We have reasonably good die shot of both A19s. That was the other reason I doubted about the M5, because Apple had laid out the core without (at least) the private L2 for the iPhones.
He didn't reveal it because it doesn't exist!

A 1MB private L2 makes no sense gives the rest of Apple's design.
Is the SME unit going to replicate its very wide connections to all 6 of those different L2's?
Is a single pool of 1MB (total 6MB) going to perform better than a shared 12 or 16MB?

Why are you guys so eager to fall for nonsense?
Where did the claim come from? WTF knows? The world is full of clueless idiots who range from making stuff up to simply not understanding what their tech friends tell them. Chinese writing is the new English accent -- Americans think anyone using it has to be smart!
 
Indeed.

The thing that is most telling to me is this:

Apple's SSD architecture should be perfectly possible to do with standard raw NAND dies: controller is in SoC, encryption is in SoC, at that point there is no harm with communicating with the NAND dies the same way standard SSD controllers do it.
Yet Apple reinvented wheels and uses those proprietary NVMe but not quite interfaces instead of the standard NAND device interface. That complexity adds nothing, its only benefit is that the commoners can't bypass Apple's money making machine and upgrade or fix the storage by simply resoldering standard NAND dies, because the modified Apple dies can't be bought on open market (perhaps sourced from recycled notebooks but getting used NAND sucks due to the limited rewrite durability, and Apple does have those "must shredd devices, no parts salvaging" deals with recyclying centers for this purpose).
It's kind of crazy, because that would be no easy task and personally I would not do it due to risk. Yet Apple felt need to cut that path off.

If the secure storage model (or really, user benefit) was what Apple cared about, you would see common NAND dies soldered on the motherboard (it would still be a downside for serviceability and useful life of the machine).


AFAIK, nobody sells raw NAND in m.2 format or otherwise to end users. Second, there are some pretty significant "under the hood" differences in NAND from different vendors and different product generations from the same vendor. All that stuff about number of dies per deck, number of decks, size of erase blocks and so forth MATTER.

NAND chips don't "advertise" that stuff the way DIMMs do with XMP and so forth, because there is no need since NAND chips are always paired with controllers that have been preprogrammed to handle the NAND chips they're using. They must have some sort of a way for the controller to interrogate them, but it is probably giving some sort of PCI ID like identifier that the controller can match on with its list of preprogrammed settings, rather than an exhaustive list of everything the controller needs the way DIMMs do.

Theoretically it could work if Apple produced a list of manufacturer/part number/spec combos to insure people bought one of the NAND chips the controller knows how to deal with, but that's pretty clunky and un Apple like to put that onus on the consumer.

So what they'd really need would be for a market for raw NAND to appear, along with a protocol that advertised its full specs so that controllers wouldn't be forced to only work with NAND they'd been previously programmed to work with. That's probably not going to happen unless Intel or AMD decides to integrate an SSD controller on their CPU dies.
 
Normally the community would take that as proof that a new Ultra chip release was imminent.

Because in the past we didn't have craziness happening with the memory market.

But I think that's the reasonable conclusion, since it isn't as if Apple is losing money on those 512 GB Studios no matter how much the price of RAM goes up. They might be "sold out" of that config (due to customers buying them to use to create little AI clusters) and don't want to make more if the M5 Ultra Studio is around the corner and risk getting stuck with inventory they'd have to heavily discount to sell (because who is going to want a top config M3 Ultra Studio when M5 Ultra Studios arrive?)
 
Because in the past we didn't have craziness happening with the memory market.

But I think that's the reasonable conclusion, since it isn't as if Apple is losing money on those 512 GB Studios no matter how much the price of RAM goes up. They might be "sold out" of that config (due to customers buying them to use to create little AI clusters) and don't want to make more if the M5 Ultra Studio is around the corner and risk getting stuck with inventory they'd have to heavily discount to sell (because who is going to want a top config M3 Ultra Studio when M5 Ultra Studios arrive?)
The removal of the 512 GB option for the old Mac Studio coincides with the introduction of the new M5 Pro/Max MacBook Pros. Those start at 1 TB now (at a higher base price too).
 
He didn't reveal it because it doesn't exist!

A 1MB private L2 makes no sense gives the rest of Apple's design.
Is the SME unit going to replicate its very wide connections to all 6 of those different L2's?
Is a single pool of 1MB (total 6MB) going to perform better than a shared 12 or 16MB?

Why are you guys so eager to fall for nonsense?
Where did the claim come from? WTF knows? The world is full of clueless idiots who range from making stuff up to simply not understanding what their tech friends tell them. Chinese writing is the new English accent -- Americans think anyone using it has to be smart!
a die shot of the M5 series would clear it up.
 
He didn't reveal it because it doesn't exist!
yeah it does.
A 1MB private L2 makes no sense gives the rest of Apple's design.
Sure does. They want better nT scaling.
Is the SME unit going to replicate its very wide connections to all 6 of those different L2's?
SME unit isn't a unit, but a separate coprocessor core with it's own things.
Why are you guys so eager to fall for nonsense?
Where did the claim come from? WTF knows? The world is full of clueless idiots who range from making stuff up to simply not understanding what their tech friends tell them. Chinese writing is the new English accent -- Americans think anyone using it has to be smart!
meds
 
He didn't reveal it because it doesn't exist!

A 1MB private L2 makes no sense gives the rest of Apple's design.
Is the SME unit going to replicate its very wide connections to all 6 of those different L2's?
Is a single pool of 1MB (total 6MB) going to perform better than a shared 12 or 16MB?

Why are you guys so eager to fall for nonsense?
Where did the claim come from? WTF knows? The world is full of clueless idiots who range from making stuff up to simply not understanding what their tech friends tell them. Chinese writing is the new English accent -- Americans think anyone using it has to be smart!
Apple's very own article on the M5 Pro/M5 Max says this;

"The industry-leading super core was first introduced as performance cores in M5, which also adopts the super core name for all M5-based products — MacBook Air, the 14-inch MacBook Pro, iPad Pro, and Apple Vision Pro. This core is the highest-performance core design with the world’s fastest single-threaded performance, driven in part by increased front-end bandwidth, a new cache hierarchy, and enhanced branch prediction."

(emphasis mine)

 
Apple's very own article on the M5 Pro/M5 Max says this;

"The industry-leading super core was first introduced as performance cores in M5, which also adopts the super core name for all M5-based products — MacBook Air, the 14-inch MacBook Pro, iPad Pro, and Apple Vision Pro. This core is the highest-performance core design with the world’s fastest single-threaded performance, driven in part by increased front-end bandwidth, a new cache hierarchy, and enhanced branch prediction."

(emphasis mine)

MY EYES! OWWWWW.

Further results are trickling in for the M5 Max.
Looking like the average ST will be a bit over 4300. Seeing some 29500s in MT too.
 
Last edited:
Back
Top