• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Apple Silicon SoC thread

Page 489 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

Screen-Shot-2021-10-18-at-1.20.47-PM.jpg

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:


M5 Family discussion here:

 
Last edited:
Apple co-designs whole families of chips from common building blocks. Yes, some of the chips tape out first, but that doesn't mean all work is equally staggered.

But more importantly, planning for all of them starts many years before. Assume the idea you're arguing for is true, and put yourself in Apple SoC architect shoes several years ago, before any of the engineers under you have started work on A19/M5 generation P/M/E cores. You know you're going to have the team design a M core cluster which can replace the E core cluster across the whole product line with no downsides. Why wouldn't you immediately cancel development of the A19/M5 E core, and make sure the M core schedule was in line with the projected A19 tapeout date?

At the end of the day I obviously don't have insider info, but to me it's not at all obvious that Apple will just want to drop the E core. It still has a place.


Apple's primary resource for on-device inference isn't CPU cores of any type, it's the neural engine. The ANE is profoundly more efficient (both power and area) for this specialized task than general purpose latency-optimized CPU cores can ever be.
We know of at least two cases where staggered execution seems to have actually happened.

1. I believe it's the M3 where the E-core used by the Max is slightly different from that used by the base A/Pro/M. (Has 11 rather than 3 rather than 2 NEON/FP units).

2. I believe that the base M1 (and matching A) do not provide Guarded Execution (and thus SPTM)

The first seems to be an example where an updated design was ready just in time to catch one manufacturing sendoff (the M3 Max vs Pro) and Apple was willing to do that.

The second seems to be a case where a complex feature (Guarded Execution) was probably shipped in the M1 on the expectation that it probably wouldn't work 100%, but the M1 implementation could be used for testing and debugging, fast enough to get a working version available in the Pro and Max a few months later.

I think what we have seen of Apple suggests that they are somewhat more flexible (in the service of rapid advancement) than the internet seems to imagine. They seem to have enough confidence in their simulators and design abstractions to be willing to make what are superficially last minute changes to the design if the timing lines up that way.
The converse of this is that the iPhone deadline (for now anyway) *cannot* ever be missed, meaning that even if M cores are on the schedule, if we're not confident they will be ready to make September, then iPhone gets an E-core (the tried and true known quantity) whereas the flexibly-dated Pro and Max have the freedom to try to get the M-core.
 
No, axiomatically it's not in the same category. We know this because the Neo uses failed yield chips. Not only is the Neo processor not a consideration in the design of the A series cores, the processor isn't even good enough to use in an iPhone. The Neo processor to Apple is literal garbage. That's how they're hitting the price point. Their BOM lists the processor at $0.

People are misunderstanding how Apple is doing their binning. They are not binning on "failed" cores. The reports that Apple had a stockpile of ~6 million A18Ps they could use for the Neo makes it clear for anyone who missed it - there is NO possible universe in which they could have that many chips with defects in two GPU cores.

What is happening is that most of these failing chips are failing on POWER. For example, the CPU cores can't reach target frequency without bumping voltage higher than spec. You can use those "failed" chips in one of three ways: 1) accept a higher power draw 2) reduce the frequency to reduce overall demand for power 3) reduce the overall demand for power by having the chip do less. You accomplish #3 by disabling GPU cores - now that power can be used elsewhere if needed if e.g. your SLC is drawing more power than spec or you need to bump the voltage by 0.1v to the P cores to get them to run at the desired frequency.

This is likely to be part of an overall plan. Since Apple's binning for the iPhone has always been pass/fail, they've had to set the bar in a fairly conservative manner because they don't have a use for millions of chips unable to run at the designated frequency without feeding it more power than they'd like. As processes get finer parametric yield has become more of an issue so they either have to set the bar for power higher, for frequency lower, or accept a higher number of failed chips and find something to do with them.
 
Back
Top