• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Samsung outs Exynos 9 Series 9810

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Uhhh what exactly made you come to the conclusion they're in order? They are OOO..
Oh, this quote from your article "At heart the Exynos M1 and M2 microarchitectures are based on a 4-wide in-order stage for decode and dispatch"

I took that as at least partially in order?
 
Oh, this quote from your article "At heart the Exynos M1 and M2 microarchitectures are based on a 4-wide in-order stage for decode and dispatch"

I took that as at least partially in order?
That's how all OOO processors work, the front-end is always in order whilst execution is out of order.
 
I would have liked a new a73 implementation on 10nm second gen. Seems more sensible to me outside of bragging rights.
As if people need to have an AI server farm in their pocket.
 
What is all the FP performance needed for?
I mean is this more a server cpu tacked to a a55 mobile cpu?
90% of what you do with your phone, web browsing. JavaScript is completely 64bit floating point. Apple's cores are wider than this, nobody seems to bat an eye at them..
 
Performance is dead clear. Now we have to see the efficiency difference.

https://www.anandtech.com/show/12361/samsung-exynos-m3-architecture
Nice article Andrei

Good find, the link really contains a lot of info about the M3 :^)
https://reviews.llvm.org/D42387
Two small remarks:
- The 2 complex ALU also handle simple integer stuff, so the simple integer handling is now 4 wide.
- The Branch units calculate branch addresses, the branch prediction unit is in front, it steers the fetch unit.
 
I would have liked a new a73 implementation on 10nm second gen. Seems more sensible to me outside of bragging rights.
As if people need to have an AI server farm in their pocket.

The A75 is as efficient as the A73 - e.g. at iso-performance same power. In addition the DSU only supports A75 and A55 but not A73. No need to extend the lifetime of A73.
 
I wouldn't be surprised if they boosted single core performance close to 2x in geekbench. I also wouldn't be surprised to find out that the various ARM SOC vendors are now optimising their designs around getting high geek bench scores at the expense of actual performance. They'd almost be insane not to when this single metric has somehow become the defacto benchmark of processor performance. Imagine what would inevitably happen if the GPU-buying market assessed the speed of GPUs almost entirely based on their 3DMark scores. GPUs would get very good at 3DMark in very short order.

We desperately need a wider variety of benchmarks for mobile that reliably isolate processor performance.

As much crap as people give Geekbench, Geekbench uses open source libraries in order to test performance across a variety of workloads. I would LOVE to see a CPU/SoC that performs great at Geekbench and has shit performance at other benchmarks. Zlib, Jpeg compression, etc...that's all stuff based on standard libraries.
 
Last edited:
That's nigh-impossible to pull off for them.
Really? Apple apparently did so without buying any major ip or graphics company.. whether you believe they genuinely designed their own gpu from scratch that would appear to run all software and games that was designed for imagination cores is up to you. (Without massive driver overhaul/issues)
If imagination don't bring litigation then I suppose this must be the case, no matter how suspicious it seems to myself.
Nevertheless, they still have a GPU product they call their own.

Intel did something similar, to less success.

Samsung has been rumoured to be working on GPU designs for many years, it is quite possible that after internal projects and millions of dollars that Samsung could have their own design out next year..whether it's competitive or not should it arrive is guess work, but it is possible.

They have motive...they design their own SOCs, custom processor's and fabric, their own CPUs, modems.
Then they fab them on their own process from their own foundry.
It stands to reason they would pine for their own gpu uarch...mated to either heavily skinned/adjusted android to their specifications or even Tizen?...that would give them the holy grail of top to bottom/ vertical integration of their own hardware/software...ie Apple.

Samsung has the motive,engineering talent and the finances to do it...I for sure wouldn't bet against it happening.
 
Last edited:
@Andrei. As written on RWT:
I might have been missing something but what makes Andrei and others think M3 has
6 decoders?

The IssueWidth from the machine description is not the number of decoders. For instance
the Skylake machine description also sets IssueWidth to 6 though Skylake doesn't have
6 decoders. See this.

So it's possible M3 has less than 6 decoders but has a uop cache that can issue up to
6 uops each cycle.
Did you pick that 6-decode information from IssueWidth? If so then I guess this might be wrong.
 
@Andrei. As written on RWT:

Did you pick that 6-decode information from IssueWidth? If so then I guess this might be wrong.
It's literally written there in cleartext.

// The Exynos-M1 is a traditional superscalar microprocessor with a
// 4-wide in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.

// The Exynos-M3 is an advanced superscalar microprocessor with a 6-wide
// in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.
 
It's literally written there in cleartext.

// The Exynos-M1 is a traditional superscalar microprocessor with a
// 4-wide in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.

// The Exynos-M3 is an advanced superscalar microprocessor with a 6-wide
// in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.
Ha thanks for clarifying. So I guess this means either no uop cache or a uop cache with the same width as the decode stage.

I wonder if these decoders are completely identical or if some of them only support AArch64 for instance.
 
I tthough maybe their claim about "around two-fold" single core performance would translate to ×1'8-1'95 in really but in this new marketing image on twitter they state a solid x2 performance.

6c748d7a587706d8fac2de370f1368bc.jpg
 
You need to understand that SLSI's marketing department is a department that is really small and essentially has no reason to exit as it doesn't *really* benefit the business in any way. When they do talk about any claims or marketing figures they are generally sourced from the technical teams.
 
You need to understand that SLSI's marketing department is a department that is really small and essentially has no reason to exit as it doesn't *really* benefit the business in any way. When they do talk about any claims or marketing figures they are generally sourced from the technical teams.
Andrei, Is there a reason you can think of why apple was able to get to such a massive multithreaded performance with A11..with just 2 ultra wide cores and 4 small/medium cores...Vs Samsung with improved A55s and similar width M3 cores..but more of them? (Approx 25% lower MT score projected geekbench?)

I would have though having more cores would have allowed exynos 8910 to spread the load at lower clock speeds... thereby getting higher performance per watt..
Is this a superior memory controller or fabric from apple at play..
 
Andrei, Is there a reason you can think of why apple was able to get to such a massive multithreaded performance with A11..with just 2 ultra wide cores and 4 small/medium cores...Vs Samsung with improved A55s and similar width M3 cores..but more of them? (Approx 25% lower MT score projected geekbench?)

I would have though having more cores would have allowed exynos 8910 to spread the load at lower clock speeds... thereby getting higher performance per watt..
Is this a superior memory controller or fabric from apple at play..
Didnt apple get two more small cores in a11?
Besides those small cores is probably fairly beefy vs a55 but i cant find specs we need Hans/andrei for this so more like half big and in a powerconstrained situation like mt perf. they are effective vs a very wide design?
 
Andrei, Is there a reason you can think of why apple was able to get to such a massive multithreaded performance with A11..with just 2 ultra wide cores and 4 small/medium cores...Vs Samsung with improved A55s and similar width M3 cores..but more of them? (Approx 25% lower MT score projected geekbench?)

I would have though having more cores would have allowed exynos 8910 to spread the load at lower clock speeds... thereby getting higher performance per watt..
Is this a superior memory controller or fabric from apple at play..

I'm guessing the performance per watt on the big cores is not as good as the Monsoon cores. Given that the SOCs are power limited in mutl core mode, the big cores have to be clocked way down compared to the 2.9ghz max. On the A11 the big cores can run closer to their peak performance.

If this is the case, it makes you wonder why they even bothered with 4 big cores. Perhaps it can stretch its legs more in a laptop form factor.
 
Back
Top