Samsung outs Exynos 9 Series 9810

Lodix · Jan 23, 2018

It seems their performance targets were not exaggerate as some people implied.

Andrei. · Jan 23, 2018

french toast said:
https://www.anandtech.com/show/12361/samsung-exynos-m3-architecture

Wow, Samsung is getting serious, I also did not know M1& M2 are in order cores...but 6 wide at 2.9ghz...
Power consumption is going to be fruity.

Uhhh what exactly made you come to the conclusion they're in order? They are OOO..

french toast · Jan 23, 2018

Andrei. said:
Uhhh what exactly made you come to the conclusion they're in order? They are OOO..

Oh, this quote from your article "At heart the Exynos M1 and M2 microarchitectures are based on a 4-wide in-order stage for decode and dispatch"

I took that as at least partially in order?

Andrei. · Jan 23, 2018

french toast said:
Oh, this quote from your article "At heart the Exynos M1 and M2 microarchitectures are based on a 4-wide in-order stage for decode and dispatch"

I took that as at least partially in order?

That's how all OOO processors work, the front-end is always in order whilst execution is out of order.

french toast · Jan 23, 2018

Andrei. said:
That's how all OOO processors work, the front-end is always in order whilst execution is out of order.

Oh ok thanks for the correction, what's your take on M3 Vs cortex A75?

Andrei. · Jan 23, 2018

Performance is dead clear. Now we have to see the efficiency difference.

french toast · Jan 23, 2018

Andrei. said:
Performance is dead clear. Now we have to see the efficiency difference.

Indeed.
Samsung really needs a new GPU uarch, Mali is not a good choice judging by your efficiency investigation... adreno kills it.

krumme · Jan 23, 2018

Andrei. said:
Performance is dead clear. Now we have to see the efficiency difference.

What is all the FP performance needed for?
I mean is this more a server cpu tacked to a a55 mobile cpu?

krumme · Jan 23, 2018

I would have liked a new a73 implementation on 10nm second gen. Seems more sensible to me outside of bragging rights.
As if people need to have an AI server farm in their pocket.

Andrei. · Jan 23, 2018

krumme said:
What is all the FP performance needed for?
I mean is this more a server cpu tacked to a a55 mobile cpu?

90% of what you do with your phone, web browsing. JavaScript is completely 64bit floating point. Apple's cores are wider than this, nobody seems to bat an eye at them..

krumme · Jan 23, 2018

Andrei. said:
90% of what you do with your phone, web browsing. JavaScript is completely 64bit floating point. Apple's cores are wider than this, nobody seems to bat an eye at them..

Damn. I didnt knew that about java script. Okey...lol

Hans de Vries · Jan 23, 2018

Andrei. said:
Performance is dead clear. Now we have to see the efficiency difference.

https://www.anandtech.com/show/12361/samsung-exynos-m3-architecture
Nice article Andrei

Good find, the link really contains a lot of info about the M3 :^)
https://reviews.llvm.org/D42387
Two small remarks:
- The 2 complex ALU also handle simple integer stuff, so the simple integer handling is now 4 wide.
- The Branch units calculate branch addresses, the branch prediction unit is in front, it steers the fetch unit.

Thala · Jan 24, 2018

krumme said:
I would have liked a new a73 implementation on 10nm second gen. Seems more sensible to me outside of bragging rights.
As if people need to have an AI server farm in their pocket.

The A75 is as efficient as the A73 - e.g. at iso-performance same power. In addition the DSU only supports A75 and A55 but not A73. No need to extend the lifetime of A73.

eek2121 · Jan 25, 2018

HurleyBird said:
I wouldn't be surprised if they boosted single core performance close to 2x in geekbench. I also wouldn't be surprised to find out that the various ARM SOC vendors are now optimising their designs around getting high geek bench scores at the expense of actual performance. They'd almost be insane not to when this single metric has somehow become the defacto benchmark of processor performance. Imagine what would inevitably happen if the GPU-buying market assessed the speed of GPUs almost entirely based on their 3DMark scores. GPUs would get very good at 3DMark in very short order.

We desperately need a wider variety of benchmarks for mobile that reliably isolate processor performance.

As much crap as people give Geekbench, Geekbench uses open source libraries in order to test performance across a variety of workloads. I would LOVE to see a CPU/SoC that performs great at Geekbench and has shit performance at other benchmarks. Zlib, Jpeg compression, etc...that's all stuff based on standard libraries.

Yotsugi · Jan 25, 2018

french toast said:
Samsung really needs a new GPU uarch

That's nigh-impossible to pull off for them.

french toast · Jan 25, 2018

Bondrewd said:
That's nigh-impossible to pull off for them.

Really? Apple apparently did so without buying any major ip or graphics company.. whether you believe they genuinely designed their own gpu from scratch that would appear to run all software and games that was designed for imagination cores is up to you. (Without massive driver overhaul/issues)
If imagination don't bring litigation then I suppose this must be the case, no matter how suspicious it seems to myself.
Nevertheless, they still have a GPU product they call their own.

Intel did something similar, to less success.

Samsung has been rumoured to be working on GPU designs for many years, it is quite possible that after internal projects and millions of dollars that Samsung could have their own design out next year..whether it's competitive or not should it arrive is guess work, but it is possible.

They have motive...they design their own SOCs, custom processor's and fabric, their own CPUs, modems.
Then they fab them on their own process from their own foundry.
It stands to reason they would pine for their own gpu uarch...mated to either heavily skinned/adjusted android to their specifications or even Tizen?...that would give them the holy grail of top to bottom/ vertical integration of their own hardware/software...ie Apple.

Samsung has the motive,engineering talent and the finances to do it...I for sure wouldn't bet against it happening.

Qwertilot · Jan 25, 2018

Not quite guesswork - if it arrives then it'll be competitive 🙂 They seem to be far too pragmatic to forcibly use a substandard in house design.

Nothingness · Jan 25, 2018

@Andrei. As written on RWT:

I might have been missing something but what makes Andrei and others think M3 has
6 decoders?

The IssueWidth from the machine description is not the number of decoders. For instance
the Skylake machine description also sets IssueWidth to 6 though Skylake doesn't have
6 decoders. See this.

So it's possible M3 has less than 6 decoders but has a uop cache that can issue up to
6 uops each cycle.

Did you pick that 6-decode information from IssueWidth? If so then I guess this might be wrong.

Andrei. · Jan 25, 2018

Nothingness said:
@Andrei. As written on RWT:

Did you pick that 6-decode information from IssueWidth? If so then I guess this might be wrong.

It's literally written there in cleartext.

// The Exynos-M1 is a traditional superscalar microprocessor with a
// 4-wide in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.

// The Exynos-M3 is an advanced superscalar microprocessor with a 6-wide
// in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.

Nothingness · Jan 25, 2018

Andrei. said:
It's literally written there in cleartext.

// The Exynos-M1 is a traditional superscalar microprocessor with a
// 4-wide in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.

// The Exynos-M3 is an advanced superscalar microprocessor with a 6-wide
// in-order stage for decode and dispatch and a wider issue stage.
// The execution units and loads and stores are out-of-order.

Ha thanks for clarifying. So I guess this means either no uop cache or a uop cache with the same width as the decode stage.

I wonder if these decoders are completely identical or if some of them only support AArch64 for instance.

Lodix · Jan 25, 2018

I tthough maybe their claim about "around two-fold" single core performance would translate to ×1'8-1'95 in really but in this new marketing image on twitter they state a solid x2 performance.

Andrei. · Jan 25, 2018

You need to understand that SLSI's marketing department is a department that is really small and essentially has no reason to exit as it doesn't *really* benefit the business in any way. When they do talk about any claims or marketing figures they are generally sourced from the technical teams.

french toast · Jan 26, 2018

Andrei. said:
You need to understand that SLSI's marketing department is a department that is really small and essentially has no reason to exit as it doesn't *really* benefit the business in any way. When they do talk about any claims or marketing figures they are generally sourced from the technical teams.

Andrei, Is there a reason you can think of why apple was able to get to such a massive multithreaded performance with A11..with just 2 ultra wide cores and 4 small/medium cores...Vs Samsung with improved A55s and similar width M3 cores..but more of them? (Approx 25% lower MT score projected geekbench?)

I would have though having more cores would have allowed exynos 8910 to spread the load at lower clock speeds... thereby getting higher performance per watt..
Is this a superior memory controller or fabric from apple at play..

krumme · Jan 26, 2018

french toast said:
Andrei, Is there a reason you can think of why apple was able to get to such a massive multithreaded performance with A11..with just 2 ultra wide cores and 4 small/medium cores...Vs Samsung with improved A55s and similar width M3 cores..but more of them? (Approx 25% lower MT score projected geekbench?)

I would have though having more cores would have allowed exynos 8910 to spread the load at lower clock speeds... thereby getting higher performance per watt..
Is this a superior memory controller or fabric from apple at play..

Didnt apple get two more small cores in a11?
Besides those small cores is probably fairly beefy vs a55 but i cant find specs we need Hans/andrei for this so more like half big and in a powerconstrained situation like mt perf. they are effective vs a very wide design?

jt7 · Jan 26, 2018

french toast said:
Andrei, Is there a reason you can think of why apple was able to get to such a massive multithreaded performance with A11..with just 2 ultra wide cores and 4 small/medium cores...Vs Samsung with improved A55s and similar width M3 cores..but more of them? (Approx 25% lower MT score projected geekbench?)

I would have though having more cores would have allowed exynos 8910 to spread the load at lower clock speeds... thereby getting higher performance per watt..
Is this a superior memory controller or fabric from apple at play..

I'm guessing the performance per watt on the big cores is not as good as the Monsoon cores. Given that the SOCs are power limited in mutl core mode, the big cores have to be clocked way down compared to the 2.9ghz max. On the A11 the big cores can run closer to their peak performance.

If this is the case, it makes you wonder why they even bothered with 4 big cores. Perhaps it can stretch its legs more in a laptop form factor.

Samsung outs Exynos 9 Series 9810

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Junior Member