I just don't understand why ARM doesn't at least come out with a design that can match the Monsoon cores of an A11, or even the power of what will likely be the next A12 cores. It seems like ARM is eternally 2-3 steps behind Apple on this and they need to catch up.
Probably their power/efficiency constraints. They manage to get the same performance as a M3 core with a 4 wide instead of 6 wide decoder and half the power usage. The A11 cores are absolute monsters at power draw at max performance but Apple is able to tweak the hell out of the rest of the device and OS to get the battery life in check. Android OEMs don't have that much control.
And I could understand the power issues for phones, but not all ARM chips are destined for phones. Some can go into cars or gaming consoles that are always plugged in and well ventilated.
I just think they should come out with another tier ( Cortex A9X series) that can go toe-to-toe with Apple's best even if it is too power hungry for phones. Just come up with a design and see where we're at.
Using a much larger core to get modest extra performance wouldn't make sense even in less power constrained cases. Not every market is happy with just 2 huge cores, so power and area efficiency remain important. For laptops binning for frequency and adding turbo modes would make far more sense.
>Using a much larger core to get modest extra performance wouldn't make sense even in less power constrained cases.
It makes perfect sense if you don't care that your core is large, because you aren't just selling a SOC. For Qualcomm, increased die size means reduced profit. For Apple, it does not.
For instance, Apple's Cyclone core from 2013:
>With six decoders and nine ports to execution units, Cyclone is big. As I mentioned before, it's bigger than anything else that goes in a phone. Apple didn't build a Krait/Silvermont competitor, it built something much closer to Intel's big cores. At the launch of the iPhone 5s, Apple referred to the A7 as being "desktop class" - it turns out that wasn't an exaggeration.
Apple has so many built in advantages - huge RD , excellent engineering, closed system ... android manufacturers are disadvantaged to Apple inso manu ways
ARM has to build a "one size fits all" kind of solution. Unlike Apple they are not catering for a single customer with full control over every aspect of HW and SW development, and the profits associated with that.
Plus, achieving the power that the Apple cores bring doesn't come cheap. Samsung's Exynos is still lagging behind and it's not like Samsung doesn't have expertise or deep pockets.
Yeah, but when you have a big little architecture, OEMs could choose the most efficient combination to meet their needs. There needs to be a powerful single core option that's available for the ARM platform. Until ARM goes there, the rest of the ARM community will be behind Apple. Remember, not all workloads can take advantage of multiple cores. At best ARM will be approaching 2016 level Apple A series core performance.
They're at the mercy of chipmakers. The only companies that would buy such a core have already left reference designs behind. Everyone else wants small, cheap chips, so much so that we've had A53-only designs in the entire middle-and-lower range. Will anyone even use the A76? I don't know if that's guaranteed.
Read the last part of the article, it's almost guaranteed next Kirin is skipping the A75.and going directly to A76. I think Huawei is done playing catch-up.
It's so frustrating how even you people who are into SoC's already forget that Apple was basically cheating the customers with secret huge compromises, just to be able to put unbalanced and owerpowered cores in the iPhones.
Wow, I've seen some seriously bad anti-Apple comments over the last 30 years, but this is probably the best one yet. A10 and A11 are not unbalanced and not 'cheating' customers. Anyone with half a brain can the history of this advantage Apple has started with A7, which was the first 64-bit ARM-based SoC in phones. Ever since they, they've been consistently 2 generations ahead of the competition, and that gap shows no sign of closing.
The comments below this that 'at 3ghz' the (still unreleased) A76 would 'only need a 20% boost' to match last year's A11 is pretty funny. So a chip already at its thermal and power limit "only" needs to be overclocked by 20% to match a chip designed two years ago running 40% slower.
Actual device performance easily disproves your claim. Your comment isn't helpful and you would have more credibility if you at least attempted to justify the claim you made.
In reality Apple is the one behind but won't bother to explain, just remember the core count for Apple's solutions.
Anyway, A76 at 3GHz and 750mW per core would require less than a 20% boost in clocks to match Apple's A11 in Geekbench. Apple has only 2 large cores and when including the 4 small ones, the SoC throttles hard under 100% load If that is what you want, A76 should be able to deliver something close enough when configured as dual core with stupid high clocks. A SoC vendor could push A76 to 2W-3W per core instead of 0.75W and get clocks as high as possible. But maybe it's better to have more than enough perf, 4 cores and sane efficiency.
Because, crazy as it sounds, most ARM's customers don't want fast off the shelf designs, at least at first. ARM's whole business model is rather simple: they sell simple, affordable, efficient, and feature rich reference designs as a "gateway drug". Once you get hooked on their ecosystem, then they charge a lot for nontrivial customization.
This is like asking "why doesn't Apple come out with a 5GHz A12?" It's not so much that they can't as that this does not make sense for their business model.
You can SAY that you want (and would be willing to pay for) Apple level's of performance, but is that really true? God knows Android people complain all the time about how expensive Apple products are, and they MOSTLY buy the midrange phones, not the high end phones. Essential just went bust assuming that more people want to pay for high end Android phones than actually exist.
We know from TSMC that moving the A12 to their 7nm process will give Apple some significant improvements before we even consider this year's improvements to their design.
>Compared to its 10nm FinFET process, TSMC's 7nm FinFET features 1.6X logic density, ~20% speed improvement, and ~40% power reduction.
Nonsense, GB was designed to interpolate scores from worthless tasks that don't mean to real-world scenarios. Even iPhone with A11 is no match in speed and performance versus SD845 and Exynos powered Android phones today. You only use GB to compare two similar platform period. There is no way in hell A11 is faster than Skylake or Ryzen. Only ret@rded people will believe on that.
>GB was designed to interpolate scores from worthless tasks
Geekbench borrows code from popular open source projects and benchmarks that code running against the same workload on multiple platforms.
For instance, Google's open sourced code to render HTML and DOM in Chrome, Google's code to render PDFs in Chrome, the open sourced LLVM compiler Google now builds the Windows version of Chrome with.
Oh yeah rendering html, see any speed tests, SD845 phone loads faster than iPhone 8/X. SD845 exports video faster than A11. See my point, just because A9 loads one piece of the whole feature, doesn't mean it it has faster single core than even SD820 which used to have higher score than A9 using prev GB version. See the point? They rigged the scoring system so it would appear Apple SoC is much faster than even Intel or AMD processors LOL.
Comparing Apple cores to Intel for browsers gives the same results as GeekBench. My personal comparisons of Wolfram Player on iPad Pro vs MacBook Pro again confirm GB4 results. Back when we had SPEC2006 numbers for Apple cores, YET AGAIN they confirmed the GB4 results...
If you don't believe the browser results, nothing is stopping you from running something like jetstream on your own devices (and borrowing someone's iPad Pro or iPhone X, since I assume you wouldn't be caught dead owning one). https://www.browserbench.org/JetStream/
Good now load a similar app and website at the same time on iPhone X and AMD or Intel desktop and come back here. Even SD835 slaughtered the A11 on a side by side comparisons. Again the point is, they multiplied the scores on where Apple SoC is faster by nanosec and call it twice faster than competiton LOL. As if nanosec is noticeable in real life scenario =D
You are being unreasonable. Run JetStream on any of your SD8XX devices, and let parent run it on A11, and let's see. Nobody's going to trust your word here against a reasonable suggestion.
The first is to assume that ONE metric (in this case 4-wide front end) is the PRIMARY determinant of performance. Even Apple's (A11) IPC (over a wide range of code) is about maybe 2.7. This means on average less than 3 of those 6 execution units are being used per cycle. IF other parts of the core uncache could be PERFECTED on a 4 wide design so that EVERY cycle 4 instructions executed, it would clearly surpass the A11 in IPC. The problem, of course, is just how hard it is to prevent cycles where NOTHING executes and cycles where only a few (one or two) instructions execute. Reducing these are where most of the magic is --- and you won't see details of that it in an article like this; rather it's in that painstaking rooting out hundreds of small inefficiencies that the article talked about.
To give just one example -- no-one is talking about the clustered page tables. This is a very cool idea which relies on the fact that most of the time the OS page allocator allocates a number of pages contiguously in virtual AND physical space, and with the same permissions. If that is so, the same page entry can correspond to multiple contiguous pages (in the academic literature, usually up to 8). This gives you a substantial increase in TLB reach at a very minor increase in TLB bits. (I can find no info as to whether Intel does this. I SUSPECT Apple used to do this in their earlier [and probably even A11] cores. There are very recent even better ideas in the academic literature that might, perhaps, have made it to the A12 core.)
Second mistake you make is to ignore frequency. A9 ran at 1.85 GHz, A10 at 2.35 GHz. The A76 will likely run at 3GHz.
And yet, here we are with single core results at around 60% that of the A11. Taking their own numbers at face value, a 56% increase over the A73 in GB4 results in 2800.
Yes, I used a very simplistic one dimensional comparison, and there's a whole lot more to it. However, core complexity does go up almost exponentially with width, and so it does point to what ballpark they were aiming at. A76 was never going to beat the A11 per core because it was never aimed at it.
I agree with your point (ARM will release a server chip, essentially based on this core). Remember that GB4 is scaled to 4000 represents an i7-6600U (Skylake, 3.4GHz, 4MiB L3, 15W). So A76 is essentially at that level (slightly worse FP, but many server tasks will not care). To the extent that that Skylake at 3.4GHz is an acceptable Server class core, ARM could dump some large number of A76 on a die and be in the same sort of space as dearly-departed Centriq and ThunderX2.
They likely would have to beef up their NoC one way or another, and tweak the caching and memory systems, the usual server additions. But I assume they didn't put all that effort into "lowest possible latency for hypervisor activity" on the theory that hypervisor performance on smartphones is THE next big thing...
I am sure people will disagree with me because people love to argue. Andrei, I like your writing and overall thoroughness but a few critiques here. The charts you make are extremely unpleasant to look at and do not lend themselves to a quick assessment of the data.
First of all the color coded stripes in the legend for the A76 projections is not even decipherable, and the actual bars are on the chart are difficult to see. Secondly, why are you color coding them at all? Just put processor names to the left of the bars and the benchmark name above the bars.
Additionally are the bars in any particular order? If they are I certainly can't tell, they should be relative to the performance OR the efficiency.
Other constructive criticism would be that adding some additional subheadings within your articles would make it feel like a more solid piece.
Loving to argue or not, the performance vs efficiency graphs are rather unique and very clever, I have not seen any design that so clearly shows how different SOC's compare at both. Yes it takes a few mins before you can read them I am sorry that the world is so complicated. But they work very well if you just use that gray matter a bit.
Apple has the manpower and funds to spend extensively for a huge chip for they are free to do things for their own glory. QCOMM designs chip for others to use and must design for price points. They must do so efficiently and maximize yields. ARM provides base designs that others can outright use or customize, you can't really blame ARM here. NVIDIA has no modem.
I get that, but you're missing the point. Sure, the budget phones have a strict budget. An $800 Android flagship should not be tight on the SoC budget. If QCOMM sold an ultra high snapdragon that could compete with the A12 you had better believe the Galaxy S10 and Pixel 3 phones would pay to use it.
Apple has the convenience of designing to a very specific application. Qualcomm ultimately has to create something that can go into many platforms that are defined no more completely than 'high-end mobile'. That's like asking why the 3.5L V6 that's in most of Toyota's vehicles only makes 268 hp, but in the Lotus Evora 400, which uses the same engine, makes 400 hp. It's because it has been tweaked for a very specific application.
Then they should sell supercharger kits for Qualcomm chips ;-)
That is how the Evora V6 gets 400 hp compared to 280-300 hp on the latest non-turbo versions of that V6. The twin turbo version of that engine on the Lexus LS makes over 450 hp too.
That's essentially what I'm getting at. Qualcomm makes the generic version of the engine that can go into a sedan, an SUV, a coupe, and a convertible and adequately power all of them. Apple says, we're only going to make 1 model of sports car and one large luxury sedan, and because we know exactly what our constraints are on these two platforms, we can add a turbo or a supercharger, we can tweak the timing, we can put a high-flow exhaust on it, etc.
Your point would make sense if the 845 were being used in low end Androids. Since it's pretty much only being used in high end designs, all out performance should've been the goal.
None of what you're saying makes sense. I simply think QCOMM and the rest are behind Apple because they can't do as good a job as Apple. It isn't because the market doesn't exist or because they need to build flexible designs.
Oh but many of us reading this conversation think all that really makes sense, and it really is because the market doesn't exist or because they need to build flexible designs.
The car engine analogy was pretty great. It's exactly like that.
True. What Apple is good at is showing the potential for what's possible. Other's may have their reasons for not reaching it, but at least none can say it's not possible.
A lot of it comes down to power consumption. Samsung managed to get close to the A10 performance but at the cost of much higher power draw. With a 4 wide decoder instead of 6 wide, ARM is able to keep power usage in check and if their claims are to be believed, A10 performance at half the power is probably more desirable to the average consumer than A11/A12 performance at Snapdragon 810 levels of thermal throttle.
Does anyone actually use the full performance of the A11 or A12 in daily tasks? To me, it's pointless to have a power hungry and fast core just for benchmarks. Just make a slightly slower core with less power usage for quick bursts like app loading or Web page rendering, while much slower and more efficient cores handle the usual workload.
ARM's objectives is to make CPU's that go into a cluster of 4+ another 4 small ones. What Apple has does is Making bigger cores >2 times the size of an ARM core and have 1.5x the performance of the Said core. That same CPU is made for very high Power consumption at maximum load and Apple tweaks the ammount of time it stays in those high clocks. Thus its easier to make a laptop chip-tablet and phone. Because you just reuse the same CPU for all of them, maybe add a few cores to the laptop version and tweak the power settings and its relatively easy.
Forgot to mention that Apple goes for 2 core clusters, not 4. So they must have significantly better single core performance to matchup against the Conpetition.
Just to correct that you are somewhat living in the past with your numbers.
ARM no longer cares about 4-sized clusters; that was an artifact of big.LITTLE (and one of the constraints that limited that architecture's performance). The successor to big.LITTLE, brand-named dynamIQ, does not do things in blocks of 4 anymore.
Likewise Apple first released 3 CPUs in a SoC with the A8X. The A10X likewise has three CPUs. It's entirely likely (though no-one knows for sure) that the A11X (or A12X if the 11X is skipped) will have 4 large cores.
Due to ARM's licensing model, they have every incentive to push designs that cater to more cores. They have little incentive to push single threaded performance any more than necessary as this would result in few cores being licensed due to space and power constraints. I'm not fully convinced that the whole big.LITTLE (and derivative) philosophy was the best way to go either. It could be that it got close enough to what advanced power management could do with the benefit of providing a convincing case for ARM CPU designers to use double the cores or more. When Intel was still in the market, they demonstrated that a dual core chip with clock gating, power gating, power monitoring, dynamic voltage and frequency scaling, and other advanced power management features could provide superior single thread performance and comparable multithread performance in a similar power envelop to competing ARM designs with double the cores (all while burdened with the inefficient x86 decoder). Apple also had good success employing a similar philosophy until their A10 design. Though, it is not necessarily causal, it is interesting to note that they've had more trouble keeping within their thermal and power constraints on their latest A11 big.LITTLE design.
Note: I don't have any issue with asymmetric / heterogeneous CPUs. I'm just not convinced that they are adequate replacements for good power management built into the cores. DynamIQ does seem to be a push in the right direction allowing simultaneous usage of all cores, providing hooks for accelerators, and providing fine grained dynamic voltage and frequency scaling. This makes a lot of sense when you can assign tasks to processors (or accelerators) with significantly better proficiency for the task in question. Switching processors for no other reason than it is lower power, however, just sounds like the design team had no incentive to further optimize their power management on the high performance core.
Again a correction. Apple's problems with the A10 and A11 are NOT problems of power management; they are problems of CURRENT DRAW. Power management on the chips works just fine (and better than ever; high performance throttling tends to occur less with each successive generation, and it used to be possible to force reboot an iPhone if you got it hot enough, now that seems impossible because of better power management).
Current draw, on the other hand, is not something the SoCs were designed to track. And so when an aging battery is no longer able to provide max current draw (when everything on the SoC is lined up just wrong) then not enough current IS provided, and the system reboots. This is definitely a flaw in the phone as a whole, but it's a system-wide flaw, and you can imagine how it happened. The SoC was designed assuming a certain current drive because no-one thought about aging batteries, because no-one (in Apple or outside) had hit the problem before.
I expect the A12 will have the same PMU that, today, monitors temperatures everywhere to make sure they remain within bounds, ALSO tracking a variety of proxies for current draw, and will be capable of throttling performance gradually in the face of extreme current draw, just like performance is throttled gradually in the face of extreme temperature.
Different design and use philosophies. Apple's mobile chips are designed to be able to deliver short bursts of very high processing power (opening a complex webpage, switching between apps), and throttle back to Okay fast during the remainder. That requires apps and OS to be tightly controlled and behave really well - one bad app that doesn't behave and keeps driving the CPU hard for longer periods and your phone would get hot (thermal throttling) , plus your battery would run down in a jiffy. For ARM & Co on Android/Linux, it makes more sense to have smaller, less powerful cores, manage energy consumption through other means (BigLittle etc), and increase performance by increading the number of cores/threads. Basically, if you really want to upscale the performance of stock ARM designs for a laptop or similar, you could dump the "little" cores and go for an octacore or decacore BIG, so all A76 cores. Might be interesting if somebody tries it.
It's not so simple - small A55 cores seem to work better in a quad or hexacore config, whereas A75s are best left in a dual core config at most because their perf/watt is poor. No point having a phone that's crazy fast but overheats and runs out of battery quickly.
Apple's use of powerful but power-hungry cores could also affect the longevity of older phones. Older batteries might not be able to supply enough power for a big core running at full speed.
The fact that Apple is able to use an even larger and more power hungry core and a (marginally?) smaller battery should tell you that it is doable. Though, you are correct in saying it's not simple. The fact of the matter is, Apple has implemented much better power management features than ARM to allow for their cores to run at higher peak loads while needed and then being able to throttle down to lower power draw very quickly. ARM simply didn't design the A75 to do low power processing. The A75 is designed to rely on the A55 for low power processing as this provides an incentive to sell more core licenses.
Traditionally, Apple builds a big ass core and clocks it low.
It wasn't until FinFET made it to mobile chips that they started clocking higher.
>Apple has always played it conservative with clockspeeds in their CPU designs – favoring wide CPUs that don’t need to (or don’t like to) clock higher – so an increase like this is a notable event given the power costs that traditionally come with higher clockspeeds. Based on the underlying manufacturing technology this looks like Apple is cashing in their FinFET dividend, taking advantage of the reduction in operating voltages in order to ratchet up the CPU frequency. This makes a great deal of sense for Apple (architectural improvements only get harder), but at the same time given that Apple is reaching the far edge of the performance curve I suspect this may be the last time we see a 25%+ clockspeed increase in a single generation with an Apple SoC.
FFS. the issue is NOT "Older batteries might not be able to supply enough power for a big core", it is that the battery cannot supply enough CURRENT.
If you can't be bothered to understand the underlying engineering issue and why the difference between current and power matters, then your opinions on this issue are worthless.
"Does anyone actually use the full performance of the A11 or A12 in daily tasks? " Absolutely. I've updated iPhones every two years, and every update brings a substantial boost in "fluidity" and just general not having to wait. I can definitely feel the difference between my iPhone 7 and my friend's iPhone X; and I expect I will likewise feel the difference when I get my iPhone 2018 edition (whatever they are naming them this year...)
Now if you want to be a tool, you can argue "that's because Apple's software sux. Bloat, useless animations, last good version of iOS was version 4, blah blah". Whatever. MOST people find more functionality distributed throughout the dozens of little changes of each new version of the OS, and MOST people find the "texture" of the OS (colors, animations, etc) more pleasant than having some sort of text only Apple II UI, though doubtless that could run at a 10,000 fps.
So point is, yeah, you DO notice the difference on phones. Likewise on iPads. I use my iPad to read technical PDFs, and again, each two year update provides a REALLY obvious jump in how quickly complicated PDF pages render. With my very first iPad 1 there was a noticeable wait almost every page (only hidden, usually, because of page caching). By the A10X iPad Pro it's rare to encounter a PDF page that ever makes you wait, cached or not.
I've also talked about in the past about Wolfram Player, a subset of Mathematica for iPad. This allows you to interact with Mathematica "animations" (actually they're 3D interactive objects you construct that change what is displayed depending on how you move sliders or otherwise tweak parameters). These are calculating what's to be displayed (which might be something like numerically solving a partial differential equation, then displaying the result as a 3D object) in realtime as you move a slider. Now this is (for now) pretty specialized stuff. But Wolfram's goal, as they fix the various bugs in the app and implement the bits of Mathematica that don't yet work well (or at all), is for these things to be the equivalent of video today. We used to put up with explanations (in books, or newspapers) that were just words. Then we got BW diagrams. Then we got color diagrams. Then we got video. Now we have web sites like NYT and Vox putting up dynamic explainers where you can move sliders --- BUT they are limited to the (slow) performance of browsers, and are a pain to construct (both the UI, and the underlying mathematical simulation). Something like Mathematica's animations are vastly more powerful, and vastly easier to create. One day these will be as ubiquitous as video is today, just one more datatype that gets passed around. But for them to work well requires a CPU that can numerically solve PDEs in real time on your mobile device...
The benefits of having a fast single core are seen on most common operations, including UI and scrolling, etc. Moreover, Apple has demonstrated that a powerful core can in fact be more efficient in race to sleep conditions whereby it completes the work more quickly then sleeps. The overall effect is a more responsive system that is just as efficient overall.
Well let's put it this way the A73 which is two instructions wide had a no problems on 14 nm FinFET, A76 is 4 instruction wide & for a sakes of argument let's say 2x the size. So switching from 14 nm to 7nm (60% reduction on power) cower it, A76 is approximately 65% faster than A73 MHz per MHz so its able to deliver approximately the 1.8x performance per same DTP. Second part is a manufacturing process in comparison to the core size. The FinFET structure transistors leak as hell when the 2.1~2.2 GHz limit is reached disregarding of OEM, vendor/foundry. So if you employ 50% wider core's (6 instructions wide) that won't cross the 2.1~2.2 GHz limit it's not the same as if you push the limit of the 4 instructions wide one to 3GHz as the power consumption will be doubled compared to the same one operating on 2.1~2.2 GHz & in the end you lose both on theoretical true output (performance) and power consumption metric but it still costs you 33% less. In reality it's much harder to feed optimally the wider core (especially on something which is mobile OS). ARM (cowboy camp) did a great work optimising instruction latency and cache latency/true output which will both increase real instruction output per clock & help predictor without significant increase in needed resources (cost - size) & A76 is a first of it's kind (CPU ever) regarding implanted solution for this. However thing that ARM didn't deliver is a better primary work horse which could make a difference in base user experience. A55 aren't exactly the power haus regarding performance & now their is more headroom regarding power when scaled down to the 7 nm, enough for let's say A73 on slightly lower clocks to replace the A55 (A73 is 1.6x integer performance of A55 MHz/MHz so A73 @ 1.7GHz = A55 @ 2.7 GHz while switching from 14 to 7nm would make DTP of A55 to A73 the same). But A73 doesn't work on DinamIQ cluster. So there is a need for the new two instructions wide OoO core with merged architectural advancements (front end, predictor, cache, ASIMD...) as in order ones did hit the brick wall long time ago.
The things is that CPU power use might already be a small part of phone power use. The display usually being the main consumer and when the display isn't running, most likely the big core will also not be running. Saving 40% power sounds great on paper. But in real designs it will already be smaller and the total impact on phone battery life will be much, much smaller. Single-digit percentage probably depending on how much you use. The more it is idle, the less the big core efficiency matters.
I get this, but when comparing an iPhone x and say an Android flagship next to each other in pretty much every day to day task, they appear evenly matched. There are some good comparisons on YouTube. There are definitely strengths to each platform, but it's not clear cut at all
Yup, that totally has to do with hardware and not thing to do with UI optimizations and design choices = Apple sux lolol
But hey, somebody in this world needs to get their daily dose of retarded Youtube clickbait shit videos. Bonus originality points when it's Apple-hating.
Yeah, app developers make sure to keep the Android versions lighter to not let performance completely fall apart on all the many cheap 8x a53 SOC's out there... fi d a real equivalent app that does heavy stuff like render a large pdf or generate video's out of a bunch of pics and videos. I mean, really the same. Not sure if faster CPU and storage will not make a huge diff then...
Do Anandtech writers know the meaning of the word "several"? I keep seeing it said on here, when you actually mean a "couple" or a "few" because you mean 2 or 3.
In American English, it's coloquially taken as meaning more than a few (sounds like seven and his progression is correct, couple<few<several). But it's not universally understood that way. Several is perfectly understandable imo.
If you want to quibble, the phrase 'my projects' in the third to last paragraph should be 'my projections'.
Nope. Still don't agree that efficiency at max clocks is indicator of anything in a smartphone form factor. Total power draw per worked is where it's at.
This announcement shouldn't be comparable with the current 9810. I don't believe the a76 would fair well on 10nm, and definitely not in a quad core - high cache configuration. Samsung rushed the design; neither the OS/firmware nor the process were ready for that chip, but it still was competitive, despite not being to my liking. It would have kicked butt running windows on arm though.
The a76 and a12 at 7nm will be competing against the m4 at 7nm EUV. I don't want to get ahead of myself, but I have my money on Samsung. Here's to a GS10 released in one SoC configuration without holding any of its features or performance back.
What does sustainable even mean anymore? Assuming you have proper power delivery, you can sustain 5-7w tdp on a smartphone. Doesn't mean it's efficient.
I suspect you have to amend that with "with reasonable performance". If it takes all day to do something but sips negligible power to do so (thus power draw for that workload is small) isn't helpful if it's loading a web page.
Sounds like a brilliant core! If it comes as fast as projected it MAY even drive Cortex A75 based devices' price down. It is incredible that even now they are still making great performance gains. Seems to me between A72 and A73 they got the "hard to get" performance and power metrics nailed down and now are enjoying the low lying fruit "i.e. going wider, increasing memory performance, etc." which is easier to attain just with process node advantages
I thought the A73 had a different architecture than the A72 and it was slower on some tasks. The A76 is supposed to be a completely new design.
I'm happy with my old 4x A53 + 2x A72 device. I would be happier with a 6x A55 + 2x A80, if ARM could come up with a hypothetical big and wide core that's similar to Monsoon but without the ridiculous power issues.
Exactley, A72 was wider than A73 in the front and back ends, but performance was relatively the same. They got better power and about the same performance from a slimmer, leaner core...again seems to me like they went after the "hard to get" performance and power metrics
We might see higher boosts/TDPs soon. there is a significant market for gaming branded smartphones. I believe this will be larger than expected as this finally allows differentiation in a saturated market.
I like this niche as it it gives the designers more freedom for thicker phones.
Seems to be a large increase in area (based on how the very little shared) so not quite easy to keep power down. Then, what kind of clocks and TDP do they target, does power scale well above 3GHz? Or maybe that was the plan and they did not quite get there. Also, did they mention some other core targeted at server?
"Arm also had a slide demonstrating absolute peak performance at frequencies of 3.3GHz. The important thing to note here was that this scenario exceeded 5W and the performance would be reduced to get under that TDP target"
This might be the wrong interpretation of the slide, as x1.9 at 5W includes the little cores so the transition from A53 to A55.
Even if Apple moved A11 from 10nm to 7nm, and runs at 3Ghz it will still be a huge gap in performance. Let alone they will have A12 and 7nm shipping in a few months time. Compare this to A76, which I don't think will come in 2018.
So there is still roughly a 3 years gap between ARM and Apple in IPC or Single thread performance.
And why do you care about IPC, when 99.99% of all smartphone users:
-Use the phone as a gloried clock -A tool for showing off (even with the cancer "dynamic" profile on Samsung AMOLED powered devices, they don't know the "basic" calibrated profile exists) -Twitter, facebook, instagram, whatapp
Where is your need for performance? Unless you buy a phone to run antutu/geekbench all the time you pick the phone out of your pockets.
The biggest improvement in phone performance was the jump from slow/high latency EMMC to nvme-like nand (apple), UFS for samsung and the others.
Spot on. I've got a SD650 and a SD625 phone, one with A72 big cores and the other with only A53 cores, and for web browsing and chatting they're almost indistinguishable. The 625 device also has much better battery life.
Of course a faster device can accomplish a task faster and drop back to idle power effciency to aid battery life. Depends on many factors, but running at (hypothetical) 20 units of performance per second over 5 seconds (total 100) then dropping back to idle might be preferable to 10 units of performance per second over 10 seconds. Also, remember Apple’s devices do much on device, the Kinect-like FaceID for one, and unlike Google Photos where images are scanned for content in the cloud (this picture contains a bridge, and a dog) iOS devices scan their libraries on device when on charge.
That's like saying Intel shouldn't bother with performance any more because 99.99% of PCs run Facebook in the web browser, email, and Word.
(a) Apple sells delight, and part of delight in your phone is NEVER waiting. If you want to save money, buy a cheaper phone and wait, but part of Apple's value proposition is that, for the money you spend, you reduce the friction of constant short waits. (Compare, eg, how much faster the phone felt when 1st gen TouchID was replaced with the faster 2nd TouchID. Same thing now with FaceID; it works and works well. But it will feel even smoother when the current half second delay is dropped to a tenth of a second [or whatever].)
(b) Apple chips also go into iPads. And people use iPads (and sometimes iPhones) for more than you claim --- for various artistic tasks (manipulating video and photos, drawing with very fancy [ie high CPU] "brushes" and effects, creating music, etc). One of the reasons these jobs are done on iPads (and sometimes Surfaces) and not Android is because they need a decent CPU.
(c) Ambition. BECAUSE Apple has a decent CPU, they can put that CPU into their desktops. And, soon enough, also into their data centers...
I'm curious about all this because I'm an iPad user. No iPhones though. Even an old iPad Mini is smoother than top Android tablets today.
Does the CPU spike up to maximum speed quickly when loading apps or PDFs, then very quickly throttle down to minimum? I don't know how Apple make their UI so smooth while also having good battery life.
When you touch the screen, touch tracking boosts to 120hz, even though they can only run the OLED screen at 60hz.
As for PDFs, MacOS (and as a consequence iOS) uses non-computational postscript as their graphics framework ... and PDF is essentially journaled postscript (like a PICT was journaled QuickDraw).
As for throttling down: yeah, when you've completed your computationally expensive task you throttle down to save power.
Skylake latency increased to 4 probably to achieve a higher clock, but if A76 can do it in 3, then Skylake should also be able to do it (3 cycles / 4.3 GHz) = 0.70 ns. How did ARM do this?
Hilarious commenters. Apple's SoC ? Again ? I guess people need to think about how bad their Power envelope is. Their A11 gets beaten by 835 in consistency, dropping to 60% of clocks lol. And the battery killing SoC yes the battery capacity is less on iPhones. But Apple's R&D and the chips costs are very high vs the ARM. Not to forget how 845s GPU performance slaps and drowns that Custom *cough cough *Imagination* IP derived GPU core.
They rely on the Single Thread performance because of power and optimization it goes for one OS and one HW ecosystem ruled and locked by Apple only where as ARM derived designs or Qcomm are robust for supporting wider hardware pool and can even run Windows OS.
Lots of Apple hate in these comments. Which is fine, nothing wrong with having your own opinion. Performance is important to me - I edit 4K (not for business purposes) from a few Fuji mirrorless bodies quite happily on iOS - on an iPhone X and iPad Pro. My fastest desktop and notebook machines I own currently are both Core2Duo. They simply cannot do it. I’m not a typical use case. I did have a quad i7, but I sold that machine (MacBook Pro) while I could still get a stupidly high amount of money back for it used. Don’t assume that no one on mobile wants high performance ARM cores - not everyone is just using Facebook messenger taking the occasional selfie all day. Also, I remember when AMD smoked Intel at times in the past. People argued, but there was never the “you don’t need that performance” type arguments.
That's what Apple is actually doing: single TDP configurable SoC for both their phones and pads (and tops if the rumor come true).
The argument is not "you don't need that performance", but "most people don't need that performance". You are one of the few in the performance-needy pool. I know you exist, just not many, and that's what many manufacturers are aware of, so they don't take the Apple route.
But this is Anantech. We want the best. We want to push the envelope. I don’t want to read about ho hum performance at a good price. That’s for Consumer Reports or a myriad of other yawn sites.
4k editing on an iPad probably won't be using the CPU completely for processing though. There's a lot of stuff that can be passed on to faster and more efficient DSP and IP blocks. I've also run Quicksync encoding on Atom tablets running Windows, it's much faster than using the puny Atom cores directly.
Afaik, only playback will use dedicated blocks like Quicksync, editing itself, the rendering of new effects, would be heavily assisted by the GPU and partly on the CPU.
The "most people don't need that performance" argument may sound nice to say, but why do you think people buy new phones? They do it when their old phone feels slow,etc. A higher performing phone has a longer effective life span.
Using phones as an example, Android has about 85% of the market share for devices sold. Yet, when Apple and Google report their active user base, Android barely maintains a 2:1 ratio over iOS devices. Why? The majority of Android devices sold are low end devices that have a much shorter effective life span.
I kinda want to buy a new phone, but my Nexus 5X simply doesn't feel slow. So I haven't. And it must have less than half the performance of modern high-end phones.
"The branch prediction unit is what Arm calls a first in the industry in adopting a hybrid indirect predictor. "
This is somewhat misleading. The fetch unit is very interesting (and Andrei did not spend enough time praising it) but to say that it is first in the industry seems unreasonable. The idea of decoupling the stream of fetch addresses from actual I-cache access dates from a thesis in 2001. Implementations I know about include Zen and Exynos M1 (2016) and IBM z14 (2017). Apple probably got in there even earlier.
So there may be some very specific detail in how ARM is implementing this that is a first, but the overall idea has been around for 17 years.
(The reason why it's taken so long to be implemented is that, first, it needs lots of transistors to store all the predictor state and, second, it requires some rethinking of how your branch predictors are indexed and updated. Think about it. What you want is machinery that, EVERY CYCLE, when given a PC will spit out two addresses -- where the current run of straightline fetching must end, ie the next TAKEN branch target, and where the PC must be directed to when it hits the end of this basic block. And it has to do this "in isolation", without looking at the instructions that are going to be loaded from the I$ because the whole point is that this is happening decoupled from, and in advance of, access to the I$. It's not trivial to think of a set of data structure that can do that. I'm still not at all convinced my understanding of exactly how this is correct, even though I've been trying to understand it for some time now.)
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
123 Comments
Back to Article
tipoo - Thursday, May 31, 2018 - link
Still a 4-wide front end, I don't imagine it'll catch A10, maybe A9 per core then eh.wicketr - Thursday, May 31, 2018 - link
I just don't understand why ARM doesn't at least come out with a design that can match the Monsoon cores of an A11, or even the power of what will likely be the next A12 cores. It seems like ARM is eternally 2-3 steps behind Apple on this and they need to catch up.shadowx360 - Thursday, May 31, 2018 - link
Probably their power/efficiency constraints. They manage to get the same performance as a M3 core with a 4 wide instead of 6 wide decoder and half the power usage. The A11 cores are absolute monsters at power draw at max performance but Apple is able to tweak the hell out of the rest of the device and OS to get the battery life in check. Android OEMs don't have that much control.wicketr - Thursday, May 31, 2018 - link
And I could understand the power issues for phones, but not all ARM chips are destined for phones. Some can go into cars or gaming consoles that are always plugged in and well ventilated.I just think they should come out with another tier ( Cortex A9X series) that can go toe-to-toe with Apple's best even if it is too power hungry for phones. Just come up with a design and see where we're at.
Wilco1 - Thursday, May 31, 2018 - link
Using a much larger core to get modest extra performance wouldn't make sense even in less power constrained cases. Not every market is happy with just 2 huge cores, so power and area efficiency remain important. For laptops binning for frequency and adding turbo modes would make far more sense.BillBear - Friday, June 1, 2018 - link
>Using a much larger core to get modest extra performance wouldn't make sense even in less power constrained cases.It makes perfect sense if you don't care that your core is large, because you aren't just selling a SOC. For Qualcomm, increased die size means reduced profit. For Apple, it does not.
For instance, Apple's Cyclone core from 2013:
>With six decoders and nine ports to execution units, Cyclone is big. As I mentioned before, it's bigger than anything else that goes in a phone. Apple didn't build a Krait/Silvermont competitor, it built something much closer to Intel's big cores. At the launch of the iPhone 5s, Apple referred to the A7 as being "desktop class" - it turns out that wasn't an exaggeration.
https://www.anandtech.com/show/7910/apples-cyclone...
Matthmaroo - Monday, June 4, 2018 - link
Apple has so many built in advantages - huge RD , excellent engineering, closed system ... android manufacturers are disadvantaged to Apple inso manu waysclose - Tuesday, June 5, 2018 - link
ARM has to build a "one size fits all" kind of solution. Unlike Apple they are not catering for a single customer with full control over every aspect of HW and SW development, and the profits associated with that.Plus, achieving the power that the Apple cores bring doesn't come cheap. Samsung's Exynos is still lagging behind and it's not like Samsung doesn't have expertise or deep pockets.
techconc - Tuesday, June 5, 2018 - link
Yeah, but when you have a big little architecture, OEMs could choose the most efficient combination to meet their needs. There needs to be a powerful single core option that's available for the ARM platform. Until ARM goes there, the rest of the ARM community will be behind Apple. Remember, not all workloads can take advantage of multiple cores. At best ARM will be approaching 2016 level Apple A series core performance.bananaforscale - Saturday, June 9, 2018 - link
Excellent engineering? Like the bendgate, touch screen problems etc. that were *engineering screwups*?lmcd - Friday, June 1, 2018 - link
They're at the mercy of chipmakers. The only companies that would buy such a core have already left reference designs behind. Everyone else wants small, cheap chips, so much so that we've had A53-only designs in the entire middle-and-lower range. Will anyone even use the A76? I don't know if that's guaranteed.vladx - Friday, June 1, 2018 - link
Read the last part of the article, it's almost guaranteed next Kirin is skipping the A75.and going directly to A76. I think Huawei is done playing catch-up.darkich - Friday, June 1, 2018 - link
It's so frustrating how even you people who are into SoC's already forget that Apple was basically cheating the customers with secret huge compromises, just to be able to put unbalanced and owerpowered cores in the iPhones.Zoolookuk - Friday, June 1, 2018 - link
Wow, I've seen some seriously bad anti-Apple comments over the last 30 years, but this is probably the best one yet. A10 and A11 are not unbalanced and not 'cheating' customers. Anyone with half a brain can the history of this advantage Apple has started with A7, which was the first 64-bit ARM-based SoC in phones. Ever since they, they've been consistently 2 generations ahead of the competition, and that gap shows no sign of closing.The comments below this that 'at 3ghz' the (still unreleased) A76 would 'only need a 20% boost' to match last year's A11 is pretty funny. So a chip already at its thermal and power limit "only" needs to be overclocked by 20% to match a chip designed two years ago running 40% slower.
techconc - Tuesday, June 5, 2018 - link
Actual device performance easily disproves your claim. Your comment isn't helpful and you would have more credibility if you at least attempted to justify the claim you made.jjj - Friday, June 1, 2018 - link
In reality Apple is the one behind but won't bother to explain, just remember the core count for Apple's solutions.Anyway, A76 at 3GHz and 750mW per core would require less than a 20% boost in clocks to match Apple's A11 in Geekbench.
Apple has only 2 large cores and when including the 4 small ones, the SoC throttles hard under 100% load
If that is what you want, A76 should be able to deliver something close enough when configured as dual core with stupid high clocks. A SoC vendor could push A76 to 2W-3W per core instead of 0.75W and get clocks as high as possible.
But maybe it's better to have more than enough perf, 4 cores and sane efficiency.
Lolimaster - Friday, June 1, 2018 - link
Please stop using geekbench as a comparison tool specially between 2 different ARM ecosystems.jjj - Friday, June 1, 2018 - link
You have results for Apple on anything else?Elstar - Friday, June 1, 2018 - link
Because, crazy as it sounds, most ARM's customers don't want fast off the shelf designs, at least at first. ARM's whole business model is rather simple: they sell simple, affordable, efficient, and feature rich reference designs as a "gateway drug". Once you get hooked on their ecosystem, then they charge a lot for nontrivial customization.name99 - Friday, June 1, 2018 - link
This is like asking "why doesn't Apple come out with a 5GHz A12?" It's not so much that they can't as that this does not make sense for their business model.You can SAY that you want (and would be willing to pay for) Apple level's of performance, but is that really true? God knows Android people complain all the time about how expensive Apple products are, and they MOSTLY buy the midrange phones, not the high end phones. Essential just went bust assuming that more people want to pay for high end Android phones than actually exist.
leo_sk - Sunday, June 3, 2018 - link
They had many problems other than target audience. I can give one plus as a counter exampleMatthmaroo - Monday, June 4, 2018 - link
Also Apple has gigantic RD budgetsThink Intel / Qualcomm and AMD are a faction of Apple is size
Matthmaroo - Monday, June 4, 2018 - link
Edit “is” should be ApplesAnd add ARM holding to the list of cpu developers
BillBear - Friday, June 1, 2018 - link
We know from TSMC that moving the A12 to their 7nm process will give Apple some significant improvements before we even consider this year's improvements to their design.>Compared to its 10nm FinFET process, TSMC's 7nm FinFET features 1.6X logic density, ~20% speed improvement, and ~40% power reduction.
http://www.tsmc.com/english/dedicatedFoundry/techn...
joms_us - Friday, June 1, 2018 - link
Nonsense, GB was designed to interpolate scores from worthless tasks that don't mean to real-world scenarios. Even iPhone with A11 is no match in speed and performance versus SD845 and Exynos powered Android phones today. You only use GB to compare two similar platform period. There is no way in hell A11 is faster than Skylake or Ryzen. Only ret@rded people will believe on that.BillBear - Friday, June 1, 2018 - link
>GB was designed to interpolate scores from worthless tasksGeekbench borrows code from popular open source projects and benchmarks that code running against the same workload on multiple platforms.
For instance, Google's open sourced code to render HTML and DOM in Chrome, Google's code to render PDFs in Chrome, the open sourced LLVM compiler Google now builds the Windows version of Chrome with.
Hardly worthless.
joms_us - Friday, June 1, 2018 - link
Oh yeah rendering html, see any speed tests, SD845 phone loads faster than iPhone 8/X. SD845 exports video faster than A11. See my point, just because A9 loads one piece of the whole feature, doesn't mean it it has faster single core than even SD820 which used to have higher score than A9 using prev GB version. See the point? They rigged the scoring system so it would appear Apple SoC is much faster than even Intel or AMD processors LOL.name99 - Friday, June 1, 2018 - link
Comparing Apple cores to Intel for browsers gives the same results as GeekBench.My personal comparisons of Wolfram Player on iPad Pro vs MacBook Pro again confirm GB4 results.
Back when we had SPEC2006 numbers for Apple cores, YET AGAIN they confirmed the GB4 results...
If you don't believe the browser results, nothing is stopping you from running something like jetstream on your own devices (and borrowing someone's iPad Pro or iPhone X, since I assume you wouldn't be caught dead owning one).
https://www.browserbench.org/JetStream/
joms_us - Friday, June 1, 2018 - link
Good now load a similar app and website at the same time on iPhone X and AMD or Intel desktop and come back here. Even SD835 slaughtered the A11 on a side by side comparisons. Again the point is, they multiplied the scores on where Apple SoC is faster by nanosec and call it twice faster than competiton LOL. As if nanosec is noticeable in real life scenario =Dlostmsu - Tuesday, June 5, 2018 - link
You are being unreasonable. Run JetStream on any of your SD8XX devices, and let parent run it on A11, and let's see. Nobody's going to trust your word here against a reasonable suggestion.techconc - Tuesday, June 5, 2018 - link
" A11 is no match in speed and performance versus SD845 and Exynos powered Android phones today"Huh? Benchmarks do not support your claim.
name99 - Friday, June 1, 2018 - link
This assumption makes two mistakes.The first is to assume that ONE metric (in this case 4-wide front end) is the PRIMARY determinant of performance. Even Apple's (A11) IPC (over a wide range of code) is about maybe 2.7. This means on average less than 3 of those 6 execution units are being used per cycle. IF other parts of the core uncache could be PERFECTED on a 4 wide design so that EVERY cycle 4 instructions executed, it would clearly surpass the A11 in IPC.
The problem, of course, is just how hard it is to prevent cycles where NOTHING executes and cycles where only a few (one or two) instructions execute. Reducing these are where most of the magic is --- and you won't see details of that it in an article like this; rather it's in that painstaking rooting out hundreds of small inefficiencies that the article talked about.
To give just one example -- no-one is talking about the clustered page tables. This is a very cool idea which relies on the fact that most of the time the OS page allocator allocates a number of pages contiguously in virtual AND physical space, and with the same permissions. If that is so, the same page entry can correspond to multiple contiguous pages (in the academic literature, usually up to 8). This gives you a substantial increase in TLB reach at a very minor increase in TLB bits.
(I can find no info as to whether Intel does this. I SUSPECT Apple used to do this in their earlier [and probably even A11] cores. There are very recent even better ideas in the academic literature that might, perhaps, have made it to the A12 core.)
Second mistake you make is to ignore frequency. A9 ran at 1.85 GHz, A10 at 2.35 GHz. The A76 will likely run at 3GHz.
tipoo - Tuesday, September 4, 2018 - link
And yet, here we are with single core results at around 60% that of the A11. Taking their own numbers at face value, a 56% increase over the A73 in GB4 results in 2800.Yes, I used a very simplistic one dimensional comparison, and there's a whole lot more to it. However, core complexity does go up almost exponentially with width, and so it does point to what ballpark they were aiming at. A76 was never going to beat the A11 per core because it was never aimed at it.
colinisation - Thursday, May 31, 2018 - link
Hi Andrei,Is this core the one referred to as Ares on roadmaps?
Been waiting years for this one if it is.
Andrei Frumusanu - Thursday, May 31, 2018 - link
Yes in practical terms - no in actual terms. You'll likely hear more about this in the future.tuxRoller - Friday, June 1, 2018 - link
Ok, now you've incepted the idea that ARM is going to announce a dedicated server-class chip (maybe even a tease of SVE.....)name99 - Friday, June 1, 2018 - link
I agree with your point (ARM will release a server chip, essentially based on this core).Remember that GB4 is scaled to 4000 represents an i7-6600U (Skylake, 3.4GHz, 4MiB L3, 15W).
So A76 is essentially at that level (slightly worse FP, but many server tasks will not care).
To the extent that that Skylake at 3.4GHz is an acceptable Server class core, ARM could dump some large number of A76 on a die and be in the same sort of space as dearly-departed Centriq and ThunderX2.
They likely would have to beef up their NoC one way or another, and tweak the caching and memory systems, the usual server additions.
But I assume they didn't put all that effort into "lowest possible latency for hypervisor activity" on the theory that hypervisor performance on smartphones is THE next big thing...
joe_85 - Thursday, May 31, 2018 - link
I am sure people will disagree with me because people love to argue. Andrei, I like your writing and overall thoroughness but a few critiques here. The charts you make are extremely unpleasant to look at and do not lend themselves to a quick assessment of the data.First of all the color coded stripes in the legend for the A76 projections is not even decipherable, and the actual bars are on the chart are difficult to see. Secondly, why are you color coding them at all? Just put processor names to the left of the bars and the benchmark name above the bars.
Additionally are the bars in any particular order? If they are I certainly can't tell, they should be relative to the performance OR the efficiency.
Other constructive criticism would be that adding some additional subheadings within your articles would make it feel like a more solid piece.
Keep up the good work.
jospoortvliet - Wednesday, June 6, 2018 - link
Loving to argue or not, the performance vs efficiency graphs are rather unique and very clever, I have not seen any design that so clearly shows how different SOC's compare at both. Yes it takes a few mins before you can read them I am sorry that the world is so complicated. But they work very well if you just use that gray matter a bit.syxbit - Thursday, May 31, 2018 - link
As an Android user, I continue to be disappointed with QCOMM, Arm (and Nvidia for dropping out) at how far ahead Apple in single threaded perf.id4andrei - Thursday, May 31, 2018 - link
Apple has the manpower and funds to spend extensively for a huge chip for they are free to do things for their own glory. QCOMM designs chip for others to use and must design for price points. They must do so efficiently and maximize yields. ARM provides base designs that others can outright use or customize, you can't really blame ARM here. NVIDIA has no modem.syxbit - Thursday, May 31, 2018 - link
I get that, but you're missing the point. Sure, the budget phones have a strict budget.An $800 Android flagship should not be tight on the SoC budget. If QCOMM sold an ultra high snapdragon that could compete with the A12 you had better believe the Galaxy S10 and Pixel 3 phones would pay to use it.
truckasaurus - Thursday, May 31, 2018 - link
Apple has the convenience of designing to a very specific application. Qualcomm ultimately has to create something that can go into many platforms that are defined no more completely than 'high-end mobile'. That's like asking why the 3.5L V6 that's in most of Toyota's vehicles only makes 268 hp, but in the Lotus Evora 400, which uses the same engine, makes 400 hp. It's because it has been tweaked for a very specific application.serendip - Thursday, May 31, 2018 - link
Then they should sell supercharger kits for Qualcomm chips ;-)That is how the Evora V6 gets 400 hp compared to 280-300 hp on the latest non-turbo versions of that V6. The twin turbo version of that engine on the Lexus LS makes over 450 hp too.
truckasaurus - Thursday, May 31, 2018 - link
That's essentially what I'm getting at. Qualcomm makes the generic version of the engine that can go into a sedan, an SUV, a coupe, and a convertible and adequately power all of them. Apple says, we're only going to make 1 model of sports car and one large luxury sedan, and because we know exactly what our constraints are on these two platforms, we can add a turbo or a supercharger, we can tweak the timing, we can put a high-flow exhaust on it, etc.Pneumothorax - Friday, June 1, 2018 - link
Your point would make sense if the 845 were being used in low end Androids. Since it's pretty much only being used in high end designs, all out performance should've been the goal.syxbit - Thursday, May 31, 2018 - link
None of what you're saying makes sense. I simply think QCOMM and the rest are behind Apple because they can't do as good a job as Apple. It isn't because the market doesn't exist or because they need to build flexible designs.SirPerro - Friday, June 1, 2018 - link
Oh but many of us reading this conversation think all that really makes sense, and it really is because the market doesn't exist or because they need to build flexible designs.The car engine analogy was pretty great. It's exactly like that.
Threska - Thursday, May 31, 2018 - link
True. What Apple is good at is showing the potential for what's possible. Other's may have their reasons for not reaching it, but at least none can say it's not possible.shadowx360 - Thursday, May 31, 2018 - link
A lot of it comes down to power consumption. Samsung managed to get close to the A10 performance but at the cost of much higher power draw. With a 4 wide decoder instead of 6 wide, ARM is able to keep power usage in check and if their claims are to be believed, A10 performance at half the power is probably more desirable to the average consumer than A11/A12 performance at Snapdragon 810 levels of thermal throttle.serendip - Thursday, May 31, 2018 - link
Does anyone actually use the full performance of the A11 or A12 in daily tasks? To me, it's pointless to have a power hungry and fast core just for benchmarks. Just make a slightly slower core with less power usage for quick bursts like app loading or Web page rendering, while much slower and more efficient cores handle the usual workload.jOHEI - Thursday, May 31, 2018 - link
ARM's objectives is to make CPU's that go into a cluster of 4+ another 4 small ones.What Apple has does is Making bigger cores >2 times the size of an ARM core and have 1.5x the performance of the Said core. That same CPU is made for very high Power consumption at maximum load and Apple tweaks the ammount of time it stays in those high clocks. Thus its easier to make a laptop chip-tablet and phone. Because you just reuse the same CPU for all of them, maybe add a few cores to the laptop version and tweak the power settings and its relatively easy.
jOHEI - Thursday, May 31, 2018 - link
Forgot to mention that Apple goes for 2 core clusters, not 4. So they must have significantly better single core performance to matchup against the Conpetition.name99 - Friday, June 1, 2018 - link
Just to correct that you are somewhat living in the past with your numbers.ARM no longer cares about 4-sized clusters; that was an artifact of big.LITTLE (and one of the constraints that limited that architecture's performance). The successor to big.LITTLE, brand-named dynamIQ, does not do things in blocks of 4 anymore.
Likewise Apple first released 3 CPUs in a SoC with the A8X. The A10X likewise has three CPUs. It's entirely likely (though no-one knows for sure) that the A11X (or A12X if the 11X is skipped) will have 4 large cores.
BurntMyBacon - Friday, June 1, 2018 - link
Due to ARM's licensing model, they have every incentive to push designs that cater to more cores. They have little incentive to push single threaded performance any more than necessary as this would result in few cores being licensed due to space and power constraints. I'm not fully convinced that the whole big.LITTLE (and derivative) philosophy was the best way to go either. It could be that it got close enough to what advanced power management could do with the benefit of providing a convincing case for ARM CPU designers to use double the cores or more. When Intel was still in the market, they demonstrated that a dual core chip with clock gating, power gating, power monitoring, dynamic voltage and frequency scaling, and other advanced power management features could provide superior single thread performance and comparable multithread performance in a similar power envelop to competing ARM designs with double the cores (all while burdened with the inefficient x86 decoder). Apple also had good success employing a similar philosophy until their A10 design. Though, it is not necessarily causal, it is interesting to note that they've had more trouble keeping within their thermal and power constraints on their latest A11 big.LITTLE design.Note: I don't have any issue with asymmetric / heterogeneous CPUs. I'm just not convinced that they are adequate replacements for good power management built into the cores. DynamIQ does seem to be a push in the right direction allowing simultaneous usage of all cores, providing hooks for accelerators, and providing fine grained dynamic voltage and frequency scaling. This makes a lot of sense when you can assign tasks to processors (or accelerators) with significantly better proficiency for the task in question. Switching processors for no other reason than it is lower power, however, just sounds like the design team had no incentive to further optimize their power management on the high performance core.
name99 - Friday, June 1, 2018 - link
Again a correction.Apple's problems with the A10 and A11 are NOT problems of power management; they are problems of CURRENT DRAW. Power management on the chips works just fine (and better than ever; high performance throttling tends to occur less with each successive generation, and it used to be possible to force reboot an iPhone if you got it hot enough, now that seems impossible because of better power management).
Current draw, on the other hand, is not something the SoCs were designed to track. And so when an aging battery is no longer able to provide max current draw (when everything on the SoC is lined up just wrong) then not enough current IS provided, and the system reboots.
This is definitely a flaw in the phone as a whole, but it's a system-wide flaw, and you can imagine how it happened. The SoC was designed assuming a certain current drive because no-one thought about aging batteries, because no-one (in Apple or outside) had hit the problem before.
I expect the A12 will have the same PMU that, today, monitors temperatures everywhere to make sure they remain within bounds, ALSO tracking a variety of proxies for current draw, and will be capable of throttling performance gradually in the face of extreme current draw, just like performance is throttled gradually in the face of extreme temperature.
eastcoast_pete - Friday, June 1, 2018 - link
Different design and use philosophies. Apple's mobile chips are designed to be able to deliver short bursts of very high processing power (opening a complex webpage, switching between apps), and throttle back to Okay fast during the remainder. That requires apps and OS to be tightly controlled and behave really well - one bad app that doesn't behave and keeps driving the CPU hard for longer periods and your phone would get hot (thermal throttling) , plus your battery would run down in a jiffy. For ARM & Co on Android/Linux, it makes more sense to have smaller, less powerful cores, manage energy consumption through other means (BigLittle etc), and increase performance by increading the number of cores/threads. Basically, if you really want to upscale the performance of stock ARM designs for a laptop or similar, you could dump the "little" cores and go for an octacore or decacore BIG, so all A76 cores. Might be interesting if somebody tries it.serendip - Friday, June 1, 2018 - link
It's not so simple - small A55 cores seem to work better in a quad or hexacore config, whereas A75s are best left in a dual core config at most because their perf/watt is poor. No point having a phone that's crazy fast but overheats and runs out of battery quickly.Apple's use of powerful but power-hungry cores could also affect the longevity of older phones. Older batteries might not be able to supply enough power for a big core running at full speed.
BurntMyBacon - Friday, June 1, 2018 - link
The fact that Apple is able to use an even larger and more power hungry core and a (marginally?) smaller battery should tell you that it is doable. Though, you are correct in saying it's not simple. The fact of the matter is, Apple has implemented much better power management features than ARM to allow for their cores to run at higher peak loads while needed and then being able to throttle down to lower power draw very quickly. ARM simply didn't design the A75 to do low power processing. The A75 is designed to rely on the A55 for low power processing as this provides an incentive to sell more core licenses.BillBear - Friday, June 1, 2018 - link
Traditionally, Apple builds a big ass core and clocks it low.It wasn't until FinFET made it to mobile chips that they started clocking higher.
>Apple has always played it conservative with clockspeeds in their CPU designs – favoring wide CPUs that don’t need to (or don’t like to) clock higher – so an increase like this is a notable event given the power costs that traditionally come with higher clockspeeds. Based on the underlying manufacturing technology this looks like Apple is cashing in their FinFET dividend, taking advantage of the reduction in operating voltages in order to ratchet up the CPU frequency. This makes a great deal of sense for Apple (architectural improvements only get harder), but at the same time given that Apple is reaching the far edge of the performance curve I suspect this may be the last time we see a 25%+ clockspeed increase in a single generation with an Apple SoC.
https://www.anandtech.com/show/9686/the-apple-ipho...
Qualcomm has been building small cores and vendors have been clocking them (with corresponding voltage increases) high.
Remember all the Android vendors getting caught red handed changing clockspeeds when they detected benchmarks running?
name99 - Friday, June 1, 2018 - link
FFS. the issue is NOT "Older batteries might not be able to supply enough power for a big core", it is that the battery cannot supply enough CURRENT.If you can't be bothered to understand the underlying engineering issue and why the difference between current and power matters, then your opinions on this issue are worthless.
serendip - Friday, June 1, 2018 - link
Whoa, chill there buddy, I'm not an electrical engineer.name99 - Friday, June 1, 2018 - link
"Does anyone actually use the full performance of the A11 or A12 in daily tasks? "Absolutely. I've updated iPhones every two years, and every update brings a substantial boost in "fluidity" and just general not having to wait. I can definitely feel the difference between my iPhone 7 and my friend's iPhone X; and I expect I will likewise feel the difference when I get my iPhone 2018 edition (whatever they are naming them this year...)
Now if you want to be a tool, you can argue "that's because Apple's software sux. Bloat, useless animations, last good version of iOS was version 4, blah blah". Whatever.
MOST people find more functionality distributed throughout the dozens of little changes of each new version of the OS, and MOST people find the "texture" of the OS (colors, animations, etc) more pleasant than having some sort of text only Apple II UI, though doubtless that could run at a 10,000 fps.
So point is, yeah, you DO notice the difference on phones. Likewise on iPads. I use my iPad to read technical PDFs, and again, each two year update provides a REALLY obvious jump in how quickly complicated PDF pages render. With my very first iPad 1 there was a noticeable wait almost every page (only hidden, usually, because of page caching). By the A10X iPad Pro it's rare to encounter a PDF page that ever makes you wait, cached or not.
I've also talked about in the past about Wolfram Player, a subset of Mathematica for iPad. This allows you to interact with Mathematica "animations" (actually they're 3D interactive objects you construct that change what is displayed depending on how you move sliders or otherwise tweak parameters). These are calculating what's to be displayed (which might be something like numerically solving a partial differential equation, then displaying the result as a 3D object) in realtime as you move a slider.
Now this is (for now) pretty specialized stuff. But Wolfram's goal, as they fix the various bugs in the app and implement the bits of Mathematica that don't yet work well (or at all), is for these things to be the equivalent of video today. We used to put up with explanations (in books, or newspapers) that were just words. Then we got BW diagrams. Then we got color diagrams. Then we got video. Now we have web sites like NYT and Vox putting up dynamic explainers where you can move sliders --- BUT they are limited to the (slow) performance of browsers, and are a pain to construct (both the UI, and the underlying mathematical simulation). Something like Mathematica's animations are vastly more powerful, and vastly easier to create. One day these will be as ubiquitous as video is today, just one more datatype that gets passed around. But for them to work well requires a CPU that can numerically solve PDEs in real time on your mobile device...
techconc - Tuesday, June 5, 2018 - link
The benefits of having a fast single core are seen on most common operations, including UI and scrolling, etc. Moreover, Apple has demonstrated that a powerful core can in fact be more efficient in race to sleep conditions whereby it completes the work more quickly then sleeps. The overall effect is a more responsive system that is just as efficient overall.tipoo - Tuesday, September 4, 2018 - link
Sure, every time I render a webpage.ZolaIII - Friday, June 1, 2018 - link
Well let's put it this way the A73 which is two instructions wide had a no problems on 14 nm FinFET, A76 is 4 instruction wide & for a sakes of argument let's say 2x the size. So switching from 14 nm to 7nm (60% reduction on power) cower it, A76 is approximately 65% faster than A73 MHz per MHz so its able to deliver approximately the 1.8x performance per same DTP. Second part is a manufacturing process in comparison to the core size. The FinFET structure transistors leak as hell when the 2.1~2.2 GHz limit is reached disregarding of OEM, vendor/foundry. So if you employ 50% wider core's (6 instructions wide) that won't cross the 2.1~2.2 GHz limit it's not the same as if you push the limit of the 4 instructions wide one to 3GHz as the power consumption will be doubled compared to the same one operating on 2.1~2.2 GHz & in the end you lose both on theoretical true output (performance) and power consumption metric but it still costs you 33% less. In reality it's much harder to feed optimally the wider core (especially on something which is mobile OS). ARM (cowboy camp) did a great work optimising instruction latency and cache latency/true output which will both increase real instruction output per clock & help predictor without significant increase in needed resources (cost - size) & A76 is a first of it's kind (CPU ever) regarding implanted solution for this. However thing that ARM didn't deliver is a better primary work horse which could make a difference in base user experience. A55 aren't exactly the power haus regarding performance & now their is more headroom regarding power when scaled down to the 7 nm, enough for let's say A73 on slightly lower clocks to replace the A55 (A73 is 1.6x integer performance of A55 MHz/MHz so A73 @ 1.7GHz = A55 @ 2.7 GHz while switching from 14 to 7nm would make DTP of A55 to A73 the same). But A73 doesn't work on DinamIQ cluster. So there is a need for the new two instructions wide OoO core with merged architectural advancements (front end, predictor, cache, ASIMD...) as in order ones did hit the brick wall long time ago.vladx - Friday, June 1, 2018 - link
> So switching from 14 nm to 7nm (60% reduction on power)That might've been true if both 14nm and 7nm fab processes were actually the real deal. But alas, they are not.
ZolaIII - Saturday, June 2, 2018 - link
Based on the TSMC projections 60% power reduction.beginner99 - Monday, June 4, 2018 - link
The things is that CPU power use might already be a small part of phone power use. The display usually being the main consumer and when the display isn't running, most likely the big core will also not be running. Saving 40% power sounds great on paper. But in real designs it will already be smaller and the total impact on phone battery life will be much, much smaller. Single-digit percentage probably depending on how much you use. The more it is idle, the less the big core efficiency matters.Dazedconfused - Thursday, May 31, 2018 - link
I get this, but when comparing an iPhone x and say an Android flagship next to each other in pretty much every day to day task, they appear evenly matched. There are some good comparisons on YouTube. There are definitely strengths to each platform, but it's not clear cut at allvladx - Friday, June 1, 2018 - link
That's weird, in most speed tests I've seen Android flagships beat iPhones in all but graphics-intensive apps like games and video editing.varase - Saturday, June 2, 2018 - link
Those tests are usually serially launching a bunch of apps, then round-robining.It's probably more a demonstration that Android phones have more RAM.
StrangerGuy - Sunday, June 3, 2018 - link
Yup, that totally has to do with hardware and not thing to do with UI optimizations and design choices = Apple sux lololBut hey, somebody in this world needs to get their daily dose of retarded Youtube clickbait shit videos. Bonus originality points when it's Apple-hating.
jospoortvliet - Wednesday, June 6, 2018 - link
Yeah, app developers make sure to keep the Android versions lighter to not let performance completely fall apart on all the many cheap 8x a53 SOC's out there... fi d a real equivalent app that does heavy stuff like render a large pdf or generate video's out of a bunch of pics and videos. I mean, really the same. Not sure if faster CPU and storage will not make a huge diff then...B3an - Thursday, May 31, 2018 - link
Do Anandtech writers know the meaning of the word "several"? I keep seeing it said on here, when you actually mean a "couple" or a "few" because you mean 2 or 3.StormyParis - Friday, June 1, 2018 - link
"several" means "more than one" so I'm not sure who's having a vocabulary issue. Or rather, I am.nico_mach - Friday, June 1, 2018 - link
In American English, it's coloquially taken as meaning more than a few (sounds like seven and his progression is correct, couple<few<several). But it's not universally understood that way. Several is perfectly understandable imo.If you want to quibble, the phrase 'my projects' in the third to last paragraph should be 'my projections'.
Gunbuster - Thursday, May 31, 2018 - link
Someone forget to update the PowerPoint deck full of "Laptop class performance" after the big nothing burger that is win 10 on ARM?nico_mach - Friday, June 1, 2018 - link
That was such a disappointing set of products. Just the price of them for the performance and spec.lilmoe - Thursday, May 31, 2018 - link
Nope. Still don't agree that efficiency at max clocks is indicator of anything in a smartphone form factor. Total power draw per worked is where it's at.This announcement shouldn't be comparable with the current 9810. I don't believe the a76 would fair well on 10nm, and definitely not in a quad core - high cache configuration. Samsung rushed the design; neither the OS/firmware nor the process were ready for that chip, but it still was competitive, despite not being to my liking. It would have kicked butt running windows on arm though.
The a76 and a12 at 7nm will be competing against the m4 at 7nm EUV. I don't want to get ahead of myself, but I have my money on Samsung. Here's to a GS10 released in one SoC configuration without holding any of its features or performance back.
lilmoe - Thursday, May 31, 2018 - link
Total power draw per workload is where it's at.Wardrive86 - Thursday, May 31, 2018 - link
Sustained performance is where it is at.lilmoe - Friday, June 1, 2018 - link
What does sustainable even mean anymore? Assuming you have proper power delivery, you can sustain 5-7w tdp on a smartphone. Doesn't mean it's efficient.Total power draw...... over time. All day.
erple2 - Friday, June 1, 2018 - link
I suspect you have to amend that with "with reasonable performance". If it takes all day to do something but sips negligible power to do so (thus power draw for that workload is small) isn't helpful if it's loading a web page.Wardrive86 - Thursday, May 31, 2018 - link
Sounds like a brilliant core! If it comes as fast as projected it MAY even drive Cortex A75 based devices' price down. It is incredible that even now they are still making great performance gains. Seems to me between A72 and A73 they got the "hard to get" performance and power metrics nailed down and now are enjoying the low lying fruit "i.e. going wider, increasing memory performance, etc." which is easier to attain just with process node advantagesserendip - Friday, June 1, 2018 - link
I thought the A73 had a different architecture than the A72 and it was slower on some tasks. The A76 is supposed to be a completely new design.I'm happy with my old 4x A53 + 2x A72 device. I would be happier with a 6x A55 + 2x A80, if ARM could come up with a hypothetical big and wide core that's similar to Monsoon but without the ridiculous power issues.
Wardrive86 - Friday, June 1, 2018 - link
Exactley, A72 was wider than A73 in the front and back ends, but performance was relatively the same. They got better power and about the same performance from a slimmer, leaner core...again seems to me like they went after the "hard to get" performance and power metricsvladx - Friday, June 1, 2018 - link
Sustained performance is loads better on A73 versus the A72.Wardrive86 - Saturday, June 2, 2018 - link
Agreed! Paid dividends on the A75 and surely the A76. Now they can enjoy the fruit of their laborporcupineLTD - Sunday, June 3, 2018 - link
This article seems to show exactly the opposite: https://www.anandtech.com/show/11088/hisilicon-kir...vladx - Wednesday, June 6, 2018 - link
@porcupineLTD: You clearly don't understand the meaning of "sustained performance".Wardrive86 - Thursday, May 31, 2018 - link
Dual 128 bit NEON simds, with an FMA is this the first "theorectical" 16 GFlop/clock ARM CPU?Wardrive86 - Thursday, May 31, 2018 - link
*Theoretical obviouslyStormyParis - Friday, June 1, 2018 - link
Tell me more about that theorectal stuff ;-pWardrive86 - Friday, June 1, 2018 - link
Well beyond the scope of this article ;)Wardrive86 - Thursday, June 28, 2018 - link
NEON is a 128 bit SIMD structured into 2 x 64bit execution ALUs(Pipelines).Read like they have doubled the width of the ALUs (2 x 128bit)zodiacfml - Friday, June 1, 2018 - link
We might see higher boosts/TDPs soon. there is a significant market for gaming branded smartphones. I believe this will be larger than expected as this finally allows differentiation in a saturated market.I like this niche as it it gives the designers more freedom for thicker phones.
jjj - Friday, June 1, 2018 - link
Would be better if they would just tell us the area, right now it sounds like the core+L1 might be towards 2mm2 on 7nm so rather massive.jjj - Friday, June 1, 2018 - link
Seems to be a large increase in area (based on how the very little shared) so not quite easy to keep power down.Then, what kind of clocks and TDP do they target, does power scale well above 3GHz? Or maybe that was the plan and they did not quite get there.
Also, did they mention some other core targeted at server?
jjj - Friday, June 1, 2018 - link
"Arm also had a slide demonstrating absolute peak performance at frequencies of 3.3GHz. The important thing to note here was that this scenario exceeded 5W and the performance would be reduced to get under that TDP target"This might be the wrong interpretation of the slide, as x1.9 at 5W includes the little cores so the transition from A53 to A55.
iwod - Friday, June 1, 2018 - link
Even if Apple moved A11 from 10nm to 7nm, and runs at 3Ghz it will still be a huge gap in performance. Let alone they will have A12 and 7nm shipping in a few months time. Compare this to A76, which I don't think will come in 2018.So there is still roughly a 3 years gap between ARM and Apple in IPC or Single thread performance.
Lolimaster - Friday, June 1, 2018 - link
And why do you care about IPC, when 99.99% of all smartphone users:-Use the phone as a gloried clock
-A tool for showing off (even with the cancer "dynamic" profile on Samsung AMOLED powered devices, they don't know the "basic" calibrated profile exists)
-Twitter, facebook, instagram, whatapp
Where is your need for performance? Unless you buy a phone to run antutu/geekbench all the time you pick the phone out of your pockets.
The biggest improvement in phone performance was the jump from slow/high latency EMMC to nvme-like nand (apple), UFS for samsung and the others.
serendip - Friday, June 1, 2018 - link
Spot on. I've got a SD650 and a SD625 phone, one with A72 big cores and the other with only A53 cores, and for web browsing and chatting they're almost indistinguishable. The 625 device also has much better battery life.darwiniandude - Friday, June 1, 2018 - link
Of course a faster device can accomplish a task faster and drop back to idle power effciency to aid battery life. Depends on many factors, but running at (hypothetical) 20 units of performance per second over 5 seconds (total 100) then dropping back to idle might be preferable to 10 units of performance per second over 10 seconds.Also, remember Apple’s devices do much on device, the Kinect-like FaceID for one, and unlike Google Photos where images are scanned for content in the cloud (this picture contains a bridge, and a dog) iOS devices scan their libraries on device when on charge.
name99 - Friday, June 1, 2018 - link
That's like saying Intel shouldn't bother with performance any more because 99.99% of PCs run Facebook in the web browser, email, and Word.(a) Apple sells delight, and part of delight in your phone is NEVER waiting. If you want to save money, buy a cheaper phone and wait, but part of Apple's value proposition is that, for the money you spend, you reduce the friction of constant short waits. (Compare, eg, how much faster the phone felt when 1st gen TouchID was replaced with the faster 2nd TouchID. Same thing now with FaceID; it works and works well. But it will feel even smoother when the current half second delay is dropped to a tenth of a second [or whatever].)
(b) Apple chips also go into iPads. And people use iPads (and sometimes iPhones) for more than you claim --- for various artistic tasks (manipulating video and photos, drawing with very fancy [ie high CPU] "brushes" and effects, creating music, etc). One of the reasons these jobs are done on iPads (and sometimes Surfaces) and not Android is because they need a decent CPU.
(c) Ambition. BECAUSE Apple has a decent CPU, they can put that CPU into their desktops. And, soon enough, also into their data centers...
serendip - Friday, June 1, 2018 - link
I'm curious about all this because I'm an iPad user. No iPhones though. Even an old iPad Mini is smoother than top Android tablets today.Does the CPU spike up to maximum speed quickly when loading apps or PDFs, then very quickly throttle down to minimum? I don't know how Apple make their UI so smooth while also having good battery life.
varase - Saturday, June 2, 2018 - link
Smooth is the iPhone X.When you touch the screen, touch tracking boosts to 120hz, even though they can only run the OLED screen at 60hz.
As for PDFs, MacOS (and as a consequence iOS) uses non-computational postscript as their graphics framework ... and PDF is essentially journaled postscript (like a PICT was journaled QuickDraw).
As for throttling down: yeah, when you've completed your computationally expensive task you throttle down to save power.
YaleZhang - Friday, June 1, 2018 - link
Reducing latency of floating point instructions from 3 cycles to 2 seems quite an accomplishment. For Intel, it's been >= 3 cycles (http://www.agner.org/optimize/instruction_tables.p...Skylake: 4 cycles / 4.3 GHz = 0.93 ns
A76: 2 cycles / 3 GHz = 0.66 ns
Skylake latency increased to 4 probably to achieve a higher clock, but if A76 can do it in 3, then Skylake should also be able to do it (3 cycles / 4.3 GHz) = 0.70 ns.
How did ARM do this?
tipoo - Tuesday, September 4, 2018 - link
Lower max clocks, shorter pipeline maybe?Quantumz0d - Friday, June 1, 2018 - link
Hilarious commenters. Apple's SoC ? Again ? I guess people need to think about how bad their Power envelope is. Their A11 gets beaten by 835 in consistency, dropping to 60% of clocks lol. And the battery killing SoC yes the battery capacity is less on iPhones. But Apple's R&D and the chips costs are very high vs the ARM. Not to forget how 845s GPU performance slaps and drowns that Custom *cough cough *Imagination* IP derived GPU core.They rely on the Single Thread performance because of power and optimization it goes for one OS and one HW ecosystem ruled and locked by Apple only where as ARM derived designs or Qcomm are robust for supporting wider hardware pool and can even run Windows OS.
darwiniandude - Friday, June 1, 2018 - link
Lots of Apple hate in these comments. Which is fine, nothing wrong with having your own opinion. Performance is important to me - I edit 4K (not for business purposes) from a few Fuji mirrorless bodies quite happily on iOS - on an iPhone X and iPad Pro. My fastest desktop and notebook machines I own currently are both Core2Duo. They simply cannot do it. I’m not a typical use case. I did have a quad i7, but I sold that machine (MacBook Pro) while I could still get a stupidly high amount of money back for it used. Don’t assume that no one on mobile wants high performance ARM cores - not everyone is just using Facebook messenger taking the occasional selfie all day.Also, I remember when AMD smoked Intel at times in the past. People argued, but there was never the “you don’t need that performance” type arguments.
leledumbo - Friday, June 1, 2018 - link
That's what Apple is actually doing: single TDP configurable SoC for both their phones and pads (and tops if the rumor come true).The argument is not "you don't need that performance", but "most people don't need that performance". You are one of the few in the performance-needy pool. I know you exist, just not many, and that's what many manufacturers are aware of, so they don't take the Apple route.
hlovatt - Friday, June 1, 2018 - link
But this is Anantech. We want the best. We want to push the envelope. I don’t want to read about ho hum performance at a good price. That’s for Consumer Reports or a myriad of other yawn sites.serendip - Friday, June 1, 2018 - link
4k editing on an iPad probably won't be using the CPU completely for processing though. There's a lot of stuff that can be passed on to faster and more efficient DSP and IP blocks. I've also run Quicksync encoding on Atom tablets running Windows, it's much faster than using the puny Atom cores directly.darwiniandude - Saturday, June 2, 2018 - link
https://forums.luma-touch.com/viewtopic.php?t=6493 Some discussion of performance on iPad at 4K. It really does work very well. Must be using the GPU also.tipoo - Tuesday, September 4, 2018 - link
Afaik, only playback will use dedicated blocks like Quicksync, editing itself, the rendering of new effects, would be heavily assisted by the GPU and partly on the CPU.techconc - Tuesday, June 5, 2018 - link
The "most people don't need that performance" argument may sound nice to say, but why do you think people buy new phones? They do it when their old phone feels slow,etc. A higher performing phone has a longer effective life span.Using phones as an example, Android has about 85% of the market share for devices sold. Yet, when Apple and Google report their active user base, Android barely maintains a 2:1 ratio over iOS devices. Why? The majority of Android devices sold are low end devices that have a much shorter effective life span.
Meteor2 - Monday, July 2, 2018 - link
I kinda want to buy a new phone, but my Nexus 5X simply doesn't feel slow. So I haven't. And it must have less than half the performance of modern high-end phones.Maxiking - Sunday, June 3, 2018 - link
So another paper dragon, YAY.They promise the same every year, so statistically, if they keep repeating the lie every year, they will get there eventually!
Herkko - Wednesday, June 6, 2018 - link
Tell me how much Nintendo Switch power and energy effiency grow if chance old ARM-cortex A57-A53 new ARM-cortex A76 CPUjospoortvliet - Wednesday, June 6, 2018 - link
Twice as fast at half power should not be hard. Of course the process has changed since those chips were baked, it isn't all in architecture.tipoo - Tuesday, September 4, 2018 - link
Yeah, on 7nm they should easily be able to make portable mode do what docked mode did, and add a new higher performance docked mode. Easy transition.name99 - Friday, December 18, 2020 - link
"The branch prediction unit is what Arm calls a first in the industry in adopting a hybrid indirect predictor. "This is somewhat misleading. The fetch unit is very interesting (and Andrei did not spend enough time praising it) but to say that it is first in the industry seems unreasonable.
The idea of decoupling the stream of fetch addresses from actual I-cache access dates from a thesis in 2001. Implementations I know about include Zen and Exynos M1 (2016) and IBM z14 (2017). Apple probably got in there even earlier.
So there may be some very specific detail in how ARM is implementing this that is a first, but the overall idea has been around for 17 years.
(The reason why it's taken so long to be implemented is that, first, it needs lots of transistors to store all the predictor state and, second, it requires some rethinking of how your branch predictors are indexed and updated. Think about it. What you want is machinery that, EVERY CYCLE, when given a PC will spit out two addresses -- where the current run of straightline fetching must end, ie the next TAKEN branch target, and where the PC must be directed to when it hits the end of this basic block. And it has to do this "in isolation", without looking at the instructions that are going to be loaded from the I$ because the whole point is that this is happening decoupled from, and in advance of, access to the I$. It's not trivial to think of a set of data structure that can do that. I'm still not at all convinced my understanding of exactly how this is correct, even though I've been trying to understand it for some time now.)