Yesterday AMD revealed that in 2014 it would begin production of its first ARMv8 based 64-bit Opteron CPUs. At the time we didn't know what core AMD would use, however today ARM helped fill in that blank for us with two new 64-bit core announcements: the ARM Cortex-A57 and Cortex-A53.

You may have heard of ARM's Cortex-A57 under the codename Atlas, while A53 was referred to internally as Apollo. The two are 64-bit successors to the Cortex A15 and A7, respectively. Similar to their 32-bit counterparts, the A57 and A53 can be used independently or in a big.LITTLE configuration. As a recap, big.LITTLE uses a combination of big (read: power hungry, high performance) and little (read: low power, lower performance) ARM cores on a single SoC. 

By ensuring that both the big and little cores support the same ISA, the OS can dynamically swap the cores in and out of the scheduling pool depending on the workload. For example, when playing a game or browsing the web on a smartphone, a pair of A57s could be active, delivering great performance at a high power penalty. On the other hand, while just navigating through your phone's UI or checking email a pair of A53s could deliver adequate performance while saving a lot of power. A hypothetical SoC with two Cortex A57s and two Cortex A53s would still only appear to the OS as a dual-core system, but it would alternate between performance levels depending on workload.

ARM's Cortex A57

Architecturally, the Cortex A57 is much like a tweaked Cortex A15 with 64-bit support. The CPU is still a 3-wide/3-issue machine with a 15+ stage pipeline. ARM has increased the width of NEON execution units in the Cortex A57 (128-bits wide now?) as well as enabled support for IEEE-754 DP FP. There have been some other minor pipeline enhancements as well. The end result is up to a 20 - 30% increase in performance over the Cortex A15 while running 32-bit code. Running 64-bit code you'll see an additional performance advantage as the 64-bit register file is far simplified compared to the 32-bit RF.

The Cortex A57 will support configurations of up to (and beyond) 16 cores for use in server environments. Based on ARM's presentation it looks like groups of four A57 cores will share a single L2 cache.

ARM's Cortex A53

Similarly, the Cortex A53 is a tweaked version of the Cortex A7 with 64-bit support. ARM didn't provide as many details here other than to confirm that we're still looking at a simple, in-order architecture with an 8 stage pipeline. The A53 can be used in server environments as well since it's ISA compatible with the A57.

ARM claims that on the same process node (32nm) the Cortex A53 is able to deliver the same performance as a Cortex A9 but at roughly 60% of the die area. The performance claims apply to both integer and floating point workloads. ARM tells me that it simply reduced a lot of the buffering and data structure size, while more efficiently improving performance. From looking at Apple's Swift it's very obvious that a lot can be done simply by improving the memory interface of ARM's Cortex A9. It's possible that ARM addressed that shortcoming while balancing out the gains by removing other performance enhancing elements of the core.

Both CPU cores are able to run 32-bit and 64-bit ARM code, as well as a mix of both so long as the OS is 64-bit.

Completed Cortex A57 and A53 core designs will be delivered to partners (including AMD and Samsung) by the middle of next year. Silicon based on these cores should be ready by late 2013/early 2014, with production following 6 - 12 months after that. AMD claimed it would have an ARMv8 based Opteron in production in 2014, which seems possible (although aggressive) based on what ARM told me.

ARM expects the first designs to appear at 28nm and 20nm. There's an obvious path to 14nm as well.

It's interesting to note ARM's commitment to big.LITTLE as a strategy for pushing mobile SoC performance forward. I'm curious to see how the first A15/A7 designs work out. It's also good to see ARM not letting up on pushing its architectures forward.

Comments Locked


View All Comments

  • A4i - Wednesday, October 31, 2012 - link

    "Today's high end" means APQ8064 , Apple A6/A6x and Еxynos 5250. APQ8064 Soc is in LG Optimus G and Nexus 4. Apple A6/A6x is in iPhone 5 and iPad 4. Еxynos 5250 is in Nexus 10 and Chromebook 3. LG Optimus G score in Linpack benchmark is 608 MFLOPS and that is slill without NEON optimisation. NEON is a 128-bit wide SIMID, roughly twice the size of a single Krait CPU core.
  • Wilco1 - Wednesday, October 31, 2012 - link

    The 3x performance gain is over current high-end mobiles (Galaxy Note 2), not tablets or laptops - I think it will take a few months before we'll see A15 based phones.

    The penultimate slide shows shows the A15 is going to give about 2x gain, and the A57 gives another 50% again (this includes frequency increases). A 3x gain in less than 24 months is amazing. It means phones are approaching Sandy Bridge levels of performance!
  • A4i - Wednesday, October 31, 2012 - link

    Yep, 20-30% faster than A15 in 32-bit code, presumably at the same frequency.
  • Charbax - Wednesday, October 31, 2012 - link
  • blanarahul - Thursday, November 1, 2012 - link

    I can't wait to see a server with PFLOP/s level of computing power using the ARM Cortex A57. Let's see if it can break the long standing record of Blue Gene/Q 16 core Power PC in perf/watt.
  • quirksNquarks - Friday, November 2, 2012 - link

    Imagine what ARM and Licensees could do with a 77w TDP like what Intel i7s have.

    Alot of people are confused.. it seems. Scaling is not per-core IPC based. It is system based. ARM IP designs are improving on a System Level and not Manufacturing (like Intel).

    ARM already owns the mobile market - FACT.
    Intel would like to be a player in the mobile market - FACT.

    But it is far easier to make technology perform faster when given higher Thermal Design Envelopes - than to remain Fast pushing down Thermal Design Envelopes.

    ARM are not worried about the mobile space/niche product markets... THEY OWN IT.
    ARM are looking to push their IPs into territory they haven't been yet. (Server - Desktop - Laptop etc).

    Its no coincidence Intel were pissed off when Apple (awhile ago) were talking to AMD. Hence, launching the Ultrabook initiative. Its no coincidence Nvidia and Intel haven't gotten along.

    Why hasn't anyone noticed that now - the Worlds 2 largest GPU companies are ARM licensees?
    An area Intel have always been behind the curve.
    AMD - Nvidia ..lets not forget the magnitude of other tech companies already onboard..
    Apple - Calexda - Samsung - Qualcomm - etc

    Hello!! efficient processing Cores (ARM) with system TDP room for all out iGPUs ONCHIP!!!!!! memory controllers - I/O controllers etc. fiber connects new ethernet protocols. Beyond the AMD APU. Remembering ARM designs are Configurable to suit the need. Intel not so much. Intel is waht you see is what you get. Fast processing with compromise.

    16 3 ghz A57 core laptops running High-end Geforce/Quadro Radeon/FirePro (thousands of stream processors) iGPUs anyone? in Thermals fit for 13" notebooks lol.. oh baby!

    Tablets and Phones are incredible gadgets to dick around with and do light workloads.BUT will NEVER be something tangible to do anything critical on. (Small Screens hinder any workflow - regardless of light or heavy based). BUT they PAY the BILLS because are sold in such HUGE volume and why they seem to be where technology is headed. It is - but as controllers for larger environments (cloud - applications etc).

    People want Smarter more Complex Applications (beyond what is offered today) - you wont get that on a Phone. Trickle down effect will grant you some really cool stuff you can do with them ...but. on 5" inch screens? Tablets same deal... 12" tablets are very constrictive UI wise. People only have 2 hands. One has to hold the device! or why bother having a tablet at all if not mobile.

    With the announcement of ARMv8 and 64-bit --- x86 and ARM -- are on a level playing field - instruction set wise (efficiency) and access to hardware based trickery.

    Picture - lower priced Ultrabooks/Sub-Notebooks/Full size Laptops/Low Watt Desktops/HTPCs/Workstations etc with processing powers of a High Dollar Workstation/Server at 1/50th the prices and power consumption. !!


    With Billions of devices on the same Architecture (ARM instruction). The Software Engineering possibilities are Endless. From your phone to your laptop to your desktop to your TV and your modems and your vehicle entertainment systems. hello!! ONE standard!!!! hasn't that been the DREAM all along?

    Linux/ARM and their Licensees are the future in making this all happen. Harmony between the End User and their Technology Environment. it will happen - biggest question is when.

    when will Dinosaurs like Intel and Microsoft - get out of the way and let it? probably never - they make good money in not doing so.

    anyone seen my meds?

  • Creig - Friday, November 2, 2012 - link

    Somebody please kill this spammers account. He's posting the same message across multiple articles.
  • blanarahul - Friday, November 9, 2012 - link

    I am interested in how Cray and AMD are going to implement this. And how Intel is going to respond (an improved version of Knights Corner is the most likely option).
  • Achtung_BG - Thursday, November 8, 2012 - link

    cortex A53 - 2.3 DMIPS/MHz OK
    cortex A57 - 4.1 DMIPS/MHz?
  • Biscuit - Tuesday, November 13, 2012 - link

    I commend intel for the work they have done with the x86 architecture. They have done an amazing job at making such an abortion of an instruction set execute incredibly quickly. The penalty of this legacy is large die size, and ultimately higher power consumption.

    AMD should be shot through the head for x64. They had a prime chance to clean up this awful chip, if they had made a decent (set sized instruction stream) then they would have been able to make much more power efficient chips in the future when x86 support would have finally gone the way of the Dodo. But, no, they just tagged 64 bit on to the x86 instruction set and fucked up intels and their power consumption future.

    Intel will improve process, but ARM will be just a single step behind them on that. As soon as the chip fabs switch to 20nm and lower, Intels "power" advantage on process size reduction will be gone.

    ARM chips, from the get-go have been elegant and have been designed with power consumption in mind for years. Now we're getting some much higher order features such as out-of-order execution, multiple execution pipelines, etc.

    They have a good plan. The big.LITTLE concept is yet to play out, but I think it's a good path. They have two options: High performance/higher power or low performance/lower power. But the difference is that their high power modes consume less power than the lowest power mode an intel chip can do.

    The ARM64 architecture is a clean break. They've taken this chance to see what made "out of order" more difficult on the ARM32 platform and improve upon it using all the knowledge gained over the last 20 years in CPU design, since they designed the original ARM.

    The key: Power consumption, power density. In the server space, this will be key. It will lead to processor densities like we haven't seen before with catastrophic drop off in power consumption in the data centers (ah, maybe that's just my naivety showing).

    But it's a good chip, and I can't wait to get working on it.


Log in

Don't have an account? Sign up now