Instruction Sets: Alder Lake Dumps AVX-512 in a BIG Way

One of the big questions we should address here is how the P-cores and E-cores have been adapted to work inside a hybrid design. One of the critical aspects in a hybrid design is if both cores support different levels of instructions. It is possible to build a processor with an unbalanced instruction support, however that requires hardware to trap unsupported instructions and do core migration mid-execution. The simple way to get around this is to ensure that both types of cores have the same level of instruction support. This is what Intel has done in Alder Lake.

In order to get to this point, Intel had to cut down some of the features of its P-core, and improve some features on the E-core. The biggest thing that gets the cut is that Intel is losing AVX-512 support inside Alder Lake. When we say losing support, we mean that the AVX-512 is going to be physically fused off, so even if you ran the processor with the E-cores disabled at boot time, AVX-512 is still disabled.

Intel’s journey with AVX-512 has been long and fragmented. Some workloads can be vectorised – multiple bits of consecutive data all require the same operation, so you can pack them into a single register and perform it all at once with a single instruction. Designed as its third generation of vector instructions (AVX is 128-bit, AVX2 is 256-bit, AVX512 is 512-bit), AVX-512 was initially found on server processors, then mobile, and we found it in the previous version of desktop processors. At the time, Intel stated that by enabling AVX-512 on its processor line from top to bottom, it would encourage greater adoption, and they were leaning hard into this missive.

But that all changes with Alder Lake. Both desktop processors and mobile processors will now have AVX-512 disabled in all scenarios. But the silicon will still be physically present in the core, only because Intel uses the same core in its next generation server processors called Sapphire Rapids. One could argue that if the AVX-512 unit was removed from the desktop cores that they would be a lot smaller, however Intel has disagreed on this point in previous launches. What it means is that for the consumer parts we have some extra dark silicon in the design, which ultimately might help thermals, or absorb defects.

But it does mean that AVX-512 is probably dead for consumers.

Intel isn’t even supporting AVX-512 with a dual-issue AVX2 mode over multiple operations - it simply won’t work on Alder Lake. If AMD’s Zen 4 processors plan to support some form of AVX-512 as has been theorized, even as dual-issue AVX2 operations, we might be in some dystopian processor environment where AMD is the only consumer processor on the market to support AVX-512.

On the E-core side, Gracemont will be Intel’s first Atom processor to support AVX2. In testing with the previous generation Tremont Atom core, at 2.9 GHz it performed similarly to a Haswell 2.9 GHz Celeron processor, i.e. identical in non-AVX2 situations. By adding AVX2, plus fundamental performance increases, we’re told to expect ‘Skylake-like performance’ from the new E-cores. Intel also stated that both the P-core and E-core will be at ‘Haswell-level’ AVX2 support.

By enabling AVX2  on the E-cores, Intel is also integrating support for VNNI instructions for neural network calculations. In the past VNNI (and VNNI2) were built for AVX-512, however this time around Intel has done a version of AVX2-VNNI for both the P-core and E-core designs in Alder Lake. So while AVX-512 might be dead here, at least some of those AI acceleration features are carrying over, albeit in AVX2 form.

For the data center versions of these big cores, Intel does have AVX-512 support and new features for matrix extensions, which we will cover in that section.

Gracemont Microarchitecture (E-Core) Examined Conclusions: Through The Cores and The Atoms
Comments Locked

223 Comments

View All Comments

  • WaltC - Thursday, August 19, 2021 - link

    Ditto. Wake me when the CPUs ship. Until then, ZZZZ-Z-Z-zzzz. You would think that most people would have grown tired by now of seeing advance info from Intel that somehow never accurate describes the products that do ship.

    Nice to see Anandtech using Intel PR marketing description instead of describing the process node in nm--just because Intel decides that accuracy in advertising really isn't important. Every time I see "Intel's process 7" I cringe...;) indicates the extent to which Intel is rattled these days, I guess.
  • SarahKerrigan - Thursday, August 19, 2021 - link

    "The process node in nm" - which structure size should determine this? What structure's geometry in TSMC is 7nm?
  • kwohlt - Thursday, August 19, 2021 - link

    There's nothing inaccurate at all, considering "TSMC 7nm" and "Intel 10nm" and it's extensions are product names and not measurements. If the next, yet to be released node known as Intel 7 offers a 20% performance/watt improvement over Intel 10nm SuperFin, then that's an improvement metric large enough to justify lowering the number in the name, just as all the other fabs do.
  • mode_13h - Thursday, August 19, 2021 - link

    > Nice to see Anandtech using Intel PR marketing description instead of
    > describing the process node in nm

    More goes into a fab process than just density. Also, because density can be computed different ways and Intel doesn't exactly release the raw data you'd need to properly compute, they have no real choice but to report the manufacturing process as Intel has named it.

    Of course, they should always do so with a link to their article describing what's known about Intel 7.
  • DannyH246 - Thursday, August 19, 2021 - link

    Completely agree. We've had so many articles like this over the last 5 years its not even funny. Never fear though www.IntelTech.com will be here to dutifully report on it as the next best thing.
  • MetaCube - Thursday, August 19, 2021 - link

    Cringe take
  • Wereweeb - Thursday, August 19, 2021 - link

    This will finally use 10nm, so I doubt it. I'm worried about the memory bandwidth and latency tho.

    I'm hopeful that IBM manages to improve upon MRAM until it's a suitable SRAM replacement. DRAM isn't keeping up, so we need more, cheaper L3 cache as a buffer.
  • TheinsanegamerN - Friday, August 20, 2021 - link

    Well it’s a good thing that ddr5 with clock speeds in excess of double what ddr4 can offer, with promising of results triple that of ddr4, are arriving as we speak.
  • mode_13h - Saturday, August 21, 2021 - link

    DDR5 will only help with bandwidth. Every time a new DDR standard comes along, latency (measured in ns) ends up being about the same or worse.

    Bigger L3 helps with both bandwidth and latency, but at a cost (in both $ and W).
  • Spunjji - Monday, August 23, 2021 - link

    If their claims about 15% better power characteristics for 7 are true - and they're not based on some cherry-picked measurements at some unspecified mid-power-level - then they might have the headroom to maintain clocks even with the expanded structures.

    With Ice Lake having been such a flop in this regard, though - and Tiger taking as much as it gave away, depending on power level - I'm with you on waiting to see what they deliver before I get excited. That's in shipping products, too - not some tweaked trial notebook with an unlocked TDP and 100% fan speeds...

Log in

Don't have an account? Sign up now