Intel's new Atom Microarchitecture: The Tremont Core in Lakefield
by Dr. Ian Cutress on October 24, 2019 1:30 PM ESTA Wider Back End
Moving beyond the micro-op queue, Tremont has an 8 execution ports, filled from 7 reservation stations.
The only two ports using a combined reservation station are the address generator units (AGUs) - this is in stark contrast to the Core design, which in Sunny Cove uses a unified reservation for all integer and floating point calculations and three for the AGUs. The reason that Tremont uses a unified reservation station for the two AGUs, also backed by extra memory for queued micro-ops, is in order to supply both AGUs with either 2x 16-byte stores, 2x 16-byte loads, or one of each. Intel clearly expects the AGUs on Tremont to be fairly active compared to other execution ports.
On the integer side, aside from the two AGUs, Tremont has 3 ALUs, a jump port, and a store data port. Each ALU supports different functions, with one enabling shift functions and another for multiplication and division. Compared to core, these ALUs are extremely lightweight, and Intel hasn’t gone into specifics here.
On the floating point side, we are a little bit more varied – the three ports are split between two ALUs and a store port. The two ALUs have one focused on fused additions (FADD), while the other focuses on fused multiplication and division (FMUL). Both ALUs support 128-bit SIMD and 128-bit AES instructions with a 4-cycle latency, as well as single instruction SHA256 at 4-cycles. There is no 256-bit vector support here. In order to help with certain calculations, GFNI instruction support is included.
There is also a larger 1024-entry L2 TLB, supporting 1024x 4K entries, 32x 2M entries, or 8x 1G entries. This is an upgrade from the 512-entry L2 TLB in Goldmont.
New Instructions
As with any generation, Intel adds new supported instructions to either accelerate common calculations that would traditionally require lots of instructions or to add new functionality. Tremont is no different.
TITLE | |||||
AnandTech | Tremont | Goldmont Plus |
Goldmont | Airmont | Silvermont |
Process | 10+ | 14 | 14 | 14 | 22 |
Release Year | 2019 | 2017 | 2016 | 2015 | 2013 |
New Instructions | CLWB GFNI ENCLV CLDEMOTE MOVDIR* TPAUSE UMONITOR UWAIT |
SGX1 UMIP PTWRITE RDPID |
RDSEED SMAP MPX XSAVEC XSAVES CLFLUSHOPT SHA |
SSE4.1 SSE4.2 MOVBE CRC32 POPCNT CLMUL AES RDRAND PREFETCHW |
(When asked what other new instructions are supported, Intel stated to look at the published documents about future instructions. When it was pointed out that those documents weren’t exactly clear and that in the past Intel hasn’t spoken about future designs, we were not afforded additional comments.)
When we get hold of a Tremont device, we’ll do a full instruction breakdown.
101 Comments
View All Comments
Namisecond - Friday, November 1, 2019 - link
Which will be far more important for devices that run Windows.petr.koc - Friday, October 25, 2019 - link
"the enterprise side has been dealing with a clock degradation issue that ultimately leaves Atom systems built on C2000 processors unable to boot,"This is unfortunately not precise as all Atom Bay Trail processors (desktop, mobile, server) including 14nm successors manufactured up to approximately 2018 are affected with LPC circuitry degradation issue that will kill them in the end:
https://en.wikipedia.org/wiki/Silvermont#Erratum
https://en.wikipedia.org/wiki/Goldmont#Erratum
29a - Friday, October 25, 2019 - link
Ugh, I just look at your links and I have a NAS box with a J1900. I wonder what can be done to replace it?MASSAMKULABOX - Thursday, October 31, 2019 - link
Yeah, I'm amazed this didnt byte Intel in the Ass much harder, AFAIK synology and cisco were both victims and I'm sure many others. So, start by making well-tested, reliable products.. and no harm in boosting up the GFX side of things (x2 X3?). Give us desktop systems @10w and lowerBigos - Friday, October 25, 2019 - link
> (We therefore assume that a 3.0 MB L2 will be 15-way.)That is very unlikely. 3.0MB (which is 3 * 1024 * 1024) is not divisible by 15. I'm sure the 3MB L2$ will be 12-way associative.
1.5MB = 12 * 128kB
3.0MB = 12 * 256kB
4.5MB = 18 * 256kB
AntonErtl - Friday, October 25, 2019 - link
It's clear that they drop products with low-$/area when they do not have enough capacity, but AFAIK that's not the case at the moment for 10nm; on the contrary, they have 10nm capacity and not much demand for Ice Lake (because they cannot get the clock rates and efficiency competetive with the 14nm Skylake derivatives). So building Tremont-based successors for Gemini Lake (where performance is not as critical) would be a way for them to get more revenue out of their 10nm production line(s?); of course they have to design that first, and they may have failed to do so, expecting Ice Lake production to be in full swing by now.Concerning sucking performance, here are some numbers for our LaTeX benchmark http://www.complang.tuwien.ac.at/franz/latex-bench...
2.368 Intel Atom 330, 1.6GHz, 512K L2 Zotac ION A
1.052 Celeron J1900 (Silvermont) 2416MHz (Shuttle XS35V4)
0.712 Celeron J3455 (Goldmont) 2300MHz, ASRock J3455-ITX
0.540 Celeron J4105 (Goldmont+) 2500MHz
0.200 Core i7-6700K (Skylake), 4200MHz
Skylake has about a factor 1.6 better IPC than Goldmont+, and allows higher clock rates (at higher power consumption), resulting in significantly better overall performance, but whether that makes the Goldmont+ suck depends on the application.
29a - Friday, October 25, 2019 - link
Decoding video, that's what the other two Atoms I've owned sucked at.PeachNCream - Friday, October 25, 2019 - link
You keep thrashing at that, but other people that have dissimilar experiences have supported claims that run contrary to your statements. What model Atoms and under what conditions haev you had this problem? This isn't an issue for anyone else and, frankly, watching video isn't the only thing a computer does so that complaint may have no impact on the wider range of use cases beyond watching YouTube and Netflix.Jorgp2 - Friday, October 25, 2019 - link
He probably has an in order atom.Pretty much all out of order atoms have hardware decoding acceleration
GreenReaper - Saturday, October 26, 2019 - link
Or, he's trying to decode a video that isn't supported by the hardware. Like 10-bit anything until very recent. In fairness my Bobcat cores struggle with 60FPS anything, and plain Full HD MP4 decode also bogs down if you add anything but the most minimal of shader filters. But they're from ~2011.