A Broadwell Retrospective Review in 2020: Is eDRAM Still Worth It?

Name: A Broadwell Retrospective Review in 2020: Is eDRAM Still Worth It?
Item: A Broadwell Retrospective Review in 2020: Is eDRAM Still Worth It?
Author: Dr. Ian Cutress

by Dr. Ian Cutress on November 2, 2020 11:00 AM EST

120 Comments | Add A Comment

120 Comments

Intel’s first foray into 14nm was with its Broadwell product portfolio. It launched into the mobile market with a variety of products, however the desktop offering in 2015 was extremely limited - only two socketed desktop processors ever made it to retail, and in limited quantities. This is despite users waiting for a strong 14nm update to Haswell, but also because of the way Intel built the chip. Alongside the processor was 128 MB of eDRAM, a sort of additional cache between the CPU and the main memory. It caused quite a stir, and we’re retesting the hardware in 2020 to see if the concept of eDRAM is still worth the effort.

eDRAM: The Savior

In recent years, Intel has pushed hard its infamous ‘Pyramid of Optane’, designed to showcase the tradeoff between small amounts of cache memory close to the CPU being low latency, out to the large offline storage offered for at a significant ping time. When a processor requires data and instructions, it navigates this hierarchy, with the goal to have as much of what is required as close to the CPU (and therefore as fast) as possible.

Traditional modern x86 processors contain three levels of caches, each growing in size and latency, before reaching main memory, and then out to storage. What eDRAM does is add a fourth layer between the last L3 cache on the processor. Whereas the L3 is measured in single digit megabytes, the eDRAM is in the 10s-100s of megabytes, and DRAM measures in gigabytes. Whereas the L3 cache is located on the processor die and low latency, the eDRAM is slightly higher latency, and the main memory is on modules outside the processor socket at the highest latency. Intel enabled an ‘eDRAM’ layer as a separate piece of silicon with the processor package, up to 128 MiB, offering latency and bandwidth between the L3 and main memory.

This piece of silicon was built on Intel’s 22nm IO manufacturing process, rather than 22nm SoC or 14nm, due to Intel’s ability to drive higher 22nm frequencies at the time.

By keeping the eDRAM as a separate piece of silicon, it allowed Intel to adjust stock levels based on demand – if the product failed, there would still be plenty of smaller CPU die for packaging. Even today, processors made with extra eDRAM use the same die as seen back in 2013-2015, showing the longevity of the product. The first eDRAM products were mobile under the 22nm Haswell microarchitecture, but Broadwell saw it come to desktop.

On the Broadwell processors, this resulted in a memory access layer with the following performance:

Broadwell Cache Structure
AnandTech	Size	Type	Latency	Bandwidth
L1 Cache	32 KiB / core	Private	4-cycle	880 GiB/s
L2 Cache	256 KiB / core	Private	12-cycle	350 GiB/s
L3 Cache	6 MiB	Shared	26-50 cycle	175 GiB/s
eDRAM	128 MiB	Shared	< 150 cycle	50 GiB/s
DDR3-1600	Up to 16 GiB	Shared	200+ cycle	25.6 GiB/s

The simplistic view of this eDRAM was as a ‘level 4’ cache layer – this is ultimately how it was described to us at the time, with the eDRAM layer acting as a victim cache accepting L3 evictions but enabled through a shadow tag system accessed through the L3. Data needed from the eDRAM would have to be moved back into L3 before going anywhere else, including the graphics or the other IO or main memory. In order to do this, these shadow tags required approximately 0.5 MiB/core of the L3 cache, reducing the L3 usefulness in exchange for lower latency extending out to 128 MiB. This is why Broadwell only had 1.5 MiB/core of L3 cache, rather than the full 2.0 MiB/core that the die shot suggested it should have.

Haswell/Broadwell eDRAM Layout

The eDRAM could be dynamically split on the fly for CPU or GPU requests, allowing it to be used in CPU-only mode when the integrated graphics are not in use, or full for the GPU when texture caching is required. The interface was described to us at the time as a narrow double-pumped serial interface capable of 50 GiB/s bi-directional bandwidth (100 GiB/s aggregate), running at a peak 1.6 GHz.

In this configuration, in combination with the graphics drivers, allowed for more granular control of the eDRAM, suggesting that the system could pull from both the eDRAM and the DDR memory simultaneously, potentially giving a peak memory bandwidth of 75.6 GiB/s, at a time when mid-range graphics cards such as the GT650M had a bandwidth around 80 GiB/s.

The second generation of the eDRAM design, as found in Skylake and future processors, moved the eDRAM out of the purview of the L3 cache, and enabled it as a purely transparent buffer between the system agent and the main DRAM memory controller, making it invisible to CPU/GPU accesses or IO accesses. This allows the cache to be accessed by all DRAM requests, enabling full coherency (although the drivers still allow it to be bypassed for textures larger than the eDRAM size), as well as removing the 0.5 MiB/core L3 cache reduction for shadow tags.

Skylake-and-beyond eDRAM Layout

There are arguments to be made about whether the eDRAM as an L4 victim cache or as a transparent buffer to DRAM is the correct direction to go – as a victim cache, Intel stated it allowed a cache hit rate over 95%, however in a number of scenarios in order to get the best performance it required software intervention, and a lot of software was not aware of such a configuration. As a buffer, it enabled seamless integration that all software can take advantage of, but it is not necessarily as optimizable as an L4 victim cache.

‘Go Big or Go Home’

For Broadwell’s eDRAM products, Intel enabled a 128 MiB implementation, quadruple that found on Xbox One silicon at the time. At the time, Intel said that a 32 MiB eDRAM L4 victim cache enabled substantial hit rates, but the company wanted the design to be futureproof as well as a long-term option in Intel’s product stack, so it was doubled, and doubled again just to be sure. The term was ‘go big or go home’, and in our initial review of the first Broadwell eDRAM products, Anand noted that it was very rare to see Intel be so ‘liberal’ with die area.

The eDRAM silicon was built on the 22nm SoC process, as mentioned, one node behind Intel’s leading edge CPU designs. The 128 MiB design came in at a die size of ~77 mm², contributing to over a third of the total die area used in the 14nm Broadwell Iris Pro quad-core processor package (182mm² + 77mm² = 259 mm²).

In the subsequent next generation Skylake generation, eDRAM models with 64 MiB were also offered.

Under certain constraints, the system could save power by disabling the main memory controller entirely if all the data required over a period of time is available in the eDRAM. As part of the initial Broadwell launch, Intel described the extra power consumption of the eDRAM as under 1 watt at idle, moving up to a peak of 5 watts when operating at full bandwidth. Ultimately this means that at a chip level, less power is available to the cores should it be needed, but the trade-off will be better performance in memory limited scenarios. The power is meant to be tracked by the on-die PCU, or Power Control Unit, that can shift power budget between the CPU, GPU, eDRAM, as needed by performance counters or thermals.

As part of this review, we are able to give at least some insights into this number. In our testing, we saw idle package power numbers for the following processors:

Core i7-4790S (22nm Haswell 4 core 6 MiB L3): 6.01 W
Core i7-5775C (14nm Broadwell 4 core 6 MiB L3 + 128 MiB eDRAM) 9.71 W
Core i7-6700K (14nm Skylake 4 core 8 MiB L3): 6.46 W

These numbers would suggest that the effect of the eDRAM, at idle, is more akin to 3.3-3.7 watts, not the sub 1-watt that Intel suggested. Perhaps that sub 1-watt value was more for mobile processors? When running at a steady-state full load, the processors reported power values of their TDP, which doesn’t enable any insight.

Broadwell’s eDRAM Flop?

Intel had somewhat backed itself into a corner with its Broadwell launch. Due to the delays of Intel’s 14nm process at the time, the company had decided to follow its popular Haswell-based 22nm Core i7-4770K high-end processor with the launch of a higher binned ‘Devil’s Canyon’ processor, the Core i7-4790K. This processor offered +500 MHz, which at the time was a substantial jump in performance, despite the processors being launched 12 months apart.

Devil’s Canyon Review: Intel Core i7-4790K and i5-4690K

Because Broadwell ‘wasn’t ready’, Devil’s Canyon was designed to be a stop-gap measure to appease Intel’s ever-hungry consumers and high-end enthusiasts. From the consumer point of view, Devil’s Canyon was at least a plus, but it gave Intel a significant headache.

By bumping the clock speed of its leading consumer processor by a significant margin, Intel now had a hill to climb – the goal of a new product generation is that it should be better than what came before. By boosting its previous best to be even better, it meant the next generation had to do even more. This is difficult to do when the upcoming process node isn’t working quite right. This meant that in the land of the desktop processor, Intel’s reluctance to launch Broadwell with eDRAM was painful to see, and the company had to change strategy.

Intel almost made Broadwell for desktops a silent launch, with very little fanfare. After the announcement, there was almost zero stock on shelves. At the time, Intel did not sample the processors for review – we were able to obtain units from other sources a few days in advance for our launch day coverage.

The Intel Broadwell Desktop Review: Core i7-5775C and Core i5-5675C Tested (Part 1)
The Intel Broadwell Review Part 2: Overclocking, IPC and Generational Analysis

By launching Broadwell Core i7 as a 65 W processor rather than an 84-88 W processor, it meant that the lower frequency Broadwell wasn’t necessarily a direct comparison to Devil’s Canyon. It came out of the gate with a frequency deficit, however the presence of the eDRAM would enable some very careful wins in memory limited scenarios, and perhaps most importantly, gaming.

Ultimately the stunted launch of desktop Broadwell in June 2^nd 2015 was very quickly followed by launch of Skylake on August 5^th 2015, and the top Core i7 processor was once again an 88+ watt unit and a true like-for-like competitor to Devil’s Canyon. Skylake also enabled DDR4 in the market, which was a significant upgrade on the memory front.

Unfortunately Intel had another conundrum – the older Broadwell processors, due to the eDRAM, actually offered slightly better gaming performance than Skylake! It was title, resolution, and quality dependent, and some might argue there was only a few percentage points in it, but for those that wanted the best at gaming, Skylake wasn’t necessarily the answer. For pretty much all CPU tasks though, Skylake was the answer.

Broadwell Still Available Today

Ultimately, Intel’s foray into socketed Broadwell processors with eDRAM was a momentary blip in its line of consumer-focused Core products. At the time, the processors were hard to find for sale, and were quickly made old by the arrival of Skylake and DDR4. There were six different Broadwell processors that were socketable, two mainstream Core products and four Xeon E3 parts.

Intel Broadwell eDRAM Socketable CPUs
AnandTech	Cores Threads	Base Freq	Turbo Freq	IGP	IGP Freq	TDP
Consumer Core
i7-5775C	4C / 8T	3300	3700	48 EUs	1150	65 W
i5-5675C *	4C / 4T	3100	3600	48 EUs	1100	65 W
* Sometimes listed as Core i7-5675C as some ES had an incorrect CPUID string
Enterprise Xeon E3 v4
E5-1285 v4	4C / 8T	3500	3800	48 EUs	1150	95 W
E5-1285L v4	4C / 8T	3400	3800	48 EUs	1150	65 W
E3-1270L v4	4C / 8T	3000	3600	-	-	45 W
E3-1265L v4	4C / 8T	2300	3300	48 EUs	1050	35 W

We were able to also review three of the Xeons at the time.

The Intel Broadwell Xeon E3 v4 Review: 95W, 65W and 35W with eDRAM

Most of these processors are actually very easy to purchase today. The best place to find them are either on Aliexpress, or eBay, for as little as $104.

Broadwell in 2020

The main highlight of these processors was the high-speed eDRAM, coming up to 50 GiB/s bidirectional, at a time when the DDR3-1600 memory solution in dual channel could only offer 25.6 GiB/s. At some point in the future, it would be expected for the speed of normal DRAM to surpass this bandwidth offered, even if it can’t exactly match that latency.

We actually reached that mark very recently.

Intel’s best consumer-grade processor is the Intel Core i9-10900K, offering 10 cores up to a peak 5.3 GHz, but most importantly the memory side has official support for DDR4-2933, which in dual channel mode would enable 46.9 GiB/s.
Current AMD Zen 2 processors have a peak supported frequency of DDR4-3200, which in dual channel mode would enable 51.2 GiB/s bandwidth.
Intel’s mobile Tiger Lake processors support LPDDR4X-4266, which when fully populated would provide 68.2 GiB/s bandwidth.
With the introduction of DDR5 set to come in the next couple of years, we are expecting to see DDR5-4800 as a possible entry point. This would enable 38.4 GiB/s per 64-bit channel, or 76.8 GiB/s in a standard consumer configuration.

Perhaps it is difficult to wrap your head around the fact that only in 2020 are we matching bandwidth levels that were enabled back in 2015 by the addition of a simple piece of silicon. It might make you question why Broadwell was the only family of Intel’s socketable processors to get this innovation – all future eDRAM products were all for mobile devices that rely on integrated graphics, despite the benefits observed for discrete graphics configurations.

It should be noted that because eDRAM offers a latency benefit in memory accesses from 6 MiB to 128 MiB, then as we approach the situation where a single core has access to 128 MiB of L3 cache, this benefit would also disappear. For consumer processors, we’re not there quite yet – while Intel processors offer up to 20 MiB (or 24 MiB in upcoming Tiger Lake 8-core processors), AMD’s future Zen 3 processors will offer access to 32 MiB of L3 for each core within a CCX. By that metric, we’re still very far behind.

For this review, because we recently tested Intel’s Tiger Lake quad-core processors and graphics, I wanted to probe exactly where Broadwell will finally sit in the hierarchy of CPU performance and graphics performance. We recently announced a new benchmark and gaming suite, and Broadwell is always one of the interesting products to put on a new test suite.

All integrated gaming tests (as well as gaming tests with an RTX 2080 Ti) will be under the respective game pages.

Pages In This Review

Analysis and Competition
Test Setup and #CPUOverload Benchmarks
Power Consumption
CPU Tests: Office and Science
CPU Tests: Simulation
CPU Tests: Rendering
CPU Tests: Encoding
CPU Tests: Legacy and Web Tests
CPU Tests: Synthetics
CPU Tests: SPEC
CPU Tests: Microbenchmarks
Gaming: Chernobylite
Gaming: Civilization VI
Gaming: Deus Ex: MD
Gaming: Final Fantasy XIV
Gaming: Final Fantasy XV
Gaming: World of Tanks
Gaming: Borderlands 3
Gaming: F1 2019
Gaming: Far Cry 5
Gaming: Gears Tactics
Gaming: GTA 5
Gaming: Red Dead Redemption 2
Gaming: Strange Brigade
Conclusions and Final Words

Test Setup and #CPUOverload Benchmarks

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

120 Comments

View All Comments

dsplover - Tuesday, November 3, 2020 - link
For Digital s Audio applications the i7-5775C @ 3.3GHz was incredible when disabling the Iris GFX turning the cache over to audio, then running s discrete GFX card.

Bested my i7 4790k’s.
Tried OC’ing but even with the kick but Supermicro H70 it was unstable as the Ring Bus/L4 would also clock up and choked @ 2050MHz.

This rig allowed really tight low latency timings and I prayed they would release future designs with a larger cache.
AMD beat them to to it w/Matisse which was good for 8 core only.

The new 5000s are going to be Digital Audio dreams @ low wattage.

Intel just keeps lagging behind.
ironicom - Tuesday, November 3, 2020 - link
fps is irrelevant in civ; turn time and load time are what matter.
vorsgren - Tuesday, November 3, 2020 - link
Thanks for using my benchmark! Hope it was usefull!
Nictron - Wednesday, November 4, 2020 - link
Which benchmark was that?
erotomania - Wednesday, November 4, 2020 - link
Google the username.
vorsgren - Wednesday, November 4, 2020 - link
http://www.bay12forums.com/smf/index.php?topic=173...
Oxford Guy - Thursday, November 5, 2020 - link
"The Intel skew on this site is getting silly its becoming an Intel promo machine!"

Yes. An article that exposes how much Intel was able to get away with sandbagging because of our tech world's lack of adequate competition (seen in MANY tech areas to the point where it's more the norm than the exception) — clearly such an article is showing Intel in a good light.

If you were an Intel shareholder.

For everyone else (the majority of the readers), the article condemns Intel for intentionally hobbing Skylake's gaming performance. ArsTechnica produced an article about this five years ago when it became clear that Skylake wasn't going to have EDRAM.

The ridiculousness of the situation (how Intel got away with charging premium prices for horribly hobbled parts — $10 worth of EDRAM missing, no less) really shows the world's economic system particularly poorly. For all the alleged capitalism in tech, there certainly isn't much competition. That's why Intel didn't have to ship Skylake with EDRAM. Monopolization (and near-monopoly) enables companies to do what they want to do more than anything else: sell less for more. As long as regulators are toothless and/or incompetent the situation won't improve much.
erikvanvelzen - Saturday, November 7, 2020 - link
Ever since the Pentium 4 Extreme Edition I've wondered why intel does not permanently offer a top product with a large L3 or L4 cache.
abufrejoval - Monday, November 9, 2020 - link
Just picked up a NUC8i7BEH last week (quad i7, 48EU GT3e with 128MB eDRAM), because they dropped below €300 including VAT: A pretty incredible value at that price point and extremely compatible with just about any software you can throw at it.

Yes, Tiger Lake NUC11 would be better on paper and I have tried getting a Ryzen 7-4800U (as PN50-BBR748MD), but I've never heard of one actually shipped.

It's my second NUC8i7BEH, I had gotten another a month or two previously, while it was still at €450, but decided to swap that against a hexa-core NUC10i7FNH (24EU no eDRAM) at the same price, before the 14-days zero-cost return period was up. GT3e+quad-core vs. GT2+hexa-core was a tough call to make, but acutally both run really mostly server loads anyway. But at €300/quad vs €450/hexa the GT3e is quite simply for free, when the silicon die area for the GT3e/quad is in all likelyhood much greater than for the GT2/hexa, even without counting the eDRAM.

My Whiskey-lake has 200MHz less top clock than the Comet-lake, but that doesn't show in single core results, where the L4 seems to put Whiskey consistently into a small lead.

GT3e doesn't quite manage to double graphics performance over GT2, but I am not planning to use either for gaming. Both do fairly well at 4k on anything 2D, even Google Map's 3D renders do pretty well.

BTW: While Google Earth Pro's Flight simulator actually gives a fairly accurate representation of the area where I live, it doesn't do great on FPS, even with an Nvidia GPU. By contrast Microsoft latest and greatest is a huge disappointment when it comes to terrain accuracy (buildings are pure fantasy, not related at all to what's actually there), but delivers ok FPS on my RTX2080ti. No, I didn't try FlightSim on the NUCs...

However, the 3D rendering pipeline Google has put into the browser variant of Google Maps, beats the socks off both Google Earth Pro and Microsoft Flight: With Chrome leading over Firefox significantly, the 3D modelled environment is mind-boggling even on the GT2 at 4k resolutions, it's buttery smooth on GT3e. A browser based flight simulator might actually give the best experience overall, quite hard to believe in a way.

It has me appreciate how good even iGPU graphics could be, if code was properly tuned to make do with what's there.

And it exposes just how bad Microsoft Flight is with nothing but Bing map data unterneath: Those €120 were a full waste of money, but I just saved those from buying the second NUC8 later.
mrtunakarya - Wednesday, December 9, 2020 - link
<a href="https://www.mrtunakarya.com/?m=1">Nice<...

A Broadwell Retrospective Review in 2020: Is eDRAM Still Worth It?

eDRAM: The Savior

‘Go Big or Go Home’

Broadwell’s eDRAM Flop?

Broadwell Still Available Today

Broadwell in 2020

Pages In This Review

Post Your Comment

120 Comments

View All Comments

dsplover - Tuesday, November 3, 2020 - link

ironicom - Tuesday, November 3, 2020 - link

vorsgren - Tuesday, November 3, 2020 - link

Nictron - Wednesday, November 4, 2020 - link

erotomania - Wednesday, November 4, 2020 - link

vorsgren - Wednesday, November 4, 2020 - link

Oxford Guy - Thursday, November 5, 2020 - link

erikvanvelzen - Saturday, November 7, 2020 - link

abufrejoval - Monday, November 9, 2020 - link

mrtunakarya - Wednesday, December 9, 2020 - link

Log in

Don't have an account? Sign up now