AMD Ryzen 3rd Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0 on Desktop

Name: AMD Ryzen 3rd Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0 on Desktop
Item: AMD Ryzen 3rd Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0 on Desktop
Author: Dr. Ian Cutress

by Ian Cutress on January 9, 2019 1:01 PM EST

342 Comments | Add A Comment

342 Comments

Blink and you miss it: AMD's keynote address this year was a whirlwind of primetime announcements for the company. The message is clear: AMD is committing itself to 7nm as the future process node that will drive the company's innovations starting in 2019. The first consumer products on 7nm will be the Ryzen 3rd Generation Desktop processors, using Zen 2 cores, offering more than competitive performance against Intel's best hardware. Also on the docket is a return to high-end graphics performance, with AMD set to release a 7nm graphics card that can spar blow-for-blow with the competition at the $700 price barrier.

AMD at CES 2019

One of the odd things about AMD’s announcements this show has been the tale of two halves. Normally a company will push out single major press release with everything in it. This year AMD discussed its news around Ryzen-3000 series mobile parts and AMD Chromebooks just as the show started, and we were all confused if this was going to constitute what was in the keynote or not – it would seem odd, after all, for the company to pre-announce its keynote announcements. Luckily, AMD has plenty to announce, and it’s all pretty juicy.

First up, CPUs. AMD presented its next generation 7nm desktop CPU, which is the 3^rd Generation Ryzen.

Attacking the Mainstream CPU Market: Toe to Toe with Core i9-9900K

Ignore everything you might have heard about what AMD’s future desktop CPU is going to be. Here are most of the details you need to know.

The new parts, codenamed Matisse, will be coming to market in mid-2019 (sometime in Q2 or Q3). The processor the company had on display was made from two pieces of silicon on the package: one eight-core 7nm chiplet made at TSMC, and a 14nm input/output chiplet with the dual memory controllers and the PCIe lanes, made at GlobalFoundries.

The company did state that it is the world’s first 7nm gaming CPU, and will also be the world’s first mainstream CPU to support PCIe 4.0 x16. At this time the company is not commenting on if the 3^rd Gen is going to have a maximum of eight cores, or if this represents the best processor of the whole family.

Because the processor is still far away from launch, frequencies are not being finalized yet. However, the processor is for the AM4 socket, given that AMD has previously said that it intends to keep backwards compatibility for several generations. That will mean that this CPU will work in current 300 and 400-series AMD motherboards.

What this means for PCIe 4.0 is actually fairly simple. We expect there to be a new line of motherboards presumably something like X570 that will be PCIe 4.0 compatible, for any new PCIe 4.0 graphics cards that will be coming to market. One of the differences with PCIe 4.0 is that it can only handle PCB traces up to 7 inches before needing a redriver/retimer, so these extra ICs are needed for ports lower down the board. But, the first PCIe slot on most motherboards is in that limit, so it would appear that a lot of current 300 and 400 series motherboards, assuming the traces adhere to signal integrity specifications, could have their first PCIe slot rated at PCIe 4.0 with new firmware.

Going For Die Size

As we can see on the die shot above, the 8-core chiplet is smaller than the IO-die, similar to the 8+1 chiplet design on EPYC. The IO-die is not exactly one quarter of the EPYC IO-die, as I predicted might be the case back the Rome server processor announcement launch, but it is actually somewhere between one quarter and one half.

Doing some measurements on our imagery of the processor, and knowing that an AM4 processor is 40mm square, we measure the chiplet to be 10.53 x 7.67 mm = 80.80 mm², whereas the IO die is 13.16mm x 9.32 mm = 122.63 mm².

+15% Performance Generation on Generation, Minimum.

During the keynote, AMD showed some performance numbers using the new Ryzen 3^rd Generation (Matisse) processor. The test in question was Cinebench R15.

Our internal numbers show the 2^nd Generation Ryzen 7 2700X scores 1754.

This new 3^rd Generation Ryzen processor scored 2023.

This would mean that at current non-final clocks, the new parts give a 15.3% increase in performance generation on generation. Cinebench is an idealized situation for AMD, but this is not at final clocks either. It will depend on the workload, but this is an interesting data point to have.

Identical Performance to the Core i9-9900K, Minimum.

Our internal benchmarks show the 9900K with a score of 2032.

The 8-core AMD processor scored 2023, and the Intel Core i9-9900K scored 2042.

Both systems were running on strong air cooling, and we were told that the Core i9-9900K was allowed to run at its standard frequencies on an ASUS motherboard. The AMD chip, by contrast, was not running at final clocks. AMD said that both systems had identical power supplies, DRAM, SSDs, operating systems, patches, and both with a Vega 64 graphics card.

At Just Over Half The Power…?!

Also, in that same test, it showed the system level power. This includes the motherboard, DRAM, SSD, and so on. As the systems were supposedly identical, this makes the comparison CPU only. The Intel system, during Cinebench, ran at 180W. This result is in line with what we’ve seen on our systems, and sounds correct. The AMD system on the other hand was running at 130-132W.

If we take a look at our average system idle power in our own reviews which is around 55W, this would make the Intel CPU around 125W, whereas the AMD CPU would be around 75W.

AMD Benchmarks at CES 2019
AnandTech	System Power	Idle Power*	Chip Power	CB 15 MT Score (pre-brief)	CB 15 MT Score (on-stage)	All-Core Frequency
AMD Zen 2	130W	55W	75W	2023	2057	?
Intel i9-9900K	180W	55W	125W	2042	2040	4.7 GHz
*A rough estimate given our previous review testing

This suggests that AMD’s new processors with the same amount of cores are offering performance parity in select benchmarks to Intel’s highest performing mainstream processor, while consuming a lot less power. Almost half as much power.

That is a powerful statement. (ed: pun not intended)

How has AMD done this? IPC or Frequency?

We know a few things about the new Zen 2 microarchitecture. We know it has an improved branch predictor unit, and improved prefetcher, better micro-op cache management, a larger micro-op cache, increased dispatch bandwidth, increased retire bandwidth, native support for 256-bit floating point math, double size FMA units, and double size load-store units. These last three parts are key elements to an FP-heavy benchmark like Cinebench, and work a lot in AMD’s favor.

As the Intel CPU was allowed to run as standard, even on the ASUS board, it should reach around 4.7 GHz on an all-core turbo. AMD’s frequencies on the processor were unknown; but also they are not final and we ‘should expect more’. Well, if the processor was only running at 75W, and they can push it another 20-30W, then there’s going to be more frequency and more performance to be had.

The one thing we don’t know is how well TSMC’s 7nm performs with respect to voltage and frequency. The only chips that currently exist on the process are smartphone chips that are under 3 GHz. There is no comparable metric – one would assume that in order to be competitive with the Core i9-9900K, the processor would have to match the all-core frequency (4.7 GHz) if it was at the same IPC.

If the CPU can't match IPC or frequency, then three things are possible:

If the TSMC process can’t go that high on frequency, then AMD is ahead of Intel on IPC, which is a massive change in the ranks of modern x86 hardware.
If the TSMC process can clock above 5.0 GHz, AND there is room to spare in the power budget to go even higher, then it’s going to be really funny seeing these processors complete.
AMD's Hyperthreading for software such as CineBench is out of this world.

TL;DR = AMD’s 3^rd Gen Ryzen Processors Are Another Step Up

When speaking with AMD, their representative said that there will be more information to follow as we get closer to launch. They’re happy for users to discuss whether it is IPC or frequency that is making AMD the winner here, and they’ll disclose more closer to the time.

Ian, I Thought You Predicted Two Chiplets?

Naturally, I assumed that AMD would be presenting a Ryzen-3000 series desktop processor with sixteen cores. For me, and a lot of others, felt like a natural progression, but here we are today with AMD only mentioning an eight core chip.

My money on two chiplets and a quarter IO die
— Ian Cutress (@IanCutress) November 7, 2018

I predicted wrong, and I've lost my money (ed: in Las Vegas no less). But if we look at the processor, there’s still room for a surprise.

There’s room for a little something extra in there. There’s not much room for a little something extra, but I’m sure if AMD wanted to, there’s just enough space for another CPU chiplet (or a GPU chiplet) on this package. The question would then be around frequency and power, which are both valid.

There's also the question of lower core count processors and the cheaper end of the market. This processor uses silicon from TSMC, made in Taiwan, and GlobalFoundries, made in New York, then packaged together. We have heard some discussion from others not in the industry that this makes cheaper processors (sub $100) less feasible. It is entirely possible that AMD might address that market with future GPU.

What AMD has plans for in the future, I don’t know. I don’t have a crystal ball. But it does look like AMD has some room to grow in the future if they need to.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

342 Comments

View All Comments

KOneJ - Thursday, January 10, 2019 - link
"The key to AMD's strategy right now is squarely pegged on the yields of 7nm output... much like Intel's on 14nm++, and that can be quite dangerous"
don't think there's much need to worry. TSMC was early to HVM 7nm, has a strong track record, and has worked well for Apple, Huawei, and Qualcomm. The chiplets are small enough that yield is more comparable to mobile dies than Intel's monster monolithic parts.
nukunukoo - Thursday, January 10, 2019 - link
Zen 2 seems to have finally matched Intel's single-core IPC. Too bad the "leaked" specs last December does not match the 8-core's TDP at the keynote. Just hope the pricing will keep intel on its toes. Having said that, a 16-core Z2 would definitely be sweet for my next 3DS/Maya/Resolve build!
willis936 - Thursday, January 10, 2019 - link
Zen 1 has higher IPC than intel’s past half decade generation.
KOneJ - Thursday, January 10, 2019 - link
Don't know about that. It was an engineering sample. They probably set the clocks to edge out the 9900K and the 75W TDP was an incidental byproduct. There may well not be a final product with a 75W TDP. I'd say that's likely actually.
Haawser - Thursday, January 10, 2019 - link
AMDs manufacturing strategy looks like it is taking another huge leap forward here. Fair bet says that they will use exactly the same 7nm CPU chiplet in *everything*. Server, desktop, HEDT, APUs, and even the next gen consoles.

Only having *one* 7nm chiplet to design/test/validate (and manufacture) has so many positive advantages it would be hard to list them all. But that's obviously their plan, and it's one that I doubt Intel will have an answer to anytime soon, even if they do get their 'relaxed 10nm' working.
PixyMisa - Thursday, January 10, 2019 - link
Yes, the ideal design for a next-gen console is a big custom die with GPU, memory, and I/O, and then a standard 8-core CPU chiplet.
KOneJ - Thursday, January 10, 2019 - link
IDEALLY, a MCM GPU would be better than monolithic. It just requires a creative scalable architecture that is transparent to software with the challenges of mGPU programming.
Kevin G - Friday, January 11, 2019 - link
nVidia has done research into this published a research paper on the topic.

It would be foolish to think that AMD is not perusing the same ideas on the GPU side that they have shown on the CPU side.

In fact one of the obvious things to do for GPUs and chiplets is to spin off miscellaneous IO (PCIe, HDMI, DP controllers) and various codec engines to their own die as those don't need perpetual updates between GPU generations nor cutting edge manufacturing.
KOneJ - Saturday, January 12, 2019 - link
IK about the NV paper. And AMD is very aggressively pursuing scalability. I think MCM GPUs from AMD that aren't plagued with mGPU programming challenges are slated for post-GCN. Whoever makes it to MCM GPUs will have a MASSIVE advantage over the competition. But MCM GPUs are significantly more difficult than MCM CPUs as David Wang rightly pointed out. Some aspects of Zen will definitely migrate into the RTG side. Already has, even with Zen and Vega if you read the white papers.
Kevin G - Monday, January 14, 2019 - link
It depends entirely on how they are split up.

The balanced performance would implement a cross bar die that housed the memory controllers so that every compute die would have even access to. The catch is that the cross bar is massive in current designs and scaling up the number of nodes here only further increases complexity. Getting enough memory bandwidth is conceptually also a challenge as they have to be placed close to the cross bar die. This does mimic AMD's strategy with the Zen 2 based Epyc.

A NUMA based GPU design with say, four nodes in a package each with their own pools of HBM memory on twos sides for a 4096 bit wide bus wouldn't be a bad decision either. However, scaling like the centralized idea above faces some physical layout challenges and doesn't inherently make individual dies smaller. nVidia's research paper explored this and was estimating around ~85% as fast as monothilic die due to the shear number of links and high bandwidth connections between each die (if you're using interposers, go all out here). While many think that this would be seen as a quad GPU setup (and for certain workloads, you'd probably want it to be), it can be virtualized to appear as a single GPU for gaming. The dies are close enough and have enough connectivity and bandwidth between each other that I don't see this as being an issue for most workloads.

An extension of the above idea would be to build compute dies and memory router dies which the HBM sits directly on top of. Essentially this is a tiled grid of compute and memory routers. Each router can coherently manage traffic form the four nearest neighbors. This enables scaling to higher compute, memory bandwidth and memory capacity but comes at the cost of far higher scalability issues. Instead of a single high bandwidth hop between nodes, we could be approaching 10 in a realistic design. The issue shifts from bandwidth to latency and being able to distribute a workload evenly.

AMD Ryzen 3rd Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0 on Desktop

AMD at CES 2019