Part of the story behind the Xeon Scalable platform, built upon server-level Skylake processing cores with AVX-512 and a new mesh topology, was that the CPU was designed to be partnered with additional silicon in the same package. Out of the gate immediately were versions bundled with Intel’s OmniPath controller, allowing for networking fabric connections. There has always been expectation that Intel will launch a Xeon Scalable processor with an integrated Intel Altera FPGA on the same package, and now that expectation has become reality. Intel is now shipping its Xeon Gold 6138P processor with a built-in Altera Arria 10 GX 1150 FPGA.

Back at Supercomputing 2016, Intel demonstrated what supposed to be a Broadwell-based Xeon system with a built in FPGA into the same package, however no real details were given and the chip itself was not on display. This year, at Mobile World Congress (of all places), Intel had a demonstration system showing a Xeon Scalable processor with a built in FPGA into the same package, but again the chip was not on display, only a processor that supposedly had the chip in. I was not allowed to use my screwdriver to open the system up. The Intel attendant next to the system was discussing that the platform would help accelerate Edge Computing for data used by 5G networks, although discussions about the finer details of how many SKUs, the size of the FPGA, and other elements were met with a refusal to answer. As a result, I didn’t post anything at MWC; I could not confirm anything that was being said and Intel was not prepared to say any more.

Lisa Spellman showing Intel Xeon + FPGA during Intel's Presentation at the Fujitsu Forum, Tokyo
Source: PC-Watch

Fast forward a couple of months, and over at PC-Watch are reporting that Intel has announced via its itpeernetwork hub (rather than its traditional PR outreach) the mass production of the Xeon Gold 6138P with an integrated Arria 10 GX 1150 FPGA, with some select customers already being sampled. The announcement states that Fujitsu is one of the Intel partners planning a system around this processor.

Intel Xeon Gold: Adding an FPGA
AnandTech Xeon Gold 6138 Xeon Gold 6138P
with Arria 10 FPGA
Socket Socket P
LGA 3647
Socket P
LGA 3647
Cores / Threads 20 / 40 20 / 40 ?
Base Frequency 2000 MHz 2000 MHz ?
Turbo Frequency 3700 MHz 3700 MHz ?
PCIe Lanes 48 32
DRAM Six Channels
Six Channels
On-Package FPGA - Arria 10 GX 1150
Logic Elements - 1150K (1.15m)
Embedded Memory - 53 Mb
UPI Links Three Two
TDP 125 W 125 W CPU
60 - 70 W FPGA
195 W Total ?
Price $2612 Arm, Leg

Intel is connecting the Xeon processor to the FPGA with 160 Gbps of bandwidth per socket (doesn’t state if this is bi-directional) using a cache coherent interconnect. From the way that we know that the Intel OmniPath Fabric connects in package to an Xeon, this connection likely implements a different protocol over the PCIe x16 interface reserved for in-package components, but also takes advantage of Intel’s Ultra-Path Interconnect (UPI) for cache coherency and access to data across the platform. This may mean that this reduces Xeon+FPGA setups to dual socket at best, if one UPI link from the processor is in use for the FPGA, however Intel did not provide briefings on the new parts to confirm this. We can confirm from an old Intel slide that the platform should be using a High Speed Serial Interface (HSSI) for connectivity; this slide also states that the new processors have different power specifications to standard Skylake-SP sockets, and as such the Xeon Gold 6138P is probably unlikely to be a drop in processor to current systems.

For this launch, Intel has built a virtual switching reference design, which uses the FPGA for infrastructure dataplane switching with virtual machines on the CPU implementing direct compute on the dataplane. Intel states that their reference design offers 3.2x better throughput and half the latency compared to a CPU-only solution when running the Open Virtual Switch framework. This test was measured through its DPDK forwarding performance. It was stated that at the Fujitsu Forum in Tokyo this week an OVS system with additional performance monitoring was on display.

The system under test was a 2P server using two of the new ‘Intel Xeon Gold 6138P with Integrated Arria 10 GX 1150 FPGA’ processors, 12x16 GB of DDR4-2666 (one DIMM per channel), and with an 100G Alaska network card from Marvell. Amusingly it says the system also had a PCIe 3.0 x10 slot, alongside a PCIe 3.0 x8 slot. 10 seems like a different number to normal.

Also in the announcement was a mention of Intel’s desire to offer a discrete FPGA solution with a faster high-bandwidth coherent connection, although details of this interconnect were not provided (it could be UPI through a physical discrete add-in card slot?). These discrete FPGA solutions will support code migration from code developed on the Xeon+FPGA system in this announcement as well as Altera’s Arria 10 GX acceleration cards.

One of Intel's current Arria 10 GX 1150 Programmable Acceleration Cards

Wider availability of the Xeon Gold 6138P with Arria 10 is not yet known at this time. Interested parties are expected to get in contact with their Intel representative or OEM partner.

Source: Intel's ITPeerNetwork, PC-World (main image)

Related Reading

Comments Locked


View All Comments

  • flgt - Thursday, May 17, 2018 - link

    I know people are down on Intel now but they appear to have diversified well (except for the major loss in mobile). They can build a lot of interesting systems even though their manufacturing process and CPU technology has stalled out. Maybe they saw the writing on the wall awhile ago.
  • jjj - Thursday, May 17, 2018 - link

    Look at that, glued together garbage is not a bad thing anymore?
  • sgeocla - Thursday, May 17, 2018 - link

    Glue is so 2017. It's all about EMIB now. Embedded Multi-die Interconnect Binder.
  • HStewart - Thursday, May 17, 2018 - link

    It is so interesting how people are so uneducated about the process especially with this chip and not just about EMiB ( which you are absolutely 100% corrected - it not glue )

    Most people here probably don't understand what FPGA is actually is - it stands for field programmable gate array and it used for custom logic on chip that is programmable. Altera is pretty much the leader in this industry - and in last year or so Intel purchase Altera.

    It probably could be research, but I bet the technology behind EMiB came about because of Intel purchase of Altera, For more information on FPGA there is a wiki on it
  • HStewart - Thursday, May 17, 2018 - link

    woops sorry another link was included - from previous conversation
  • CoolDeepBlue - Thursday, May 17, 2018 - link

    Altera was NOT the leader, Xilinx owned/owns more than all the other programmable logic vendors combined together.
    And by the way, the entire industry is NOT using EMIB (or an EMIB like) solution for a reason: it is difficult and expensive (yield)
  • patrickjp93 - Thursday, May 17, 2018 - link

    The industry can't use what Intel keeps to itself. And yields are far better than big interposer solutions. And Intel's Stratix X family destroys everything Xilinx offers. Intel is King of FPGAs now.
  • ZolaIII - Friday, May 18, 2018 - link

    Nope by; market share, internal FPGA interconnect they are behind Xilinx. Only thing they are ahead is a manufacturing process. FPGA's need to become a prime main resident's on heterogeneous and complete SoC's (either with out of much glue or with much better interconnect between glued part's).
  • patrickjp93 - Friday, May 18, 2018 - link

    ??? Intel's latest family is faster on interconnect (58G vs. Xilinx's best 54) with higher integrated compute (10 TFlops vs. Xilinx's best 7), and Intel actually has a coherrent interconnect and system fabric for their designs. Xilinx is years behind now. The only thing Xilinx offers that Intel doesn't is high-speed derivatives trading AICs.
  • BurntMyBacon - Tuesday, May 22, 2018 - link

    Doesn't change the fact that Xilinx is still well ahead by market share. In practice, the 58 vs 54 difference in interconnect speed doesn't usually make much of a difference. Developers are often tied to some standard interconnect (PCIe, SRIO, Fibre Channel, etc.) that both manufacturers have more than enough interconnect bandwidth to support. The compute resources difference is more compelling, but the bulk of development doesn't happen on the top end chips. Many other factors play into it. To name a few: Development Environment (Altera had an edge here, but I'm not sure that is still true), Available IP (Last I checked, Xilinx has the edge here in both free and paid IP), balance of resources (compute units, interconnects, on die memories, fabric, clock generators, etc.), power, and cost.

Log in

Don't have an account? Sign up now