Intel's Ivy Bridge Architecture Exposed
by Anand Lal Shimpi on September 17, 2011 2:00 AM EST- Posted in
- CPUs
- Intel
- Ivy Bridge
- IDF 2011
- Trade Shows
Core Architecture Changes
Ivy Bridge is considered a tick from the CPU perspective but a tock from the GPU perspective. On the CPU core side that means you can expect clock-for-clock performance improvements in the 4 - 6% range. Despite the limited improvement in core-level performance there's a lot of cleanup that went into the design. In order to maintain a strict design schedule it's not uncommon for a number of features not to make it into a design, only to be added later in the subsequent product. Ticks are great for this.
Five years ago Intel introduced Conroe which defined the high level architecture for every generation since. Sandy Bridge was the first significant overhaul since Conroe and even it didn't look very different from the original Core 2. Ivy Bridge continues the trend.
The front end in Ivy Bridge is still 4-wide with support for fusion of both x86 instructions and decoded uOps. The uOp cache introduced in Sandy Bridge remains in Ivy with no major changes.
Some structures within the chip are now better optimized for single threaded execution. Hyper Threading requires a bunch of partitioning of internal structures (e.g. buffers/queues) to allow instructions from multiple threads to use those structures simultaneously. In Sandy Bridge, many of those structures are statically partitioned. If you have a buffer that can hold 20 entries, each thread gets up to 10 entries in the buffer. In the event of a single threaded workload, half of the buffer goes unused. Ivy Bridge reworks a number of these data structures to dynamically allocate resources to threads. Now if there's only a single thread active, these structures will dedicate all resources to servicing that thread. One such example is the DSB queue that serves the uOp cache mentioned above. There's a lookup mechanism for putting uOps into the cache. Those requests are placed into the DSB queue, which used to be split evenly between threads. In Ivy Bridge the DSB queue is allocated dynamically to one or both threads.
In Sandy Bridge Intel did a ground up redesign of its branch predictor. Once again it doesn't make sense to redo it for Ivy Bridge so branch prediction remains the same. In the past prefetchers have stopped at page boundaries since they are physically based. Ivy Bridge lifts this restriction.
The number of execution units hasn't changed in Ivy Bridge, but there are some changes here. The FP/integer divider sees another performance gain this round. Ivy Bridge's divider has twice the throughput of the unit in Sandy Bridge. The advantage here shows up mostly in FP workloads as they tend to be more computationally heavy.
MOV operations can now take place in the register renaming stage instead of making it occupy an execution port. The x86 MOV instruction simply copies the contents of a register into another register. In Ivy Bridge MOVs are executed by simply pointing one register at the location of the destination register. This is enabled by the physical register file first introduced in Sandy Bridge, in addition to a whole lot of clever logic within IVB. Although MOVs still occupy decode bandwidth, the instruction doesn't take up an execution port allowing other instructions to execute in place of it.
ISA Changes
Intel also introduced a number of ISA changes in Ivy Bridge. The ones that stand out the most to me are the inclusion of a very high speed digital random number generator (DRNG) and supervisory mode execution protection (SMEP).
Ivy Bridge's DRNG can generate high quality random numbers (standards compliant) at 2 - 3Gbps. The DRNG is available to both user and OS level code. This will be very important for security and algorithms going forward.
SMEP in Ivy Bridge provides hardware protection against user mode code being executed in more privileged levels.
97 Comments
View All Comments
Meegulthwarp - Saturday, September 17, 2011 - link
Thanks man, you're a star. You really should just ignore whiny comments like mine as you provide some of the best (if not the best) tech articles online and its free of charge! Everytime you push a article like this my life comes to a standstill so I can read it. Keep up the good work!zshift - Saturday, September 17, 2011 - link
I agree with this 100%. I love reading the articles here on AnandTech. The articles are well written, and provide plenty of charts/data/photos to provide as much of a complete understanding as possible of the product in question.I also like the fairly recent upsurge in articles, you have a great team here.
PS: Bench rocks!
lowenz - Saturday, September 17, 2011 - link
From power page: "Voltage changes have a cubic affect on power"Cubic?
P ~ C * v^2 * freq * switching activity
know of fence - Saturday, September 17, 2011 - link
Cubic as in "to the third power".I remember a slide from on of the Intel presentations saying that, but i'd like to know how it comes about.
Vcore^3 ~ power
Here somebody posted some data of Vcore vs Power. If you were to plot power consumption in relation to Vcore^3 then one ought to get a linear graph.
http://www.awardfabrik.de/forum/showthread.php?t=6...
KalTorak - Saturday, September 17, 2011 - link
Cubic. Because f, for a big chunk of the V-f curve, tends to be linear in V.gevorg - Saturday, September 17, 2011 - link
Will IVB have 8-core unlocked CPUs like 2500K/2600K SNB?Sabresiberian - Saturday, September 17, 2011 - link
"Ivy Bridge won't get rid of the need for a discrete GPU but, like Sandy Bridge, it is a step in the right direction."I'm not so sure I'd agree getting rid of the need for a discrete GPU is a good thing. In terms of furthering technological possibilities, yes, I get that; in terms of me building the computer I want to build and tailoring the results to my purposes, I really don't want these things to be tied together in an inflexible way.
;)
platedslicer - Sunday, September 18, 2011 - link
Standardization seems to be the current trend... next thing you know, the computer industry has gone the way of car manufacturers.JonnyDough - Monday, September 19, 2011 - link
Standards are not all bad. In the case of car manufacturers, we now have things like sealed bearings (so you don't have to regularly grease the bearings in your wheels, and they actually last longer and cost less), safety systems like ABS, seat belts, airbags, etc.With computers, we need standards as well for compatibility. It lowers cost, ensures that hardware works fluidly between platforms, etc. If we didn't have standards we would have things like rambus - which would only cost us a fortune and slow technological progression.
JonnyDough - Monday, September 19, 2011 - link
I think the author means that you won't NEED a discrete CHIP (GPU other than the one on-die) to run a system. Discrete here seems to imply an IGP (integrated onto the motherboard) OR on a separate graphics card. That isn't to say one won't still be required for graphics intensive applications. Ideally, the on-die GPU will be able to work in tandem when a graphics card is installed.