Introduction

In our first article, we explained that dynamic power, power leakage, the memory wall and wire delay have forced CPU designers to rethink the methods that they use to achieve higher performance CPUs.

In Part 2, we will investigate the advantages and disadvantages of the new market trend: multi-core CPUs. Will dual core enhance your gaming experience? Tim Sweeney, the leading developer behind the Unreal 3 engine, was so kind to answer our questions about multi-threaded development with concise answers. There is more - in the third part of this series, we will investigate what future multi-core and single core architectures will bring. We examine if the stories about "the new era of multi-threaded multi-core CPUs" are true and whether or not this will really benefit the consumer.

Should you care?

Should you care whether or not we are moving to multi-core and multi-threaded CPUs? After all, the past decades, we were able to get consistently more performance for lower prices. However, it is pretty unclear whether or not multi-cores will benefit all consumers. We will explain this statement in more detail, but it is very interesting to see whether or not it will benefit you. The last spring IDF was all about multi-core CPUs, but there was very little information on how this is going to benefit the consumers. Let us take a critical look at this new direction that the desktop CPUs have taken.

Multi-core, multi-expensive?

Dual cores are expensive to manufacture. Yields (the number of working chips on one wafer) are roughly proportional to size. Larger, dual core chips will always have lower yields than smaller, single core chips on the same process technology. But that is only a small problem. A bigger and more obvious problem is that you have only half the number per wafer (even slightly less). So, dual cores (such as Pressler) cost at least twice as much to manufacture compared to a single core chip - most likely more (such as Yonah, Pentium-D). Dual and multi-cores might not increase the thermal density (dissipated power per mm²), but they do increase the total power. Granted, from the viewpoint of a heat sink designer, it is not much harder to cool a 112 mm² Prescott chip that dissipates +/- 90 Watt than a theoretical 206 mm² Pentium-D with 180 Watt. However, making sure that those 180 Watts do not cook all the components inside your computer is almost an impossible task for the system designer who wants to design a relatively silent PC. The result is that multi-core CPUs will run at lower clockspeeds than their single core counterparts. The Pentium-D, the dual core Prescott, is limited to 130 Watt and 3.2 GHz, while the current Prescott dissipates up to 115 Watt and runs at 3.8 GHz. And last, but not least, dual core CPUs need more bandwidth than a single core to make a difference and increase the "CPU perceived" latency. Cache coherency and getting access to the same memory bus all increase the total latency that the CPU sees and thus, lowers performance.

Multi-core, multi-performance?

The advantages of multi-core and multi-threaded CPUs far outweigh the disadvantages in the server market. While most server applications produce a lot of threads and processes, performance scales close to linear as more cores are added to the die. This is in sharp contrast with the superscalar CPU where increasingly complex designs require exponentionally more transistors, and power show diminishing returns, especially in server applications where the IPC can go below 1. While Dual core CPUs are more expensive to manufacture, they are far easier to design than turning a single core CPU into an even wider issue, complex CPU. Development costs for a new CPU design are astronomically high. So, it does not surprise us at all that Server CPU manufacturers have turned en masse towards multi-core CPU designs: significant power gains with a fraction of the time and money invested. And the same can be said about a big part of the HPC market.

A good example of how well server applications can scale with more CPUs, refer to our DB2 tests, which showed up to a 96% performance increase going from single to dual, and a boost of up to 89% when we increased the number of Opterons from two to four. Most desktop and many workstation applications are single-threaded, however. Or more accurately, they might be multithreaded to be more responsive, but there is only one thread that really needs CPU power.

Even some workstation applications that are supposed to be prime examples of multi-threaded applications are not as multi-core friendly as they appear to be. I ran a lot of Adobe Premier benchmarking with different video formats, and I found out that the second CPU offered a meagre 10% to 40% speed increase in video editing (rendering). 3DSMax shows only big increases when you use very complex scenes. When using a relatively light animation scene, the second CPU adds about 20% to 50%. One of the best scenes, the architecture scene of the Spec test, shows an 89% increase when adding a second Opteron, but two extra Opterons already show some diminishing returns - performance went up to 72%.

Multitasking scenarios might be another way to use the power of dual and multi-cores. However, many of the CPU heavy applications that desktop and workstation users like to run in the background - archiving, encoding - also operate on the hard disk. And despite the merits of NCQ (Native Command Queuing), high rotation speeds, and lower seek times, disk heavy tasks and especially multithreaded ones can bring a whole system to a crawl when there is too much hard disk activity. So, it is clear that there are big challenges ahead before multi-core CPUs will really bring benefits to most consumers and employees.

Threads & Performance
Comments Locked

49 Comments

View All Comments

  • hzmonte - Tuesday, October 4, 2005 - link

    For everyone's convenience, here is Part 3: http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...
  • bmayer - Friday, March 25, 2005 - link

    About the automatic parallization:
    It can be fairly easy to do. I work on some Cray X1 and X1es. A little bit about the X1. The Processors (called Multi-Streaming Processors (MSP)) are made up of 4 Single Streaming Processors (SSP). They are vector units with a lenght of 64 or 32 (can't remember, the point is they are good sized vector units). The processors are clocked at 800MHz for the X1 or 1.3GHz if you have an X1e.

    Ok so what do we see? These CPUs *suck* if your code is not vectorized and running in parallel.Guess what the Cray compiler does? Automatically vectorizes, streams (takes advantage of a full MSP instead of a single SSP), and parallizes.

    They lay out very clearly what the conditions are where the compiler can NOT optimize, and give you directives where you can force it to do so. You can also get a listing of why it did not do a given optimization for any given line. Actually it gives you all information by default which combined with grep is nice.

    OK so there are different types of parallelizm, and the one I have just talked about is different then what they are trying to do. This has been talking about speeding up the execution of some inner loop, which is very different from doing two different things at the same time (AI module and sound module running at the same time). BUT this can still be used for great effect. When the inner loops execute for half the time as on a single core/CPU machine we now have more time to do other things, and thus see a speed improvement.

    I have thought that Sony/IBM should get in touch with Cray to supply compiler tech for the Cell processor. If the Cell is as easy to write parallel code for as the X1 is we will have some very awesome games, and clusters of PS3s.

    If you want to see a very nice overview of processor history and some of the crazy things people are proposing to do with the multi-cores check out ftp://ftp.cs.wisc.edu/sohi/talks/2003/pact.pdf

    I agree that parallel C++ is just not happening very well. There are languages like UPC which are starting to gain hold in the HPC market, which *could* find some use in the game market. But as the state of the art stands it is Fortran which is really great for automatically generating parallel code. But who could serriously say that someone outside of engineering writes a code in Fortran?

    Great article, lots to think about!
  • blckgrffn - Friday, March 18, 2005 - link

    Loved it. All of it. Especially the interview with Sweeney - it is always nice to hear where the future *will* lie with regards to at least one major application/game. Now, just get an interview with Carmack, and I will be happy for a long time... :D

    Nat
  • Caleb Jasin - Friday, March 18, 2005 - link

    #41

    Sorry for the late reply.

    And yeah pthreads are not threads really. They are processes. When you call the pthread_create() function you create a new proccess ;)
  • ravedave - Wednesday, March 16, 2005 - link

    Sorry to double post. You could easily make a benchmark that saw 1000000000% speed increase. Take one application give it high priority and have it loop for 5 days. It would lock everything else up. Throw in a second processor and you no longer have that problem, hence a huge speedup in the other processes. I dont trust any numbers from any manufacturer.
  • ravedave - Wednesday, March 16, 2005 - link

    Excellent article. Extremely excllent. I like the fact that you mentioned GUI updates, most people forget that almost all applications are multi-thread as far as GUI/core go. I really think that Microsoft is on the right track with .NET though. I belive .NET 2 or 2.5 will really take multithreading to the next level.
  • RockHydra11 - Tuesday, March 15, 2005 - link

    My fear is that instead of creating new architctures for their processors to increase performance, they will just shove more cores on it and pass it off on people.
  • Verdant - Tuesday, March 15, 2005 - link

    #40

    i do see a shift to something like C# but anything that brings a "performance" hit, is likely to scare away developers, especially since on non-windows platforms the hit is pretty huge atm.

    you are right that a compiler probably isn't an answer, i was merely stating that if the industry was dedicated to creating a "deserializing" compiler it would be possible, extremely complex, and probably technically more than a "compiler" but still possible...


    also you are thinking of UNIX, the linux kernel has supported ever since i can remember, take a look at the pthreads and linuxthreads (glibc2) libraries
  • Caleb Jasin - Tuesday, March 15, 2005 - link

    #35

    Yeah agreed, from my own experience C# threadding is much easier than threadding in C. And I would say it is the same for most code. Developing in C# is generally much faster than in C or C++. And the tests I have seen shows about a 10% performance hit between optimal C# and C++ code. So I think it is just a question of time before we see games coded primarily in C#. In the end, the time saved could be used to write more optimal code I guess, so maybe the performance hit would be negligable.


    However, I don't think that we will see compilers that are smart enough to multithread code any time soon. I wrote a very simple compiler for a very simple language in university and coding compilers is extremely complicated. As Tim says, they aren't threadding gamelogics and that wouldn't make much sense either because there are too many dependencies. And even though threadding takes alot of time, there are alot of relatively easily paralizeable code in games.


    Btw, there is a small error in the article. It says that Linux has thread support. It really doesn't. A thread in Linux is a process. There is no diffrence at kernel level between starting a new thread and forking a new process.
  • melgross - Tuesday, March 15, 2005 - link

    Don't forget that the idea behind these game engines is the reusability of the code. What I mean is that they will first tackle the problems that Sweeney thought most important, and easier, and then, one by one, the harder problems will be resolved. This might take years, but performance increases are always going to be appreciated. Competing products are always going to put pressure on on each other.

    Ten years from now the discussion will be about how they accomplished all of this.

    While dual-cored GPUs have never been used, since that is just now becoming a viable technology, dual and quad GPUs have been used for many years now on the high end boards. Not the gamer boards that we see for $500 and below.

Log in

Don't have an account? Sign up now