Browsing through a manufacturer’s website can offer a startling view of the product line up.  Such was the case when I sprawled through Gigabyte’s range, only to find that they offer server line products, including dual processor motherboards.  These are typically sold in a B2B environment (to system builders and integrators) rather than to the public, but after a couple of emails they were happy to send over their GA-7PESH1 model and a couple of Xeon CPUs for testing.  Coming from a background where we used dual processor systems for some serious CPU Workstation throughput, it was interesting to see how the Sandy Bridge-E Xeons compared to consumer grade hardware for getting the job done. 

In my recent academic career as a computational chemist, we developed our own code to solve issues of diffusion and migration.  This started with implicit grid solvers – everyone in the research group (coming from chemistry backgrounds rather than computer science backgrounds), as part of their training, wrote their own grid and solver classes in C++ which would be the backbone of the results obtained in their doctorate degree.  Due to the idiosyncratic nature of coders and learning how to code, some of the students naturally wrote classes were easily multi-threaded at a high level, whereas some used a large amount of localized cache which made multithreading impractical.  Nevertheless, single threaded performance was a major part in being able to obtain the results of the simulations which could last from seconds to weeks.  As part of my role in the group, I introduced the chemists to OpenMP which sped up some of their simulations, but as a result caused the shift in writing this code towards the multithreaded.  I orchestrated the purchasing of dual processor (DP) Nehalem workstations from Dell (the preferred source of IT equipment for the academic institution (despite my openness to build in-house custom hardware) in order to speed up the newly multithreaded code (with ECC memory for safety), and then embarked on my own research which looked at off-the-shelf FEM solvers then explicit calculations to parallelize the code at a low level, which took me to GPUs, which resulted in nine first author research papers overall in those three years. 

In a lot of the simulations written during that period by the multiple researchers, one element was consistent – trying to use as much processor power as possible.  When one of us needed more horsepower for a larger number of simulations, we used each other’s machines to get the job done quicker.  Thus when it came to purchasing those DP machines, I explored the SR-2 route and the possibility of self-building the machines, but this was quickly shot down by the IT department who preferred pre-built machines with a warranty.  In the end we purchased three dual E5520 systems, to give each machine 8 cores / 16 threads of processing power, as well as some ECC memory (thankfully the nature of the simulations required no more than a few megabytes each), to fit into the budget.  When I left that position, these machines were still going strong, with one colleague using all three to correlate the theoretical predictions with experimental results.

Since leaving that position and working for AnandTech, I still partake in exploring other avenues where my research could go into, albeit in my spare time without funding.  Thankfully moving to a single OCed Sandy Bridge-E processor let me keep the high level CPU code comparable to during the research group, even if I don’t have the ECC memory.  The GPU code is also faster, moving from a GTX480 during research to 580/680s now.  One of the benchmarks in my motherboard reviews is derived from one of my research papers – regular readers of our motherboard reviews will recognize the 3DPM benchmark from those reviews and in the review today, just to see how far computation has gone.  Being a chemist rather than a computer scientist, the code for this benchmark could be comparable to similar non-CompSci trained individuals – from a complexity point of view it is very basic, slightly optimized to perform faster calculations (FMA) but not the best it could be in terms of full blown SSE/SSE2/AVX extensions et al.

With the vast number of possible uses for high performance systems, it would be impossible for me to cover them all.  Johan de Gelas, our server reviewer, lives and breathes this type of technology, and hence his benchmark suite deals more with virtualization, VMs and database accessing.  As my perspective is usually from performance and utility, the review of this motherboard will be based around my history and perspective.  As I mentioned previously, this product is primarily B2B (business to business) rather than B2C (business to consumer), however from a home build standpoint, it offers an alternative to the two main Sandy Bridge-E based Xeon home-build workstation products in the market – the ASUS Z9PE-D8 WS and the EVGA SR-X.  Hopefully we will get these other products in as comparison points for you.

Gigabyte GA-7PESH1 Visual Inspection, Board Features
Comments Locked

64 Comments

View All Comments

  • Hulk - Saturday, January 5, 2013 - link

    I had no idea you were so adept with mathematics. "Consider a point in space..." Reading this brought me back to Finite Element Analysis in college! I am very impressed. Being a ME I would have preferred some flow models using the Navier-Stokes equations, but hey I like chemistry as well.
  • IanCutress - Saturday, January 5, 2013 - link

    I never did any FEM so wouldn't know where to start. The next angle of testing would have been using a C++ AMP Fluid Dynamics Simulation and adjusting the code from the SDK example like with the n-Body testing. If there is enough interest, I could spend a few days organising it for the normal motherboard reviews :)

    Ian
  • mayankleoboy1 - Saturday, January 5, 2013 - link

    How the frick did you get the i7-3770K to *5.4GHZ* ? :shock:
    How the frick did you get the i7-3770K to *5.0GHZ* ? :shock:
  • IanCutress - Saturday, January 5, 2013 - link

    A few members of the Overclock.net HWBot team helped testing by running my benchmark while they were using DICE/LN2/Phase Change for overclocking contests (i.e. not 24/7 runs). The i7-3770K will go over 7 GHz if (a) you get a good chip, (b) cool it down enough, and (c) know what you are doing. If you're interested in competitive overclocking, head over to HWBot, Xtreme Systems or Overclock.net - there are plenty of people with info to help you get started.

    Ian
  • JlHADJOE - Tuesday, January 8, 2013 - link

    The incredible performance of those overclocked Ivy bridge systems here really hammers home the importance of raw IPC. You can spend a lot of time optimizing code, but IPC is free speed when it's available.
  • jd_tiger - Saturday, January 5, 2013 - link

    http://www.youtube.com/watch?v=Ccoj5lhLmSQ
  • smonsees - Saturday, January 5, 2013 - link

    You might try modifying your algorithm to pin the data to a specific core (therefore cache) to keep the thrashing as low as possible. Google "processor affinity c++". I will admit this adds complexity to your straightforward algorithm. In C#, I would use a parallel loop with a range partition to do it as a starting point: http://msdn.microsoft.com/en-us/library/dd560853.a...
  • nickgully - Saturday, January 5, 2013 - link

    Mr. Cutress,
    Do you think with all the virtualized CPU available, researchers will still build their own system as it is something concrete to put into a grant application, versus the power-by-the-hour of cloud computing?

    Thanks.
  • IanCutress - Saturday, January 5, 2013 - link

    We examined both scenarios. Our university had cluster time to buy, and there is always the Amazon cloud. In our calculation, getting a 16 thread machine from Dell paid for itself in under six months of continuous running, and would not require a large adjustment in the way people were currently coding (i.e. staying in Windows rather than moving to Linux), and could also be passed down the research group when newer hardware is released.

    If you are using production level code and manipulating it each time to get results, and you can guarantee the results will be good each time, then power-by-the-hour could work. As we were constantly writing and testing new code for different scenarios, the build/buy your own workstation won out. Having your own system also helps in building GPU codes, if you want to buy a better GPU card it is easier to swap out rather than relying on a cloud computing upgrade.

    Ian
  • jtv - Sunday, January 6, 2013 - link

    One big consideration is who the researchers are. I work in x-ray spectroscopy (as a computational theorist). Experimentalists in this field use some of our codes without wanting to bother with having big computational resources. We have looked at trying to provide some of our codes through some cloud-based service so that it can be used on demand.

    Otherwise I would agree with Ian's reply. When I'm improving code, debugging code, or trying to implement new theoretical approaches I absolutely want my own hardware to do it on.

Log in

Don't have an account? Sign up now