Quick update: more Interlagos testingby Johan De Gelas on December 8, 2011 5:11 AM EST
- Posted in
- IT Computing
- IT Computing general
- Cloud Computing
As promised in our last Opteron "Interlagos" review, we have been taking the time to deepen our understanding of AMD's newest Interlagos server platform and the "Bulldozer" architecture. Server reviewing remains a complex undertaking: some of the benchmarks take hours to set up and run, and power management policies, I/O subsystems and configuration settings can completely alter the outcome of a benchmark. That sounds very obvious right? It is not in practice.
Let me give you an example how subtle server benchmarking can be. One of the benchmarks missing in the original review was the MS SQL server benchmark, and for a reason. We did some extensive scaling benchmarks and our gut feeling told us that some of the results were a bit off the mark. So we kept the benchmark out of the original review until we pinpointed the problem.
Just a few days ago, we found out that a tiny bit of time-outs (1%, caused mostly by a data provider time out setting) can boost the results by about 20% erroneously as the actual workload is decreased. So our MS SQL server benchmark was not as accurate as we thought it was. Luckily we have solved all problems, and the benchmark is now more accurate than ever. You can expect to see the MS SQL server benchmarks on different server platforms and an in depth analysis in a forthcoming article.
While solving the MS SQL Server benchmark issues required a lot of testing, analysis and debate with Dieter, the lead developer of our stress testing tool vApus, we missed a more obvious tweak that could have improved our blender benchmarking. Luckily, we still have a community that is willing to give us valuable feedback. Greg Wereszko point out that our Blender benchmark cuts the render job up into only 64 tiles (X=8, Y=8). The result is that near the end of the test several cores are inactive, especially on the Interlagos Opteron (32 cores/threads).
So we increased the number of tiles beyond 8x8, to check if this improves performance on our 32 and 24 thread machines, and it did. (Quick note: the Blender benchmark on Windows is one of the worst benchmarks for the Opteron Interlagos, so see this as "worst case" performance point.)
Instead of trailing behind the Opteron 6174, the Opteron "Interlagos" 6276 manages to perform a tiny bit better than its older sibling when we use 256 (16x16) tiles. The Opteron 6276 improves performance by 24%, the Xeon X5650 and Opteron 6174 by 19%.
Using more tiles, all CPUs are able to show their top performance. It also shows the rather "fragile performance profile" of the new Opteron. Many users are going to use standard settings and will never bother with this kind of tuning. As a result they are not going to use the full potential of the new Opteron. The Xeon's higher single-threaded performance makes it less vulnerable to less optimal software settings.
At the other side of the coin, once well tuned the Opteron 62xx offers an interesting performance per dollar ratio and this "fragile performance profile" may become very robust in FP intensive applications once the use of AVX gets widespread. We are taking quite a bit of time to make sure that the next server article can give more detailed information, but rest assured that we did not give up: we will update our server benchmarking...when it is finished.
Post Your CommentPlease log in or sign up to comment.
View All Comments
chrone - Thursday, December 8, 2011 - linkit seems the new interlagos is outperformed with previous version of opteron. this is bad for amd. :(
i like the more cores in server, it could be useful for web server. i hope next generation of amd will improve its performance per watt.
MrSpadge - Thursday, December 8, 2011 - linkNo, but that's what we already knew. What's new is that "If using the optimum settings, the new one can catch up and slightly outperform the old one. The gap to the Xeons closes a bit, too."
JohanAnandtech - Thursday, December 8, 2011 - linkI quickly added the disclaimer:
(quick note: the Blender benchmark on Windows is one of the worst benchmarks for the Opteron Interlagos, so see this as "worst case" performance point).
So make sure you see this in perspective, it is not one of the Interlagos favorite benchmarks.
Morg. - Monday, December 12, 2011 - linkThis is basically the impression those benchmarks give, but it's dead wrong.
For anything windows, the current scheduler is unable to adapt to the Bulldozer architecture and will thus completely waste any and all advantage it brings, that is why you don't get much better performance in any windows benchmark of either fx- or interlagos CPU's.
That is temporary, as microsoft will update their scheduler in the future for WS2008 and W7 in the process.
For anything ESX, we can again see how Intel's strategy of helping other vendors customize and adapt to Xeon is paying.
So there, the Interlagos is doing bad because AMD didn't bother to pay someone @ ESX to create a power plan adapted to Interlagos - and we can see the default power plan is even more retarded than the windows scheduler (it's not that it doesn't understand modules, it doesn't understand power states ...).
In the hands of an ESX / Interlagos expert, you could have a finely tuned ESX host that would definitely outperform the Istanbul, and most probably the Xeon, given how much faster Interlagos is compared to Istanbul.
Moving on, we get into "stupid" benchmarks like Cinebench, which should be altogether dropped from benchmarks, be it Desktop or Server - as it is Intel favored and globally irrelevant for anyone.
Missing from a server benchmark are the raw performance numbers, some fine-tuned SQL benchmarks (those can't be done in-house at anandtech but hey, they could be copied or something), and some real world virtualization performance per watt benchmarks. (No, ESX is not the only one, and it wasn't even properly tweaked)
Globally, if anything, the benchmarks presented here on Anandtech say three things :
1) Windows Threading Scheme Sucks
2) ESX power plans suck big time
3) Some benchmarks should be removed (rendering included)
If anyone still has a doubt, go ask Cray why they picked AMD . and you'll understand that not everything is as it's presented in Anandtech benchmarks.
chrone - Thursday, December 8, 2011 - linkDear Johan,
Is there any PostgreSQL benchmark on AMD and Intel new CPU?
JohanAnandtech - Thursday, December 8, 2011 - linkUnfortunately I have very little knowledge of PostgreSQL. The last time we tried (together with a decent PostgreSQL expert), it scaled slightly better than MySQL (>4 cores, <8cores), but nowhere near MS SQL server (which can tackle 32-64 cores). The big problem is getting these kind of databases work with 8 cores and more.
Short answer: not that I know off :-).
chrone - Thursday, December 8, 2011 - linkoh okay, thanks for the reply, i really appreciate it. :)
argh, too bad postgresql is not as popular as mysql. hehe
samuraid - Thursday, December 8, 2011 - linkJohan,
Hopefully you and the Anandtech team get a chance to test out Postgresql again, especially with version 9.2 or newer. There have been some recent scaling improvements to postgresql that might prove interesting:
Morg. - Monday, December 12, 2011 - linkJohan,
Scaling PostgreSQL is totally possible, but it's indeed reserved to the experts.
If you need someone to help you setup a benchmark, you might want to ask around on the postgreSQL performance mailing list, I'm pretty sure they'll help you setup your test bed.
PostgreSQL has been scaled far beyond 8 cores in the past so it shouldn't be an issue for just two socket Interlagos.
And, please don't ever consider MySQL and PostgreSQL as the same kind of databases, the first is a toy for webdevs, the second is a real concrete alternative to Oracle for DBA's.
Elite99 - Thursday, December 8, 2011 - linkRead an article in CT (german magazine) which also did some tests with Linux, Interlagos and some - obscure - compiler optimization flags. Andreas Stiller did those tests I believe.
CT achieved much better benchmark scores than current Xeon cpu's using Linux. Strange to see multiple benchmarks that show entirely different results.