3D Movement Algorithm Test

The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc.  The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score.  This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark.  The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance.

It should be noted that this benchmark is a purely floating point benchmark, indicative of a lot of research written code where several months of optimization is not possible or not common knowledge within the research group.  The compiler is not clever enough to convert what is expected into appropriate integer conversions, so Bulldozer and Piledriver based processors will only perform as well as their singular FPU units per module will allow.  The benchmark is unaffected by memory speed as thread creation is created on the CPU and all data created fits well within the L2 cache per core.

3D Particle Movement Single Threaded

In the single threaded test, a lot of conclusions can be drawn from the comparison of AMD architectures.  Direct comparison of Piledriver to Bulldozer (A10-5800K to FX-8150) gives a boost in single core performance of 7%, however comparing the old Stars cores of the A8-3850 at 2.9 GHz is roughly the same as the new Piledriver core at 4.2 GHz.  So even with a 1.3 GHz advantage, Piledriver is only as good as Stars and less efficient in floating point results.  If we compare Piledriver to Thuban, i.e. A10-5800K to X6 1100T, the Piledriver core gets stomped on by a good 25% performance.  I find this quite staggering – most of the code I ever encountered as a computational chemist was floating point based, dealing with single and double precision on a regular basis.  On this result, I would steer clear of Piledriver.

3D Particle Movement MultiThreaded

The multithreaded version of 3DPM is slightly tougher to analyze.  Due to the FP nature of the program, the A10-5800K is essentially a 2 core FPU processor, whereas all the other comparative AMD processors have either 4 or 6 FPUs to play with.  What is perhaps worth considering is that the Bulldozer processor with 4 modules scores 326.32, whereas the Piledriver processor with only 2 modules scores 203.06, which is more than half.  This would mean that the Piledriver core actually achieves 20% better performance at the same frequency, despite our ST test giving Piledriver only a 7% increase.  Part of this could be put down to the architecture improvements – improved scheduling for heavily threaded loads, one of the downfalls of Bulldozer but was improved in Piledriver could be the reason here.

WinRAR x64 3.93 - link

With 64-bit WinRAR, we compress the set of files used in the USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible, and provides as a good test for when a system has variable threaded load.  If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.  WinRAR is also very sensitive to memory speeds and subtimings.

WinRar x64 3.93

Analyzing the WinRAR results can be a little confusing, given that at different points of our previous testing the memory settings have been different (should point out that they have been consistent when comparing against like-for-like).  When the A10-5800K was playing ball with DDR3-2400 10-12-12 memory, the WinRAR copy test pulled out miles ahead of the Thuban and Llano processors by a good margin.  However the Sandy Bridge i5-2500K, with 4 INT and 4 FPU units gave the Trinity processor a proverbial thrashing despite being limited to DDR3-1333 at the time of testing.

FastStone Image Viewer 4.2 - link

FastStone Image Viewer is a free piece of software I have been using for quite a few years now.  It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters.  It also has a bulk image conversion tool, which we use here.  The software currently operates only in single-thread mode, which should change in later versions of the software.  For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.

FastStone Image Viewer 4.2

In the FastStone results, we see the Piledriver processor beat the Bulldozer and Stars cores by a considerable margin, and nudge the Thuban as well.

Xilisoft Video Converter

With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices.  By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD APP for AMD GPUs.  For this test, we use a set of 32 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU.  The time taken to convert these videos gives us our result.

Xilisoft Video Converter 7

Unfortunately our Xilisoft benchmark is rather new, and thus we do not have results for all the AMD cores we have used in the past.  However if we compare the A10-5800K directly with the i3-3225, we unfortunately have a memory difference to contend with (DDR3-2133 vs. DDR3-1600).  With that being said, it is clear that video conversion is an INT process and all four of the A10-5800K INT units are being used.

x264 HD 4.0.1 Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps.  This test represents a standardized result which can be compared across other reviews, and is dependant on both CPU power and memory speed.  The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.

x264 HD 4.0.1 Pass 1

x264 HD 4.0.1 Pass 2

System Benchmarks Gaming Benchmarks
Comments Locked

66 Comments

View All Comments

  • IanCutress - Wednesday, November 28, 2012 - link

    1) Interestingly enough it is not a reviewers job to debug. I do correlate my results to the manufacturers, but I test on the latest publicly available BIOS at the time of testing. If I sat around waiting for 'the next BIOS' then each review would take 3x as long and I couldn't feed my family. Sorry to disappoint. (Also, not all reviewers are masculine as per your pronoun usage.)

    2) The USB 3.0 and SATA 6 Gbps are both native on Intel and AMD unless specified otherwise. I believe it is an appropriate comparison. People deciding to upgrade will want a comparison between what is available now in the market, not what was on the market. There is scope for editorials to look at how certain dynamics have changed over the years, but also tests change. My old data for 9xx chipsets is not relevant here.

    3) Again, tests change over time in order to correlate with newer hardware and test the capabilities. If you have the spare time to dig out the hardware and run the newer tests, that's up to you. The other 14 motherboards I have in needed to be tested get priority here otherwise they become irrelevant. I would love to have infinite hardware and infinite time to do the comparisons, but that is not a logistical possibility.

    4) Each chipset is tested against a single CPU. P67/Z68 was i5-2500K, X79 was i7-3960X, FM1 was A6-3650, FM2 was A10-5800K.

    5) My apologies, next time I'll forgo the initial release review because it's the only board in my hand before release and wait a few months until I have six reviewed then post them at once when they become a little irrelevant.

    6) Have you? Have you got time to do stability testing? What about testing it at high altitude, or in the Sahara?

    If you believe there are things missing from the review, helpfully suggest additions for future consideration. My email is through my name on the review.
  • brookheather - Thursday, October 11, 2012 - link

    Typo - "there is few reasons to jump on board to Trinity".
  • Mugur - Friday, October 12, 2012 - link

    Well, she was hot back in the first Matrix days... :-)
  • silverblue - Thursday, October 11, 2012 - link

    "With that being said, it is clear that video conversion is an INT process and all four of the A10-5800K INT units are being used"

    Are they, though? If so, it's a bit disappointing. Are all four threads maxed out in Task Manager? It'd be interesting to see a 4C/4T Intel processor thrown in there (2500K seems a perfect candidate) as well.

    From looking at this, it should mean that an identically clocked Piledriver (83xx) CPU wouldn't be too far behind the 3770K in this one test. It does also mean, unfortunately, that even with linear performance scaling, even the top Piledriver CPU won't dethrone Thuban in the 3DPM MP test.
  • Soulnibbler - Thursday, October 11, 2012 - link

    What does is this line in the performance section supposed to mean?
    QUOTE:
    From a practical standpoint, the lack of floating point units in the CPU gives cause for concern as not everyone codes in hex or integer style (my own personal software all uses FP – INT would be confusing to code for me for negligible gain on most architectures).
    /QUOTE:

    I'm assuming you are referring to the bulldozer/steamroller architecture with a shared FPU unit per pair of integer units. On first reading it implies (and this implication is uncontested by the bizarre contents of of the following parentheses) that there is no floating point unit on the chip. That is patently wrong as there is a rather nice FPU shared between every two integer cores.

    The other interpretation is that you think it needs MORE than half an FPU per core. That is an arguable point, but then the strange text in the parentheses paints you as someone who needs much more study towards what actually happens in a program. So much of your normal computing occurs in integer space. There isn't really any sort of program I can easily think of where you don't use integer operations (even memory mapping is integer) many times in order to prepare to do a single FP operation. The counter examples are all pretty much graphics examples where we want to work on vectors. The Trinity FPU has a nice vector processor too. If you break down and look at the machine code that any of your programs use you will find that an overwhelming (much greater than 66%) of that code is integer code.

    Crying OH NOES 1/2 A FPU, is not good reporting. YES the AMD chips lag the Intel chips, YES the design parameters are different. The unfounded supposition that performance difference are due to that specific portion of the architectural choice is frankly bad journalism. If you want to make claims like that you have to point to a set of benchmarks that demonstrate clearly that the 1/2 FPUs are to blame. I doubt this is the case as most analysis that I've seen points to larger memory subsystem problems as a much bigger factor.
  • IanCutress - Wednesday, November 28, 2012 - link

    If your supposition is true, then the A10-5800K should not experienced as much of a decrease against the competition as it did do in the results.

    My 3DPM results clear my position on the matter:

    "In the single threaded test, a lot of conclusions can be drawn from the comparison of AMD architectures. Direct comparison of Piledriver to Bulldozer (A10-5800K to FX-8150) gives a boost in single core performance of 7%, however comparing the old Stars cores of the A8-3850 at 2.9 GHz is roughly the same as the new Piledriver core at 4.2 GHz. So even with a 1.3 GHz advantage, Piledriver is only as good as Stars and less efficient in floating point results. If we compare Piledriver to Thuban, i.e. A10-5800K to X6 1100T, the Piledriver core gets stomped on by a good 25% performance. I find this quite staggering – most of the code I ever encountered as a computational chemist was floating point based, dealing with single and double precision on a regular basis. On this result, I would steer clear of Piledriver.

    The multithreaded version of 3DPM is slightly tougher to analyze. Due to the FP nature of the program, the A10-5800K is essentially a 2 core FPU processor, whereas all the other comparative AMD processors have either 4 or 6 FPUs to play with. What is perhaps worth considering is that the Bulldozer processor with 4 modules scores 326.32, whereas the Piledriver processor with only 2 modules scores 203.06, which is more than half. This would mean that the Piledriver core actually achieves 20% better performance at the same frequency, despite our ST test giving Piledriver only a 7% increase. Part of this could be put down to the architecture improvements – improved scheduling for heavily threaded loads, one of the downfalls of Bulldozer but was improved in Piledriver could be the reason here."

    My basis for my comments is from a computational complexity standpoint. Sure memory mapping may be an int process, but if I only do it at the beginning and end of a matrix transformation (and thereby having a total processing time less than 0.1% of the program) then it becomes insignificant.

    What AMD have done is project that applications in the future which require heavy computational throughput will be driven by INT ops. The big software vendors can do this, making video conversion and ray tracing type applications enhanced by use of INT ops. But for the non-CompSci scientist who relies more on readable code but also wants a speed increase, then going all out on the INT side may not be possible, and we get limited performance due to the scheduling and the lack of pure grunt due to the gutted APU. It's a design choice AMD have to live with, and I'm not the only one who is not entirely in favor of it.
  • Scootiep7 - Thursday, October 11, 2012 - link

    Ok, I'm trying not to break down and just buy a Llano for my HTPC build, but does anybody know how much longer it'll be till I can get some nice options for a mini-ITX such as http://news.softpedia.com/news/MSI-Presents-FM2-A7... and the 5700k? What's the holdup on these!
  • groundhogdaze - Thursday, October 11, 2012 - link

    AMD should play to their strengths which is an affordable CPU with a relatively fast integrated GPU. That means focusing a small form factor systems such as AIO, ITX, HTPC class systems, however, I am surprised and disappointed at the relative lack of options when it comes to ITX FMx motherboards. I sold my AMD stock when I concluded they had their strategy wrong. Most folks who want to use a full sized case would also want to use a dedicated GPU, otherwise, what's the point of having a full sized case? Wrong marketing choice.

    Unless AMD can improve their heat/power ratings, the Intel G530 makes better sense as a NAS solution as it is dirt cheap and uses less power than its advertised 65w TDP while running circles around the Atom class processors. I hope AMD is reading the forums and best luck to them.
  • Mugur - Friday, October 12, 2012 - link

    You are right. Full ATX and Trinity makes little sense. mATX and mini ITX with 8 SATA3 and integrated graphics should be the focus. Full ATX in fact makes little sense today, anyway... :-)

    If you want more than a NAS from a server, the best 65W Trinity part should be nice. I have a Phenom II X2 rated 80W in my HTPC and an Athlon II X4 (95W) in my server at home. Neither of them comes even closer to their rated TDP, according to the "green" ICs and software of the motherboards (Gigabyte and Asus).
  • silverblue - Friday, October 12, 2012 - link

    AMD seems to volt their processors conservatively, so K10Stat (or other utilities) or using the BIOS to reduce the voltage may prove useful in reducing power consumption noticably without affecting performance more than a couple of percent.

    Toms ran an article on this as regards Trinity, and have done so with various AMD models in the past:

    http://www.tomshardware.com/reviews/a10-5800k-trin...

    Saving 14W for a tiny performance deficit is more than acceptable in my eyes.

    I undervolted my Phenom II X3 710 as per the following article:

    http://www.tomshardware.com/reviews/processor-powe...

    (though I needed to raise voltages by 0.025v to keep it 100% stable in my case)

Log in

Don't have an account? Sign up now