AMD's Radeon HD 5870: Bringing About the Next Generation Of GPUs
by Ryan Smith on September 23, 2009 9:00 AM EST- Posted in
- GPUs
DirectX11 Redux
With the launch of the 5800 series, AMD is quite proud of the position they’re in. They have a DX11 card launching a month before DX11 is dropped on to consumers in the form of Win7, and the slower timing of NVIDIA means that AMD has had silicon ready far sooner. This puts AMD in the position of Cypress being the de facto hardware implementation of DX11, a situation that is helpful for the company in the long term as game development will need to begin on solely their hardware (and programmed against AMD’s advantages and quirks) until such a time that NVIDIA’s hardware is ready. This is not a position that AMD has enjoyed since 2002 with the Radeon 9700 and DirectX 9.0, as DirectX 10 was anchored by NVIDIA due in large part to AMD’s late hardware.
As we have already covered DirectX 11 in-depth with our first look at the standard nearly a year ago, this is going to be a recap of what DX11 is bringing to the table. If you’d like to get the entire inside story, please see our in-depth DirectX 11 article.
DirectX 11, as we have previously mentioned, is a pure superset of DirectX 10. Rather than being the massive overhaul of DirectX that DX10 was compared to DX9, DX11 builds off of DX10 without throwing away the old ways. The result of this is easy to see in the hardware of the 5870, where as features were added to the Direct3D pipeline, they were added to the RV770 pipeline in its transformation into Cypress.
New to the Direct3D pipeline for DirectX 11 is the tessellation system, which is divided up into 3 parts, and the Computer Shader. Starting at the very top of the tessellation stack, we have the Hull Shader. The Hull Shader is responsible for taking in patches and control points (tessellation directions), to prepare a piece of geometry to be tessellated.
Next up is the tesselator proper, which is a rather significant piece of fixed function hardware. The tesselator’s sole job is to take geometry and to break it up into more complex portions, in effect creating additional geometric detail from where there was none. As setting up geometry at the start of the graphics pipeline is comparatively expensive, this is a very cool hack to get more geometric detail out of an object without the need to fully deal with what amounts to “eye candy” polygons.
As the tesselator is not programmable, it simply tessellates whatever it is fed. This is what makes the Hull Shader so important, as it’s serves as the programmable input side of the tesselator.
Once the tesselator is done, it hands its work off to the Domain Shader, along with the Hull Shader handing off its original inputs to the Domain Shader too. The Domain Shader is responsible for any further manipulations of the tessellated data that need to be made such as applying displacement maps, before passing it along to other parts of the GPU.
The tesselator is very much AMD’s baby in DX11. They’ve been playing with tesselators as early as 2001, only for them to never gain traction on the PC. The tesselator has seen use in the Xbox 360 where the AMD-designed Xenos GPU has one (albeit much simpler than DX11’s), but when that same tesselator was brought over and put in the R600 and successive hardware, it was never used since it was not a part of the DirectX standard. Now that tessellation is finally part of that standard, we should expect to see it picked up and used by a large number of developers. For AMD, it’s vindication for all the work they’ve put into tessellation over the years.
The other big addition to the Direct3D pipeline is the Compute Shader, which allows for programs to access the hardware of a GPU and treat it like a regular data processor rather than a graphical rendering processor. The Compute Shader is open for use by games and non-games alike, although when it’s used outside of the Direct3D pipeline it’s usually referred to as DirectCompute rather than the Compute Shader.
For its use in games, the big thing AMD is pushing right now is Order Independent Transparency, which uses the Compute Shader to sort transparent textures in a single pass so that they are rendered in the correct order. This isn’t something that was previously impossible using other methods (e.g. pixel shaders), but using the Compute Shader is much faster.
Other features finding their way into Direct3D include some significant changes for textures, in the name of improving image quality. Texture sizes are being bumped up to 16K x 16K (that’s a 256MP texture) which for all practical purposes means that textures can be of an unlimited size given that you’ll run out of video memory before being able to utilize such a large texture.
The other change to textures is the addition of two new texture compression schemes, BC6H and BC7. These new texture compression schemes are another one of AMD’s pet projects, as they are the ones to develop them and push for their inclusion in DX11. BC6H is the first texture compression method dedicated for use in compressing HDR textures, which previously compressed very poorly using even less-lossy schemes like BC3/DXT5. It can compress textures at a lossy 6:1 ratio. Meanwhile BC7 is for use with regular textures, and is billed as a replacement for BC3/DXT5. It has the same 3:1 compression ratio for RGB textures.
We’re actually rather excited about these new texture compression schemes, as better ways to compress textures directly leads to better texture quality. Compressing HDR textures allows for larger/better textures due to the space saved, and using BC7 in place of BC3 is an outright quality improvement in the same amount of space, given an appropriate texture. Better compression and tessellation stand to be the biggest benefactors towards improving the base image quality of games by leading to better textures and better geometry.
We had been hoping to supply some examples of these new texture compression methods in action with real textures, but we have not been able to secure the necessary samples in time. In the meantime we have Microsoft’s examples from GameFest 2008, which drive the point home well enough in spite of being synthetic.
Moving beyond the Direct3D pipeline, the next big feature coming in DirectX 11 is better support for multithreading. By allowing multiple threads to simultaneously create resources, manage states, and issue draw commands, it will no longer be necessary to have a single thread do all of this heavy lifting. As this is an optimization focused on better utilizing the CPU, it stands that graphics performance in GPU-limited situations stands to gain little. Rather this is going to help the CPU in CPU-limited situations better utilize the graphics hardware. Technically this feature does not require DX11 hardware support (it’s a high-level construct available for use with DX10/10.1 cards too) but it’s still a significant technology being introduced with DX11.
Last but not least, DX11 is bringing with it High Level Shader Language 5.0, which in turn is bringing several new instructions that are primarily focused on speeding up common tasks, and some new features that make it more C-like. Classes and interfaces will make an appearance here, which will make shader code development easier by allowing for easier segmentation of code. This will go hand-in-hand with dynamic shader linkage, which helps to clean up code by only linking in shader code suitable for the target device, taking the management of that task out of the hands of the coder.
327 Comments
View All Comments
SiliconDoc - Monday, September 28, 2009 - link
When the GTX295 still beats the latest ati card, your wish probably won't come true. Not only that, ati's own 4870x2 just recently here promoted as the best value, is a slap in it's face.It's rather difficult to believe all those crossfire promoting red ravers suddenly getting a different religion...
Then we have the no DX11 released yet, and the big, big problem...
NO 5870'S IN THE CHANNELS, reports are it's runs hot and the drivers are beta problematic.
---
So, celebrating a red revolution of market share - is only your smart aleck fantasy for now.
LOL - Awwww...
silverblue - Monday, September 28, 2009 - link
It's nearly as fast as a dual GPU solution. I'd say that was impressive.DirectX 11 comes out in less than a month... hardly a wait. It's not as if the card won't do DX9/10.
Hot card? Designed to be that way. If it was a real issue they'd have made the exhaust larger.
Beta problematic drivers? Most ATI launches seem to go that way. They'll be fixed soon enough.
SiliconDoc - Monday, September 28, 2009 - link
Gee, I thought the red rooster said nvidia sales will be low for a while, and I pointed out why they won't be, and you, well you just couldn'r handle that.I'd say a 60.96% increase in a nex gen gpu is "impressive", and that's what Nvidia did just this last time with GT200.
http://www.anandtech.com/video/showdoc.aspx?i=3334...">http://www.anandtech.com/video/showdoc.aspx?i=3334...
--
BTW - the 4870 to 4890 move had an additional 3M core transistors, and we were told by you and yours that was not a "rebrand".
BUT - the G80 move to G92 added 73M core transistors, and you couldn't stop shrieking "rebrand".
---
nearly as fast= second best
DX11 in a month = not now and too early
hot card -= IT'S OK JUST CLAIM ATI PLANNED ON IT BEING HOT !ROFL, IT'S OK TO LIE ABOUT IT IN REVIEWS, TOO ! COCKA DOODLE DOOO!
beta drivers = ALL ATI LAUNCHES GO THAT WAY, NOT "MOST"
----
Now, you can tell smart aleck this is a paper launch like the 4870, the 4770, and now this 5870 and no 5850, becuase....
"YOU'LL PUT YOUR HEAD IN THE SAND AND SCREAM IN CAPS BECAUSE THAT'S HOW YOU ROLL IN RED ROOSERVILLE ! "
(thanks for the composition Jared, it looks just as good here as when you add it to my posts, for "convenience" of course)
ClownPuncher - Monday, September 28, 2009 - link
It would be awesome if you were to stop posting altogether.SiliconDoc - Monday, September 28, 2009 - link
It would be awesome if this 5870 was 60.96% better than the last ati card, but it isn't.JarredWalton - Monday, September 28, 2009 - link
But the 5870 *is* up to 65% faster than the 4890 in the tested games. If you were to compare the GTX 280 to the 9800 GX2, it also wasn't 60% faster. In fact, 9800 GX2 beat the GTX 280 in four out of seven tested games, tied it in one, and only trailed in two games: Enemy Territory (by 13%) and Oblivion (by 3%), making ETQW the only substantial win for the GT200.So we're biased while you're the beacon of impartiality, I suppose, since you didn't intentionally make a comparison similar to comparing apples with cantaloupes. Comparing ATI's new card to their last dual-GPU solution is the way to go, but NVIDIA gets special treatment and we only compare it with their single GPU solution.
If you want the full numbers:
1) On average, the 5870 is 30% faster than the 4890 at 1680x1050, 35% faster at 1920x1200, and 45% faster at 2560x1600.
2) Note that the margin goes *up* as resolution increases, indicating the major bottleneck is not memory bandwidth at anything but 2560x1600 on the 5870.
3) Based on the old article you linked, GTX 280 was on average 5% slower than 9800X2 and 59% faster than the 9800 GTX - the 9800X2 was 6.4% faster than the GTX 280 in the tested titles.
4) Making the same comparisons, 5870 is only 3.4% faster than the 4870X2 in the tested games and 45% faster than the 4890HD.
Now, the games used for testing are completely different, so we have to throw that out. DoW2 is a huge bonus in favor of the 5870 and it scales relatively poorly with CF, hurting the X2. But you're still trying to paint a picture of the 5870 as a terrible disappointment when in fact we could say it essentially equals what NVIDIA did with the GTX 280.
On average, at 2560x1600, if NVIDIA's GT300 were to come out and be 60% faster than the GTX 285, it will beat ATI's 5870 by about 15%. If it's the same price, it's the clear choice... if you're willing to wait a month or two. That's several "ifs" for what amounts to splitting hairs. There is no current game that won't run well on the HD 5870 at 2560x1600, and I suspect that will hold true of the GT300 as well.
(FWIW, Crysis: Warhead is as bad as it gets, and dropping 4xAA will boost performance by at least 25%. It's an outlier, just like Crysis, since the higher settings are too much for anything but the fastest hardware. "High" settings are more than sufficient.)
SiliconDoc - Tuesday, September 29, 2009 - link
In other words, even with your best fudging and whining about games and all the rest, you can't even bring it with all the lies from the 15-30 percent people are claiming up to 60.96%--
Yes, as I thought.
zshift - Thursday, September 24, 2009 - link
My thoughts exactly ;)I knew the 5870 was gonna be great based on the design philosophy that AMD/ATi had with the 4870, but I never thought I'd see anything this impressive. LESS power, with MORE power! (pun intended), and DOUBLE the speed, at that!
Funny thing is, I was actually considering an Nvidia gpu when I saw how impressive PhysX was on Batman AA. But I think I would rather have near double the frame rates compared to seeing extra paper fluffing around here and there (though the scenes with the scarecrow are downright amazing). I'll just have to wait and see how the GT300 series does, seeing as I can't afford any of this right now (but boy, oh boy, is that upgrade bug itching like it never has before).
SiliconDoc - Thursday, September 24, 2009 - link
Fine, but performance per dollar is on the very low end, often the lowest of all the cards. That's why it was omitted here.http://www.techpowerup.com/reviews/ATI/Radeon_HD_5...">http://www.techpowerup.com/reviews/ATI/Radeon_HD_5...
THE LOWEST overall, or darn near it.
erple2 - Friday, September 25, 2009 - link
So what you're saying then is that everyone should buy the 9500 GT and ignore everything else? If that's the most important thing to you, then clearly, that's what you mean.I think that the performance per dollar metrics that are shown are misleading at best and terrible at worst. It does not take into account that any frame rates significantly above your monitor refresh are for all intents and purposes wasted, and any frame rates significantly below 30 should by heavily weighted negatively. I haven't seen how techpowerup does their "performance per dollar" or how (if at all) they weight the FPS numbers in the dollar category.
SLI/Crossfire has always been a lose-lose in the "performance per dollar" category. Curiously, I don't see any of the nvidia SLI cards listed (other than the 295).
That sounds like biased "reporting" on your part.