The AnandTech Podcast: Episode 10by Anand Lal Shimpi on November 21, 2012 12:40 AM EST
- Posted in
- Cloud Computing
We've made it to 10 episodes of the AnandTech Podcast! As promised, this week's episode is a bit more PC focused as we discuss the future of AMD. Intel's SSD DC S3700 is up for discussion, as well as the HPC space including the launch of Intel's Xeon Phi (baesd on the architecture formerly known as Larrabee).
There's a bit of mobile discussion in the second half of the podcast, addressing TI's exit from the market and some final thoughts on the Nexus 4 from Brian.
The AnandTech Podcast - Episode 10
featuring Anand Shimpi, Brian Klug, Ryan Smith & Dr. Ian Cutress
RSS - mp3, m4a
Direct Links - mp3, m4a
Total Time: 2 hours 3 minutes
Outline - hh:mm
Intel SSD DC S3700 and the Evolution of SSDs - 00:01
Intel's Xeon Phi - 00:16
AMD in the HPC Space - 00:41
AMD's Tough Times - 00:55
TI Exiting the Mobile SoC Business - 01:32
More on the Nexus 4 - 01:44
As always, comments are welcome and appreciated.
Post Your CommentPlease log in or sign up to comment.
View All Comments
hammer256 - Wednesday, November 21, 2012 - linkOk, this is definitely my favorate episode so far. Very nice segment on the HPC stuff. What's you guys' take on Nvidia's GK104/110 strategy, i.e., little die with limited compute capbility for general consumers and big die for professional/HPC space? Is this going to be a trend that will continue for the next generations, and will AMD also join in?
Also, I wonder how easy it really is for porting over OpenMP code to utilize Phi. A common issue with programming for PCI-E accelerator cards, be it Nvidia, AMD, or Intel, is the bottleneck of the PCI-E bus itself. Which means that the programmer has to be aware of the seperate memory space between the accelerator and the host, and to arrange memory transfers efficiently to avoid that bottleneck. I know in my simulation using CUDA, that is a very large part of the code. To me, that is definitely a barrier to entry if I was just used to program for the CPU. So I wonder how intel is going to deal with it, maybe with compiler directives just as with OpenMP to denote which memory blocks should reside on the GPU? But it seems that this problem alone is enough to make porting existing OpenMP code to efficiently utilize the Phi a less than a trival process.
Of course, this is just speculation, since I don't have a Phi to try OpenMP with. Maybe in super computers this is a non-issue, since the bandwidth bottleneck there is probably intenode communication. What are your thoughts?
IanCutress - Friday, November 23, 2012 - linkA lot of my CUDA work didn't rely on PCIe speed at all. One copy of the memory buffer from host to device, then a few thousand to a billion loops each with a few billion threads spawned, then copy back. Total time transferring over the PCIe bus was sub 0.1% of a long simulation.
I could see where Host->Device->Host copy times could be problematic, but it is all algorithm dependent. If you want to do a Matrix convert once on a bit of data, then yes the transfer will be a limiting factor. I try and keep my PCIe transfers to a minimum with my matrix solvers - keep the data on the host and only probe the data you actually need. If it's a science based simulation, you don't need the results of every time step - take every 10th or 100th loop around.
If PCIe is the bottleneck, then perhaps CUDA/GPGPU isn't the best way to look at your code? Or buy a machine where you can bump up the PCIe bus a bit and still maintain data coherency (if you need ECC).
With Xeon Phi, I imagine it'll be an API call to probe the Xeon Phi devices present, then a separate pragma for calls to Xeon Phi devices. Hopefully (fingers crossed) that it will automatically split over multiple Phi cards present in the system and do the cross talking automatically. That won't be the best solution for all, but for my Brownian motion simulations will love it. I wonder if they will include SLI/XFire type bridges between cards to minimize the PCIe crosstalk.
hammer256 - Saturday, November 24, 2012 - linkYeah it took me a lot of effort to minimize transfer over the pci-e bus, which I needed to happen at each time step for synchronization across my neural network simulation.
It's not a bottleneck anymore, but for a moment there I was wondering why I didn't move my last bit of host code to the GPU and not deal with pci-e transfer at all. But then I remembered that I was programming to handle multiple cards, and it was easier to keep that bit of code on the host and handle the synchronization from multiple GPUs to the host over pci-e.
With multiple cards it seems that full optimization requires treating the code as running on a multi-node system. Which is probably a non-issue for people in the super computer space since they have to deal with multiple nodes anyways. But for scientists who want to have a super computer in a desktop, to run OpenMP code with little modification, it will be a barrier if they want multiple cards. So like you said, hopefully Intel can have software and hardware solutions to make that easier to handle.
Should be exciting times.
tipoo - Sunday, November 25, 2012 - linkIt's kind of funny how AMD and Nvidia switched boats there, the 500 series actually often has better compute performance than the 600, but the 600 is much more efficient for what most consumers will use it for, namely games. And AMD had a game oriented architecture with the 6000 series then added in all the compute stuff with the 7k, just as Nvidia relegated all that to the more professionally oriented cards.
Is it a good strategy? I guess so, smaller dies = lower cost and power consumption and most users won't miss all the compute stuff. But it may limit some next generation games which may lean on GPGPU calculations, who knows how that will pan out. But for current tech, I think it's a good tradeoff, just a bummer for enthusiasts and scientists who may have wanted to run GPGPU calculations on a card that doesn't cost thousands.
IanCutress - Tuesday, November 27, 2012 - linkThe only reasons the top cards cost thousands is because they use ECC, often better double precision rates, and support if things go wrong. You essentially pay more for the HW and an extra layer of testing before you get the card. If you don't need ECC as an enthusiast, then don't bother - but for commercial results, ECC tends to be the barrier between several years of work or several years on the streets.
The NVIDIA 600 series does have its place - ideal when spawning a lot more lightweight threads. But the AMD change was more to do with architecture - VLIW4/5 was good but only great in a few niche examples that took care of ILP. GCN is a more general way of tackling everything GPGPU related, hence why those VLIW4/5 codes do not work as well, but everything else tends to work better.
hammer256 - Wednesday, November 21, 2012 - linkAn AnandTech podcast just won't be the same without the Brian Rant.
maximumGPU - Thursday, November 22, 2012 - linkone of my favourite episodes too. Insight on the HPC market was super informative. Looking forward to you guys getting your hands on a phi!
The thoughts on what went wrong with AMD were very insightful too. . Never owned one of their products, but i know how bad for all of us a world without AMD would be.
Dman23 - Thursday, November 22, 2012 - linkGood podcast! I really feel that these guys who are talking now their stuff!! Obviously, Anand wouldn't of hired them if they didn't have technical backgrounds in compute... just saying, they're smart. ;)
BTW, with speculation of AMD being bought out or sold off piece by piece, what about Apple buying them up?? I think they would be MUCH more likely to buy them up then Samsung since they are both American companies, have the headquarters right next to each-other in silicon valley (so the integration of both companies would be much easier), and obviously Apple has interest in acquiring semiconductor companies in order to leverage their own products for their businesses.
Ideally tho, I would rather have ARM be able to somehow buy them up and be able to integrate their energy efficient designs with some of AMD's high-powered GPU prowess. Not only that but it could create a company that has the scale and technical talent to match up with Intel. I don't see that happening tho because ARM doesn't have the pile of cash to probably pull that off, except with maybe an outside investor group providing the financial capital to pull something like that off.
Pfffman - Friday, November 23, 2012 - linkGreat episode. Always interesting to hear more about different industries and how they interact with each other. Brian's rant is justified. Hopefully developers will fix it in future patches. My Padfone fortunately doesn't do that.
Will there be more coverage on Padfone 2? I was waiting to see Anandtech's take on the first one before I got it, then I just got impatient and got it.
Several podcasts ago I think Anand mentioned that he was getting a Thunderbolt to PCIe slot device from OWC and was going to test it out putting a GPU on it and seeing what would happen. Any updates from that?
I understand that it is already a lot of work to get the podcast together, however I think people would greatly appreciate a link dump of the topics/companies/articles/etc. of the topics you cover like Rooster Teeth does with theirs. Brian mentioned that a company would do screen calibrations for Google if they just approached them. I wanted to look at the company to know more about it and this happens on a regular basis of just people talking about tech.
Thanks again and keep them up :)
iAnders - Friday, December 28, 2012 - linkJust listened to episode 10, very high quality discussions. Loved the bleak situation for AMD insight. Just wanted to say thanks!