Not only not trivial to port those but they are constantly will have to play cat...

webmaven · on Dec 13, 2016

There are several well known (certainly to AMD) strategies and tactics for dealing with this situation.

One example: AMD can support CUDA while focusing on price/performance at the low end (initiating an Innovators Dilemma for NVIDIA), and if successful then solidify AMD's position, for example by starting a standards process for a successor to CUDA.

NVIDIA, of course, has various countering moves available such as IP moats, initiating competing standards, and so on.

Rarely is a credible player entirely boxed in. And even then, there are moves available to make the best of things like making some acquisitions and then spinning off the division into an independent company, or selling it off to another player (in this case probably Intel, but maybe ARM could be suckered into it) and so on.

coredog64 · on Dec 13, 2016

> if successful then solidify AMD's position, for example by starting a standards process for a successor to CUDA.

I thought OpenCL was the standard in this space.

webmaven · on Dec 14, 2016

Yes, well, sometimes there are both de-facto standards and de-jure standards in circulation.

XHTML 2.0 comes to mind.

Tactically, creating a backward-compatible successor to CUDA (a superset, essentially) would have the advantage (for AMD, that is) that NVIDIA can't decide to un-implement the existing CUDA support in their products in order to to spike AMD's efforts.

Then again, there is a lot of IP sloshing around in this space, so the specific tactic used by AMD may have to be something sneakier and subtler.

Anyway, this is all just one of the hypothetical ways AMD could try to unseat NVIDIA, there are others.

slizard · on Dec 12, 2016

NVIDIA is playing catch-up with themselves and the shifting market too!

Just look at the Maxwell-based Tesla cards that came out of the blue (as if they were an afterthought); I bet Facebook, Baidu, Google, etc. told NVIDIA that Kepler was shit for their use-cases and they did not want to wait for Pascal.

Or look at the bizarre Pascal product line where the GP100 does support half precision, but the others don't, not even the P4 or P40. Strange, isn't it?

Things are changing so quickly that it isn't hard at all to find footing as long as you have something useful to offer. At the same time, I agree, a robust software stack is an advantage, but for the likes of Google or Baidu even that is not so big of a deal (Google wrote their own CUDA compiler!).

dogma1138 · on Dec 12, 2016

Not sure if Maxwell came out of the blue, Maxwell 1 was designed for mobile, embedded and tesla, Maxwell 2 came out most likely because Pascal at large was delayed.

As for the half precision, it's pretty much the same thing NVIDIA been doing since Kepler dumping FP64 and FP16, especially FP16 due to the silicon costs.

NVIDIA came out with the Titan and Titan Black with baller FP16 performance and no one seem to care, the Titan X then dropped it and people bought it like it was cupcakes, I'm pretty sure they have pretty good market research that states most people can live without it and those who can't can pay through the nose.

As for Google and Baidu while they are huge I'm not sure how "important" they are, Google can pretty much design their own hardware at this point, and as you mentioned they don't really use the software ecosystem that much as they can write everything from scratch even the driver if need be.

What CUDA gives is a huge ecosystem for 3rd parties and more importantly a lock on developers since that's "all they know", it's not that different than how MSFT got a lock on the IT industry through sysadmins that only knew Windows and even now with Dev Ops and everyone and their mother running Linux they are still an important player.

If the majority of the commercial software is CUDA based, if most researchers and developers are exposed more to CUDA and are more experienced with it NVIDIA has a lock on the market.

I'm not entirely sure how much big of a client or how good of a client Google will be they can demand pretty steep discounts and they are a heartbeat away from building their own hardware probably anyhow.

NVIDIA doesn't want to be locked to 2-3 huge contracts that pretty much dictate how their hardware and software should look like, that's what put AMD in a bind with the console contracts dictating how their GPU's are going to look for a few generations now.

slizard · on Dec 13, 2016

> Not sure if Maxwell came out of the blue, Maxwell 1 was designed for mobile, embedded and tesla, Maxwell 2 came out most likely because Pascal at large was delayed.

First off not sure what you are referring to by "Maxwell 1" and "Maxwell 2"; there's GM204, GM206, GM200 all very similar, and GM20B the slight outlier.

Maxwell was an arch tweak on Kepler which, due to everyone but Intel stuck at 28 nm, had to cut down on all but the gaming-essential stuff to deliver what the consumers expected (>1.5x gen-boost). For that reason, and because HPC has traditionally been thought to need DP, IIRC no public roadmap mentioned Maxwell Tesla at all. Instead, the GK210 (K80) was the attempt to be the bridge-gap chip for HPC until Pascal; it was released just a few months before the big GM200 consumer chips came out. That is until fall '15 when the late arriving M40 and M4 were pitched as "Deep learning accelerators" (though GRID/virtualization version were released a little earlier in August). Quite obvious naming, plus no sane HPC shop would buy a crippled (no DP) chip <1 year before the promised miracle-chip, Pascal was planned to be released. Let's not forget that P100 was known to be quite late, K80 was insufficient for the needs of many customers, so another bridge-gap was necessary, and that what the M Teslas were: bridge-gap ML/DL cards that Google/FB/Baidu and the like wanted, and picked up pretty quickly e.g. see [1].

> As for the half precision, it's pretty much the same thing NVIDIA been doing since Kepler dumping FP64 and FP16, especially FP16 due to the silicon costs.

Not really. In Kepler they experimented with the SP/DP balance a bit (1/3), in Maxwell they were pushed by 28 nm, in Pascal they returned to the DP = 1/2 SP throughput.

Also, FP16 is not completely separate silicone, AFAIK FP16 instructions are dual-issued on the SP hardware.

> NVIDIA came out with the Titan and Titan Black with baller FP16 performance and no one seem to care

Source? AFAIK earlier FP16 was only supported natively by textures and by some conversion instructions. First chip was the GM20X/Tegra X1 with native FP16.

> As for Google and Baidu while they are huge I'm not sure how "important" they are, Google can pretty much design their own hardware at this point,

Some hardware that allows large benefits, but definitely not all hardware. They're happily relying heavily on GPUs, are planning to pick up Power8+NVlink, etc. Designing chips is expensive, especially if there isn't a huge market to pay for it.

> and as you mentioned they don't really use the software ecosystem that much as they can write everything from scratch even the driver if need be.

Note that they "only" wrote the fronted and IR optimizer, the code generator is NVIDIA's NVPTX [3]!

To wrap up because this is getting long, to your last points I'd say the big DL players are very important for NVIDIA because they are the trendsetters leading in many aspects of AI/DNN research with OSS toolkits for GPUs, and they'd be silly to not use GPUs in-house for their own needs which they're happy to talk about at various conferences and trade shows (e.g. both keynotes GTC 2015 [3] [4]).

[1] http://arstechnica.com/information-technology/2015/12/facebo... [2] http://llvm.org/devmtg/2015-10/slides/Wu-OptimizingLLVMforGP... [3] http://www.ustream.tv/recorded/60071572 [4] http://www.ustream.tv/recorded/60071572

dogma1138 · on Dec 13, 2016

>First off not sure what you are referring to by "Maxwell 1" and "Maxwell 2"; there's GM204, GM206, GM200 all very similar, and GM20B the slight outlier.

GM1XX is Maxwell 1st Gen, which was the 750ti, the Gefore 800M series, Tegra K1 and Tesla M10, GM2XX is Maxwell 2nd Gen.

>Not really. In Kepler they experimented with the SP/DP balance a bit (1/3), in Maxwell they were pushed by 28 nm, in Pascal they returned to the DP = 1/2 SP throughput

Maxwell wasn't 1/3 it was 1/32 (no this isn't a typo) ;) this is the same (or even worse IIRC it's 1/64 now) with Pascal with the exclusion of the GP100, the Pascal Titan and Quadro cards are 1/32 or 1/64.

As I said NVIDIA keeps this for very limited silicon if you buy a desktop/workstation GPU even with Pascal don't expect DP/HP performance.

As for the big players, yes they are trend setters but they are also can easily turn into "blackmailers" if you have a single client which buys half your GPU's they dictate the terms, companies went under because of contracts that take too much of their delivery pipeline.

slizard · on Dec 14, 2016

> GM1XX is Maxwell 1st Gen, which was the 750ti, the Gefore 800M series, Tegra K1 and Tesla M10, GM2XX is Maxwell 2nd Gen.

My bad, forgot about 5.0 devices being called GM1xx. Still, AFAIR there was little practical difference (in particular instruction set) between all but the compute capability 5.3 Tegra.

> Maxwell wasn't 1/3 it was 1/32 (no this isn't a typo) ;) this is the same (or even worse IIRC it's 1/64 now) with Pascal with the exclusion of the GP100, the Pascal Titan and Quadro cards are 1/32 or 1/64.

I did not say Maxwell had 1/3 DP flop rate, I said Kepler had that, but Maxwell was pushed by the 28 nm process (so they had to get rid of all DP ALUs).

> As I said NVIDIA keeps this for very limited silicon if you buy a desktop/workstation GPU even with Pascal don't expect DP/HP performance.

Never disagreed, but they can only do that because their professional compute division has grown big enough that it it worth designing different silicon; that shift happened in fact after Kepler, GK110 was more or less the same on GeForce and Tesla (e.g. 780 Ti and K40). They're also trying to find a good way to do market segmentation and DP ALU die area is an obvious candidate to play with. Still, no HP on GP102/104 is a curious thing, especially as the Tesla P4/P40 that oficially target ML/DL would really need it. My bet is they won't "forget" HP on lower-end Volta; not all consumer chips will support it, but unless they'll design different dies for GV102/104 (or whatever they'll call the non-DP Tesla uarch), which I doubt, some desktop parts will also support it.

dogma1138 · on Dec 14, 2016

NVIDIA said they have a new double precision silicon for Volta so in effect "DP" should become the new base unit, which would either give you 1:1 DP/SP or 1:2 DP/SP if they can 2 SP operations on a single DP unit, but no much talk about HP. The emphasis on DP performance across the board was extreme with NVIDIA this time around, which makes me wonder if they think FP16 is irrelevant for some reason.

As for Kepler well it had 1:3 but only on few chips, most kepler based GPU's had 1:24 DP performance, the 780ti had 1:24 while the Kepler Titan had 1:3, the Titan Black then had 1:24 again and again no one seemed to care...