Google claims[0] the TPU is many times faster for the workloads they've designed...

vomjom · on Feb 12, 2018

Keep in mind that what you linked refers to TPUv1, which is built for quantized 8-bit inference. The TPUv2, which was announced in this blog post, is for general purpose training and uses 32-bit weights, activations, and gradients.

It will have very different performance characteristics.

bloudermilk · on Feb 12, 2018

Thanks for pointing that out!

indescions_2018 · on Feb 12, 2018

The reserve TPU button has been available on the dashboard for the last few months. But I assume instances have been prioritized for large customers such as Two Sigma.

From the paper:

"Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU."

In-Datacenter Performance Analysis of a Tensor Processing Unit

https://arxiv.org/abs/1704.04760

Price is about 5x cloud nvidia gpu instance on an hourly basis.

twtw · on Feb 12, 2018

It will be interesting to see some benchmarks that compare TPUs to V100, since all previously published comparisons from Google compare TPU to K80 (3 GPU architectures ago).

ebikelaw · on Feb 12, 2018

Perf per watt matters to Google but not you. You should only think of it on a perf/$ basis, right?

gh02t · on Feb 12, 2018

They're closely related though, since if the perf per watt is lower then Google can charge you less doller per perf. The price they charge you is ultimately tied to the operating cost.

phamilton · on Feb 12, 2018

I wonder how these would compare with Amazon's FPGA instances with a comparable core running.