2nd Gen Tensor Processor Unit Provides 11.5 Petaflops of Power

Google's 2nd generation tensor processor has four 45 teraflops chips in a 180 TFLOPS Tensor Processor Unit (TPU) - available for use in the cloud later this year. Google calls them Cloud TPUs - initially available on the Google Compute Engine.

The new TPUs can be used for both training and inference. As you know training a machine learning model is a major pain in the tuchus and takes an unreasonably long amount of time to perfect. Thus, significantly improved training times has high value. Each card has its own high-speed interconnects, and 64 of the cards can be linked into a pod, with 11.5 petaflops total (one petaflops is 1015 floating point operations per second).

Note that Nvidia's new GPU Volta Tesla V100 accelerators provide 15 TFLOPS single precision and 120 TFLOPS for "deep learning" workloads to improve training times. Also AMD's Vega GPU provides 13 TFLOPS of single precision and 25 TFLOPS of half-precision performance.  And note that Microsoft's FPGAs provide similar performance improvements yet difficult to measure and compare.

Speed kills in data science and it warms the heart to play with more powerful and faster toys.