What Is TPU And Is It Better Than CPU And GPU

In this article, we are going to talk about the Tensor Processing Unit (TPU). A custom-designed processor created by Google for machine learning and Deep learning task. We will talk about why google created TPU, How TPU is different from CPU and GPU., and so on…

What is TPU?

TPU is an AI accelerator, Application-specific Integrated Circuit (ASIC) developed by Google, specifically for AI-related tasks such as Machine learning, Deep learning, and Neural networks. TPU is not a generic processor so any model other than the Tensorflow model will not run on it. As its name suggests it’s very powerful in dealing with tensors (any mathematical objects that can be used to describe physical properties, just like scalars, vectors, or matrices).

When was TPU introduced?

TPU was introduced in May 2016 at Google I/O but they were using it internally from 2015 in their data centers for their own products like Google photos, Rankbrain, and translate, etc. TPU was made available for third-party use in 2018, as part of their Cloud infrastructure or by offering a smaller version of the chip for sale.

What led to the development of TPU?

The demand and progress of AI and machine learning are growing very fast. Almost every sector is now using artificial intelligence and machine learning to improve its performance. But our conventional processor is not good enough in handling this effectively. They could be expensive and they consume lots of power and time for an operation. Which led to the development of TPU which is specifically designed for this purpose only.

dBp1bCQbYpMPh2nm2zz4gc1iw57Otpytzai3Lawyg9Oed8WlhwJjKcOHeyTr16 bP BrwIsQWEk5sIaEISj dMQ2vGOhtNVCfxIcOgeGxbmWhoVOHWKhSV5oe4D9XrE1TNe Hhgb TPU,tensor processing unit,What is TPU,How TPU works — Source

Read more – Difference between CPU and GPU

How TPU works?

The major reduction of the TPU is Von Neumann’s bottleneck since its primary task is matrix processing. Inside the TPU there are thousands of multipliers and adders connected to each other directly to form a large physical matrix of those operators known as systolic array architecture. There are two systolic arrays of 128×128 in Cloud TPUv2 aggregating 32,768 ALUs for 16 bit floating values in a single processor.

Systolic array works executes neural networks calculations in these 3 steps

First, all parameters are loaded from the input in a matrix of multiplier and adders
After that, all data is loaded from memory
Then, the result of every multiplication operation is passed to the next multiplier while taking summation at the same time. The output is then given as the summation of all multiplication results of data and parameters.

To keep this article simple we have not discussed the working of TPU here, in detail. Instead of that, we have created a separate article for that where we have compared TPU, CPU, and GPU and discussed how TPU works in detail. You can read this here “An In-Depth Exploration On TPU Working“

How much does a TPU cost?

TPU is not something that you can buy and plug into your CPU (not talking about the processor) as we do with GPU. TPU can be used only through Google Cloud Platform (GCP) or through their notebook (Kaggle & Colab). TPU is not too much expensive to use, you can start using it from just a few dollars

VgfwISKirGpbQAHTztrEiGpGIgRsEpDfo13tvtygahQNH4mZqtZkTwMXqbKjHdF cIe4hCIKvRuVlWa YByZF5jgTt1cnIib9brDLOibTfB9Ww1M9eyz3n477 PVFpJBLR sPQeg TPU,tensor processing unit,What is TPU,How TPU works — Source

This is the pricing of TPU service in the US, if you use preemptible it will be a lot cheaper for you. Preemptible is a google service in which they let you use their spare VM instances or VM instances that are currently not in use. Preemptible VM automatically stops after 24 hours or can be stopped by google if they need those remaining processors too. You don’t want to use preemptible service if your app is not fault tolerance or it won’t work on batch processing.

7wSyd25PkuWuO2WMpR9M2IZuc3SF2pISKgS9wLnvKYflIO 8suno3nLp9dD0IRLz34dvmKQci4OKrV2vrnvsmnXl o aQrgTfv2ZLdgdjW6EScjfWt AhdlkQzXEXvQ1HL54hrDn TPU,tensor processing unit,What is TPU,How TPU works — source

If you need even more computation power then you can use TPU Pod, There are two pods one is a set of 512 TPU v2 cores and has a clock speed of 11.5 petaflops(flops-> floating-point operation per second). And other is a set of 2048 TPUv3 cores and has a clock speed of 100+petaflops. Depending on your need you can choose the service. You can see the price of a different Cloud TPU pod

GGjTwCOoyd4INH203e2YIja B67sTUu7KENYWHTzvYdpoUES22qCYKWqxv LZsWN TqCTn21F54pvj5ESQ8PkV4AipcABsXBWgNCteUCSJjyxOQUIDtzGRChHAx rq1NfCm9bSb TPU,tensor processing unit,What is TPU,How TPU works — Source

Anyway, we generally don’t require TPU, TPU is required only when you have a really massive amount of data and require really high computation power. Also if you require prediction with high precision(like in medical case) then TPU will not be ideal for you since it works on 8bit architecture, it compresses the floating-point value with 32-bit or 16-bit to 8bit integers using quantization. Generally, the loss of accuracy is not so significant and works in many cases. But I guess in few domains like medical and scientific research it won’t work.

Conclusion

TPU (Tensor processing unit) is an ASIC (Application-specific integrated circuit) processor developed by google. TPU is developed for the only tasks related to Deep Learning and machine learning, so any model other than the Tensorflow model won’t run on it. Training model on TPU is cheaper and consumes less power and time than GPU and CPU. TPU should be used only if you have a very large dataset or need very high and fast computation power.

Aman Kumar

Data Scientist with 3+ years of experience in building data-intensive applications in diverse industries. Proficient in predictive modeling, computer vision, natural language processing, data visualization etc. Aside from being a data scientist, I am also a blogger and photographer.