In this article, we are going to talk about the Tensor Processing Unit (TPU). A custom-designed processor created by Google for machine learning and Deep learning task. We will talk about why google created TPU, How TPU is different from CPU and GPU., and so on…
What is TPU?
TPU is an AI accelerator, Application-specific Integrated Circuit (ASIC) developed by Google, specifically for AI-related tasks such as Machine learning, Deep learning, and Neural networks. TPU is not a generic processor so any model other than the Tensorflow model will not run on it. As its name suggests it’s very powerful in dealing with tensors (any mathematical objects that can be used to describe physical properties, just like scalars, vectors, or matrices).
When was TPU introduced?
TPU was introduced in May 2016 at Google I/O but they were using it internally from 2015 in their data centers for their own products like Google photos, Rankbrain, and translate, etc. TPU was made available for third-party use in 2018, as part of their Cloud infrastructure or by offering a smaller version of the chip for sale.
What led to the development of TPU?
The demand and progress of AI and machine learning are growing very fast. Almost every sector is now using artificial intelligence and machine learning to improve its performance. But our conventional processor is not good enough in handling this effectively. They could be expensive and they consume lots of power and time for an operation. Which led to the development of TPU which is specifically designed for this purpose only.
Read more – Difference between CPU and GPU
How TPU works?
The major reduction of the TPU is Von Neumann’s bottleneck since its primary task is matrix processing. Inside the TPU there are thousands of multipliers and adders connected to each other directly to form a large physical matrix of those operators known as systolic array architecture. There are two systolic arrays of 128×128 in Cloud TPUv2 aggregating 32,768 ALUs for 16 bit floating values in a single processor.
Systolic array works executes neural networks calculations in these 3 steps
- First, all parameters are loaded from the input in a matrix of multiplier and adders
- After that, all data is loaded from memory
- Then, the result of every multiplication operation is passed to the next multiplier while taking summation at the same time. The output is then given as the summation of all multiplication results of data and parameters.
To keep this article simple we have not discussed the working of TPU here, in detail. Instead of that, we have created a separate article for that where we have compared TPU, CPU, and GPU and discussed how TPU works in detail. You can read this here “An In-Depth Exploration On TPU Working“
How much does a TPU cost?
TPU is not something that you can buy and plug into your CPU (not talking about the processor) as we do with GPU. TPU can be used only through Google Cloud Platform (GCP) or through their notebook (Kaggle & Colab). TPU is not too much expensive to use, you can start using it from just a few dollars
This is the pricing of TPU service in the US, if you use preemptible it will be a lot cheaper for you. Preemptible is a google service in which they let you use their spare VM instances or VM instances that are currently not in use. Preemptible VM automatically stops after 24 hours or can be stopped by google if they need those remaining processors too. You don’t want to use preemptible service if your app is not fault tolerance or it won’t work on batch processing.
If you need even more computation power then you can use TPU Pod, There are two pods one is a set of 512 TPU v2 cores and has a clock speed of 11.5 petaflops(flops-> floating-point operation per second). And other is a set of 2048 TPUv3 cores and has a clock speed of 100+petaflops. Depending on your need you can choose the service. You can see the price of a different Cloud TPU pod
Anyway, we generally don’t require TPU, TPU is required only when you have a really massive amount of data and require really high computation power. Also if you require prediction with high precision(like in medical case) then TPU will not be ideal for you since it works on 8bit architecture, it compresses the floating-point value with 32-bit or 16-bit to 8bit integers using quantization. Generally, the loss of accuracy is not so significant and works in many cases. But I guess in few domains like medical and scientific research it won’t work.
TPU (Tensor processing unit) is an ASIC (Application-specific integrated circuit) processor developed by google. TPU is developed for the only tasks related to Deep Learning and machine learning, so any model other than the Tensorflow model won’t run on it. Training model on TPU is cheaper and consumes less power and time than GPU and CPU. TPU should be used only if you have a very large dataset or need very high and fast computation power.