**Introduction**

In Neural Networks, where the neuron is a weighted average of input, Activation functions are used to bring non-linearity into the output of neurons. It helps the network activate or deactivate a neuron based on the desired output. Because of the non-linearity transformation, it produces better outcomes in complicated situations like audio, video, and text. Let’s get started learning more about them and the types of them, as well as why to use them, which one to utilize, and their benefits and drawbacks!

**OverView**

- What are activation Functions in Neural Networks?
- Linear vs Non-linear Activation Functions
- Types of Activation Functions in Neural Networks?
- Advantages and Disadvantages of non-linear activation functions
- Why Activation Functions are used in Neural networks ?
- How to choose Activation functions in Neural networks?
- Conclusion

## What are Activation Functions in Neural Networks?

Activation functions are also known as transfer functions, mathematical equations that determine the output of a neural network. It helps to normalize the output of each neuron to range between 1 and 0 or between -1 and 1. Activation functions introduce non-linearity into neural networks which helps the network learn complex data, compute and provide accurate predictions. In neural networks, based on an error of the output we update the weights and biases of the neuron. Activation functions are differentiable, which helps the backpropagation of the neural networks.

## Linear vs Non-linear activation functions :

Linear activation Functions | Non-linear Activation Functions |

It has no backpropagation because the derivative is constant. So, there will no relation between input and output. | it is differentiable and establishes a relation between input and output when data is non-linear. |

It can be used only for the linear regression model. | It has non-linearity, Monotonic Function propertied. |

It doesn’t allow complex mapping between input and output | It can build complex mapping relationships. |

All layers of linear activation function lead to a single layer. | Stacking is allowed for multiple layers of neurons. |

If you are wondering why we need non-linearity in neural network, then you can understand it but below video

## Types Of Activation Functions In Neural Networks:

There are three types of activation functions in neural network

- Step Activation Functions
- Linear Activation Functions
- Non-linear activation Functions

## Step Activation Function:

This activation function is mainly used in binary classification, which activates or deactivates the neuron based on a certain threshold.

###### Mathematically it can be explained as using python code:

```
def binaryStep(x):
return np.heaviside(x,1)
x = np.linspace(-10, 10)
plt.plot(x, binaryStep(x))
plt.axis('tight')
plt.title('Activation Function :binaryStep')
plt.show()
```

## Linear activation Function:

Linear activation is similar to straight-line i.e., Y = MX, where the activation is proportional to inputs. Its derivate is constant, there is no use of backpropagation. if add more layers into the neural network, it still equal to one layer

Mathematical Explanation by using python code:

```
def linear(x):
return x
x = np.linspace(-10, 10)
plt.figure(figsize=(12,5))
plt.plot(x, linear(x))
plt.axis('tight')
plt.title('Activation Function :Linear')
plt.grid()
plt.legend(['Linear'])
plt.show()
```

## Non-linear activation functions:

Several non-linear activation functions are there in neural networks. But here explained most used activation functions in neural networks.

### Sigmoid :

Sigmoid Activation Function is one of the widely used activation functions in neural networks. It is known as the logistic function which helps to normalize the output of any input in the range 0 to 1. The main purpose of the activation function is to maintain the output or predicted value in the particular range, which makes for the good efficiency and accuracy of the model.

###### Mathematically Explained using Python code:

### Hyperbolic Tangent (Tanh) Activation Function:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = np.arange(-5,15,0.01)
y = 1/(1 + np.exp(-x))
plt.figure(figsize=(12,5))
plt.plot(x,y)
plt.title("Sigmoid Activation Function")
plt.grid()
plt.legend(['Sigmoid Activation'])
plt.xlabel('Input')
plt.ylabel('Output')
```

## Tanh :

It is very similar to a sigmoid function. It is superior to the sigmoid because it provides ranges from -1 t0 1.

###### Mathematically explained by using python code:

```
x = np.arange(-5, 5, 0.01)
y = (2 / (1 + np.exp(-2*x)))-1
plt.figure(figsize=(12,5))
plt.plot(x,y)
plt.title('Tanh Activation Function')
plt.gird()
plt.legend(['Tanh activation'])
plt.xlabel('Input')
plt.ylabel('Output')
```

### Rectified Linear Unit(ReLU) activation function :

ReLU (Rectified Linear activation function ), where input is greater than zero (positive)gives output also positive, otherwise, it gives output zero. If the input is negative then output is zero. Because of negative input, it can’t perform backpropagation in a neural network.

###### Mathematical explanation using python

```
x = np.arange(-5, 5, 0.01)
z = np.zeros(len(x))
y = np.maximum(z,x)
plt.plot(x,y)
plt.title('ReLU Activation Function')
plt.grid()
plt.legend("ReLU activation")
plt.xlabel('Input')
plt.ylabel('Output')
```

## 4.Leaky_ReLU activation function:

It is also similar to ReLU and fixes dying neuron problems in ReLU.

###### Mathematical explanation of Leaky_ReLU using Python code:

```
import numpy as np
import matplotlib.pyplot as plt
# Leaky Rectified Linear Unit (leaky ReLU) Activation Function
def leaky_ReLU(x):
data = [max(0.05*value,value) for value in x]
return np.array(data, dtype=float)
# Derivative for leaky ReLU
def der_leaky_ReLU(x):
data = [1 if value>0 else 0.05 for value in x]
return np.array(data, dtype=float)
# Generating Visual of leaky ReLU
x_data = np.linspace(-10,10,100)
y_data = leaky_ReLU(x_data)
dy_data = der_leaky_ReLU(x_data)
# Graph
plt.figure(figsize=(12,5))
plt.plot(x_data, y_data)
plt.title('leaky ReLU')
plt.legend(['leaky_ReLU'])
plt.grid()
plt.show()
```

### 5. Softplus Activation Function :

It is nearly similar to ReLU and is a smooth approximation of ReLU.

###### Mathematical Explanation using python:

```
x = np.arange(-5, 5, 0.01)
y = np.log(1+np.exp(x))
plt.plot(x,y)
plt.title('Softplus Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
```

## Advantages and Disadvantages of Activation Function in Neural Networks :

## Sigmoid:

Advantages of sigmoid | Disadvantages of Sigmoid |

Its non-linearity nature is introduced in the neural networks, it can activate hidden layers. | Vanishing-Gradient Problem(During backpropagation, ongoing towards neural networks, the gradient becomes very close to zero). Computationally Expensive. |

It ranges from 0 to 1, which generate probabilities for prediction | When gradient reaches zero, there will be no learning in the network |

## Tanh:

Advantages of Tanh | Disadvantages of Tanh |

It considers the negative values, in the minimum range -1. | It also creates the same vanishing gradient problem as a sigmoid. |

It is a Zero-centred activation function and superior to sigmoid | Computationally expensive because of its exponential nature. |

## ReLU:

Advantages of ReLU | Disadvantages of ReLU |

The negative values are converted in Zero | It is non-differentiable at zero and also boundless |

The vanishing-Gradient problem is solved because threshold values lead to infinity which gives better prediction and accuracy. | Most of the time used in hidden layers only in the neural network |

Computation time is low |

Also read about -> Bokeh vs Plotly Visualization libraries in python

## Why activation functions are used in neural networks ?

Activation Function introduces non-linearity in the multi-layer networks in order to detect non-linear features in the data, without non-linearity neural networks are only restricted to learning just linear functions.

## How to choose activation function for your neural networks ?

In a neural network, the same type of activation function is used in all hidden layers. Most commonly used are ReLU, Sigmoid, Tanh activation functions for hidden layers. But popular activation function is ReLU because the vanishing-gradient problem is deduced. Also, you can follow the common practice for choosing your activation function

## Conclusion:

After discussion, You have a better understanding know what are activation Functions and their properties introduce non-linearity, Continuously differentiable, range of inputs. We explained non-linear activation functions using python code. If any math is off and detailing of topics is needed or any doubts feel free to ask your questions in the comment section.

Thank you so much for reading this blog!

Data Scientist with 3+ years of experience in building data-intensive applications in diverse industries. Proficient in predictive modeling, computer vision, natural language processing, data visualization etc. Aside from being a data scientist, I am also a blogger and photographer.