Introduction
In Neural Networks, where the neuron is a weighted average of input, Activation functions are used to bring nonlinearity into the output of neurons. It helps the network activate or deactivate a neuron based on the desired output. Because of the nonlinearity transformation, it produces better outcomes in complicated situations like audio, video, and text. Let’s get started learning more about them and the types of them, as well as why to use them, which one to utilize, and their benefits and drawbacks!
OverView
 What are activation Functions in Neural Networks?
 Linear vs Nonlinear Activation Functions
 Types of Activation Functions in Neural Networks?
 Advantages and Disadvantages of nonlinear activation functions
 Why Activation Functions are used in Neural networks ?
 How to choose Activation functions in Neural networks?
 Conclusion
What are Activation Functions in Neural Networks?
Activation functions are also known as transfer functions, mathematical equations that determine the output of a neural network. It helps to normalize the output of each neuron to range between 1 and 0 or between 1 and 1. Activation functions introduce nonlinearity into neural networks which helps the network learn complex data, compute and provide accurate predictions. In neural networks, based on an error of the output we update the weights and biases of the neuron. Activation functions are differentiable, which helps the backpropagation of the neural networks.
Linear vs Nonlinear activation functions :
Linear activation Functions  Nonlinear Activation Functions 
It has no backpropagation because the derivative is constant. So, there will no relation between input and output.  it is differentiable and establishes a relation between input and output when data is nonlinear. 
It can be used only for the linear regression model.  It has nonlinearity, Monotonic Function propertied. 
It doesn’t allow complex mapping between input and output  It can build complex mapping relationships. 
All layers of linear activation function lead to a single layer.  Stacking is allowed for multiple layers of neurons. 
If you are wondering why we need nonlinearity in neural network, then you can understand it but below video
Types Of Activation Functions In Neural Networks:
There are three types of activation functions in neural network
 Step Activation Functions
 Linear Activation Functions
 Nonlinear activation Functions
Step Activation Function:
This activation function is mainly used in binary classification, which activates or deactivates the neuron based on a certain threshold.
Mathematically it can be explained as using python code:
def binaryStep(x):
return np.heaviside(x,1)
x = np.linspace(10, 10)
plt.plot(x, binaryStep(x))
plt.axis('tight')
plt.title('Activation Function :binaryStep')
plt.show()
Linear activation Function:
Linear activation is similar to straightline i.e., Y = MX, where the activation is proportional to inputs. Its derivate is constant, there is no use of backpropagation. if add more layers into the neural network, it still equal to one layer
Mathematical Explanation by using python code:
def linear(x):
return x
x = np.linspace(10, 10)
plt.figure(figsize=(12,5))
plt.plot(x, linear(x))
plt.axis('tight')
plt.title('Activation Function :Linear')
plt.grid()
plt.legend(['Linear'])
plt.show()
Nonlinear activation functions:
Several nonlinear activation functions are there in neural networks. But here explained most used activation functions in neural networks.
Sigmoid :
Sigmoid Activation Function is one of the widely used activation functions in neural networks. It is known as the logistic function which helps to normalize the output of any input in the range 0 to 1. The main purpose of the activation function is to maintain the output or predicted value in the particular range, which makes for the good efficiency and accuracy of the model.
Mathematically Explained using Python code:
Hyperbolic Tangent (Tanh) Activation Function:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = np.arange(5,15,0.01)
y = 1/(1 + np.exp(x))
plt.figure(figsize=(12,5))
plt.plot(x,y)
plt.title("Sigmoid Activation Function")
plt.grid()
plt.legend(['Sigmoid Activation'])
plt.xlabel('Input')
plt.ylabel('Output')
Tanh :
It is very similar to a sigmoid function. It is superior to the sigmoid because it provides ranges from 1 t0 1.
Mathematically explained by using python code:
x = np.arange(5, 5, 0.01)
y = (2 / (1 + np.exp(2*x)))1
plt.figure(figsize=(12,5))
plt.plot(x,y)
plt.title('Tanh Activation Function')
plt.gird()
plt.legend(['Tanh activation'])
plt.xlabel('Input')
plt.ylabel('Output')
Rectified Linear Unit(ReLU) activation function :
ReLU (Rectified Linear activation function ), where input is greater than zero (positive)gives output also positive, otherwise, it gives output zero. If the input is negative then output is zero. Because of negative input, it can’t perform backpropagation in a neural network.
Mathematical explanation using python
x = np.arange(5, 5, 0.01)
z = np.zeros(len(x))
y = np.maximum(z,x)
plt.plot(x,y)
plt.title('ReLU Activation Function')
plt.grid()
plt.legend("ReLU activation")
plt.xlabel('Input')
plt.ylabel('Output')
4.Leaky_ReLU activation function:
It is also similar to ReLU and fixes dying neuron problems in ReLU.
Mathematical explanation of Leaky_ReLU using Python code:
import numpy as np
import matplotlib.pyplot as plt
# Leaky Rectified Linear Unit (leaky ReLU) Activation Function
def leaky_ReLU(x):
data = [max(0.05*value,value) for value in x]
return np.array(data, dtype=float)
# Derivative for leaky ReLU
def der_leaky_ReLU(x):
data = [1 if value>0 else 0.05 for value in x]
return np.array(data, dtype=float)
# Generating Visual of leaky ReLU
x_data = np.linspace(10,10,100)
y_data = leaky_ReLU(x_data)
dy_data = der_leaky_ReLU(x_data)
# Graph
plt.figure(figsize=(12,5))
plt.plot(x_data, y_data)
plt.title('leaky ReLU')
plt.legend(['leaky_ReLU'])
plt.grid()
plt.show()
5. Softplus Activation Function :
It is nearly similar to ReLU and is a smooth approximation of ReLU.
Mathematical Explanation using python:
x = np.arange(5, 5, 0.01)
y = np.log(1+np.exp(x))
plt.plot(x,y)
plt.title('Softplus Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
Advantages and Disadvantages of Activation Function in Neural Networks :
Sigmoid:
Advantages of sigmoid  Disadvantages of Sigmoid 
Its nonlinearity nature is introduced in the neural networks, it can activate hidden layers.  VanishingGradient Problem(During backpropagation, ongoing towards neural networks, the gradient becomes very close to zero). Computationally Expensive. 
It ranges from 0 to 1, which generate probabilities for prediction  When gradient reaches zero, there will be no learning in the network 
Tanh:
Advantages of Tanh  Disadvantages of Tanh 
It considers the negative values, in the minimum range 1.  It also creates the same vanishing gradient problem as a sigmoid. 
It is a Zerocentred activation function and superior to sigmoid  Computationally expensive because of its exponential nature. 
ReLU:
Advantages of ReLU  Disadvantages of ReLU 
The negative values are converted in Zero  It is nondifferentiable at zero and also boundless 
The vanishingGradient problem is solved because threshold values lead to infinity which gives better prediction and accuracy.  Most of the time used in hidden layers only in the neural network 
Computation time is low 
Also read about > Bokeh vs Plotly Visualization libraries in python
Why activation functions are used in neural networks ?
Activation Function introduces nonlinearity in the multilayer networks in order to detect nonlinear features in the data, without nonlinearity neural networks are only restricted to learning just linear functions.
How to choose activation function for your neural networks ?
In a neural network, the same type of activation function is used in all hidden layers. Most commonly used are ReLU, Sigmoid, Tanh activation functions for hidden layers. But popular activation function is ReLU because the vanishinggradient problem is deduced. Also, you can follow the common practice for choosing your activation function
Conclusion:
After discussion, You have a better understanding know what are activation Functions and their properties introduce nonlinearity, Continuously differentiable, range of inputs. We explained nonlinear activation functions using python code. If any math is off and detailing of topics is needed or any doubts feel free to ask your questions in the comment section.
Thank you so much for reading this blog!
Data Scientist with 3+ years of experience in building dataintensive applications in diverse industries. Proficient in predictive modeling, computer vision, natural language processing, data visualization etc. Aside from being a data scientist, I am also a blogger and photographer.

Aman Kumarhttps://buggyprogrammer.com/author/buggy5454/

Aman Kumarhttps://buggyprogrammer.com/author/buggy5454/

Aman Kumarhttps://buggyprogrammer.com/author/buggy5454/

Aman Kumarhttps://buggyprogrammer.com/author/buggy5454/