Understanding Activation Functions in Neural Networks: How to choose Better one?

Introduction

In Neural Networks, where the neuron is a weighted average of input, Activation functions are used to bring non-linearity into the output of neurons. It helps the network activate or deactivate a neuron based on the desired output. Because of the non-linearity transformation, it produces better outcomes in complicated situations like audio, video, and text. Let’s get started learning more about them and the types of them, as well as why to use them, which one to utilize, and their benefits and drawbacks!

OverView

What are activation Functions in Neural Networks?
Linear vs Non-linear Activation Functions
Types of Activation Functions in Neural Networks?
Advantages and Disadvantages of non-linear activation functions
Why Activation Functions are used in Neural networks ?
How to choose Activation functions in Neural networks?
Conclusion

What are Activation Functions in Neural Networks?

Activation functions are also known as transfer functions, mathematical equations that determine the output of a neural network. It helps to normalize the output of each neuron to range between 1 and 0 or between -1 and 1. Activation functions introduce non-linearity into neural networks which helps the network learn complex data, compute and provide accurate predictions. In neural networks, based on an error of the output we update the weights and biases of the neuron. Activation functions are differentiable, which helps the backpropagation of the neural networks.

Activation function in neural network — **Activation Function In Neural Networks**

Linear vs Non-linear activation functions :

Linear activation Functions	Non-linear Activation Functions
It has no backpropagation because the derivative is constant. So, there will no relation between input and output.	it is differentiable and establishes a relation between input and output when data is non-linear.
It can be used only for the linear regression model.	It has non-linearity, Monotonic Function propertied.
It doesn’t allow complex mapping between input and output	It can build complex mapping relationships.
All layers of linear activation function lead to a single layer.	Stacking is allowed for multiple layers of neurons.

Linear Vs Non-linear activation function

If you are wondering why we need non-linearity in neural network, then you can understand it but below video

Why non-linearity is important in neural network

Types Of Activation Functions In Neural Networks:

There are three types of activation functions in neural network

Step Activation Functions
Linear Activation Functions
Non-linear activation Functions

Step Activation Function:

This activation function is mainly used in binary classification, which activates or deactivates the neuron based on a certain threshold.

Mathematically it can be explained as using python code:

def binaryStep(x):
    return np.heaviside(x,1)
x = np.linspace(-10, 10)
plt.plot(x, binaryStep(x))
plt.axis('tight')
plt.title('Activation Function :binaryStep')
plt.show()

1*K9QJmeG33SvQeJgPa52mmQ activation function,neural network — Step activation

Linear activation Function:

Linear activation is similar to straight-line i.e., Y = MX, where the activation is proportional to inputs. Its derivate is constant, there is no use of backpropagation. if add more layers into the neural network, it still equal to one layer

Mathematical Explanation by using python code:

def linear(x):
       return x
x = np.linspace(-10, 10)
plt.figure(figsize=(12,5))
plt.plot(x, linear(x))
plt.axis('tight')
plt.title('Activation Function :Linear')
plt.grid()
plt.legend(['Linear'])
plt.show()

Linear activation function,neural network — Linear activation Function

Non-linear activation functions:

Several non-linear activation functions are there in neural networks. But here explained most used activation functions in neural networks.

Sigmoid :

Sigmoid Activation Function is one of the widely used activation functions in neural networks. It is known as the logistic function which helps to normalize the output of any input in the range 0 to 1. The main purpose of the activation function is to maintain the output or predicted value in the particular range, which makes for the good efficiency and accuracy of the model.

Mathematically Explained using Python code:

Hyperbolic Tangent (Tanh) Activation Function:

sigmoid activation function,neural network — Sigmoid/Logistic function

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x = np.arange(-5,15,0.01)
y = 1/(1 + np.exp(-x))
plt.figure(figsize=(12,5))
plt.plot(x,y)
plt.title("Sigmoid Activation Function")
plt.grid()
plt.legend(['Sigmoid Activation'])
plt.xlabel('Input')
plt.ylabel('Output')

Tanh :

It is very similar to a sigmoid function. It is superior to the sigmoid because it provides ranges from -1 t0 1.

Mathematically explained by using python code:

x = np.arange(-5, 5, 0.01)
y = (2 / (1 + np.exp(-2*x)))-1
plt.figure(figsize=(12,5))
plt.plot(x,y)
plt.title('Tanh Activation Function')
plt.gird()
plt.legend(['Tanh activation'])
plt.xlabel('Input')
plt.ylabel('Output')

tanh activation function,neural network — Tanh activation function

Rectified Linear Unit(ReLU) activation function :

ReLU (Rectified Linear activation function ), where input is greater than zero (positive)gives output also positive, otherwise, it gives output zero. If the input is negative then output is zero. Because of negative input, it can’t perform backpropagation in a neural network.

Mathematical explanation using python

x = np.arange(-5, 5, 0.01)
z = np.zeros(len(x))
y = np.maximum(z,x)
plt.plot(x,y)
plt.title('ReLU Activation Function')
plt.grid()
plt.legend("ReLU activation")
plt.xlabel('Input')
plt.ylabel('Output')

relu activation function,neural network — ReLU Activation Function

4.Leaky_ReLU activation function:

It is also similar to ReLU and fixes dying neuron problems in ReLU.

Mathematical explanation of Leaky_ReLU using Python code:

import numpy as np
import matplotlib.pyplot as plt


# Leaky Rectified Linear Unit (leaky ReLU) Activation Function
def leaky_ReLU(x):
        data = [max(0.05*value,value) for value in x]
         return np.array(data, dtype=float)

# Derivative for leaky ReLU 
def der_leaky_ReLU(x): 
        data = [1 if value>0 else 0.05 for value in x]
        return np.array(data, dtype=float)

# Generating Visual of leaky ReLU
x_data = np.linspace(-10,10,100)
y_data = leaky_ReLU(x_data)
dy_data = der_leaky_ReLU(x_data)

# Graph
plt.figure(figsize=(12,5))
plt.plot(x_data, y_data)
plt.title('leaky ReLU')
plt.legend(['leaky_ReLU'])
plt.grid()
plt.show()

6 activation function,neural network — Leaky_ReLU activation

5. Softplus Activation Function :

It is nearly similar to ReLU and is a smooth approximation of ReLU.

Mathematical Explanation using python:

x = np.arange(-5, 5, 0.01)
y = np.log(1+np.exp(x))
plt.plot(x,y)
plt.title('Softplus Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')

7 activation function,neural network — Soft plus activation function

Advantages and Disadvantages of Activation Function in Neural Networks :

Sigmoid:

Advantages of sigmoid	Disadvantages of Sigmoid
Its non-linearity nature is introduced in the neural networks, it can activate hidden layers.	Vanishing-Gradient Problem(During backpropagation, ongoing towards neural networks, the gradient becomes very close to zero). Computationally Expensive.
It ranges from 0 to 1, which generate probabilities for prediction	When gradient reaches zero, there will be no learning in the network

Advantages and disadvantages of sigmoid

Tanh:

Advantages of Tanh	Disadvantages of Tanh
It considers the negative values, in the minimum range -1.	It also creates the same vanishing gradient problem as a sigmoid.
It is a Zero-centred activation function and superior to sigmoid	Computationally expensive because of its exponential nature.

Advantages and Disadvantages of Tanh

ReLU:

Advantages of ReLU	Disadvantages of ReLU
The negative values are converted in Zero	It is non-differentiable at zero and also boundless
The vanishing-Gradient problem is solved because threshold values lead to infinity which gives better prediction and accuracy.	Most of the time used in hidden layers only in the neural network
Computation time is low

Advantages and Disadvantages of ReLU

Also read about -> Bokeh vs Plotly Visualization libraries in pytho n

Why activation functions are used in neural networks ?

Activation Function introduces non-linearity in the multi-layer networks in order to detect non-linear features in the data, without non-linearity neural networks are only restricted to learning just linear functions.

How to choose activation function for your neural networks ?

In a neural network, the same type of activation function is used in all hidden layers. Most commonly used are ReLU, Sigmoid, Tanh activation functions for hidden layers. But popular activation function is ReLU because the vanishing-gradient problem is deduced. Also, you can follow the common practice for choosing your activation function

How to choose activation function for hidden layer — Activation function for hidden layers

How to choose activation function for output layer — Activation functions at the output layer

Conclusion:

After discussion, You have a better understanding know what are activation Functions and their properties introduce non-linearity, Continuously differentiable, range of inputs. We explained non-linear activation functions using python code. If any math is off and detailing of topics is needed or any doubts feel free to ask your questions in the comment section.

Thank you so much for reading this blog!

Aman Kumar

Data Scientist with 3+ years of experience in building data-intensive applications in diverse industries. Proficient in predictive modeling, computer vision, natural language processing, data visualization etc. Aside from being a data scientist, I am also a blogger and photographer.

Share this post

Newest

Oldest Most Voted