Support Vector Machine: One of the easiest classification model

June 16, 2022

Today, let’s discuss Support Vector Machine, or SVM for short. SVM can be used for both regressions as well as classification. Let’s see how SVM looks like a graph in both cases.

SVM Classification vs SVM Regression shown graphically support vector machine,support vector machine explained — Image By Author

By looking at these two graphs you should have got an idea that SVM is not working well in Regression but it is working well enough in the case of classification. What you are looking for in a classification graph is called Binary classification where there are only 2 categories in the target.

Let’s discuss what are those keywords given to different parts of SVM.

Support Vectors:- The data point that is closer to the hyperplane.

Margin:- It is a line that is away from the decision boundary.

Decision Boundary or Hyperplane:- It is the main dividing line that divides two classes.

When you work with the real dataset, you will not get to see such a dataset that you are looking at here in the case of classification. You are lucky if you get such data because, in real datasets, the data will be very complex and cannot be divided easily.

Let us discuss some of the hyperparameters that you will be working on

Regularization in support vector machine

C Is The Regularization Parameter Using Which We Can Decrease The Model Complexity Which In Turn Avoid Overfitting Our Model. In other more simple Words, the C Parameter Helps the Model To Classify The Data More Precisely. Let’s Visualize This To Understand Better.

how support vector machine works — Image by author

When C = 1, the margins are bigger as compared to when C = 10, and you can also observe that when margins are bigger then the chances of misclassification is also high and in the case of c = 10 (lower margins), the misclassification has not happened.

Misclassification:- It is the case when the model predicts class Red as class Green and vice-versa

But in reality, there will be misclassification of data for sure else the model will be overfitted.

Also, read -> what is regularization and how it works

Gamma in support vector machine

What if the data is not well separated and you cannot apply linear Kernel ( a linear line that will separate the two classes like in the above diagrams). You need to use other kernels like Poly or RBF, and with those kernels, you need to use gamma. So this will help you to separate the classes.

Let’s understand this with graphs.

Non linear data support vector machine,support vector machine explained — Image by Author

This data cannot be divided linearly so you need to use different kernels like Poly or RBF

gamma 0.1 kernel polynomial support vector machine,support vector machine explained — Kernel = Poly, Gamma = 0.1

Look at the hyperplane, it has well divided the data but still, there are lots of misclassification, to solve this we can increase the gamma value. You see the red background and green background, whatever the data falls inside the red background is predicted as class red and whatever the data has fallen inside the green background is predicted as class green.

gamma 1 kernel polynomial support vector machine,support vector machine explained — Kernel = Poly, Gamma = 1

Now increasing the gamma value has improved the result

kernel in Support vector machine

When we want to apply SVM on non-linearly separable data we apply non-linear kernels available like Poly and RBF. Here comes the Kernel trick, when non-linearly separable data is passed to the model, the Low Dimensional data which is not linearly separable is converted to High Dimensional Data which is now Linearly separable.

Look at the below given first graph, it cannot be classified using the linear kernel, so we will apply the Kernel trick here.

Kernel Trick support vector machine,support vector machine explained — Image by Author

1) Polynomial Kernel

This is applied to data that is not linearly separable. When you use the poly kernel, you also tune a hyperparameter called a degree. The degree allows flexibility to the polynomial kernel. If degree = 1, then it results in a linear kernel, higher degree results in a more flexible kernel. Let’s look at some graphs to understand this.

Linear Kernel when degree 1 support vector machine,support vector machine explained — Degree = 1, Linear Kernel

Polynomial kernel when degree 2 support vector machine,support vector machine explained — Degree = 2, Polynomial Kernel

Polynomial kernel when degree 5 support vector machine,support vector machine explained — Degree = 5, Polynomial Kernel

RBF Kernel

Radial Basis Function or RBF for short is my favorite as it works very well in case of data that is not linearly separable. Let me show some of its graphs to explain to you why it is my favorite

svm6 support vector machine,support vector machine explained — Gamma = 1

svm7 support vector machine,support vector machine explained — Gamma = 10

svm8 support vector machine,support vector machine explained — Gamma = 100

You can observe that on increasing the gamma value, the model is able to classify the data very well. But one thing you need to keep in mind is that increasing the gamma value can lead to overfitting.

Soft Margin and Hard Margin in SVM

Hard Margin

The core idea of Hard Margin is to maximize the margin in a way that classifier does not make any mistake i.e. it does not misclassify the data. But this is problematic because it becomes very sensitive to new data and tries to fit the decision boundary according to that new data.

11 support vector machine,support vector machine explained

12 1 support vector machine,support vector machine explained

You can observe that to satisfy new data and to avoid any misclassification, the decision boundary in the case of Hard Margin got shifted.

Soft Margin

The problem of Hard Margin is overcome by Soft margin. Soft Margin allows some misclassification, so countering any new data will not shift its decision boundary.

svm9 support vector machine,support vector machine explained

svm10 support vector machine,support vector machine explained

Observe this, the decision boundary is not shifted at all. The benefit of a soft margin is that it can avoid overfitting.

Some points to remember when using SVM…..

Can be used for both Regression and Classification but does not work well with regression.

Can be used when lots of outliers are present.

But remember that SVM is slower to train and its prediction speed is Moderate.

It gives high performance when the data is small.

Links to refer and study more about SVM:

https://en.wikipedia.org/wiki/Support-vector_machine

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Youtube link to refer to: https://www.youtube.com/watch?v=_YPScrckx28&t=1s

Aman Kumar

Data Scientist with 3+ years of experience in building data-intensive applications in diverse industries. Proficient in predictive modeling, computer vision, natural language processing, data visualization etc. Aside from being a data scientist, I am also a blogger and photographer.