Regular expression also known as regex or rational expression, is a set of strings used to match or find (pattern matching) certain characters or strings in a big database or any big document, etc. Regular expression is also used with web scraping for extracting useful information through pattern matching, like name, phone number, address, etc.
Also regular expression is not any programming language, which many people get confused with, it’s simply a language or you can say a particular kind of grammar to parse strings or textual information which can be used with almost any language like C++, Python, Java, etc.
Why do we need Regular expression
Sometimes we want to extract some important information like Name, Email, Address, Phone, etc from a webpage, big document file, or some database, etc. And searching this manually one by one is kind of impossible, so there we use Regex, as we discussed earlier it is a pattern-matching thing, we just specify the pattern of something we want to search for and it finds all the related searches.
For sake of an example let’s suppose you have a huge document file that contains your friend’s email address and you want to extract it, then how will you do that, it won’t be a good idea to read the whole document to find it right! So, there we can use regex.
Also read: What is GitHub gist and how to use it?
Application of Regular expression
There’s a lot of applications of regex, like as we already discussed that it is used in web scraping for finding relevant data, now let’s see one more example, have you ever noticed while creating your account in some website or app it asks you to include special characters, numbers, and letters with both upper and lower case. So, how their program got to know that if you have included it in your password or not, yeah you guessed it right they use regex to check that. Similar to that they also use this to enforce you write an email, credit card number or phone number, etc in correct form
Let’s take one more example, suppose you wrote a research paper about a dog of almost 12 or more page, then you found out that, by mistake, you wrote cat everywhere instead of a dog, so now you have to change it everywhere, it might be repeated more than 100 times in your document and if you go to replace it manually everywhere, then probably it will take you about an hour or two. So, instead of doing it manually you can use Regex here to replace it from everywhere and it will barely take you a minute
Simple example of email matching with Regular expression
If you want to extract an email from any document, webpage or anything then you need to match it’s pattern with regex, so to do that let’s have a closer look at their structure, like what they contain
So, as you can see here all emails are following a common pattern which is, their username, followed by the ‘@’ symbol followed by the mail server and finally attached with a top-level domain (‘.com’, ‘.edu’, ‘.hotmail’).
Also read: what is docker and why need it?
So with this observation, we will first try to match username which contains alphabet, numbers, and symbols. This can be done with the following code “[a-zA-Z0-9.-]+”, then we will take a look at the mail server which in our case is just alphabets so it will be “[a-zA-Z]+” after that we have a dot, so we will use “\.” and finally for the top-level domain we write “[a-zA-Z]+” and that’s it.
There are many components of regular expression that you will need to know, in order to use it. So, here is the quick cheat sheet which will help you in learning all these.
Should you learn Regular expression?
If you are in pentesting, cybersecurity, network administrator, or Linux administrator it will seriously add great value to your career. But even if you are not it’s a great thing that will help you despite your domain so it’s worth learning. So, If you are planning to learn more about it, then I would suggest you to start with Corey Schafer’s video, he did a great job in explaining this