Regular Expressions was initially a term borrowed from automata theory in theoretical computer science. Broadly, it refers to patterns to which a sub-string needs to be matched.
The comic should have already given you an idea of what regular expressions could be useful for. It should not be surprising that many programming languages, text processing tools, data validation tools and search engines make extensive use of them.
The key idea is that a regular expression is a pattern which matches a set of target strings.
\w+@\w+\.(com|org|net|in) is a regex that matches a most email addresses that end with a .com, .net, .org or a .in.
Regular Expressions Concepts
There are many forms of regex syntax that vary with the language. Here, we will be examining Perl regex since most other regexps are usually a variation on this.
Before we dive into the syntax, these are the kinds of things that the patterns consist of:
- Literals: They are the simplest things to match. When they are there, we just match them. It could be like an
a
or a1
. - Meta characters: They do not mean what they look like. They usually refer to something else. For example,
\d
could refer to any digit. - Vertical Bar: The
|
is a symbol of boolean OR. It gives an option to match any of the things it delimits. - Quantifiers: They specify how many of the concerned pattern needs to be matched.
- Grouping and Capturing: Parentheses could be used to group parts of the regex or capturing parts for later use.
Regular Expression Syntax
Let’s look at what the meta characters do in a little more detail.
Meta character | Description |
^ |
Start of a string |
$ |
End of a string |
\t |
Tab |
\n |
Newline |
\r |
Carriage Return |
\s |
Any whitespace character |
\S |
Any non-whitespace character |
\d |
Any Digit |
\D |
Any non-digit |
\w |
Any word-character |
\W |
Any non-word character |
\b |
Any word boundary |
\B |
Any non-word-boundary |
. |
Any single character, usually barring a newline |
By the way, if you want to match a metacharacter literally, you need to use \
to escape it. For example, \.
would just match the .
character.
Now, let us look into more flexibility stuff.
Expression | Meaning |
[abc] |
Matches any of a ,b , or c |
[^abc] |
Matches anything other than a , b , or c |
[a-d] |
Matches any of the characters in the range a-d |
a* |
Matches a zero or more times |
a? |
Matches a zero or one time |
a+ |
Matches a one or more times |
a|b |
Matches either a or b |
a{3} |
Matches exactly 3 of a |
a{3,} |
Matches 3 or more of a |
a{3,5} |
Matches 3, 4 or 5 of a (inclusive range) |
( ) |
Captures everything inside the bracket |
Read Also: Context Free Grammars