Regular Expression Using Perl : Perl is the language that is the most famous for its use of regular expression for good reasons.
We use the =~
operator to denote a match or an assignment depending upon the context. The use of !~
is to reverse the sense of the match.
There are basically two regex operators in perl:
- Matching:
m//
- Substitution:
s///
The purpose of the //
is to enclose the regex. However, any other delimiters like {}</codmy ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/); e>,
""
, etc could be used.
Matching
To use the matching operator, we simply check both sides using the =~
and m//
operator.
The following sets $true
to 1 if and only if $foo
matches the regular expression foo
:
$true = ($foo =~ m/foo/);
It is not difficult to see that just the opposite is achieved with !~
:
$false = ($foo !~ m/foo/);
Capturing
As promised, the ()
could be used for capturing parts of the regexes. When the pattern inside a parentheses match, they go into special variables like $1
, $2
, etc in that order.
Example:
Here’s how one would extract the hours, minutes, seconds from a time string:
if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format $hours = $1; $minutes = $2; $seconds = $3; }
In list context, the list ($1, $2, $3, .. )
would be returned.
my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);
Substitution
This is our favorite search and replace feature. Almost the same syntax rules apply here except that there is an extra clause between the second //
that tells us what to match with.
$x = "Time to feed the cat!"; $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!" if ($x =~ s/^(Time.*hacker)!$/$1 now!/) { $more_insistent = 1; } $y = "'quoted words'"; $y =~ s/^'(.*)'$/$1/; # strip single quotes, # $y contains "quoted words"
Modifiers
Modifiers could be appended to the end of the regex operation expression to modify their matching behavior.
Here is a list of some important modifiers:
Modifier | Description |
i |
Case insensisitive matching |
s |
Allows the use of . to match newlines |
x |
Allows use of whitespace in the regex for clarity |
g |
Globally find all matches |
Here’s how one might want to use the
g
modifier:$x = "I batted 4 for 4"; $x =~ s/4/four/; # doesn't do it all: # $x contains "I batted four for 4" $x = "I batted 4 for 4"; $x =~ s/4/four/g; # does it all: # $x contains "I batted four for four"