In this tutorials we will about regular expressions in PHP and how a programmer can apply them in pattern matching.
What is Regular Expression?
They are popularly referred to as ‘regex’ or ‘RegExp’. Usually, they are uniquely formatted text strings that can one use to determine patterns in a text. Regular expressions are very useful when it comes to processing text and manipulating text. For instance, an individual can use it to determine if the data format such as name, phone number, and email typed by user is correct or not, find or replace a matching string in text content, and many others.
The table below shows some of the most common PHP in-built pattern matching functions.
Function
What it Does
preg_macth ()
Perform a regular expression match.
preg_match_all ()
Perform a global regular expression match.
preg_replace ()
Perform a regular expression search and replace.
preg_grep ()
Returns the elements of the inputs array that matched the pattern.
preg_split ()
Splits a string into substring using a regular expression.
preg_quote ()
Quote regular expression characters found within a string.
The PHP preg_match() function will stop to search once it finds the first match. On the other hand, the preg_match_all() function will continue to search until it reaches the end of the string and identifies all the possible matches instead of halting at the first match.
The syntax of a Regular Expression
The syntax of a regular expression has special characters. The specific types of characters that have a unique meaning inside a regular expression include: . * ? + [] () {} ^ $ | \.
However, before you can use these characters, you must backslash them. For instance, if you want to match “.”, you will need to write \. Other remaining characters assume their literal meaning automatically.
Character classes
The square brackets which enclose a pattern of characters are referred to as character class. A character class will match a single character from a list of unique character.
Negated character classes can also be generated that match any type of character except the ones that are inside the brackets. To define a negated character class, the caret symbol should immediately follow the opening bracket. Example,[^abc].
Still, it is possible to define a range of characters by placing the hypen(-) character inside a character such as [0-9]. Below are examples of character classes.
RegExp
What it Does
[abc]
Matches any one of the characters a,b, or c.
[^abc]
Matches any one character other than a, b, c.
[a-z]
Matches any one character from lowercase a to lowercase z.
[A-Z]
Matches any one character from uppercase a to uppercase z.
[a-Z]
Matches any one character from lowercase to uppercase Z.
[0-9]
Matches a single digit between 0 and 9.
[a-z0-9]
Matches a single character between a and z or between 0 to 9.
The example below demonstrates how to determine whether a pattern is present in a string or not by using regular expressions and the PHP preg_match() function.
1
2
3
4
5
6
7
8
<?php
$pattern="/ca[cf]e/";
$text="He was eating cake in the cafe.";
if(preg_match($pattern,$text)){
echo"Match found!";
}else{
echo"Match not found";
}
At the same time, all matches in a string can be identified by applying the preg_match_all() function:
1
2
3
4
5
<?php
$pattern="/ca[cf]e/";
$text="He was eating cake in the cafe.";
$matches=preg_match_all($pattern,$text,$array);
echo$matches."matches were found".
Predefined Character Classes
There are certain character classes such as whitespaces, letters, and digits that are often used. As a result, they have shortcut names defined for them. The table below lists some of the predefined character classes:
Shortcut
Function
.
It matches a single character except for only a new line \n
\d
It matches any digit character. Similar to [0-9]
\D
It matches any non-digit character. Similar to [^0-9]
\s
It matches the whitespace character. Similar to [ \t\n\r]
\w
It will match any word character and underscore. Similar to [a-zA-Z_0-9]
\W
It will match any non-word character. Similar to [^a-zA-Z_0-9]
The example below shows how to find and replace space using a hyphen character in a string by applying a regular expression and PHP preg_replace () function.
1
2
3
4
5
6
7
8
9
<?php
$pattern="/\s/";
$replacement="-";
$text="Earth revolves around\nthe\tSun";
// Replace spaces, newlines and tabs
echo preg_replace($pattern,$replacement,$text);
echo"<br >";
// Replace only spaces
echo str_replace(" ","-",$text);
Repetition Quantifiers
Quantifiers describe the number of times a character in a regular expression should match. The table below shows several ways that one can quantify a specific pattern.
RegExp
What it Does
p+
Matches one or more occurrences of the letter p.
p*
Matches zero or more occurrences of the letter p.
p?
Matches zero or more occurrences of the letter p.
p{2}
Matches exactly two occurrences of the letter p.
p{2,3}
Matches at least two occurrences of the letter p, but not more than three occurrences of the letter p.
p{2,}
Matches two or more occurrences of the letter p.
p{,3}
Matches at most three occurrences of the letter p.
The regular expression used in the example below split the string at a comma, series of a comma, and a combination applying PHP preg_split () function.
1
2
3
4
5
6
7
8
9
<?php
$pattern="/[\s,]+/";
$text="My favourite color are red, green and blue";
$parts=preg_split($pattern,$text);
// Loop through parts array and display substrings
foreach($parts aspart){
echo$part."<br>"
}
Position Anchors
There are specific cases that you might want to match at the start or end of a line, string, or word. To achieve this, you can apply anchors. Two common anchors include the caret(^) which signals the start of a string and a $ sign that represents the end of a string.
RegExp
Function
^p
It will match the letter p at the start of a line.
p$
It will match the letter p at the end of a line.
The regular expression applied in the following example displays only names from the names array which begins with the letter “j” and the preg_group () function.
//Loop through matches array and display matched names
foreach($matches as$match){
echo$match."<br>"
}
Pattern Modifiers
A pattern modifier will allow a developer to respond to a pattern match. Pattern modifiers appear directly after the regular expression. If you want to look for a pattern in a case-insensitive way, for example, then you should use the I modify such as /pattern/I. The table below has some of the most commonly used pattern modifiers.
Modifier
What is Does
i
Makes the match case-insensitive manner.
m
Changes the behavior of ^ and $ to match against a newline boundary (i.e. start or end of each line within a multiline string), instead of a string boundary.
g
Perform a global match i.e. finds all occurrences.
o
Evaluates the expression only once.
s
Changes the behavior of . (dot) to match all characters, including newlines.
x
Allows you to use whitespace and comments within a regular expression for clarity.
The example below will demonstrate how you can carry out a global case-insensitive search by applying the i modifier and PHP preg_match_all () function.
Modifier
What is Does
i
Makes the match case-insensitive manner.
m
Changes the behavior of ^ and $ to match against a newline boundary (i.e. start or end of each line within a multiline string), instead of a string boundary.
g
Perform a global match i.e. finds all occurrences.
o
Evaluates the expression only once.
s
Changes the behavior of . (dot) to match all characters, including newlines.
x
Allows you to use whitespace and comments within a regular expression for clarity.
Word Boundaries
A word boundary character (\b) will help you identify the words that start and end with a pattern. For instance, the regexp /\bcar/ matches words that start with a pattern car and match cartoon, carrot, cart but cannot match Oscar.
In the same way, the regexp /car\b/ matches words that end with the pattern car, and match scar, supercar, Oscar but cannot match cart. Similarly, /\bcar\b\ matches words that start and end with the pattern car and that will only match the word car. The example will show words starting with the car in bold:
1
2
3
4
5
6
<?php
$pattern='/\bcar\w*/';
$replacement='<b>$0</b>';
$text='Words beginning with car: cart, carrot, cartoon
Hi! I am Anuj Kumar, a professional web developer with 5+ years of experience in this sector. I found PHPGurukul in September 2015. My keen interest in technology and sharing knowledge with others became the main reason for starting PHPGurukul. My basic aim is to offer all web development tutorials like PHP, PDO, jQuery, PHP oops, MySQL, etc.
Apart from the tutorials, we also offer you PHP Projects, and we have around 100+ PHP Projects for you.