re#
The re
module in Python realizes regular expressions. See the full official description in the corresponding official HOWTO.
import re
Usage#
The re
module provides several common operations: match
, search
, findall
, and finditer
. You can use these functions directly from the re
module, which is the easiest option for experiments and command-line operations. Alternatively, you can use re.compile
to precompile regular expressions—this approach is widely used in production, as it allows for efficient reuse when processing multiple strings.
The following code shows how re.findall
can be used.
re.findall(r'\b\w{5}\b', 'Hello, world!')
['Hello', 'world']
The same example that uses re.compile
to build the expression and use findall
from it.
expression_object = re.compile(r'\b\w{5}\b')
expression_object.findall('Hello, world!')
['Hello', 'world']
Find all#
The findall
method allows you to extract all patterns as a list of elements.
The following cell extracts all words that are surrounded by []
.
re.findall(
r'\[(.*?)\]',
'[wow] -> [this] -> [is] -> [cool]'
)
['wow', 'this', 'is', 'cool']
Match/Search#
The match
and search
methods return special object that represents pattern described by the regular expression.
The match
allows to load pattern at the begining of the string.
The search
checks the whole string for the matching pattern.
Check more details in the special page.
The following code shows how to use the match.
re.match(
r'\[(.*?)\] -> \[(.*?)\]',
'[wow] -> [this] -> [is] -> [cool]'
).group()
'[wow] -> [this]'
Result only loads the first entry of the specified pattern.
But if you want to apply match
to the case where the pattern under consideration is somewhere in the middle of the input line - you will just get None
because the pattern startes somewhere in the middle of the line.
ans = re.match(r"\[(.*?)\]", "this is test [hello] [wow] test")
print(ans)
None
But the same case with search
allows to find the first entry of the pattern under consideration.
re.search(r"\[(.*?)\]", "this is test [hello] test").group()
'[hello]'