re#
The re module in Python realizes regular expressions. See the full official description in the corresponding official HOWTO.
import re
Usage#
The re module provides several common operations: match, search, findall, and finditer. You can use these functions directly from the re module, which is the easiest option for experiments and command-line operations. Alternatively, you can use re.compile to precompile regular expressions—this approach is widely used in production, as it allows for efficient reuse when processing multiple strings.
The following code shows how re.findall can be used.
re.findall(r'\b\w{5}\b', 'Hello, world!')
['Hello', 'world']
The same example that uses re.compile to build the expression and use findall from it.
expression_object = re.compile(r'\b\w{5}\b')
expression_object.findall('Hello, world!')
['Hello', 'world']
Find all#
The findall method allows you to extract all patterns as a list of elements.
The following cell extracts all words that are surrounded by [].
re.findall(
r'\[(.*?)\]',
'[wow] -> [this] -> [is] -> [cool]'
)
['wow', 'this', 'is', 'cool']
Match/Search#
The match and search methods return special object that represents pattern described by the regular expression.
The match allows to load pattern at the begining of the string.
The search checks the whole string for the matching pattern.
Check more details in the special page.
The following code shows how to use the match.
re.match(
r'\[(.*?)\] -> \[(.*?)\]',
'[wow] -> [this] -> [is] -> [cool]'
).group()
'[wow] -> [this]'
Result only loads the first entry of the specified pattern.
But if you want to apply match to the case where the pattern under consideration is somewhere in the middle of the input line - you will just get None because the pattern startes somewhere in the middle of the line.
ans = re.match(r"\[(.*?)\]", "this is test [hello] [wow] test")
print(ans)
None
But the same case with search allows to find the first entry of the pattern under consideration.
re.search(r"\[(.*?)\]", "this is test [hello] test").group()
'[hello]'