通过导入re模块,使得程序可以使用正则表达式。
一.字符匹配
findall(x,str)
遍历字符串,找到正则表达式匹配的所有位置,并以列表的形式返回。
1 2 3
| import re re.findall('clocky7','clocky7 is a nice clocky7')
|
结果:
match
判断一个正则表达式是否从开始处匹配一个字符串。
1 2 3 4 5
| x = re.match('clocky7','clocky7 is a nice clocky7') print(x) y = re.match('is','clocky7 is a nice clocky7') print(y)
|
结果:
1 2
| <re.Match object; span=(0, 7), match='clocky7'> None
|
search
遍历字符串,找到正则表达式匹配的第一个位置,返回匹配对象。
1
| re.search('clocky7','he is a good clocky7')
|
结果:
1
| <re.Match object; span=(13, 20), match='clocky7'>
|
finditer()
遍历字符串,找到正则表达式匹配的所有位置,并以迭代器的形式返回
1 2 3 4
| r = re.finditer('ky','clocky is a good 1ky1') print(type(r)) for i in r: print(i)
|
结果:
1 2 3
| <class 'callable_iterator'> <re.Match object; span=(4, 6), match='ky'> <re.Match object; span=(18, 20), match='ky'>
|
二.正则的规则
hqyj
匹配文本中的hqyj
1
| re.findall('hqyj','hqyj is a nice hqyj')
|
结果:
[hqyj]
匹配h或者q或者y或者j字符
1 2
| re.findall('[hqyj]','hqyj is a nice hqyj')
|
结果:
1
| ['h', 'q', 'y', 'j', 'h', 'q', 'y', 'j']
|
[^hqyj]
匹配除了hqyj以外的其他字符
1
| re.findall('[^hqyj]','hqyj is a nice hqyj')
|
结果:
1
| [' ', 'i', 's', ' ', 'a', ' ', 'n', 'i', 'c', 'e', ' ']
|
[a-z]
匹配a~z的任意字符([0-9]也可以)
1
| re.findall('[a-z]','he is a 12345')
|
结果:
1
| ['h', 'e', 'i', 's', 'a']
|
.
匹配除了换行符以外的任意字符,空字符也可以返回
1
| re.findall('.','he is a 12345')
|
结果:
1
| ['h', 'e', ' ', 'i', 's', ' ', 'a', ' ', '1', '2', '3', '4', '5']
|
.
后还可以添加字符匹配,其表示(默认)匹配一个任意字符加上特定字符匹配
1 2 3 4 5 6 7 8 9 10
| a = re.findall('.12','he is a 12345') print(a)
b = re.findall('.+12','he is a 12345') print(b)
c = re.findall('.+?12','12 he is a 12345') print(c)
|
结果:
1 2 3
| [' 12'] ['he is a 12'] ['12 he is a 12']
|
三.特殊字符
- \d
1 2 3
| import re
re.findall(r'\d', 'abc123')
|
结果:
- \D
1 2 3 4
| import re s = '123abc456' print(re.findall('\D',s))
|
结果:
1 2 3 4 5
| ['a', 'b', 'c'] <>:4: SyntaxWarning: invalid escape sequence '\D' <>:4: SyntaxWarning: invalid escape sequence '\D' C:\Users\Clocky7\AppData\Local\Temp\ipykernel_18016\3487280925.py:4: SyntaxWarning: invalid escape sequence '\D' print(re.findall('\D',s))
|
- \s
1 2 3
| s = 'a\tb\nc\td' print(re.findall('\s',s))
|
结果:
1 2 3 4 5
| ['\t', '\n', '\t'] <>:3: SyntaxWarning: invalid escape sequence '\s' <>:3: SyntaxWarning: invalid escape sequence '\s' C:\Users\Clocky7\AppData\Local\Temp\ipykernel_18016\4213507675.py:3: SyntaxWarning: invalid escape sequence '\s' print(re.findall('\s',s))
|
- \S
1 2 3 4
| import re s = 'a\tb\tc' print(re.findall('\S',s))
|
结果:
1 2 3 4 5
| ['a', 'b', 'c'] <>:4: SyntaxWarning: invalid escape sequence '\S' <>:4: SyntaxWarning: invalid escape sequence '\S' C:\Users\Clocky7\AppData\Local\Temp\ipykernel_18016\108571978.py:4: SyntaxWarning: invalid escape sequence '\S' print(re.findall('\S',s))
|
- \w
1 2 3 4
| s = 'hello world_1234\t\n' print(re.findall('\w',s))
|
结果:
1 2 3 4 5
| ['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd', '_', '1', '2', '3', '4'] <>:4: SyntaxWarning: invalid escape sequence '\w' <>:4: SyntaxWarning: invalid escape sequence '\w' C:\Users\Clocky7\AppData\Local\Temp\ipykernel_18016\159184317.py:4: SyntaxWarning: invalid escape sequence '\w' print(re.findall('\w',s))
|
- \W
1 2 3 4
| import re s = 'a1b2c3d4e5f6 g7h8i9j0' print(re.findall('\W',s))
|
结果:
1 2 3 4 5
| [' '] <>:4: SyntaxWarning: invalid escape sequence '\W' <>:4: SyntaxWarning: invalid escape sequence '\W' C:\Users\Clocky7\AppData\Local\Temp\ipykernel_18016\603422700.py:4: SyntaxWarning: invalid escape sequence '\W' print(re.findall('\W',s))
|
- \b
1 2 3 4 5 6
|
import re s = 'clocky7 ikys a nice clocky' t = re.findall(r'ky\b',s) print(t)
|
结果:
关于 \b 实际有一个需要注意的地方:
1 2 3 4 5 6 7 8 9 10 11
| import re s = 'clocky7 is a nice clocky'
t = re.findall('ky\b',s) print(t)
print ("test \\b :lo\bve")
|
结果:
- \B
1 2 3 4 5 6
| import re s = 'clocky7 ikys a nice clocky' t = re.findall(r'ky\B', s) print(t)
|
结果:
四.数量控制
*
重复0次或多次
1 2 3 4
| import re
s = 'clocky7 is_a nice clocky77' print(re.findall('clocky7*',s))
|
结果:
+
重复1次或多次
1 2 3
| s = 'clocky7 is_a nice clocky77' print(re.findall('7+',s))
|
结果:
?
重复1次或0次
1 2 3 4 5 6
| import re s = 'clocky7 is a nice clockyy7' print(re.findall('kyy?', s))
|
结果:
{n}
重复n次
1 2 3 4 5
| import re s = 'clocky666 is a clocky66' print(re.findall('ky6{2}', s))
|
结果:
{n,}
重复n次或多次
1 2 3 4 5
| import re s = 'clocky666 is a clocky66' print(re.findall('ky6{2,}', s))
|
结果:
{n,m}
重复n到m次
1 2 3 4 5
| import re s = 'clocky666 is a clocky66,and clocky6666666' print(re.findall('ky6{2,3}',s))
|
结果:
1
| ['ky666', 'ky66', 'ky666']
|
五.分组
()
提取兴趣区域,其前后可以规定字符,最终输出的是括号的要求的表达式。
1 2 3 4
| import re s = 'clocky7 is a nice clocky78898' print(re.findall(r'clocky(\d+)', s))
|
结果:
1 2 3 4
| import re s = '伊内斯得了MVP,琴柳得了MVP,温蒂得了MVP,维什戴尔得了MVP' print(re.findall(r'(\w{2,})得了MVP', s))
|
结果:
1
| ['伊内斯', '琴柳', '温蒂', '维什戴尔']
|
六.开始与结束
1 2 3
| import re s = "clocky7 is a good clocky8" print(re.findall(r"^clocky\d", s))
|
结果:
1 2 3
| import re s = "clocky7 is a good clocky8" print(re.findall(r'clocky\d$', s))
|
结果:
注意:由于正则表达式中 * . \ {} () 等等符号具有特殊含义,如果你指定的字符正好就是这些符号,需要用 \ 进行转义
七.正则表达式常见方法
由于前面提过了 findall() , match() , serch() 方法,接下来介绍一些不同的:
- sub(a,b,c) 替换匹配成功的字符,a是被替换的,b是替换上去的,c是字符串,类似与字符串的replace函数
1 2 3
| import re s = 'clocky7 is a clock' print(re.sub('clocky7','MOl',s))
|
结果:
1 2 3
| import re s = 'clocky7 like6 arknights' print(re.split(r'\d+',s))
|
结果:
1
| ['clocky', ' like', ' arknights']
|