<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont-阿里云开发者社区

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont

2019-06-01 1943

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： re模块：核心函数和方法常见的正则表达式使用match() 方法匹配字符串 match是re模块函数和正则表达式对象方法。

re模块：核心函数和方法

常见的正则表达式
使用match() 方法匹配字符串
- match是re模块函数和正则表达式对象方法。match函数试图从字符串的起始部分对模式进行匹配。如果匹配成功就返回一个匹配对象，如果匹配失败就返回None。
```
 

import  re 

m=re.match(‘foo’,’foo’) #模式匹配字符串 

m.group()        #返回整个匹配对象 

Out[10]: ‘foo’
```

m=re.match(‘foo’,’bar’) #模式并不能匹配字符串 m.group() 这个就会出现报错了，跑出AttributeError异常，打印一下是None； re.match(‘foo’,’food on the table’).group() Out[13]: ‘foo’

使用search()在一个字符串中查找模式（搜索与匹配的对比）
- search函数在任意位置对给定正则表达式模式搜索第一次出现的匹配情况，如果搜索到成功的匹配，就会返回一个匹配对象，否则None.
```
 

     m=re.match(‘foo’,’seafood’) 

     (m.groups()  #匹配失败抛出异常
```
re.search(‘foo’,’seafood’).group() #搜索成功返回第一次出现的相应字符串
Out[18]: ‘foo’

匹配多个字符串

则一匹配（|）符号

    bt='bat|bet|bit'     #正则表达式模式：bat、bet、bit
    m=re.match(bt,'bat') #'bat'是一个匹配
    m.group()
    Out[19]: 'bat'
    *****************************************
    bt='bat|bet|bit'     #正则表达式模式：bat、bet、bit
    m=re.match(bt,'blt') #'blt'没有匹配,抛出异常
    m.group()
    *****************************************
    bt='bat|bet|bit'     #正则表达式模式：bat、bet、bit
    m=re.match(bt,'he bit me!') #'bit'没有匹配,抛出异常
    m.group()
    *****************************************
    bt='bat|bet|bit'     #正则表达式模式：bat、bet、bit
    m=re.search(bt,'he bit me!') #'bit'被搜所到
    m.group()
    Out[22]: 'bit'

匹配任何单个字符

点号不能匹配一个换行符\n或者非字符（空字符串）。

   anyend='.end'
   re.match(anyend,'bend').group() #点好匹配'b'
   Out[24]: 'bend'
    *****************************************
    anyend='.end'
    re.match(anyend,'end').group() #不匹配任何字符，抛出异常
    *****************************************
    re.search('.end','The end.').group()
    Out[25]: ' end'

创建字符集（[]）

    re.match('[cr][23][dp][o2]','c3po')  #匹配'c3po'
    Out[28]: 'c3po'
    *****************************************
   re.match('[cr][23][dp][o2]','c2do')#匹配'c2do'
    Out[29]: 'c2do'
    *****************************************
    re.match('r2d2|c3po','c2do').group()  #不匹配'c2do'，抛出异常

使用findall()和finditer()查找每一次出现的位置

findall（）查询字符串中某个正则表达式模式全部的非重复出现情况，返回的是一个列表。
和 findall 类似，在字符串中找到正则表达式所匹配的所有子串，并把它们作为一个迭代器返回。

 import  re
 re.findall('car','car')
 Out[4]: ['car']
 "*****************************************"
 re.findall('car','scary')
 Out[6]: ['car']
 "*****************************************"
 re.findall('car','carry the barcard to the car')
 Out[5]: ['car', 'car', 'car']
 "*****************************************"
it =re.finditer(r'(th\w+) and (th\w+)',s,re.I)
for match in it:
    print(match.groups())
('This', 'that')
"*****************************************"
match.group(1)
Out[19]: 'This'
"*****************************************"
match.group(2)
Out[20]: 'that'

使用sub()和subn()搜索与替换

两者几乎一样，都是将字符串中所有匹配正则表达式的部分进行某种形式的替换。不同的是，subn()还返回一个表是替换的总数。

re.sub('X','Mr. Smith','attn:X\n\nDear X,\n')
Out[33]: 'attn:Mr. Smith\n\nDear Mr. Smith,\n'
"*****************************************"
re.subn('X','Mr. Smith','attn:X\n\nDear X,\n')
Out[34]: ('attn:Mr. Smith\n\nDear Mr. Smith,\n', 2)
"*****************************************"
print(re.sub('X','Mr. Smith','attn:X\n\nDear X,\n'))
attn:Mr. Smith
Dear Mr. Smith,
"*****************************************"
re.sub('[aa]','X','abcdef')
Out[36]: 'Xbcdef'
"*****************************************"
re.subn('[aa]','X','abcdef')
Out[37]: ('Xbcdef', 1)
"*****************************************"
re.sub(r'(\d{1,2})/(\d{1,2})/(\d{2}|\d{4})',r'\2/\1/\3','2/20/91')
Out[30]: '20/2/91'
"*****************************************"
re.sub(r'(\d{1,2})/(\d{1,2})/(\d{2}|\d{4})',r'\2/\1/\3','2/20/1991')
Out[31]: '20/2/1991'

在限定模式上使用split()分割字符串

split()对于字符串的工作方式是类似的分割一个固定字符串相比，他们基于正则表达式的模式分割字符串。

re.split(':','str1:Str2:str3')
Out[42]: ['str1', 'Str2', 'str3']
"*****************************************"
DATA = ('Mountain View, CA 94040',
            'Sunnyvale, CA',
            'Los Altos, 94023',
            'Cupertino 95014',
            'Palo Alto CA', )
for datum in DATA:
print(re.split(', |(?= (?:\d{5}|[A-Z]{2})) ',datum))
['Mountain View', 'CA', '94040']
['Sunnyvale', 'CA']
['Los Altos', '94023']
['Cupertino', '95014']
['Palo Alto', 'CA']

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont

re模块：核心函数和方法

热门文章

最新文章

相关课程

相关电子书