正規表示式（Regular Expression） - HackMD

2025-02-07

文章推薦指數： 80 %

投票人數：10人

正規表示式（英語：Regular Expression，常簡寫為regex、regexp或RE），又稱正規表達式、正規表示法、規則運算式、常規表示法。

正規表示式用來操作字串，透過某個 ... Published LinkedwithGitHub Like Bookmark Subscribe Edit #正規表示式（RegularExpression） ##介紹 **正規表示式**（英語：RegularExpression，常簡寫為regex、regexp或RE），又稱**正規表達式**、**正規表示法**、**規則運算式**、**常規表示法**。

正規表示式用來操作字串，透過某個規則（pattern）的來檢索、搜尋字串裡符合條件的文字。

所以也常用在對純文字的文件進行解析，例如：txt、html、xml、json檔案，從中萃取出所需要的文字，或是針對純文字檔案來進行處理。

Python中做正規運算式的模組為re，首先要設定好「規則（pattern）」，並提供要進行處理的「字串（string）」，然後在透過呼叫`re`模組中相關功能的函式（function）來進行處理。

>提示： >「規則（pattern）」通常會用使用Python的`r`開頭的原始字串（rawstring）格式，這是因為正規表示式的規則中有符號跟Python字串中的跳脫符號會互相衝突（例如反斜線`\\`），所以必須使用原始字串來作為規則字串。

####網路資源 -RegularExpression測驗：https://regexone.com/ -RegularExpression測試：https://regex101.com/ ##用途 1.尋找資料（`findall`) 2.驗證資料（`search`、`match`） 3.抽取資料（`split`、`sub`） ##常用re模組函數： |函數|說明| |--------------------------------------|------------------------------------------------------------| |`findall(pattern,string)`|回傳string中所有與pattern相匹配的全部字串，返回形式為陣列。

| |`finditer(pattern,string)`|回傳string中所有與pattern相匹配的全部字串，返回形式為迭代器。

| |`search(pattern,string)`|回傳從string中「第一個」包含pattern的字串，沒有找到則回傳None。

| |`match(pattern,string)`|匹配字串的開頭，如果有包含pattern，則匹配成功，回傳Match物件，失敗則回傳None。

若要完全匹配pattern，必須以$結尾。

| |`fullmatch(pattern,string)`|判斷string是否與配對形式字串pattern完全相符，如果完全相符就回傳配對物件，不完全相符就回傳None。

| |`compile(pattern)`|以pattern字串當參數，回傳re.compile()物件，提供其他支援正規表示式的函式使用。

| |`split(pattern,string,maxsplit=0)`|將string以配對形式字串pattern拆解，結果回傳拆解後的串列。

| |`sub(pattern,repl,string,count=0)`|依據pattern及repl對string進行處理，結果回傳處理過的新字串。

| |`subn(pattern,repl,string,count=0)`|依據pattern及repl對string進行處理，結果回傳處理過的序對。

| |`escape(pattern)`|將pattern中的特殊字元加入反斜線，結果回傳新字串。

| |`purge()`|清除正規運算式的內部緩存。

| >備註： > >**re.match()與re.search()的差別** > >re.match只有匹配字串的開頭，如果字符串開頭就不符合正則表達式，則匹配失敗，函式回傳None；而re.search()則是整個字串都會做匹配，只要找到一個匹配就表示成功，整個字串都沒有匹配才會回傳None。

##中介字元（Metacharacters）說明： |中介字元|說明|範例|說明| |--------|------------------------------------------------|-----------------|----------------------------------------------| |[]|字元的集合。

|[a-m]|a~m之間的小寫英文字| |\|發出特殊序列的信號（也可以用於轉義特殊字符）。

|\d|只要數字| |.|除了新行符號外的任意字元。

|he..o|he字串後接著兩個字元，然後接著是o| |^|字串以此為開頭。

|^hello|字串開頭為hello| |$|以此為結尾的字串。

|world$|字串結尾為world| |*|字元或字串出現任意次數（包含０次）。

|aix*|ai、aix、aix和aixx或更多x都符合。

| |?|字元或字串出現0或1次。

|aix?|僅ai、aix符合。

| |+|字元或字串至少出現一次。

|aix+|僅aix符合。

| |{m,n}|指定字元或字串出現的m~n之間的次數。

|al{2}
al{3,6}|a後面連續2個l的字串
a後面連續3到6個l的字串| |\||單一字元或群組的或，例如'a\|b'為'a'或'b'。

|falls\|stays|字串包含falls或是stays| |()|對小括弧內的字元形成群組。

||| ##特別序列（SpecialSequences）說明： |特別序列|說明| |--------|--------------------------------| |\A|字串的開頭字元。

| |\b|單字的界線字元。

| |\B|字元的界線字元。

| |\d|數字，從0到9。

| |\D|非數字。

| |\s|各種空白符號，包含換行符號\n。

| |\S|非空白符號。

| |\w|任意文字字元，包括數字。

| |\W|非文字字元，包括空白符號。

| |\Z|字串的結尾字元。

| >補充： > >`\A`、`\Z`和`^`、`$`有類似的作用，差別在於前者會以全部內容為主，後者會以換行為結束。

###`findall()` ####範例ㄧ：找出a~m之間的小寫英文字 ``` importre txt='TheraininSpain' x=re.findall(r'[a-m]',txt) print(x) ``` 輸出： ``` ['h','e','a','i','i','a','i'] ``` ####範例二：找出數字 ``` importre txt='Thatwillbe59dollars' x=re.findall(r'\d',txt) print(x) ``` ``` ['5','9'] ``` ####範例三：找出he字串後接著兩個字元，然後接著是o ``` importre txt='helloworld' x=re.findall('he..o',txt) print(x) ``` 輸出： ``` ['hello'] ``` ####範例四：字串開頭必須為hello ``` importre txt='helloworld' x=re.findall(r'^hello',txt) ifx: print("Yes,thestringstartswith'hello'") else: print('Nomatch') ``` 輸出： ``` Yes,thestringstartswith'hello' ``` ####範例五：字串結尾為world ``` importre txt='helloworld' x=re.findall(r'world$',txt) ifx: print("'Yes,thestringendswith'world'") else: print('Nomatch') ``` 輸出： ``` Yes,thestringendswith'world' ``` ####範例六：找出ai字串後面有0~多個x字元的字串 ``` importre txt='TheraininSpainfallsmainlyintheplain!' x=re.findall(r'aix*',txt) print(x) ifx: print('Yes,thereisatleastonematch!') else: print('Nomatch') ``` 輸出： ``` ['ai','ai','ai','ai'] Yes,thereisatleastonematch! ``` ####範例七：找出ai字串後面有1~多個x字元的字串 ``` importre txt='TheraininSpainfallsmainlyintheplain!' x=re.findall(r'aix+',txt) print(x) ifx: print('Yes,thereisatleastonematch!') else: print('Nomatch') ``` 輸出： ``` [] Nomatch ``` ####範例八：找出a後面連續2個l的字串 ``` importre txt='TheraininSpainfallsmainlyintheplain!' x=re.findall(r'al{2}',txt) print(x) ifx: print('Yes,thereisatleastonematch!') else: print('Nomatch') ``` 輸出： ``` ['all'] Yes,thereisatleastonematch! ``` ####範例九：字串包含falls或是stays ``` importre txt='TheraininSpainfallsmainlyintheplain!' #Checkifthestringcontainseither'falls'or'stays': x=re.findall(r'falls|stays',txt) print(x) ifx: print('Yes,thereisatleastonematch!') else: print('Nomatch') ``` 輸出： ``` ['falls'] Yes,thereisatleastonematch! ``` ### ###`search()` ####找出第一個空白字元的位置 ``` importre txt='TheraininSpain' x=re.search(r'\s',txt) print(r'Thefirstwhite-spacecharacterislocatedinposition:',x.start()) ``` 輸出： ``` Thefirstwhite-spacecharacterislocatedinposition:3 ``` ####找出Portugal是否出現在字串中 ``` importre txt='TheraininSpain' x=re.search(r'Portugal',txt) print(x) ``` 輸出： ``` None ``` ###`split` ####使用空白字元分割字串。

``` importre txt='TheraininSpain' x=re.split(r'\s',txt) print(x) ``` 輸出： ``` ['The','rain','in','Spain'] ``` ####使用空白字元分割字串，並限制最大分割次數。

``` importre #Splitthestringatthefirstwhite-spacecharacter: txt='TheraininSpain' x=re.split(r'\s',txt,1) print(x) ``` ``` ['The','raininSpain'] ``` ###`sub()` ####使用9取代所有的空白字元： ``` importre #Replaceallwhite-spacecharacterswiththedigit'9': txt='TheraininSpain' x=re.sub(r'\s','9',txt) print(x) ``` 輸出： ``` The9rain9in9Spain ``` ####使用9取代所有的空白字元，並限制最大的取代次數： ``` importre #Replacethefirsttwooccurrencesofawhite-spacecharacterwiththedigit9: txt='TheraininSpain' x=re.sub(r'\s','9',txt,2) print(x) ``` 輸出： ``` The9rain9inSpain ``` ##集合範例 |集合|說明| |:-------------|:-----------------------------------------------------------| |\[arn\]|回傳字串中含有a、r或n的小寫字元。

| |\[a-n\]|回傳字串中含有a~n之間的任意小寫字元。

| |\[^arn\]|回傳任意字元，除了，a、r和n。

| |[0123]|回傳字串中含有0、1、2或3的數字。

| |\[0-9\]|回傳字串中含有0~9之間的數字。

| |\[0-5\]\[0-9\]|回傳00~59之間的數字。

| |\[a-zA-Z\]|回傳a~z之間的大寫和小寫字元。

| |\[+\]|回傳字串中的+號（`+`,`*`,`.`,`|`,`()`,`$`,`{}`沒有特殊作用，只是單純代表+號）。

| ##比較`match`、`searh`、`findall`、`finditer`差異 ||match|search|findall|finditer| |--------|-----------------------------------------------|-------------------------------------|---------------------------------|---------------------------------------| |說明|字串開頭開始，如果包含pattern子字串則成功|整個字串中只要有出現pattern字串就成功|回傳字串中所有符合pattern的子字串|回傳字串中所有符合pattern的子字串迭代器| |成功回傳|Match物件|Match物件|清單|Match物件迭代器| |失敗回傳|None|None|空清單|空迭代器| |其它|如果要整個字串符合pattern，則pattern必須是$結尾|||| ##應用-以正規表示式來搜尋網頁屬性 ####利用正規表達式來找出多個符合條件的標籤： ``` frombs4importBeautifulSoup importre html_doc='''

這是HTML文件標題牛肉乾豬肉乾羊肉乾鳥肉乾雞肉乾 ''' #建立BeautifulSoup物件解析HTML文件 soup=BeautifulSoup(html_doc,'lxml') items=soup.find_all(id=re.compile(r'^item')) foriinitems:print(i) ``` 輸出： ``` 牛肉乾豬肉乾羊肉乾鳥肉乾雞肉乾 ``` ####驗證手機號碼 1.一共有10位數 2.開頭要是09 3.每一個字元都要是數字 ``` ^09\d{8}$ ``` #####範例： ``` importre phones=['0912345678','023456789','096312345'] forpinphones: result=re.findall(r'^09\d{8}$',p) iflen(result)>0: print(p+'是手機號碼') else: print(p+'不是手機號碼') ``` ##練習ㄧ、將下面的email清單，只抽取出帳號的部分，其餘去掉： ``` [email protected] [email protected] [email protected] [email protected] [email protected] ``` 只留下： ``` aaronho andyliu apple abner amberok ``` 二、以下哪些字串可以配對到這個RE：`/\w\w\w.\d\d\d/`。

``` 1.000000 2.9999999 3.aaaaaaa 4.0a0a000 5.0a0a0a0 6.cc3c777 7.cccc777 ``` 三、找出下面文章包含tion和sion的單字： ``` Aregularexpression(shortenedasregexorregexp;alsoreferredtoasrational expression)isasequenceofcharactersthatdefineasearchpattern.Usually suchpatternsareusedbystring-searchingalgorithmsfor"find"or"findand replace"operationsonstrings,orforinputvalidation.Itisatechnique developedintheoreticalcomputerscienceandformallanguagetheory. ``` ####答案ㄧ、 ``` @.+$ ``` 二、 ``` 9999999 0a0a000 cc3c777 cccc777 ``` 三、 ``` importre txt=''' Aregularexpression(shortenedasregexorregexp;alsoreferredtoasrational expression)isasequenceofcharactersthatdefineasearchpattern.Usually suchpatternsareusedbystring-searchingalgorithmsfor"find"or"findand replace"operationsonstrings,orforinputvalidation.Itisatechnique developedintheoreticalcomputerscienceandformallanguagetheory. ''' result=re.findall(r'\w*tion\w*|\w*sion\w*',txt) print(result) ``` ``` × Signin Email Password Forgotpassword or Byclickingbelow,youagreetoourtermsofservice. SigninviaFacebook SigninviaTwitter SigninviaGitHub SigninviaDropbox SigninviaGoogle NewtoHackMD?Signup