Perl Regular Expression(正規表示式) - Totui - 痞客邦

2025-01-03

文章推薦指數： 80 %

投票人數：10人

Perl Regular Expression(正規表示式) ; g, Match globally, i.e. find all occurrences. ; i, Makes the search case-insensitvie. ; m Totui 跳到主文在追求程式的夢想之下，慢慢地走上一條不歸路部落格全站分類：不設分類相簿部落格留言名片 Aug04Thu201121:19 PerlRegularExpression(正規表示式) RegularExpression通常是用來尋找"特定的字串樣式(pattern)"，也就是所謂"格式辨認(pattern-matching)"的功能。

他的運算子是=~(唸成match)和!~(唸成notmatch)。

Syntex:$string=~/regularexpression/expressionmodifier Ex:$sentence=~/Hello/ (a)Modifiers:修飾選項可有可無，它是用來對整個敘述作修正的。

g    Matchglobally,i.e.findalloccurrences. i Makesthesearchcase-insensitvie. m Ifthestringhasnew-linecharactersembeddedwithinit,themetacharacters^and$willnot workcorrectly.ThismodifiertellsPerltotreatthislineasamultipleline. o Onlycompilepatternonce. s Thecharacter.matchesanycharacterexceptanewline.Thismodifiertreatsthislineasa singleline,whichallows.tomatchanew-linecharacter. x Allowswhitespaceintheexpression. (b)Metacharacter:下面這些字元都具有特殊意義，可以讓你建立更複雜的搜尋樣式(searchingpattern)。

\    TellsPerltoacceptthefollowingcharactersasaregularcharacter;thisremovesspecialmeaningsfromanymetacharacter. ^ Matchesthebeginningofthestring,unless/misused. . Matchesanycharacterexceptanewlinecharacter,unless/sisused. $ Matchestheendofthestring,unless/misused. | Expressesalternation.Thismeanstheexpressionswillsearchformultiplepatternsinthesamestring. () Groupsexpressionstoassistinalternationandbackreferencing. [] Looksforasetofcharacters. (c)PattermQuantifier:用來表示字元的數量關係。

*     Matches0ormoretimes.                                                 + Matches1ormoretimes. ? Matches0or1times. {n} Matchesexactlyntimes. {n,} Matchesatleastntimes. {n,m} Matchesatleastntimesbutnomorethanmtimes. (d)CharacterPatterns:下列的sequence用來match一些特定格式的字元： \r     Carriagereturn(CR),ASCII13(十進位)                                        \n Newline,UNIX中代表ASCII10(十進位),DOS(Windows)系統中則是ASCII13+ASCII10(十進位) \t Tab,ASCII9(十進位) \w Matchesanalphanumericcharacter.Alphanumericalsoincludes_.即[A-Za-z0-9_]. \W Matchesanonalphanumericcharacter.即[^A-Za-z0-9_]. \s Matchesawhitespacecharacter.Thisincludesspace,tab,FormFeedandCR/LF. 即[\\t\f\r\n]. \S Matchesanon-whitespacecharacter.即[^\\t\f\r\n]. \d Matchesadigit.即[0-9]. \d Matchesanondigitcharacter.即[^0-9]. \b Matchesawordboundary. \B Matchesanonwordboundary. \033 octalchar \x1B hexchar (e)Examples: /abc/ => 找到含有abc的字串 /^abc/ => 找到開頭是abc的字串 /abc$/ => 找到結尾是abc的字串 /a|b/ => 找到有a或b的字串，也可以用來找整個字(word) /ab{2,4}c/ => 找到a後面跟著2-4個b，再跟著c的字串，若只有/ab{2,}c/則會找二個以上的b /ab*c/ => 找到a後面跟著0個或多個b，再跟著c的字串，如同/ab{0,}c/ /ab+c/ => 找到a後面跟著一個以上的b，再跟著c的字串，如同/ab{1,}c/ /a.c/ => .可以代表任何字元，除了newline字元(\n)外。

/[abc]/ => 找到含有這三個字元中任何一個的字串。

/\d/ => 找到含有數字的字串，如同/[0-9]/ /\w/ => 找到含有字母的字串，如同/[a-zA-Z0-9_]/ /\s/ => 找到含有whitespace的字串，如同/[\t\r\n\f]/ /[^abc]/ => 找到沒有abc任一字元的字串 /\*/ => 找到含有字元*的字串，在反斜線"\"後面的字元Perl會把他當作普通字元看待。

        若你不確定這個符號是否為特殊字元，乾脆全加上\以策安全。

/abc/i => 忽略abc的大小寫 /(\d+)\.(\d+)\.(\d+)\.(\d+)/ =>找到類似IP的字串，並將IP的四個數字分別存放在$1,$2,$3$4四個特殊變數中，以便在其後加以利用。

Ex:   if($x=~/(\d+)\.(\d+)\.(\d+)\.(\d+)/)   {     print"成功大學"if($1eq"140.116");   } m//gimosx => m命令可以讓你自訂pattern的分隔符號，而gimosx則是它的修飾選項，請參考(a)Modifiers。

Ex:   $url="my.machine.tw:8080/noname/test.pl";   ($host,$port,$file)=($url=~m|http://([^/:]+):{0,1}(\d*)(\S*)$|); 這個RegularExpression相當複雜，主要目的是分析指定的URL，然後取得host名稱、port號碼及對應的檔案。

我一項一項慢慢解釋： $url=~m|| m後面跟著的就是分隔符號，||裡面的就是pattern。

([^/:]+) match一個字串，裡面沒有/和:的字元。

找到的字串存放在$1中。

:{0,1}(\d*) match0或1個:，後面跟著一串數字或nothing。

找到的字串存在$2中，若找不到，$2就是空的。

(\S*)$ match一串非空白字元，並以找到的字串為結尾。

找到的字串存在$3中。

()=() ($host,$port,$file)=($1,$2,$3) 即$host="my.machine.tw" $port="8080" $file="/noname/test.pl" s/PATTERN/REPLACEMENT/egimox 這個取代的命令。

它會尋找符合的PATTERN的字串，並取代成REPLACEMENT字串。

它的修飾選項多了e選項，其他跟上面一樣，列表如下： e    Evaluatetherightsideasanexpression.                                       g Replaceglobally,i.e.alloccurrences. i Docase-insensitivepatternmatching. m Treatstringasmultiplelines. o Onlycompilepatternonce. s Treatstringassingleline. x Useextendedregularexpressions. Ex: $x=~s/\s+//g => 把所有的whitespace全部去除掉 $x=~s/([^]*):*([^]*)/$2:$1/ => 把用":"分開的兩個欄位互相對調 $path=~s|/usr/bin|/usr/local/bin| =>可以自訂分隔符號 tr/SEARCHLIST/REPLACEMENTLIST/cds 這也是取代命令，和上一個不同的是SEARCHLIST和REPLACEMENTLIST只能是普通字串，而不是RegularExpression，所以速度比較快。

它的修飾選項也比較少： c    ComplementtheSEARCHLIST                                              d Deletefoundbutunreplacedcharacters. s Squashduplicatereplacedcharacters. Ex: $x=~tr/this/that => 把"this"替換成"that" $x=~tr/a-z/A-Z/ => 把小寫字母全部替換成大寫字母 $count=$x=~tr/*/*/ => 計算$x中有幾個"*" 全站熱搜創作者介紹 Totui Totui Totui發表在痞客邦留言(0)人氣() E-mail轉寄全站分類：不設分類個人分類：正規表示法上一篇：GCC參數指令下一篇：如何使用matlab求norm數 ▲top 留言列表發表留言站方公告 [公告]2022年度農曆春節期間服務公告[公告]MIB廣告分潤計劃、PIXwallet錢包帳戶條款異動通知[公告]2021年度農曆春節期間服務公告活動快報曬美食圖領好禮！按讚並追蹤PIXstyleMe，於此貼文的留言處「PO出夏... 看更多活動好康我的好友熱門文章文章分類 Linux(3) Python(0)正規表示法(1)C/CPP(0) 程式(3) C/CPP(2)Matlab(1)Code::Blocks(1) JavaScript(0) 考試(0) HTML/CSS(1)人生(1)Windows(0)LaTeX(4)未分類文章(4) 最新文章最新留言動態訂閱文章精選文章精選 2020七月(3) 2018六月(1) 2017十二月(1) 2012四月(1) 2012二月(1) 2011九月(3) 2011八月(2) 2011七月(3) 所有文章列表文章搜尋新聞交換(RSS) 誰來我家參觀人氣本日人氣：累積人氣： QRCode POWEREDBY (登入) 回到頁首回到主文免費註冊客服中心痞客邦首頁 ©2003-2022PIXNET 關閉視窗