re — Regular expression operations — Python 3.10.5 ...
文章推薦指數: 80 %
Regular expressions use the backslash character ( '\' ) to indicate special forms or to allow special characters to be used without invoking their special ...
Navigation
index
modules|
next|
previous|
Python»
3.10.5Documentation»
ThePythonStandardLibrary»
TextProcessingServices»
re—Regularexpressionoperations
|
re—Regularexpressionoperations¶
Sourcecode:Lib/re.py
Thismoduleprovidesregularexpressionmatchingoperationssimilarto
thosefoundinPerl.
BothpatternsandstringstobesearchedcanbeUnicodestrings(str)
aswellas8-bitstrings(bytes).
However,Unicodestringsand8-bitstringscannotbemixed:
thatis,youcannotmatchaUnicodestringwithabytepatternor
vice-versa;similarly,whenaskingforasubstitution,thereplacement
stringmustbeofthesametypeasboththepatternandthesearchstring.
Regularexpressionsusethebackslashcharacter('\')toindicate
specialformsortoallowspecialcharacterstobeusedwithoutinvoking
theirspecialmeaning.ThiscollideswithPython’susageofthesame
characterforthesamepurposeinstringliterals;forexample,tomatch
aliteralbackslash,onemighthavetowrite'\\\\'asthepattern
string,becausetheregularexpressionmustbe\\,andeach
backslashmustbeexpressedas\\insidearegularPythonstring
literal.Also,pleasenotethatanyinvalidescapesequencesinPython’s
usageofthebackslashinstringliteralsnowgenerateaDeprecationWarning
andinthefuturethiswillbecomeaSyntaxError.Thisbehaviour
willhappenevenifitisavalidescapesequenceforaregularexpression.
ThesolutionistousePython’srawstringnotationforregularexpression
patterns;backslashesarenothandledinanyspecialwayinastringliteral
prefixedwith'r'.Sor"\n"isatwo-characterstringcontaining
'\'and'n',while"\n"isaone-characterstringcontaininga
newline.UsuallypatternswillbeexpressedinPythoncodeusingthisraw
stringnotation.
Itisimportanttonotethatmostregularexpressionoperationsareavailableas
module-levelfunctionsandmethodson
compiledregularexpressions.Thefunctionsareshortcuts
thatdon’trequireyoutocompilearegexobjectfirst,butmisssome
fine-tuningparameters.
Seealso
Thethird-partyregexmodule,
whichhasanAPIcompatiblewiththestandardlibraryremodule,
butoffersadditionalfunctionalityandamorethoroughUnicodesupport.
RegularExpressionSyntax¶
Aregularexpression(orRE)specifiesasetofstringsthatmatchesit;the
functionsinthismoduleletyoucheckifaparticularstringmatchesagiven
regularexpression(orifagivenregularexpressionmatchesaparticular
string,whichcomesdowntothesamething).
Regularexpressionscanbeconcatenatedtoformnewregularexpressions;ifA
andBarebothregularexpressions,thenABisalsoaregularexpression.
Ingeneral,ifastringpmatchesAandanotherstringqmatchesB,the
stringpqwillmatchAB.ThisholdsunlessAorBcontainlowprecedence
operations;boundaryconditionsbetweenAandB;orhavenumberedgroup
references.Thus,complexexpressionscaneasilybeconstructedfromsimpler
primitiveexpressionsliketheonesdescribedhere.Fordetailsofthetheory
andimplementationofregularexpressions,consulttheFriedlbook[Frie09],
oralmostanytextbookaboutcompilerconstruction.
Abriefexplanationoftheformatofregularexpressionsfollows.Forfurther
informationandagentlerpresentation,consulttheRegularExpressionHOWTO.
Regularexpressionscancontainbothspecialandordinarycharacters.Most
ordinarycharacters,like'A','a',or'0',arethesimplestregular
expressions;theysimplymatchthemselves.Youcanconcatenateordinary
characters,solastmatchesthestring'last'.(Intherestofthis
section,we’llwriteRE’sinthisspecialstyle,usuallywithoutquotes,and
stringstobematched'insinglequotes'.)
Somecharacters,like'|'or'(',arespecial.Special
characterseitherstandforclassesofordinarycharacters,oraffect
howtheregularexpressionsaroundthemareinterpreted.
Repetitionqualifiers(*,+,?,{m,n},etc)cannotbe
directlynested.Thisavoidsambiguitywiththenon-greedymodifiersuffix
?,andwithothermodifiersinotherimplementations.Toapplyasecond
repetitiontoaninnerrepetition,parenthesesmaybeused.Forexample,
theexpression(?:a{6})*matchesanymultipleofsix'a'characters.
Thespecialcharactersare:
.(Dot.)Inthedefaultmode,thismatchesanycharacterexceptanewline.If
theDOTALLflaghasbeenspecified,thismatchesanycharacter
includinganewline.
^(Caret.)Matchesthestartofthestring,andinMULTILINEmodealso
matchesimmediatelyaftereachnewline.
$Matchestheendofthestringorjustbeforethenewlineattheendofthe
string,andinMULTILINEmodealsomatchesbeforeanewline.foo
matchesboth‘foo’and‘foobar’,whiletheregularexpressionfoo$matches
only‘foo’.Moreinterestingly,searchingforfoo.$in'foo1\nfoo2\n'
matches‘foo2’normally,but‘foo1’inMULTILINEmode;searchingfor
asingle$in'foo\n'willfindtwo(empty)matches:onejustbefore
thenewline,andoneattheendofthestring.
*CausestheresultingREtomatch0ormorerepetitionsoftheprecedingRE,as
manyrepetitionsasarepossible.ab*willmatch‘a’,‘ab’,or‘a’followed
byanynumberof‘b’s.
+CausestheresultingREtomatch1ormorerepetitionsoftheprecedingRE.
ab+willmatch‘a’followedbyanynon-zeronumberof‘b’s;itwillnot
matchjust‘a’.
?CausestheresultingREtomatch0or1repetitionsoftheprecedingRE.
ab?willmatcheither‘a’or‘ab’.
*?,+?,??The'*','+',and'?'qualifiersareallgreedy;theymatch
asmuchtextaspossible.Sometimesthisbehaviourisn’tdesired;iftheRE
<.>ismatchedagainst'b['"]).*?(?P=quote)(i.e.matchingastringquotedwitheither
singleordoublequotes):
inthesamepatternitself
(?P=quote)(asshown)
\1
whenprocessingmatchobjectm
m.group('quote')
m.end('quote')(etc.)
inastringpassedtotherepl
argumentofre.sub()
\g
\g<1>
\1
(?P=name)Abackreferencetoanamedgroup;itmatcheswhatevertextwasmatchedbythe
earliergroupnamedname.
(?#...)Acomment;thecontentsoftheparenthesesaresimplyignored.
(?=...)Matchesif...matchesnext,butdoesn’tconsumeanyofthestring.Thisis
calledalookaheadassertion.Forexample,Isaac(?=Asimov)willmatch
'Isaac'onlyifit’sfollowedby'Asimov'.
(?!...)Matchesif...doesn’tmatchnext.Thisisanegativelookaheadassertion.
Forexample,Isaac(?!Asimov)willmatch'Isaac'onlyifit’snot
followedby'Asimov'.
(?<=...)Matchesifthecurrentpositioninthestringisprecededbyamatchfor...
thatendsatthecurrentposition.Thisiscalledapositivelookbehind
assertion.(?<=abc)defwillfindamatchin'abcdef',sincethe
lookbehindwillbackup3charactersandcheckifthecontainedpatternmatches.
Thecontainedpatternmustonlymatchstringsofsomefixedlength,meaningthat
abcora|bareallowed,buta*anda{3,4}arenot.Notethat
patternswhichstartwithpositivelookbehindassertionswillnotmatchatthe
beginningofthestringbeingsearched;youwillmostlikelywanttousethe
search()functionratherthanthematch()function:
>>>importre
>>>m=re.search('(?<=abc)def','abcdef')
>>>m.group(0)
'def'
Thisexamplelooksforawordfollowingahyphen:
>>>m=re.search(r'(?<=-)\w+','spam-egg')
>>>m.group(0)
'egg'
Changedinversion3.5:Addedsupportforgroupreferencesoffixedlength.
(?|$)isapooremailmatchingpattern,which
willmatchwith'
延伸文章資訊
- 1一輩子受用的Regular Expressions -- 兼談另類的電腦學習態度
Regular Expression (簡稱regexp 或RE) 是什麼? 有人直譯為「常規表示式」; 筆者偏好意譯, 姑且叫它「字串樣版」。 它的功能是協助我們搜尋字串, 甚至對檔案內的特定...
- 2正規表示式Regular Expression - Poy Chang
正規表示式Regular Expression ... 正規表示式通常被稱為一個模式(pattern),為用來描述或者符合一系列符合某個句法規則的字串,透過他我們可以快速搜尋符合指定模式的文字 ...
- 3Regular Expression (regex),要成為GA專家一定要懂的正規 ...
學會使用正規表示式,或稱規則運算式(Regular Expression, RegEx) 就很重要了!常見的規則運算式符號在Google的官方指南裡分成萬用字元、錨定文字、其他以及分組, ...
- 4正規表示式- 維基百科,自由的百科全書
Regular Expression的Regular一般被譯為正規、正則或常規。 此處的Regular即是規則、規律的意思,Regular Expression即「描述某種規則的表達式」之 ...
- 5規則運算式語言- 快速參考 - Microsoft Docs
expression 會解譯為零寬度判斷提示。 若要避免具名或編號的擷取群組模棱兩可,您可以選擇性地使用明確的判斷提示,如下所示: