re — Regular expression operations — Python 3.10.5 ...
文章推薦指數: 80 %
Regular expressions use the backslash character ( '\' ) to indicate special forms or to allow special characters to be used without invoking their special ...
Navigation
index
modules|
next|
previous|
Python»
3.10.5Documentation»
ThePythonStandardLibrary»
TextProcessingServices»
re—Regularexpressionoperations
|
re—Regularexpressionoperations¶
Sourcecode:Lib/re.py
Thismoduleprovidesregularexpressionmatchingoperationssimilarto
thosefoundinPerl.
BothpatternsandstringstobesearchedcanbeUnicodestrings(str)
aswellas8-bitstrings(bytes).
However,Unicodestringsand8-bitstringscannotbemixed:
thatis,youcannotmatchaUnicodestringwithabytepatternor
vice-versa;similarly,whenaskingforasubstitution,thereplacement
stringmustbeofthesametypeasboththepatternandthesearchstring.
Regularexpressionsusethebackslashcharacter('\')toindicate
specialformsortoallowspecialcharacterstobeusedwithoutinvoking
theirspecialmeaning.ThiscollideswithPython’susageofthesame
characterforthesamepurposeinstringliterals;forexample,tomatch
aliteralbackslash,onemighthavetowrite'\\\\'asthepattern
string,becausetheregularexpressionmustbe\\,andeach
backslashmustbeexpressedas\\insidearegularPythonstring
literal.Also,pleasenotethatanyinvalidescapesequencesinPython’s
usageofthebackslashinstringliteralsnowgenerateaDeprecationWarning
andinthefuturethiswillbecomeaSyntaxError.Thisbehaviour
willhappenevenifitisavalidescapesequenceforaregularexpression.
ThesolutionistousePython’srawstringnotationforregularexpression
patterns;backslashesarenothandledinanyspecialwayinastringliteral
prefixedwith'r'.Sor"\n"isatwo-characterstringcontaining
'\'and'n',while"\n"isaone-characterstringcontaininga
newline.UsuallypatternswillbeexpressedinPythoncodeusingthisraw
stringnotation.
Itisimportanttonotethatmostregularexpressionoperationsareavailableas
module-levelfunctionsandmethodson
compiledregularexpressions.Thefunctionsareshortcuts
thatdon’trequireyoutocompilearegexobjectfirst,butmisssome
fine-tuningparameters.
Seealso
Thethird-partyregexmodule,
whichhasanAPIcompatiblewiththestandardlibraryremodule,
butoffersadditionalfunctionalityandamorethoroughUnicodesupport.
RegularExpressionSyntax¶
Aregularexpression(orRE)specifiesasetofstringsthatmatchesit;the
functionsinthismoduleletyoucheckifaparticularstringmatchesagiven
regularexpression(orifagivenregularexpressionmatchesaparticular
string,whichcomesdowntothesamething).
Regularexpressionscanbeconcatenatedtoformnewregularexpressions;ifA
andBarebothregularexpressions,thenABisalsoaregularexpression.
Ingeneral,ifastringpmatchesAandanotherstringqmatchesB,the
stringpqwillmatchAB.ThisholdsunlessAorBcontainlowprecedence
operations;boundaryconditionsbetweenAandB;orhavenumberedgroup
references.Thus,complexexpressionscaneasilybeconstructedfromsimpler
primitiveexpressionsliketheonesdescribedhere.Fordetailsofthetheory
andimplementationofregularexpressions,consulttheFriedlbook[Frie09],
oralmostanytextbookaboutcompilerconstruction.
Abriefexplanationoftheformatofregularexpressionsfollows.Forfurther
informationandagentlerpresentation,consulttheRegularExpressionHOWTO.
Regularexpressionscancontainbothspecialandordinarycharacters.Most
ordinarycharacters,like'A','a',or'0',arethesimplestregular
expressions;theysimplymatchthemselves.Youcanconcatenateordinary
characters,solastmatchesthestring'last'.(Intherestofthis
section,we’llwriteRE’sinthisspecialstyle,usuallywithoutquotes,and
stringstobematched'insinglequotes'.)
Somecharacters,like'|'or'(',arespecial.Special
characterseitherstandforclassesofordinarycharacters,oraffect
howtheregularexpressionsaroundthemareinterpreted.
Repetitionqualifiers(*,+,?,{m,n},etc)cannotbe
directlynested.Thisavoidsambiguitywiththenon-greedymodifiersuffix
?,andwithothermodifiersinotherimplementations.Toapplyasecond
repetitiontoaninnerrepetition,parenthesesmaybeused.Forexample,
theexpression(?:a{6})*matchesanymultipleofsix'a'characters.
Thespecialcharactersare:
.(Dot.)Inthedefaultmode,thismatchesanycharacterexceptanewline.If
theDOTALLflaghasbeenspecified,thismatchesanycharacter
includinganewline.
^(Caret.)Matchesthestartofthestring,andinMULTILINEmodealso
matchesimmediatelyaftereachnewline.
$Matchestheendofthestringorjustbeforethenewlineattheendofthe
string,andinMULTILINEmodealsomatchesbeforeanewline.foo
matchesboth‘foo’and‘foobar’,whiletheregularexpressionfoo$matches
only‘foo’.Moreinterestingly,searchingforfoo.$in'foo1\nfoo2\n'
matches‘foo2’normally,but‘foo1’inMULTILINEmode;searchingfor
asingle$in'foo\n'willfindtwo(empty)matches:onejustbefore
thenewline,andoneattheendofthestring.
*CausestheresultingREtomatch0ormorerepetitionsoftheprecedingRE,as
manyrepetitionsasarepossible.ab*willmatch‘a’,‘ab’,or‘a’followed
byanynumberof‘b’s.
+CausestheresultingREtomatch1ormorerepetitionsoftheprecedingRE.
ab+willmatch‘a’followedbyanynon-zeronumberof‘b’s;itwillnot
matchjust‘a’.
?CausestheresultingREtomatch0or1repetitionsoftheprecedingRE.
ab?willmatcheither‘a’or‘ab’.
*?,+?,??The'*','+',and'?'qualifiersareallgreedy;theymatch
asmuchtextaspossible.Sometimesthisbehaviourisn’tdesired;iftheRE
<.>ismatchedagainst'b['"]).*?(?P=quote)(i.e.matchingastringquotedwitheither
singleordoublequotes):
inthesamepatternitself
(?P=quote)(asshown)
\1
whenprocessingmatchobjectm
m.group('quote')
m.end('quote')(etc.)
inastringpassedtotherepl
argumentofre.sub()
\g
\g<1>
\1
(?P=name)Abackreferencetoanamedgroup;itmatcheswhatevertextwasmatchedbythe
earliergroupnamedname.
(?#...)Acomment;thecontentsoftheparenthesesaresimplyignored.
(?=...)Matchesif...matchesnext,butdoesn’tconsumeanyofthestring.Thisis
calledalookaheadassertion.Forexample,Isaac(?=Asimov)willmatch
'Isaac'onlyifit’sfollowedby'Asimov'.
(?!...)Matchesif...doesn’tmatchnext.Thisisanegativelookaheadassertion.
Forexample,Isaac(?!Asimov)willmatch'Isaac'onlyifit’snot
followedby'Asimov'.
(?<=...)Matchesifthecurrentpositioninthestringisprecededbyamatchfor...
thatendsatthecurrentposition.Thisiscalledapositivelookbehind
assertion.(?<=abc)defwillfindamatchin'abcdef',sincethe
lookbehindwillbackup3charactersandcheckifthecontainedpatternmatches.
Thecontainedpatternmustonlymatchstringsofsomefixedlength,meaningthat
abcora|bareallowed,buta*anda{3,4}arenot.Notethat
patternswhichstartwithpositivelookbehindassertionswillnotmatchatthe
beginningofthestringbeingsearched;youwillmostlikelywanttousethe
search()functionratherthanthematch()function:
>>>importre
>>>m=re.search('(?<=abc)def','abcdef')
>>>m.group(0)
'def'
Thisexamplelooksforawordfollowingahyphen:
>>>m=re.search(r'(?<=-)\w+','spam-egg')
>>>m.group(0)
'egg'
Changedinversion3.5:Addedsupportforgroupreferencesoffixedlength.
(?|$)isapooremailmatchingpattern,which
willmatchwith'
延伸文章資訊
- 1正規表達式- JavaScript - MDN Web Docs
Regular expressions are used with the RegExp methods test and exec and with the String methods ma...
- 2簡易Regular Expression 入門指南
而Regular Expression(以下簡稱RE) 其實就只是把這些規則用特定的格式轉換成符號而已。之所以需要學這一套,是因為它應用最廣泛,幾乎每個程式語言都有 ...
- 3正規表示式Regular Expression - Poy Chang
正規表示式Regular Expression ... 正規表示式通常被稱為一個模式(pattern),為用來描述或者符合一系列符合某個句法規則的字串,透過他我們可以快速搜尋符合指定模式的文字 ...
- 4Regular expression - Wikipedia
A regular expression is a sequence of characters that specifies a search pattern in text. Usually...
- 5一輩子受用的Regular Expressions -- 兼談另類的電腦學習態度
Regular Expression (簡稱regexp 或RE) 是什麼? 有人直譯為「常規表示式」; 筆者偏好意譯, 姑且叫它「字串樣版」。 它的功能是協助我們搜尋字串, 甚至對檔案內的特定...