Regular Expression (Regex) Tutorial
文章推薦指數: 80 %
A range expression consists of two characters separated by a hyphen ( - ). It matches any single character that sorts between the two characters, inclusive. For ...
TABLEOFCONTENTS(HIDE)
RegularExpressions(Regex)
RegularExpression,orregexorregexpinshort,isextremelyandamazinglypowerfulinsearchingandmanipulatingtextstrings,particularlyinprocessingtextfiles.Onelineofregexcaneasilyreplaceseveraldozenlinesofprogrammingcodes.
Regexissupportedinallthescriptinglanguages(suchasPerl,Python,PHP,andJavaScript);aswellasgeneralpurposeprogramminglanguagessuchasJava;andevenwordprocessorssuchasWordforsearchingtexts.Gettingstartedwithregexmaynotbeeasyduetoitsgeekysyntax,butitiscertainlyworththeinvestmentofyourtime.
RegexByExamples
Thissectionismeantforthosewhoneedtorefreshtheirmemory.Fornovices,gotothenextsectiontolearnthesyntax,beforelookingattheseexamples.
RegexSyntaxSummary
Character:Allcharacters,exceptthosehavingspecialmeaninginregex,matchesthemselves.E.g.,theregexxmatchessubstring"x";regex9matches"9";regex=matches"=";andregex@matches"@".
SpecialRegexCharacters:Thesecharactershavespecialmeaninginregex(tobediscussedbelow):.,+,*,?,^,$,(,),[,],{,},|,\.
EscapeSequences(\char):
Tomatchacharacterhavingspecialmeaninginregex,youneedtouseaescapesequenceprefixwithabackslash(\).E.g.,\.matches".";regex\+matches"+";andregex\(matches"(".
Youalsoneedtouseregex\\tomatch"\"(back-slash).
Regexrecognizescommonescapesequencessuchas\nfornewline,\tfortab,\rforcarriage-return,\nnnforaupto3-digitoctalnumber,\xhhforatwo-digithexcode,\uhhhhfora4-digitUnicode,\uhhhhhhhhfora8-digitUnicode.
ASequenceofCharacters(orString):Stringscanbematchedviacombiningasequenceofcharacters(calledsub-expressions).E.g.,theregexSaturdaymatches"Saturday".Thematching,bydefault,iscase-sensitive,butcanbesettocase-insensitiveviamodifier.
OROperator(|):E.g.,theregexfour|4acceptsstrings"four"or"4".
Characterclass(orBracketList):
[...]:AcceptANYONEofthecharacterwithinthesquarebracket,e.g.,[aeiou]matches"a","e","i","o"or"u".
[.-.](RangeExpression):AcceptANYONEofthecharacterintherange,e.g.,[0-9]matchesanydigit;[A-Za-z]matchesanyuppercaseorlowercaseletters.
[^...]:NOTONEofthecharacter,e.g.,[^0-9]matchesanynon-digit.
Onlythesefourcharactersrequireescapesequenceinsidethebracketlist:^,-,],\.
OccurrenceIndicators(orRepetitionOperators):
+:oneormore(1+),e.g.,[0-9]+matchesoneormoredigitssuchas'123','000'.
*:zeroormore(0+),e.g.,[0-9]*matcheszeroormoredigits.Itacceptsallthosein[0-9]+plustheemptystring.
?:zeroorone(optional),e.g.,[+-]?matchesanoptional"+","-",oranemptystring.
{m,n}:mton(bothinclusive)
{m}:exactlymtimes
{m,}:mormore(m+)
Metacharacters:matchesacharacter
.(dot):ANYONEcharacterexceptnewline.Sameas[^\n]
\d,\D:ANYONEdigit/non-digitcharacter.Digitsare[0-9]
\w,\W:ANYONEword/non-wordcharacter.ForASCII,wordcharactersare[a-zA-Z0-9_]
\s,\S:ANYONEspace/non-spacecharacter.ForASCII,whitespacecharactersare[\n\r\t\f]
PositionAnchors:doesnotmatchcharacter,butpositionsuchasstart-of-line,end-of-line,start-of-wordandend-of-word.
^,$:start-of-lineandend-of-linerespectively.E.g.,^[0-9]$matchesanumericstring.
\b:boundaryofword,i.e.,start-of-wordorend-of-word.E.g.,\bcat\bmatchestheword"cat"intheinputstring.
\B:Inverseof\b,i.e.,non-start-of-wordornon-end-of-word.
\:start-of-wordandend-of-wordrespectively,similarto\b.E.g.,\Hello,
CodeExampleinPHP
[TODO]
Example:FullNumericStrings^[0-9]+$or^\d+$
Theleading^andthetrailing$areknownaspositionanchors,whichmatchthestartandendpositionsoftheline,respectively.Astheresult,theentireinputstringshallbematchedfully,insteadofaportionoftheinputstring(substring).
Thisregexmatchesanynon-emptynumericstrings(comprisingofdigits0to9),e.g.,"0"and"12345".Itdoesnotmatchwith""(emptystring),"abc","a123","abc123xyz",etc.However,italsomatches"000","0123"and"0001"withleadingzeros.
Example:PositiveIntegerLiterals[1-9][0-9]*|0or[1-9]\d*|0
[1-9]matchesanycharacterbetween1to9;[0-9]*matcheszeroormoredigits.The*isanoccurrenceindicatorrepresentingzeroormoreoccurrences.Together,[1-9][0-9]*matchesanynumberswithoutaleadingzero.
|representstheORoperator;whichisusedtoincludethenumber0.
Thisexpressionmatches"0"and"123";butdoesnotmatch"000"and"0123"(butseebelow).
Youcanreplace[0-9]bymetacharacter\d,butnot[1-9].
Wedidnotusepositionanchors^and$inthisregex.Hence,itcanmatchanypartsoftheinputstring.Forexamples,
Iftheinputstringis"abc123xyz",itmatchesthesubstring"123".
Iftheinputstringis"abcxyz",itmatchesnothing.
Iftheinputstringis"abc123xyz456_0",itmatchessubstrings"123","456"and"0"(threematches).
Iftheinputstringis"0012300",itmatchessubstrings:"0","0"and"12300"(threematches)!!!
Example:FullIntegerLiterals^[+-]?[1-9][0-9]*|0$or^[+-]?[1-9]\d*|0$
ThisregexmatchanIntegerliteral(forentirestringwiththepositionanchors),bothpositive,negativeandzero.
[+-]matcheseither+or-sign.?isanoccurrenceindicatordenoting0or1occurrence,i.e.optional.Hence,[+-]?matchesanoptionalleading+or-sign.
Wehavecoveredthreeoccurrenceindicators:+foroneormore,*forzeroormore,and?forzeroorone.
Example:Identifiers(orNames)[a-zA-Z_][0-9a-zA-Z_]*or[a-zA-Z_]\w*
Beginwithonelettersorunderscore,followedbyzeroormoredigits,lettersandunderscore.
Youcanusemetacharacter\wforawordcharacter[a-zA-Z0-9_].Recallthatmetacharacter\dcanbeusedforadigit[0-9].
Example:ImageFilenames^\w+\.(gif|png|jpg|jpeg)$
Thepositionanchors^and$matchthebeginningandtheendingoftheinputstring,respectively.Thatis,thisregexshallmatchtheentireinputstring,insteadofapartoftheinputstring(substring).
\w+matchesoneormorewordcharacters(sameas[a-zA-Z0-9_]+).
\.matchesthedot(.)character.Weneedtouse\.torepresent.as.hasspecialmeaninginregex.The\isknownastheescapecode,whichrestoretheoriginalliteralmeaningofthefollowingcharacter.Similarly,*,+,?(occurrenceindicators),^,$(positionanchors)havespecialmeaninginregex.Youneedtouseanescapecodetomatchwiththesecharacters.
(gif|png|jpg|jpeg)matcheseither"gif","png","jpg"or"jpeg".The|denotes"OR"operator.Theparenthesesareusedforgroupingtheselections.
Themodifieriaftertheregexspecifiescase-insensitivematching(applicabletosomelanguageslikePerlandJavaScriptonly).Thatis,itaccepts"test.GIF"and"TesT.Gif".
Example:EmailAddresses^\w+([.-]?\w+)*@\w+([.-]?\w+)*(\.\w{2,3})+$
Thepositionanchors^and$matchthebeginningandtheendingoftheinputstring,respectively.Thatis,thisregexshallmatchtheentireinputstring,insteadofapartoftheinputstring(substring).
\w+matches1ormorewordcharacters(sameas[a-zA-Z0-9_]+).
[.-]?matchesanoptionalcharacter.or-.Althoughdot(.)hasspecialmeaninginregex,inacharacterclass(squarebrackets)anycharactersexcept^,-,]or\isaliteral,anddonotrequireescapesequence.
([.-]?\w+)*matches0ormoreoccurrencesof[.-]?\w+.
Thesub-expression\w+([.-]?\w+)*isusedtomatchtheusernameintheemail,[email protected][a-zA-Z0-9_],followedbymorewordcharactersor.or-.However,a.or-mustfollowbyawordcharacter[a-zA-Z0-9_].Thatis,theinputstringcannotbeginwith.or-;andcannotcontain"..","--",".-"or"-.".Exampleofvalidstringare"a.1-2-3".
[email protected],allcharactersotherthanthosehavingspecialmeaningsmatchesitself,e.g.,amatchesa,bmatchesb,andetc.
Again,thesub-expression\w+([.-]?\w+)*isusedtomatchtheemaildomainname,withthesamepatternastheusernamedescribedabove.
Thesub-expression\.\w{2,3}matchesa.followedbytwoorthreewordcharacters,e.g.,".com",".edu",".us",".uk",".co".
(\.\w{2,3})+specifiesthattheabovesub-expressioncouldoccuroneormoretimes,e.g.,".com",".co.uk",".edu.sg"etc.
Exercise:Interpretthisregex,whichprovideanotherrepresentationofemailaddress:^[\w\-\.\+]+\@[a-zA-Z0-9\.\-]+\.[a-zA-z0-9]{2,4}$.
Example:SwappingWordsusingParenthesizedBack-References^(\S+)\s+(\S+)$and$2$1
The^and$matchthebeginningandendingoftheinputstring,respectively.
The\s(lowercases)matchesawhitespace(blank,tab\t,andnewline\ror\n).Ontheotherhand,the\S+(uppercaseS)matchesanythingthatisNOTmatchedby\s,i.e.,non-whitespace.Inregex,theuppercasemetacharacterdenotestheinverseofthelowercasecounterpart,forexample,\wforwordcharacterand\Wfornon-wordcharacter;\dfordigitand\Dornon-digit.
Theaboveregexmatchestwowords(withoutwhitespaces)separatedbyoneormorewhitespaces.
Parentheses()havetwomeaningsinregex:
togroupsub-expressions,e.g.,(abc)*
toprovideaso-calledback-referenceforcapturingandextractingmatches.
Theparenthesesin(\S+),calledparenthesizedback-reference,isusedtoextractthematchedsubstringfromtheinputstring.Inthisregex,therearetwo(\S+),matchthefirsttwowords,separatedbyoneormorewhitespaces\s+.Thetwomatchedwordsareextractedfromtheinputstringandtypicallykeptinspecialvariables$1and$2(or\1and\2inPython),respectively.
Toswapthetwowords,youcanaccessthespecialvariables,andprint"$2$1"(viaaprogramminglanguage);orsubstituteoperator"s/(\S+)\s+(\S+)/$2$1/"(inPerl).
CodeExampleinPython
Pythonkeepstheparenthesizedbackreferencesin\1,\2,....Also,\0keepstheentirematch.
$python3
>>>re.findall(r'^(\S+)\s+(\S+)$','appleorange')
[('apple','orange')]#Alistoftuplesifthepatternhasmorethanonebackreferences
#Backreferencesarekeptin\1,\2,\3,etc.
>>>re.sub(r'^(\S+)\s+(\S+)$',r'\2\1','appleorange')#Prefixrforrawstringwhichignoresescape
'orangeapple'
>>>re.sub(r'^(\S+)\s+(\S+)$','\\2\\1','appleorange')#Needtouse\\for\forregularstring
'orangeapple'
CodeExampleinJava
Javakeepstheparenthesizedbackreferencesin$1,$2,....
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
importjava.util.regex.Pattern;
importjava.util.regex.Matcher;
publicclassTestRegexSwapWords{
publicstaticvoidmain(String[]args){
StringinputStr="appleorange";
StringregexStr="^(\\S+)\\s+(\\S+)$";//Regexpatterntobematched
StringreplacementStr="$2$1";//Replacementpatternwithbackreferences
//Step1:AllocateaPatternobjecttocompilearegex
Patternpattern=Pattern.compile(regexStr);
//Step2:AllocateaMatcherobjectfromthePattern,andprovidetheinput
Matchermatcher=pattern.matcher(inputStr);
//Step3:Performthematchingandprocessthematchingresult
StringoutputStr=matcher.replaceFirst(replacementStr);//firstmatchonly
System.out.println(outputStr);//Output:orangeapple
}
}
Example:HTTPAddresses^http:\/\/\S+(\/\S+)*(\/)?$
Beginwithhttp://.Takenotethatyoumayneedtowrite/as\/withanescapecodeinsomelanguages(JavaScript,Perl).
Followedby\S+,oneormorenon-whitespaces,forthedomainname.
Followedby(\/\S+)*,zeroormore"/...",forthesub-directories.
Followedby(\/)?,anoptional(0or1)trailing/,fordirectoryrequest.
Example:RegexPatternsinAngularJS
ThefollowingrathercomplexregexpatternsareusedbyAngularJSinJavaScriptsyntax:
varISO_DATE_REGEXP=/^\d{4,}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d\.\d+(?:[+-][0-2]\d:[0-5]\d|Z)$/;
varURL_REGEXP=/^[a-z][a-z\d.+-]*:\/*(?:[^:@]+(?::[^@]+)?@)?(?:[^\s:/?#]+|\[[a-f\d:]+])(?::\d+)?(?:\/[^?#]*)?(?:\?[^#]*)?(?:#.*)?$/i;
varEMAIL_REGEXP=/^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$/;
//Matchbothuppercaseandlowercaseletters,single-quotebutnotdouble-quote
varNUMBER_REGEXP=/^\s*(-|\+)?(\d+|(\d*(\.\d*)))([eE][+-]?\d+)?\s*$/;
varDATE_REGEXP=/^(\d{4,})-(\d{2})-(\d{2})$/;
varDATETIMELOCAL_REGEXP=/^(\d{4,})-(\d\d)-(\d\d)T(\d\d):(\d\d)(?::(\d\d)(\.\d{1,3})?)?$/;
varWEEK_REGEXP=/^(\d{4,})-W(\d\d)$/;
varMONTH_REGEXP=/^(\d{4,})-(\d\d)$/;
varTIME_REGEXP=/^(\d\d):(\d\d)(?::(\d\d)(\.\d{1,3})?)?$/;
Example:SampleRegexinPerl
s/^\s+//#Removeleadingwhitespaces(substitutewithemptystring)
s/\s+$//#Removetrailingwhitespaces
s/^\s+.*\s+$//#Removeleadingandtrailingwhitespaces
RegularExpression(Regex)Syntax
ARegularExpression(orRegex)isapattern(orfilter)thatdescribesasetofstringsthatmatchesthepattern. Inotherwords,aregexacceptsacertainsetofstringsandrejectstherest.
Aregexconsistsofasequenceofcharacters,metacharacters(suchas.,\d,\D,\s,\S,\w,\W)andoperators(suchas+,*,?,|,^).Theyareconstructedbycombiningmanysmallersub-expressions.
MatchingaSingleCharacter
Thefundamentalbuildingblocksofaregexarepatternsthatmatchasinglecharacter. Mostcharacters,includingallletters(a-zandA-Z)anddigits(0-9),matchitself.Forexample,theregexxmatchessubstring"x";zmatches"z";and9matches"9".
Non-alphanumericcharacterswithoutspecialmeaninginregexalsomatchesitself.Forexample,=matches"=";@matches"@".
RegexSpecialCharactersandEscapeSequences
Regex'sSpecialCharacters
Thesecharactershavespecialmeaninginregex(Iwilldiscussindetailinthelatersections):
metacharacter:dot(.)
bracketlist:[]
positionanchors:^,$
occurrenceindicators:+,*,?,{}
parentheses:()
or:|
escapeandmetacharacter:backslash(\)
EscapeSequences
Thecharacterslistedabovehavespecialmeaningsinregex.Tomatchthesecharacters,weneedtoprependitwithabackslash(\),knownasescapesequence. Forexamples,\+matches"+";\[matches"[";and\.matches".".
Regexalsorecognizescommonescapesequencessuchas\nfornewline,\tfortab,\rforcarriage-return,\nnnforaupto3-digitoctalnumber,\xhhforatwo-digithexcode,\uhhhhfora4-digitUnicode,\uhhhhhhhhfora8-digitUnicode.
CodeExampleinPython
$python3
>>>importre#Needmodule're'forregularexpression
#Tryfind:re.findall(regexStr,inStr)->matchedStrList
#r'...'denotesrawstringswhichignoreescapecode,i.e.,r'\n'is'\'+'n'
>>>re.findall(r'a','abcabc')
['a','a']
>>>re.findall(r'=','abc=abc')#'='isnotaspecialregexcharacter
['=']
>>>re.findall(r'\.','abc.com')#'.'isaspecialregexcharacter,needregexescapesequence
['.']
>>>re.findall('\\.','abc.com')#Youneedtowrite\\for\inregularPythonstring
['.']
CodeExampleinJavaScript
[TODO]
CodeExampleinJava
[TODO]
MatchingaSequenceofCharacters(StringorText)
Sub-Expressions
Aregexisconstructedbycombiningmanysmallersub-expressionsoratoms.Forexample,theregexFridaymatchesthestring"Friday".Thematching,bydefault,iscase-sensitive,butcanbesettocase-insensitiveviamodifier.
OR(|)Operator
Youcanprovidealternativesusingthe"OR"operator,denotedbyaverticalbar'|'.Forexample,theregexfour|for|floor|4acceptsstrings"four","for","floor"or"4".
BracketList(CharacterClass)[...],[^...],[.-.]
Abracketexpressionisalistofcharactersenclosedby[],alsocalledcharacterclass.ItmatchesANYONEcharacterinthelist.However,ifthefirstcharacterofthelististhecaret(^),thenitmatchesANYONEcharacterNOTinthelist.Forexample,theregex[02468]matchesasingledigit0,2,4,6,or8;theregex[^02468]matchesanysinglecharacterotherthan0,2,4,6,or8.
Insteadoflistingallcharacters,youcouldusearangeexpressioninsidethebracket.Arangeexpressionconsistsoftwocharactersseparatedbyahyphen(-).Itmatchesanysinglecharacterthatsortsbetweenthetwocharacters,inclusive.Forexample,[a-d]isthesameas[abcd].Youcouldincludeacaret(^)infrontoftherangetoinvertthematching.Forexample,[^a-d]isequivalentto[^abcd].
Mostofthespecialregexcharacterslosetheirmeaninginsidebracketlist,andcanbeusedastheyare;except^,-,]or\.
Toincludea],placeitfirstinthelist,oruseescape\].
Toincludea^,placeitanywherebutfirst,oruseescape\^.
Toincludea-placeitlast,oruseescape\-.
Toincludea\,useescape\\.
Noescapeneededfortheothercharacterssuchas.,+,*,?,(,),{,},andetc,insidethebracketlist
Youcanalsoincludemetacharacters(tobeexplainedinthenextsection),suchas\w,\W,\d,\D,\s,\Sinsidethebracketlist.
NameCharacterClassesinBracketList(ForPerlOnly?)
Named(POSIX)classesofcharactersarepre-definedwithinbracketexpressions.Theyare:
[:alnum:],[:alpha:],[:digit:]:letters+digits,letters,digits.
[:xdigit:]:hexadecimaldigits.
[:lower:],[:upper:]:lowercase/uppercaseletters.
[:cntrl:]:Controlcharacters
[:graph:]:printablecharacters,exceptspace.
[:print:]:printablecharacters,includespace.
[:punct:]:printablecharacters,excludinglettersanddigits.
[:space:]:whitespace
Forexample,[[:alnum:]]means[0-9A-Za-z].(Notethatthesquarebracketsintheseclassnamesarepartofthesymbolicnames,andmustbeincludedinadditiontothesquarebracketsdelimitingthebracketlist.)
Metacharacters.,\w,\W,\d,\D,\s,\S
Ametacharacterisasymbolwithaspecialmeaninginsidearegex.
Themetacharacterdot(.)matchesanysinglecharacterexceptnewline\n(sameas[^\n]).Forexample,...matchesany3characters(includingalphabets,numbers,whitespaces,butexceptnewline);the..matches"there","these","the ",andsoon.
\w(wordcharacter)matchesanysingleletter,numberorunderscore(sameas[a-zA-Z0-9_]).Theuppercasecounterpart\W(non-word-character)matchesanysinglecharacterthatdoesn'tmatchby\w(sameas[^a-zA-Z0-9_]).
Inregex,theuppercasemetacharacterisalwaystheinverseofthelowercasecounterpart.
\d(digit)matchesanysingledigit(sameas[0-9]).Theuppercasecounterpart\D(non-digit)matchesanysinglecharacterthatisnotadigit(sameas[^0-9]).
\s(space)matchesanysinglewhitespace(sameas[\t\n\r\f],blank,tab,newline,carriage-returnandform-feed).Theuppercasecounterpart\S(non-space)matchesanysinglecharacterthatdoesn'tmatchby\s(sameas[^\t\n\r\f]).
Examples:
\s\s#Matchestwospaces
\S\S\s#Twonon-spacesfollowedbyaspace
\s+#Oneormorespaces
\S+\s\S+#Twowords(non-spaces)separatedbyaspace
Backslash(\)andRegexEscapeSequences
Regexusesbackslash(\)fortwopurposes:
formetacharacterssuchas\d(digit),\D(non-digit),\s(space),\S(non-space),\w(word),\W(non-word).
toescapespecialregexcharacters,e.g.,\.for.,\+for+,\*for*,\?for?.Youalsoneedtowrite\\for\inregextoavoidambiguity.
Regexalsorecognizes\nfornewline,\tfortab,etc.
Takenotethatinmanyprogramminglanguages(C,Java,Python),backslash(\)isalsousedforescapesequencesinstring,e.g.,"\n"fornewline,"\t"fortab,andyoualsoneedtowrite"\\"for\.Consequently,towriteregexpattern\\(whichmatchesone\)intheselanguages,youneedtowrite"\\\\"(twolevelsofescape!!!).Similarly,youneedtowrite"\\d"forregexmetacharacter\d.Thisiscumbersomeanderror-prone!!!
OccurrenceIndicators(RepetitionOperators):+,*,?,{m},{m,n},{m,}
Aregexsub-expressionmaybefollowedbyanoccurrenceindicator(akarepetitionoperator):
?:Theprecedingitemisoptionalandmatchedatmostonce(i.e.,occurs0or1timesoroptional).
*:Theprecedingitemwillbematchedzeroormoretimes,i.e.,0+
+:Theprecedingitemwillbematchedoneormoretimes,i.e.,1+
{m}:Theprecedingitemismatchedexactlymtimes.
{m,}:Theprecedingitemismatchedmormoretimes,i.e.,m+
{m,n}:Theprecedingitemismatchedatleastmtimes,butnotmorethanntimes.
Forexample:Theregexxy{2,4}accepts"xyy","xyyy"and"xyyyy".
Modifiers
Youcanapplymodifierstoaregextotailoritsbehavior,suchasglobal,case-insensitive,multiline,etc.Thewaystoapplymodifiersdifferamonglanguages.
InPerl,youcanattachmodifiersafteraregex,intheformof/.../modifiers.Forexamples:
m/abc/i#case-insensitivematching
m/abc/g#global(MatchALLinsteadofmatchfirst)
InJava,youapplymodifierswhencompilingtheregexPattern.Forexample,
Patternp1=Pattern.compile(regex,Pattern.CASE_INSENSITIVE);//forcase-insensitivematching
Patternp2=Pattern.compile(regex,Pattern.MULTILINE);//formultilineinputstring
Patternp3=Pattern.compile(regex,Pattern.DOTALL);//Dot(.)matchesallcharactersincludingnewline
Thecommonly-usedmodifermodesare:
Case-Insensitivemode(ori):case-insensitivematchingforletters.
Global(org):matchAllinsteadoffirstmatch.
Multilinemode(orm):affect^,$,\Aand\Z.Inmultilinemode,^matchesstart-of-lineorstart-of-input;$matchesend-of-lineorend-of-input,\Amatchesstart-of-input;\Zmatchesend-of-input.
Single-linemode(ors):Dot(.)willmatchallcharacters,includingnewline.
Commentmode(orx):allowandignoreembeddedcommentstartingwith#tillend-of-line(EOL).
more...
Greediness,LazinessandBacktrackingforRepetitionOperators
GreedinessofRepetitionOperators*,+,?,{m,n}:Therepetitionoperatorsaregreedyoperators,andbydefaultgraspasmanycharactersaspossibleforamatch.Forexample,theregexxy{2,4}trytomatchfor"xyyyy",then"xyyy",andthen"xyy".
LazyQuantifiers*?,+?,??,{m,n}?,{m,}?,:Youcanputanextra?aftertherepetitionoperatorstocurbitsgreediness(i.e.,stopattheshortestmatch).Forexample,
input="Thefirst
andsecond
instances"
regex=.*
matches"first
andsecond
"
But
regex=.*?
producestwomatches:"first
"and"second
"
Backtracking:Ifaregexreachesastatewhereamatchcannotbecompleted,itbacktracksbyunwindingonecharacterfromthegreedymatch.Forexample,iftheregexz*zzzismatchedagainstthestring"zzzz",thez*firstmatches"zzzz";unwindstomatch"zzz";unwindstomatch"zz";andfinallyunwindstomatch"z",suchthattherestofthepatternscanfindamatch.
PossessiveQuantifiers*+,++,?+,{m,n}+,{m,}+:Youcanputanextra+totherepetitionoperatorstodisablebacktracking,evenitmayresultinmatchfailure.e.g,z++zwillnotmatch"zzzz".Thisfeaturemightnotbesupportedinsomelanguages.
PositionAnchors^,$,\b,\B,\,\A,\Z
PositionalanchorsDONOTmatchactualcharacter,butmatchespositioninastring,suchasstart-of-line,end-of-line,start-of-word,andend-of-word.
^and$:The^matchesthestart-of-line.The$matchestheend-of-lineexcludingnewline,orend-of-input(forinputnotendingwithnewline).Thesearethemostcommonly-usedpositionanchors.Forexamples,
ing$#endingwith'ing'
^testing123$#Matchesonlyonepattern.Shoulduseequalitycomparisoninstead.
^[0-9]+$#Numericstring
\band\B:The\bmatchestheboundaryofaword(i.e.,start-of-wordorend-of-word);and\Bmatchesinverseof\b,ornon-word-boundary.Forexamples,
\bcat\b#matchestheword"cat"ininputstring"Thisisacat."
#butdoesnotmatchinput"Thisisacatalog."
\
延伸文章資訊
- 1Regular Expression (Regex) Tutorial
A range expression consists of two characters separated by a hyphen ( - ). It matches any single ...
- 2Metacharacters - IBM
- 3正規表達式- JavaScript - MDN Web Docs
在JavaScript 中,正規表達式也是物件,這些模式在RegExp 的exec ... regular expression engine defines a specific set of...
- 4Regular expressions 1. Special characters
The following characters are the meta characters that give special meaning to the regular express...
- 5How to Read and Use Regular Expressions | Hall