Regular expression - Wikipedia
文章推薦指數: 80 %
A regular expression is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for ... Regularexpression FromWikipedia,thefreeencyclopedia Jumptonavigation Jumptosearch Sequenceofcharactersthatformsasearchpattern "Regex"redirectshere.Forthecomicbook,seeRe:Gex. Thematchresultsofthepattern(?<=\.){2,}(?=[A-Z]) Atleasttwospacesarematched,butonlyiftheyoccurdirectlyafteraperiod(.)andbeforeanuppercaseletter. Aregularexpression(shortenedasregexorregexp;[1]sometimesreferredtoasrationalexpression[2][3])isasequenceofcharactersthatspecifiesasearchpatternintext.Usuallysuchpatternsareusedbystring-searchingalgorithmsfor"find"or"findandreplace"operationsonstrings,orforinputvalidation.Regularexpressiontechniquesaredevelopedintheoreticalcomputerscienceandformallanguagetheory. Theconceptofregularexpressionsbeganinthe1950s,whentheAmericanmathematicianStephenColeKleeneformalizedtheconceptofaregularlanguage.TheycameintocommonusewithUnixtext-processingutilities.Differentsyntaxesforwritingregularexpressionshaveexistedsincethe1980s,onebeingthePOSIXstandardandanother,widelyused,beingthePerlsyntax. Regularexpressionsareusedinsearchengines,insearchandreplacedialogsofwordprocessorsandtexteditors,intextprocessingutilitiessuchassedandAWK,andinlexicalanalysis.Mostgeneral-purposeprogramminglanguagessupportregexcapabilitieseithernativelyorvialibraries,includingforexamplePython,[4]C,[5]C++,[6] Java,[7]andJavaScript.[8] Contents 1History 2Patterns 3Basicconcepts 4Formallanguagetheory 4.1Formaldefinition 4.2Expressivepowerandcompactness 4.3Decidingequivalenceofregularexpressions 5Syntax 5.1Delimiters 5.2Standards 5.2.1POSIXbasicandextended 5.2.2POSIXextended 5.2.3Characterclasses 5.3PerlandPCRE 5.4Lazymatching 5.5Possessivematching 6Patternsfornon-regularlanguages 7Implementationsandrunningtimes 8Unicode 9Uses 10Examples 11Induction 12Seealso 13Notes 14References 15Externallinks History[edit] StephenColeKleene,whointroducedtheconcept Regularexpressionsoriginatedin1951,whenmathematicianStephenColeKleenedescribedregularlanguagesusinghismathematicalnotationcalledregularevents.[9][10]Thesearoseintheoreticalcomputerscience,inthesubfieldsofautomatatheory(modelsofcomputation)andthedescriptionandclassificationofformallanguages.OtherearlyimplementationsofpatternmatchingincludetheSNOBOLlanguage,whichdidnotuseregularexpressions,butinsteaditsownpatternmatchingconstructs. Regularexpressionsenteredpopularusefrom1968intwouses:patternmatchinginatexteditor[11]andlexicalanalysisinacompiler.[12]AmongthefirstappearancesofregularexpressionsinprogramformwaswhenKenThompsonbuiltKleene'snotationintotheeditorQEDasameanstomatchpatternsintextfiles.[11][13][14][15]Forspeed,Thompsonimplementedregularexpressionmatchingbyjust-in-timecompilation(JIT)toIBM7094codeontheCompatibleTime-SharingSystem,animportantearlyexampleofJITcompilation.[16]HelateraddedthiscapabilitytotheUnixeditored,whicheventuallyledtothepopularsearchtoolgrep'suseofregularexpressions("grep"isawordderivedfromthecommandforregularexpressionsearchingintheededitor:g/re/pmeaning"GlobalsearchforRegularExpressionandPrintmatchinglines").[17]AroundthesametimewhenThompsondevelopedQED,agroupofresearchersincludingDouglasT.Rossimplementedatoolbasedonregularexpressionsthatisusedforlexicalanalysisincompilerdesign.[12] ManyvariationsoftheseoriginalformsofregularexpressionswereusedinUnix[15]programsatBellLabsinthe1970s,includingvi,lex,sed,AWK,andexpr,andinotherprogramssuchasEmacs.Regexesweresubsequentlyadoptedbyawiderangeofprograms,withtheseearlyformsstandardizedinthePOSIX.2standardin1992. Inthe1980s,themorecomplicatedregexesaroseinPerl,whichoriginallyderivedfromaregexlibrarywrittenbyHenrySpencer(1986),wholaterwroteanimplementationofAdvancedRegularExpressionsforTcl.[18]TheTcllibraryisahybridNFA/DFAimplementationwithimprovedperformancecharacteristics.SoftwareprojectsthathaveadoptedSpencer'sTclregularexpressionimplementationincludePostgreSQL.[19]PerllaterexpandedonSpencer'soriginallibrarytoaddmanynewfeatures.[20]PartoftheeffortinthedesignofRaku(formerlynamedPerl6)istoimprovePerl'sregexintegration,andtoincreasetheirscopeandcapabilitiestoallowthedefinitionofparsingexpressiongrammars.[21]Theresultisamini-languagecalledRakurules,whichareusedtodefineRakugrammaraswellasprovideatooltoprogrammersinthelanguage.TheserulesmaintainexistingfeaturesofPerl5.xregexes,butalsoallowBNF-styledefinitionofarecursivedescentparserviasub-rules. Theuseofregexesinstructuredinformationstandardsfordocumentanddatabasemodelingstartedinthe1960sandexpandedinthe1980swhenindustrystandardslikeISOSGML(precursoredbyANSI"GCA101-1983")consolidated.Thekernelofthestructurespecificationlanguagestandardsconsistsofregexes.ItsuseisevidentintheDTDelementgroupsyntax.Priortotheuseofregularexpressions,manysearchlanguagesallowedsimplewildcards,forexample"*"tomatchanysequenceofcharacters,and"?"tomatchasinglecharacter.Relicsofthiscanbefoundtodayintheglobsyntaxforfilenames,andintheSQLLIKEoperator. Startingin1997,PhilipHazeldevelopedPCRE(PerlCompatibleRegularExpressions),whichattemptstocloselymimicPerl'sregexfunctionalityandisusedbymanymoderntoolsincludingPHPandApacheHTTPServer.[citationneeded] Today,regexesarewidelysupportedinprogramminglanguages,textprocessingprograms(particularlylexers),advancedtexteditors,andsomeotherprograms.Regexsupportispartofthestandardlibraryofmanyprogramminglanguages,includingJavaandPython,andisbuiltintothesyntaxofothers,includingPerlandECMAScript.Implementationsofregexfunctionalityisoftencalledaregexengine,andanumberoflibrariesareavailableforreuse.Inthelate2010s,severalcompaniesstartedtoofferhardware,FPGA,[22]GPU[23]implementationsofPCREcompatibleregexenginesthatarefastercomparedtoCPUimplementations. Patterns[edit] Thephraseregularexpressions,orregexes,isoftenusedtomeanthespecific,standardtextualsyntaxforrepresentingpatternsformatchingtext,asdistinctfromthemathematicalnotationdescribedbelow.Eachcharacterinaregularexpression(thatis,eachcharacterinthestringdescribingitspattern)iseitherametacharacter,havingaspecialmeaning,oraregularcharacterthathasaliteralmeaning.Forexample,intheregexb.,'b'isaliteralcharacterthatmatchesjust'b',while'.'isametacharacterthatmatcheseverycharacterexceptanewline.Therefore,thisregexmatches,forexample,'b%',or'bx',or'b5'.Together,metacharactersandliteralcharacterscanbeusedtoidentifytextofagivenpatternorprocessanumberofinstancesofit.Patternmatchesmayvaryfromapreciseequalitytoaverygeneralsimilarity,ascontrolledbythemetacharacters.Forexample,.isaverygeneralpattern,[a-z](matchalllowercaselettersfrom'a'to'z')islessgeneralandbisaprecisepattern(matchesjust'b').Themetacharactersyntaxisdesignedspecificallytorepresentprescribedtargetsinaconciseandflexiblewaytodirecttheautomationoftextprocessingofavarietyofinputdata,inaformeasytotypeusingastandardASCIIkeyboard. Averysimplecaseofaregularexpressioninthissyntaxistolocateawordspelledtwodifferentwaysinatexteditor,theregularexpressionseriali[sz]ematchesboth"serialise"and"serialize".Wildcardcharactersalsoachievethis,butaremorelimitedinwhattheycanpattern,astheyhavefewermetacharactersandasimplelanguage-base. Theusualcontextofwildcardcharactersisinglobbingsimilarnamesinalistoffiles,whereasregexesareusuallyemployedinapplicationsthatpattern-matchtextstringsingeneral.Forexample,theregex^[\t]+|[\t]+$matchesexcesswhitespaceatthebeginningorendofaline.Anadvancedregularexpressionthatmatchesanynumeralis[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?. TranslatingtheKleenestar(s*means"zeroormoreofs") Aregexprocessortranslatesaregularexpressionintheabovesyntaxintoaninternalrepresentationthatcanbeexecutedandmatchedagainstastringrepresentingthetextbeingsearchedin.OnepossibleapproachistheThompson'sconstructionalgorithmtoconstructanondeterministicfiniteautomaton(NFA),whichisthenmadedeterministicandtheresultingdeterministicfiniteautomaton(DFA)isrunonthetargettextstringtorecognizesubstringsthatmatchtheregularexpression. ThepictureshowstheNFAschemeN(s*)obtainedfromtheregularexpressions*,wheresdenotesasimplerregularexpressioninturn,whichhasalreadybeenrecursivelytranslatedtotheNFAN(s). Basicconcepts[edit] Aregularexpression,oftencalledapattern,specifiesasetofstringsrequiredforaparticularpurpose.Asimplewaytospecifyafinitesetofstringsistolistitselementsormembers.However,thereareoftenmoreconciseways:forexample,thesetcontainingthethreestrings"Handel","Händel",and"Haendel"canbespecifiedbythepatternH(ä|ae?)ndel;wesaythatthispatternmatcheseachofthethreestrings.However,therecanbemanywaystowritearegularexpressionforthesamesetofstrings:forexample,(Hän|Han|Haen)delalsospecifiesthesamesetofthreestringsinthisexample. Mostformalismsprovidethefollowingoperationstoconstructregularexpressions. Boolean"or" Averticalbarseparatesalternatives.Forexample,gray|greycanmatch"gray"or"grey". Grouping Parenthesesareusedtodefinethescopeandprecedenceoftheoperators(amongotheruses).Forexample,gray|greyandgr(a|e)yareequivalentpatternswhichbothdescribethesetof"gray"or"grey". Quantification Aquantifierafteranelement(suchasatoken,character,orgroup)specifieshowmanytimestheprecedingelementisallowedtorepeat.Themostcommonquantifiersarethequestionmark?,theasterisk*(derivedfromtheKleenestar),andtheplussign+(Kleeneplus). ? Thequestionmarkindicateszerooroneoccurrencesoftheprecedingelement.Forexample,colou?rmatchesboth"color"and"colour". * Theasteriskindicateszeroormoreoccurrencesoftheprecedingelement.Forexample,ab*cmatches"ac","abc","abbc","abbbc",andsoon. + Theplussignindicatesoneormoreoccurrencesoftheprecedingelement.Forexample,ab+cmatches"abc","abbc","abbbc",andsoon,butnot"ac". {n}[24] Theprecedingitemismatchedexactlyntimes. {min,}[24] Theprecedingitemismatchedminormoretimes. {,max}[24] Theprecedingitemismatcheduptomaxtimes. {min,max}[24] Theprecedingitemismatchedatleastmintimes,butnotmorethanmaxtimes. Wildcard Thewildcard.matchesanycharacter.Forexample,a.bmatchesanystringthatcontainsan"a",andthenanycharacterandthen"b";anda.*bmatchesanystringthatcontainsan"a",andthenthecharacter"b"atsomelaterpoint. Theseconstructionscanbecombinedtoformarbitrarilycomplexexpressions,muchlikeonecanconstructarithmeticalexpressionsfromnumbersandtheoperations+,−,×,and÷. Theprecisesyntaxforregularexpressionsvariesamongtoolsandwithcontext;moredetailisgivenin§ Syntax. Formallanguagetheory[edit] Regularexpressionsdescriberegularlanguagesinformallanguagetheory.Theyhavethesameexpressivepowerasregulargrammars. Formaldefinition[edit] Regularexpressionsconsistofconstants,whichdenotesetsofstrings,andoperatorsymbols,whichdenoteoperationsoverthesesets.Thefollowingdefinitionisstandard,andfoundassuchinmosttextbooksonformallanguagetheory.[25][26]GivenafinitealphabetΣ,thefollowingconstantsaredefined asregularexpressions: (emptyset)∅denotingtheset∅. (emptystring)εdenotingthesetcontainingonlythe"empty"string,whichhasnocharactersatall. (literalcharacter)ainΣdenotingthesetcontainingonlythecharactera. GivenregularexpressionsRandS,thefollowingoperationsoverthemaredefined toproduceregularexpressions: (concatenation)(RS)denotesthesetofstringsthatcanbeobtainedbyconcatenatingastringacceptedbyRandastringacceptedbyS(inthatorder).Forexample,letRdenote{"ab","c"}andSdenote{"d","ef"}.Then,(RS)denotes{"abd","abef","cd","cef"}. (alternation)(R|S)denotesthesetunionofsetsdescribedbyRandS.Forexample,ifRdescribes{"ab","c"}andSdescribes{"ab","d","ef"},expression(R|S)describes{"ab","c","d","ef"}. (Kleenestar)(R*)denotesthesmallestsupersetofthesetdescribedbyRthatcontainsεandisclosedunderstringconcatenation.Thisisthesetofallstringsthatcanbemadebyconcatenatinganyfinitenumber(includingzero)ofstringsfromthesetdescribedbyR.Forexample,ifRdenotes{"0","1"},(R*)denotesthesetofallfinitebinarystrings(includingtheemptystring).IfRdenotes{"ab","c"},(R*)denotes{ε,"ab","c","abab","abc","cab","cc","ababab","abcab",...}. ToavoidparenthesesitisassumedthattheKleenestarhasthehighestpriority,thenconcatenationandthenalternation.Ifthereisnoambiguitythenparenthesesmaybeomitted.Forexample,(ab)ccanbewrittenasabc,anda|(b(c*))canbewrittenasa|bc*. Manytextbooksusethesymbols∪,+,or∨foralternationinsteadoftheverticalbar. Examples: a|b*denotes{ε,"a","b","bb","bbb",...} (a|b)*denotesthesetofallstringswithnosymbolsotherthan"a"and"b",includingtheemptystring:{ε,"a","b","aa","ab","ba","bb","aaa",...} ab*(c|ε)denotesthesetofstringsstartingwith"a",thenzeroormore"b"sandfinallyoptionallya"c":{"a","ac","ab","abc","abb","abbc",...} (0|(1(01*0)*1))*denotesthesetofbinarynumbersthataremultiplesof3:{ε,"0","00","11","000","011","110","0000","0011","0110","1001","1100","1111","00000",...} Expressivepowerandcompactness[edit] Theformaldefinitionofregularexpressionsisminimalonpurpose,andavoidsdefining?and+—thesecanbeexpressedasfollows:a+=aa*,anda?=(a|ε).Sometimesthecomplementoperatorisadded,togiveageneralizedregularexpression;hereRcmatchesallstringsoverΣ*thatdonotmatchR.Inprinciple,thecomplementoperatorisredundant,becauseitdoesn'tgrantanymoreexpressivepower.However,itcanmakearegularexpressionmuchmoreconcise—eliminatingasinglecomplementoperatorcancauseadoubleexponentialblow-upofitslength.[27][28][29] Regularexpressionsinthissensecanexpresstheregularlanguages,exactlytheclassoflanguagesacceptedbydeterministicfiniteautomata.Thereis,however,asignificantdifferenceincompactness.Someclassesofregularlanguagescanonlybedescribedbydeterministicfiniteautomatawhosesizegrowsexponentiallyinthesizeoftheshortestequivalentregularexpressions.Thestandardexamplehereisthelanguages Lkconsistingofallstringsoverthealphabet{a,b}whosekth-from-lastletterequals a.Ontheonehand,aregularexpressiondescribingL4isgivenby ( a ∣ b ) ∗ a ( a ∣ b ) ( a ∣ b ) ( a ∣ b ) {\displaystyle(a\midb)^{*}a(a\midb)(a\midb)(a\midb)} . GeneralizingthispatterntoLkgivestheexpression: ( a ∣ b ) ∗ a ( a ∣ b ) ( a ∣ b ) ⋯ ( a ∣ b ) ⏟ k − 1 times . {\displaystyle(a\midb)^{*}a\underbrace{(a\midb)(a\midb)\cdots(a\midb)}_{k-1{\text{times}}}.\,} Ontheotherhand,itisknownthateverydeterministicfiniteautomatonacceptingthelanguageLkmusthaveatleast2kstates.Luckily,thereisasimplemappingfromregularexpressionstothemoregeneralnondeterministicfiniteautomata(NFAs)thatdoesnotleadtosuchablowupinsize;forthisreasonNFAsareoftenusedasalternativerepresentationsofregularlanguages.NFAsareasimplevariationofthetype-3grammarsoftheChomskyhierarchy.[25] Intheoppositedirection,therearemanylanguageseasilydescribedbyaDFAthatarenoteasilydescribedbyaregularexpression.Forinstance,determiningthevalidityofagivenISBNrequirescomputingthemodulusoftheintegerbase11,andcanbeeasilyimplementedwithan11-stateDFA.However,aregularexpressiontoanswerthesameproblemofdivisibilityby11isatleastmultiplemegabytesinlength.[citationneeded] Givenaregularexpression,Thompson'sconstructionalgorithmcomputesanequivalentnondeterministicfiniteautomaton.AconversionintheoppositedirectionisachievedbyKleene'salgorithm. Finally,itisworthnotingthatmanyreal-world"regularexpression"enginesimplementfeaturesthatcannotbedescribedbytheregularexpressionsinthesenseofformallanguagetheory;rather,theyimplementregexes.Seebelowformoreonthis. Decidingequivalenceofregularexpressions[edit] Asseeninmanyoftheexamplesabove,thereismorethanonewaytoconstructaregularexpressiontoachievethesameresults. Itispossibletowriteanalgorithmthat,fortwogivenregularexpressions,decideswhetherthedescribedlanguagesareequal;thealgorithmreduceseachexpressiontoaminimaldeterministicfinitestatemachine,anddetermineswhethertheyareisomorphic(equivalent). AlgebraiclawsforregularexpressionscanbeobtainedusingamethodbyGischerwhichisbestexplainedalonganexample:Inordertocheckwhether(X+Y)*and(X*Y*)*denotethesameregularlanguage,forallregularexpressionsX,Y,itisnecessaryandsufficienttocheckwhethertheparticularregularexpressions(a+b)*and(a*b*)*denotethesamelanguageoverthealphabetΣ={a,b}.Moregenerally,anequationE=Fbetweenregular-expressiontermswithvariablesholdsif,andonlyif,itsinstantiationwithdifferentvariablesreplacedbydifferentsymbolconstantsholds.[30][31] EveryregularexpressioncanbewrittensolelyintermsoftheKleenestarandsetunions.Thisisasurprisinglydifficultproblem.Assimpleastheregularexpressionsare,thereisnomethodtosystematicallyrewritethemtosomenormalform.Thelackofaxiominthepastledtothestarheightproblem.In1991,DexterKozenaxiomatizedregularexpressionsasaKleenealgebra,usingequationalandHornclauseaxioms.[32] Alreadyin1964,Redkohadprovedthatnofinitesetofpurelyequationalaxiomscancharacterizethealgebraofregularlanguages.[33] Syntax[edit] Aregexpatternmatchesatargetstring.Thepatterniscomposedofasequenceofatoms.Anatomisasinglepointwithintheregexpatternwhichittriestomatchtothetargetstring.Thesimplestatomisaliteral,butgroupingpartsofthepatterntomatchanatomwillrequireusing( )asmetacharacters.Metacharactershelpform:atoms;quantifierstellinghowmanyatoms(andwhetheritisagreedyquantifierornot);alogicalORcharacter,whichoffersasetofalternatives,andalogicalNOTcharacter,whichnegatesanatom'sexistence;andbackreferencestorefertopreviousatomsofacompletingpatternofatoms.Amatchismade,notwhenalltheatomsofthestringarematched,butratherwhenallthepatternatomsintheregexhavematched.Theideaistomakeasmallpatternofcharactersstandforalargenumberofpossiblestrings,ratherthancompilingalargelistofalltheliteralpossibilities. Dependingontheregexprocessorthereareaboutfourteenmetacharacters,charactersthatmayormaynothavetheirliteralcharactermeaning,dependingoncontext,orwhethertheyare"escaped",i.e.precededbyanescapesequence,inthiscase,thebackslash\.ModernandPOSIXextendedregexesusemetacharactersmoreoftenthantheirliteralmeaning,sotoavoid"backslash-osis"orleaningtoothpicksyndromeitmakessensetohaveametacharacterescapetoaliteralmode;butstartingout,itmakesmoresensetohavethefourbracketingmetacharacters( )and{ }beprimarilyliteral,and"escape"thisusualmeaningtobecomemetacharacters.Commonstandardsimplementboth.Theusualmetacharactersare{}[]()^$.|*+?and\.TheusualcharactersthatbecomemetacharacterswhenescapedaredswDSWandN. Delimiters[edit] Whenenteringaregexinaprogramminglanguage,theymayberepresentedasausualstringliteral,henceusuallyquoted;thisiscommoninC,Java,andPythonforinstance,wheretheregexreisenteredas"re".However,theyareoftenwrittenwithslashesasdelimiters,asin/re/fortheregexre.Thisoriginatesined,where/istheeditorcommandforsearching,andanexpression/re/canbeusedtospecifyarangeoflines(matchingthepattern),whichcanbecombinedwithothercommandsoneitherside,mostfamouslyg/re/pasingrep("globalregexprint"),whichisincludedinmostUnix-basedoperatingsystems,suchasLinuxdistributions.Asimilarconventionisusedinsed,wheresearchandreplaceisgivenbys/re/replacement/andpatternscanbejoinedwithacommatospecifyarangeoflinesasin/re1/,/re2/.ThisnotationisparticularlywellknownduetoitsuseinPerl,whereitformspartofthesyntaxdistinctfromnormalstringliterals.Insomecases,suchassedandPerl,alternativedelimiterscanbeusedtoavoidcollisionwithcontents,andtoavoidhavingtoescapeoccurrencesofthedelimitercharacterinthecontents.Forexample,insedthecommands,/,X,willreplacea/withanX,usingcommasasdelimiters. Standards[edit] TheIEEEPOSIXstandardhasthreesetsofcompliance:BRE(BasicRegularExpressions),[34]ERE(ExtendedRegularExpressions),andSRE(SimpleRegularExpressions).SREisdeprecated,[35]infavorofBRE,asbothprovidebackwardcompatibility.ThesubsectionbelowcoveringthecharacterclassesappliestobothBREandERE. BREandEREworktogether.EREadds?,+,and|,anditremovestheneedtoescapethemetacharacters( )and{ },whicharerequiredinBRE.Furthermore,aslongasthePOSIXstandardsyntaxforregexesisadheredto,therecanbe,andoftenis,additionalsyntaxtoservespecific(yetPOSIXcompliant)applications.AlthoughPOSIX.2leavessomeimplementationspecificsundefined,BREandEREprovidea"standard"whichhassincebeenadoptedasthedefaultsyntaxofmanytools,wherethechoiceofBREorEREmodesisusuallyasupportedoption.Forexample,GNUgrephasthefollowingoptions:"grep-E"forERE,and"grep-G"forBRE(thedefault),and"grep-P"forPerlregexes. Perlregexeshavebecomeadefactostandard,havingarichandpowerfulsetofatomicexpressions.Perlhasno"basic"or"extended"levels.AsinPOSIXEREs,( )and{ }aretreatedasmetacharactersunlessescaped;othermetacharactersareknowntobeliteralorsymbolicbasedoncontextalone.Additionalfunctionalityincludeslazymatching,backreferences,namedcapturegroups,andrecursivepatterns. POSIXbasicandextended[edit] InthePOSIXstandard,BasicRegularSyntax(BRE)requiresthatthemetacharacters( )and{ }bedesignated\(\)and\{\},whereasExtendedRegularSyntax(ERE)doesnot. Metacharacter Description ^ Matchesthestartingpositionwithinthestring.Inline-basedtools,itmatchesthestartingpositionofanyline. . Matchesanysinglecharacter(manyapplicationsexcludenewlines,andexactlywhichcharactersareconsiderednewlinesisflavor-,character-encoding-,andplatform-specific,butitissafetoassumethatthelinefeedcharacterisincluded).WithinPOSIXbracketexpressions,thedotcharactermatchesaliteraldot.Forexample,a.cmatches"abc",etc.,but[a.c]matchesonly"a",".",or"c". [ ] Abracketexpression.Matchesasinglecharacterthatiscontainedwithinthebrackets.Forexample,[abc]matches"a","b",or"c".[a-z]specifiesarangewhichmatchesanylowercaseletterfrom"a"to"z".Theseformscanbemixed:[abcx-z]matches"a","b","c","x","y",or"z",asdoes[a-cx-z]. The-characteristreatedasaliteralcharacterifitisthelastorthefirst(afterthe^,ifpresent)characterwithinthebrackets:[abc-],[-abc].Notethatbackslashescapesarenotallowed.The]charactercanbeincludedinabracketexpressionifitisthefirst(afterthe^)character:[]abc]. [^ ] Matchesasinglecharacterthatisnotcontainedwithinthebrackets.Forexample,[^abc]matchesanycharacterotherthan"a","b",or"c".[^a-z]matchesanysinglecharacterthatisnotalowercaseletterfrom"a"to"z".Likewise,literalcharactersandrangescanbemixed. $ Matchestheendingpositionofthestringorthepositionjustbeforeastring-endingnewline.Inline-basedtools,itmatchestheendingpositionofanyline. () Definesamarkedsubexpression.Thestringmatchedwithintheparenthesescanberecalledlater(seethenextentry,\n).Amarkedsubexpressionisalsocalledablockorcapturinggroup.BREmoderequires\( \). \n Matcheswhatthenthmarkedsubexpressionmatched,wherenisadigitfrom1to9.ThisconstructisvaguelydefinedinthePOSIX.2standard.Sometoolsallowreferencingmorethanninecapturinggroups.Alsoknownasabackreference.backreferencesareonlysupportedinBREmode * Matchestheprecedingelementzeroormoretimes.Forexample,ab*cmatches"ac","abc","abbbc",etc.[xyz]*matches"","x","y","z","zx","zyx","xyzzy",andsoon.(ab)*matches"","ab","abab","ababab",andsoon. {m,n} Matchestheprecedingelementatleastmandnotmorethanntimes.Forexample,a{3,5}matchesonly"aaa","aaaa",and"aaaaa".Thisisnotfoundinafewolderinstancesofregexes.BREmoderequires\{m,n\}. Examples: .atmatchesanythree-characterstringendingwith"at",including"hat","cat","bat","4at","#at"and"at"(startingwithaspace). [hc]atmatches"hat"and"cat". [^b]atmatchesallstringsmatchedby.atexcept"bat". [^hc]atmatchesallstringsmatchedby.atotherthan"hat"and"cat". ^[hc]atmatches"hat"and"cat",butonlyatthebeginningofthestringorline. [hc]at$matches"hat"and"cat",butonlyattheendofthestringorline. \[.\]matchesanysinglecharactersurroundedby"["and"]"sincethebracketsareescaped,forexample:"[a]","[b]","[7]","[@]","[]]",and"[]"(bracketspacebracket). s.*matchessfollowedbyzeroormorecharacters,forexample:"s","saw","seed","s3w96.7",and"s6#h%(>>>mnmQ". POSIXextended[edit] ThemeaningofmetacharactersescapedwithabackslashisreversedforsomecharactersinthePOSIXExtendedRegularExpression(ERE)syntax.Withthissyntax,abackslashcausesthemetacharactertobetreatedasaliteralcharacter.So,forexample,\( \)isnow( )and\{ \}isnow{ }.Additionally,supportisremovedfor\nbackreferencesandthefollowingmetacharactersareadded: Metacharacter Description ? Matchestheprecedingelementzerooronetime.Forexample,ab?cmatchesonly"ac"or"abc". + Matchestheprecedingelementoneormoretimes.Forexample,ab+cmatches"abc","abbc","abbbc",andsoon,butnot"ac". | Thechoice(alsoknownasalternationorsetunion)operatormatcheseithertheexpressionbeforeortheexpressionaftertheoperator.Forexample,abc|defmatches"abc"or"def". Examples: [hc]?atmatches"at","hat",and"cat". [hc]*atmatches"at","hat","cat","hhat","chat","hcat","cchchat",andsoon. [hc]+atmatches"hat","cat","hhat","chat","hcat","cchchat",andsoon,butnot"at". cat|dogmatches"cat"or"dog". POSIXExtendedRegularExpressionscanoftenbeusedwithmodernUnixutilitiesbyincludingthecommandlineflag-E. Characterclasses[edit] Thecharacterclassisthemostbasicregexconceptafteraliteralmatch.Itmakesonesmallsequenceofcharactersmatchalargersetofcharacters.Forexample,[A-Z]couldstandforanyuppercaseletterintheEnglishalphabet,and\dcouldmeananydigit.CharacterclassesapplytobothPOSIXlevels. Whenspecifyingarangeofcharacters,suchas[a-Z](i.e.lowercaseatouppercaseZ),thecomputer'slocalesettingsdeterminethecontentsbythenumericorderingofthecharacterencoding.Theycouldstoredigitsinthatsequence,ortheorderingcouldbeabc…zABC…Z,oraAbBcC…zZ.SothePOSIXstandarddefinesacharacterclass,whichwillbeknownbytheregexprocessorinstalled.Thosedefinitionsareinthefollowingtable: Description POSIX Non-standard Perl/Tcl Vim Java ASCII ASCIIcharacters [:ascii:][36] \p{ASCII} [\x00-\x7F] Alphanumericcharacters [:alnum:] \p{Alnum} [A-Za-z0-9] Alphanumericcharactersplus"_" [:word:][36] \w \w \w [A-Za-z0-9_] Non-wordcharacters \W \W \W [^A-Za-z0-9_] Alphabeticcharacters [:alpha:] \a \p{Alpha} [A-Za-z] Spaceandtab [:blank:] \s \p{Blank} [\t] Wordboundaries \b \ \b (?<=\W)(?=\w)|(?<=\w)(?=\W) Non-wordboundaries \B (?<=\W)(?=\W)|(?<=\w)(?=\w) Controlcharacters [:cntrl:] \p{Cntrl} [\x00-\x1F\x7F] Digits [:digit:] \d \d \p{Digit}or\d [0-9] Non-digits \D \D \D [^0-9] Visiblecharacters [:graph:] \p{Graph} [\x21-\x7E] Lowercaseletters [:lower:] \l \p{Lower} [a-z] Visiblecharactersandthespacecharacter [:print:] \p \p{Print} [\x20-\x7E] Punctuationcharacters [:punct:] \p{Punct} [][!"#$%&'()*+,./:;<=>?@\^_`{|}~-] Whitespacecharacters [:space:] \s \_s \p{Space}or\s [\t\r\n\v\f] Non-whitespacecharacters \S \S \S [^\t\r\n\v\f] Uppercaseletters [:upper:] \u \p{Upper} [A-Z] Hexadecimaldigits [:xdigit:] \x \p{XDigit} [A-Fa-f0-9] POSIXcharacterclassescanonlybeusedwithinbracketexpressions.Forexample,[[:upper:]ab]matchestheuppercaselettersandlowercase"a"and"b". Anadditionalnon-POSIXclassunderstoodbysometoolsis[:word:],whichisusuallydefinedas[:alnum:]plusunderscore.Thisreflectsthefactthatinmanyprogramminglanguagesthesearethecharactersthatmaybeusedinidentifiers.TheeditorVimfurtherdistinguisheswordandword-headclasses(usingthenotation\wand\h)sinceinmanyprogramminglanguagesthecharactersthatcanbeginanidentifierarenotthesameasthosethatcanoccurinotherpositions:numbersaregenerallyexcluded,soanidentifierwouldlooklike\h\w*or[[:alpha:]_][[:alnum:]_]*inPOSIXnotation. NotethatwhatthePOSIXregexstandardscallcharacterclassesarecommonlyreferredtoasPOSIXcharacterclassesinotherregexflavorswhichsupportthem.Withmostotherregexflavors,thetermcharacterclassisusedtodescribewhatPOSIXcallsbracketexpressions. PerlandPCRE[edit] Seealso:PerlCompatibleRegularExpressions Becauseofitsexpressivepowerand(relative)easeofreading,manyotherutilitiesandprogramminglanguageshaveadoptedsyntaxsimilartoPerl's—forexample,Java,JavaScript,Julia,Python,Ruby,Qt,Microsoft's.NETFramework,andXMLSchema.SomelanguagesandtoolssuchasBoostandPHPsupportmultipleregexflavors.Perl-derivativeregeximplementationsarenotidenticalandusuallyimplementasubsetoffeaturesfoundinPerl5.0,releasedin1994.Perlsometimesdoesincorporatefeaturesinitiallyfoundinotherlanguages.Forexample,Perl5.10implementssyntacticextensionsoriginallydevelopedinPCREandPython.[37] Lazymatching[edit] InPythonandsomeotherimplementations(e.g.Java),thethreecommonquantifiers(*,+and?)aregreedybydefaultbecausetheymatchasmanycharactersaspossible.[38]Theregex".+"(includingthedouble-quotes)appliedtothestring "Ganymede,"hecontinued,"isthelargestmoonintheSolarSystem." matchestheentireline(becausetheentirelinebeginsandendswithadouble-quote)insteadofmatchingonlythefirstpart,"Ganymede,".Theaforementionedquantifiersmay,however,bemadelazyorminimalorreluctant,matchingasfewcharactersaspossible,byappendingaquestionmark:".+?"matchesonly"Ganymede,".[38] Possessivematching[edit] InJava,quantifiersmaybemadepossessivebyappendingaplussign,whichdisablesbackingoff(inabacktrackingengine),evenifdoingsowouldallowtheoverallmatchtosucceed:[39]Whiletheregex".*"appliedtothestring "Ganymede,"hecontinued,"isthelargestmoonintheSolarSystem." matchestheentireline,theregex".*+"doesnotmatchatall,because.*+consumestheentireinput,includingthefinal".Thus,possessivequantifiersaremostusefulwithnegatedcharacterclasses,e.g."[^"]*+",whichmatches"Ganymede,"whenappliedtothesamestring. Anothercommonextensionservingthesamefunctionisatomicgrouping,whichdisablesbacktrackingforaparenthesizedgroup.Thetypicalsyntaxis(?>group).Forexample,while^(wi|w)i$matchesbothwiandwii,^(?>wi|w)i$onlymatcheswiibecausetheengineisforbiddenfrombacktrackingandtrywithsettingthegroupas"w".[40] Possessivequantifiersareeasiertoimplementthangreedyandlazyquantifiers,andaretypicallymoreefficientatruntime.[39] Patternsfornon-regularlanguages[edit] Manyfeaturesfoundinvirtuallyallmodernregularexpressionlibrariesprovideanexpressivepowerthatexceedstheregularlanguages.Forexample,manyimplementationsallowgroupingsubexpressionswithparenthesesandrecallingthevaluetheymatchinthesameexpression(backreferences).Thismeansthat,amongotherthings,apatterncanmatchstringsofrepeatedwordslike"papa"or"WikiWiki",calledsquaresinformallanguagetheory.Thepatternforthesestringsis(.+)\1. Thelanguageofsquaresisnotregular,norisitcontext-free,duetothepumpinglemma.However,patternmatchingwithanunboundednumberofbackreferences,assupportedbynumerousmoderntools,isstillcontextsensitive.[41]ThegeneralproblemofmatchinganynumberofbackreferencesisNP-complete,growingexponentiallybythenumberofbackrefgroupsused.[42] However,manytools,libraries,andenginesthatprovidesuchconstructionsstillusethetermregularexpressionfortheirpatterns.Thishasledtoanomenclaturewherethetermregularexpressionhasdifferentmeaningsinformallanguagetheoryandpatternmatching.Forthisreason,somepeoplehavetakentousingthetermregex,regexp,orsimplypatterntodescribethelatter.LarryWall,authorofthePerlprogramminglanguage,writesinanessayaboutthedesignofRaku: "Regularexpressions"[…]areonlymarginallyrelatedtorealregularexpressions.Nevertheless,thetermhasgrownwiththecapabilitiesofourpatternmatchingengines,soI'mnotgoingtotrytofightlinguisticnecessityhere.Iwill,however,generallycallthem"regexes"(or"regexen",whenI'minanAnglo-Saxonmood).[21] Assertion Lookbehind Lookahead Positive (?<=pattern) (?=pattern) Negative (?=5.\n"; } Output: HelloWorld haslength>=5. () Groupsaseriesofpatternelementstoasingleelement.Whenyoumatchapatternwithinparentheses,youcanuseanyof$1,$2,...latertorefertothepreviouslymatchedpattern.Someimplementationsmayuseabackslashnotationinstead,like\1,\2. $string1="HelloWorld\n"; if($string1=~m/(H..).(o..)/){ print"Wematched'$1'and'$2'.\n"; } Output: Wematched'Hel'and'oW'. + Matchestheprecedingpatternelementoneormoretimes. $string1="HelloWorld\n"; if($string1=~m/l+/){ print"Thereareoneormoreconsecutiveletter\"l\"'sin$string1.\n"; } Output: Thereareoneormoreconsecutiveletter"l"'sinHelloWorld. ? Matchestheprecedingpatternelementzerooronetime. $string1="HelloWorld\n"; if($string1=~m/H.?e/){ print"Thereisan'H'anda'e'separatedby"; print"0-1characters(e.g.,HeHueHee).\n"; } Output: Thereisan'H'anda'e'separatedby0-1characters(e.g.,HeHueHee). ? Modifiesthe*,+,?or{M,N}'dregexthatcomesbeforetomatchasfewtimesaspossible. $string1="HelloWorld\n"; if($string1=~m/(l.+?o)/){ print"Thenon-greedymatchwith'l'followedbyoneor"; print"morecharactersis'llo'ratherthan'lloWo'.\n"; } Output: Thenon-greedymatchwith'l'followedbyoneormorecharactersis'llo'ratherthan'lloWo'. * Matchestheprecedingpatternelementzeroormoretimes. $string1="HelloWorld\n"; if($string1=~m/el*o/){ print"Thereisan'e'followedbyzerotomany"; print"'l'followedby'o'(e.g.,eo,elo,ello,elllo).\n"; } Output: Thereisan'e'followedbyzerotomany'l'followedby'o'(e.g.,eo,elo,ello,elllo). {M,N} DenotestheminimumMandthemaximumNmatchcount.NcanbeomittedandMcanbe0:{M}matches"exactly"Mtimes;{M,}matches"atleast"Mtimes;{0,N}matches"atmost"Ntimes.x*y+z?isthusequivalenttox{0,}y{1,}z{0,1}. $string1="HelloWorld\n"; if($string1=~m/l{1,2}/){ print"Thereexistsasubstringwithatleast1"; print"andatmost2l'sin$string1\n"; } Output: Thereexistsasubstringwithatleast1andatmost2l'sinHelloWorld […] Denotesasetofpossiblecharactermatches. $string1="HelloWorld\n"; if($string1=~m/[aeiou]+/){ print"$string1containsoneormorevowels.\n"; } Output: HelloWorld containsoneormorevowels. | Separatesalternatepossibilities. $string1="HelloWorld\n"; if($string1=~m/(Hello|Hi|Pogo)/){ print"$string1containsatleastoneofHello,Hi,orPogo."; } Output: HelloWorld containsatleastoneofHello,Hi,orPogo. \b Matchesazero-widthboundarybetweenaword-classcharacter(seenext)andeitheranon-wordclasscharacteroranedge;sameas (^\w|\w$|\W\w|\w\W). $string1="HelloWorld\n"; if($string1=~m/llo\b/){ print"Thereisawordthatendswith'llo'.\n"; } Output: Thereisawordthatendswith'llo'. \w Matchesanalphanumericcharacter,including"_";sameas[A-Za-z0-9_]inASCII,and [\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}] inUnicode,[52]wheretheAlphabeticpropertycontainsmorethanLatinletters,andtheDecimal_NumberpropertycontainsmorethanArabdigits. $string1="HelloWorld\n"; if($string1=~m/\w/){ print"Thereisatleastonealphanumeric"; print"characterin$string1(A-Z,a-z,0-9,_).\n"; } Output: ThereisatleastonealphanumericcharacterinHelloWorld (A-Z,a-z,0-9,_). \W Matchesanon-alphanumericcharacter,excluding"_";sameas[^A-Za-z0-9_]inASCII,and [^\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}] inUnicode. $string1="HelloWorld\n"; if($string1=~m/\W/){ print"ThespacebetweenHelloand"; print"Worldisnotalphanumeric.\n"; } Output: ThespacebetweenHelloandWorldisnotalphanumeric. \s Matchesawhitespacecharacter,whichinASCIIaretab,linefeed,formfeed,carriagereturn,andspace;inUnicode,alsomatchesno-breakspaces,nextline,andthevariable-widthspaces(amongstothers). $string1="HelloWorld\n"; if($string1=~m/\s.*\s/){ print"In$string1thereareTWOwhitespacecharacters,whichmay"; print"beseparatedbyothercharacters.\n"; } Output: InHelloWorld thereareTWOwhitespacecharacters,whichmaybeseparatedbyothercharacters. \S Matchesanythingbutawhitespace. $string1="HelloWorld\n"; if($string1=~m/\S.*\S/){ print"In$string1thereareTWOnon-whitespacecharacters,which"; print"maybeseparatedbyothercharacters.\n"; } Output: InHelloWorld thereareTWOnon-whitespacecharacters,whichmaybeseparatedbyothercharacters. \d Matchesadigit;sameas[0-9]inASCII;inUnicode,sameasthe\p{Digit}or\p{GC=Decimal_Number}property,whichitselfthesameasthe\p{Numeric_Type=Decimal}property. $string1="99bottlesofbeeronthewall."; if($string1=~m/(\d+)/){ print"$1isthefirstnumberin'$string1'\n"; } Output: 99isthefirstnumberin'99bottlesofbeeronthewall.' \D Matchesanon-digit;sameas[^0-9]inASCIIor\P{Digit}inUnicode. $string1="HelloWorld\n"; if($string1=~m/\D/){ print"Thereisatleastonecharacterin$string1"; print"thatisnotadigit.\n"; } Output: ThereisatleastonecharacterinHelloWorld thatisnotadigit. ^ Matchesthebeginningofalineorstring. $string1="HelloWorld\n"; if($string1=~m/^He/){ print"$string1startswiththecharacters'He'.\n"; } Output: HelloWorld startswiththecharacters'He'. $ Matchestheendofalineorstring. $string1="HelloWorld\n"; if($string1=~m/rld$/){ print"$string1isalineorstring"; print"thatendswith'rld'.\n"; } Output: HelloWorld isalineorstringthatendswith'rld'. \A Matchesthebeginningofastring(butnotaninternalline). $string1="Hello\nWorld\n"; if($string1=~m/\AH/){ print"$string1isastring"; print"thatstartswith'H'.\n"; } Output: Hello World isastringthatstartswith'H'. \z Matchestheendofastring(butnotaninternalline).[57] $string1="Hello\nWorld\n"; if($string1=~m/d\n\z/){ print"$string1isastring"; print"thatendswith'd\\n'.\n"; } Output: Hello World isastringthatendswith'd\n'. [^…] Matcheseverycharacterexcepttheonesinsidebrackets. $string1="HelloWorld\n"; if($string1=~m/[^abc]/){ print"$string1containsacharacterotherthan"; print"a,b,andc.\n"; } Output: HelloWorld containsacharacterotherthana,b,andc. Induction[edit] Mainarticle:Inductionofregularlanguages Regularexpressionscanoftenbecreated("induced"or"learned")basedonasetofexamplestrings.Thisisknownastheinductionofregularlanguagesandispartofthegeneralproblemofgrammarinductionincomputationallearningtheory.Formally,givenexamplesofstringsinaregularlanguage,andperhapsalsogivenexamplesofstringsnotinthatregularlanguage,itispossibletoinduceagrammarforthelanguage,i.e.,aregularexpressionthatgeneratesthatlanguage.Notallregularlanguagescanbeinducedinthisway(seelanguageidentificationinthelimit),butmanycan.Forexample,thesetofexamples{1,10,100},andnegativeset(ofcounterexamples){11,1001,101,0}canbeusedtoinducetheregularexpression1⋅0*(1followedbyzeroormore0s). Seealso[edit] Comparisonofregular-expressionengines ExtendedBackus–Naurform Matchingwildcards Regulartreegrammar Thompson'sconstruction–convertsaregularexpressionintoanequivalentnondeterministicfiniteautomaton(NFA) Notes[edit] ^Goyvaerts,Jan."RegularExpressionTutorial-LearnHowtoUseRegularExpressions".www.regular-expressions.info.Archivedfromtheoriginalon2016-11-01.Retrieved2016-10-31. ^Mitkov,Ruslan(2003).TheOxfordHandbookofComputationalLinguistics.OxfordUniversityPress.p. 754.ISBN 978-0-19-927634-9.Archivedfromtheoriginalon2017-02-28.Retrieved2016-07-25. ^Lawson,MarkV.(17September2003).FiniteAutomata.CRCPress.pp. 98–100.ISBN 978-1-58488-255-8.Archivedfromtheoriginalon27February2017.Retrieved25July2016. ^"re—Regularexpressionoperations—Python3.10.4documentation".docs.python.org.Retrieved2022-04-27. ^"regex(3)-Linuxmanualpage".man7.org.Retrieved2022-04-27. ^"Regularexpressionslibrary-cppreference.com".en.cppreference.com.Retrieved2022-04-27. ^"Pattern(JavaPlatformSE7)".docs.oracle.com.Retrieved2022-04-27. ^"Regularexpressions-JavaScript|MDN".developer.mozilla.org.Retrieved2022-04-27. ^Kleene1951. ^Leung,Hing(16September2010)."RegularLanguagesandFiniteAutomata"(PDF).NewMexicoStateUniversity.Archivedfromtheoriginal(PDF)on5December2013.Retrieved13August2019.TheconceptofregulareventswasintroducedbyKleeneviathedefinitionofregularexpressions. ^abThompson1968. ^abJohnsonetal.1968. ^Kernighan,Brian(2007-08-08)."ARegularExpressionsMatcher".BeautifulCode.O'ReillyMedia.pp. 1–2.ISBN 978-0-596-51004-6.Archivedfromtheoriginalon2020-10-07.Retrieved2013-05-15. ^Ritchie,DennisM."AnincompletehistoryoftheQEDTextEditor".Archivedfromtheoriginalon1999-02-21.Retrieved9October2013. ^abAho&Ullman1992,10.11BibliographicNotesforChapter10,p.589. ^Aycock2003,2.JITCompilationTechniques,2.1Genesis,p.98.sfnerror:notarget:CITEREFAycock2003(help) ^Raymond,EricS.citingDennisRitchie(2003)."JargonFile4.4.7:grep".Archivedfromtheoriginalon2011-06-05.Retrieved2009-02-17. ^"NewRegularExpressionFeaturesinTcl8.1".Archivedfromtheoriginalon2020-10-07.Retrieved2013-10-11. ^"PostgreSQL9.3.1Documentation:9.7.PatternMatching".Archivedfromtheoriginalon2020-10-07.Retrieved2013-10-12. ^Wall,LarryandthePerl5developmentteam(2006)."perlre:Perlregularexpressions".Archivedfromtheoriginalon2009-12-31.Retrieved2006-10-10. ^abWall(2002) ^"GROVF|BigDataAnalyticsAcceleration".grovf.com.Archivedfromtheoriginalon2020-10-07.Retrieved2019-10-22. ^"CUDAgrep".bkase.github.io.Archivedfromtheoriginalon2020-10-07.Retrieved2019-10-22. ^abcdgrep(1)manpage ^abHopcroft,Motwani&Ullman(2000) ^Sipser(1998) ^Gelade&Neven(2008,p. 332,Thm.4.1) ^Gruber&Holzer(2008) ^BasedonGelade&Neven(2008),aregularexpressionoflengthabout850suchthatitscomplementhasalengthabout232canbefoundatFile:RegexComplementBlowup.png. ^JayL.Gischer(1984).(Titleunknown)(TechnicalReport).StanfordUniv.,Dept.ofComp.Sc. ^JohnE.Hopcroft;RajeevMotwani&JeffreyD.Ullman(2003).IntroductiontoAutomataTheory,Languages,andComputation.UpperSaddleRiver/NJ:AddisonWesley.ISBN 978-0-201-44124-6.Here:Sect.3.4.6,p.117-120.—Thispropertyneednotholdforextendedregularexpressions,eveniftheydescribenolargerclassthanregularlanguages;cf.p.121. ^Kozen(1991)[page needed] ^V.N.Redko(1964)."Ondefiningrelationsforthealgebraofregularevents".UkrainskiiMatematicheskiiZhurnal.16(1):120–126.Archivedfromtheoriginalon2018-03-29.Retrieved2018-03-28.(InRussian) ^ISO/IEC9945-2:1993Informationtechnology–PortableOperatingSystemInterface(POSIX)–Part2:ShellandUtilities,successivelyrevisedasISO/IEC9945-2:2002Informationtechnology–PortableOperatingSystemInterface(POSIX)–Part2:SystemInterfaces,ISO/IEC9945-2:2003,andcurrentlyISO/IEC/IEEE9945:2009Informationtechnology–PortableOperatingSystemInterface(POSIX®)BaseSpecifications,Issue7 ^TheSingleUnixSpecification(Version2) ^ab"33.3.1.2CharacterClasses—Emacslispmanual—Version25.1".gnu.org.2016.Archivedfromtheoriginalon2020-10-07.Retrieved2017-04-13. ^"PerlRegularExpressionDocumentation".perldoc.perl.org.ArchivedfromtheoriginalonDecember31,2009.RetrievedJanuary8,2012. ^ab"RegularExpressionSyntax".Python3.5.0documentation.PythonSoftwareFoundation.Archivedfromtheoriginalon18July2018.Retrieved10October2015. ^ab"Essentialclasses:RegularExpressions:Quantifiers:DifferencesAmongGreedy,Reluctant,andPossessiveQuantifiers".TheJavaTutorials.Oracle.Archivedfromtheoriginalon7October2020.Retrieved23December2016. ^"AtomicGrouping".RegexTutorial.Archivedfromtheoriginalon7October2020.Retrieved24November2019. ^CezarCâmpeanu;KaiSalomaa&ShengYu(Dec2003)."AFormalStudyofPracticalRegularExpressions".InternationalJournalofFoundationsofComputerScience.14(6):1007–1018.doi:10.1142/S012905410300214X.Archivedfromtheoriginalon2015-07-04.Retrieved2015-07-03.Theorem3(p.9) ^"PerlRegularExpressionMatchingisNP-Hard".perl.plover.com.Archivedfromtheoriginalon2020-10-07.Retrieved2019-11-21. ^WanderingLogic."Howtosimulatelookaheadsandlookbehindsinfinitestateautomata?".ComputerScienceStackExchange.Archivedfromtheoriginalon7October2020.Retrieved24November2019. ^Cox(2007) ^Laurikari(2009) ^"gnulib/lib/dfa.c".Archivedfromtheoriginalon2021-08-18.Retrieved2022-02-12.Ifthescannerdetectsatransitiononbackref,itreturnsakindof"semi-success"indicatingthatthematchwillhavetobeverifiedwithabacktrackingmatcher. ^Kearns,Steven(August2013)."SublinearMatchingWithFiniteAutomataUsingReverseSuffixScanning".arXiv:1308.3822[cs.DS]. ^Navarro,Gonzalo(10November2001)."NR‐grep:afastandflexiblepattern‐matchingtool"(PDF).Software:PracticeandExperience.31(13):1265–1312.doi:10.1002/spe.411.S2CID 3175806.Archived(PDF)fromtheoriginalon7October2020.Retrieved21November2019. ^"travisdowns/polyregex".GitHub.5July2019.Archivedfromtheoriginalon14September2020.Retrieved21November2019. ^Schmid,MarkusL.(March2019)."RegularExpressionswithBackreferences:Polynomial-TimeMatchingTechniques".arXiv:1903.05896[cs.FL]. ^"Vimdocumentation:pattern".Vimdoc.sourceforge.net.Archivedfromtheoriginalon2020-10-07.Retrieved2013-09-25. ^ab"UTS#18onUnicodeRegularExpressions,AnnexA:CharacterBlocks".Archivedfromtheoriginalon2020-10-07.Retrieved2010-02-05. ^Horowitz,Bradley(24October2011)."Afallsweep".GoogleBlog.Archivedfromtheoriginalon21October2018.Retrieved4May2019. ^Thecharacter'm'isnotalwaysrequiredtospecifyaPerlmatchoperation.Forexample,m/[^abc]/couldalsoberenderedas/[^abc]/.The'm'isonlynecessaryiftheuserwishestospecifyamatchoperationwithoutusingaforward-slashastheregexdelimiter.Sometimesitisusefultospecifyanalternateregexdelimiterinordertoavoid"delimitercollision".See'perldocperlreArchived2009-12-31attheWaybackMachine'formoredetails. ^E.g.,seeJavainaNutshell,p.213;PythonScriptingforComputationalScience,p.320;ProgrammingPHP,p.106. ^NotethatalltheifstatementsreturnaTRUEvalue ^Conway,Damian(2005)."RegularExpressions,EndofString".PerlBestPractices.O'Reilly.p. 240.ISBN 978-0-596-00173-5.Archivedfromtheoriginalon2020-10-07.Retrieved2017-09-10. References[edit] Aho,AlfredV.(1990).vanLeeuwen,Jan(ed.).Algorithmsforfindingpatternsinstrings.HandbookofTheoreticalComputerScience,volumeA:AlgorithmsandComplexity.TheMITPress.pp. 255–300. Aho,AlfredV.;Ullman,JeffreyD.(1992)."Chapter10.Patterns,Automata,andRegularExpressions"(PDF).FoundationsofComputerScience.Archivedfromtheoriginalon2020-10-07.Retrieved2013-12-14. "RegularExpressions".TheSingleUNIXSpecification,Version2.TheOpenGroup.1997.Archivedfromtheoriginalon2020-10-07.Retrieved2011-12-13. "Chapter9:RegularExpressions".TheOpenGroupBaseSpecifications.TheOpenGroup(6).2004.IEEEStd1003.1,2004Edition.Archivedfromtheoriginalon2011-12-02.Retrieved2011-12-13. Cox,Russ(2007)."RegularExpressionMatchingCanBeSimpleandFast".Archivedfromtheoriginalon2010-01-01.Retrieved2008-04-27. Forta,Ben(2004).SamsTeachYourselfRegularExpressionsin10Minutes.Sams.ISBN 978-0-672-32566-3. Friedl,JeffreyE.F.(2002).MasteringRegularExpressions.O'Reilly.ISBN 978-0-596-00289-3.Archivedfromtheoriginalon2005-08-30.Retrieved2005-04-26. Gelade,Wouter;Neven,Frank(2008).SuccinctnessoftheComplementandIntersectionofRegularExpressions.Proceedingsofthe25thInternationalSymposiumonTheoreticalAspectsofComputerScience(STACS2008).pp. 325–336.arXiv:0802.2869.Archivedfromtheoriginalon2011-07-18.Retrieved2009-06-15. Goyvaerts,Jan;Levithan,Steven(2009).RegularExpressionsCookbook.[O'reilly].ISBN 978-0-596-52068-7. Gruber,Hermann;Holzer,Markus(2008).FiniteAutomata,DigraphConnectivity,andRegularExpressionSize(PDF).Proceedingsofthe35thInternationalColloquiumonAutomata,LanguagesandProgramming(ICALP2008).LectureNotesinComputerScience.Vol. 5126.pp. 39–50.doi:10.1007/978-3-540-70583-3_4.ISBN 978-3-540-70582-6.Archived(PDF)fromtheoriginalon2011-07-11.Retrieved2011-02-03. Habibi,Mehran(2004).RealWorldRegularExpressionswithJava1.4.Springer.ISBN 978-1-59059-107-9. Hopcroft,JohnE.;Motwani,Rajeev;Ullman,JeffreyD.(2000).IntroductiontoAutomataTheory,Languages,andComputation(2nd ed.).Addison-Wesley. Johnson,WalterL.;Porter,JamesH.;Ackley,StephanieI.;Ross,DouglasT.(1968)."Automaticgenerationofefficientlexicalprocessorsusingfinitestatetechniques".CommunicationsoftheACM.11(12):805–813.doi:10.1145/364175.364185.S2CID 17253809. Kleene,StephenC.(1951).Shannon,ClaudeE.;McCarthy,John(eds.).RepresentationofEventsinNerveNetsandFiniteAutomata(PDF).AutomataStudies.PrincetonUniversityPress.pp. 3–42.Archived(PDF)fromtheoriginalon2020-10-07.Retrieved2017-12-10. Kozen,Dexter(1991)."ACompletenessTheoremforKleeneAlgebrasandtheAlgebraofRegularEvents".[1991]ProceedingsSixthAnnualIEEESymposiumonLogicinComputerScience.Proceedingsofthe6thAnnualIEEESymposiumonLogicinComputerScience(LICS1991).pp. 214–225.doi:10.1109/LICS.1991.151646.hdl:1813/6963.ISBN 978-0-8186-2230-4.S2CID 19875225. Laurikari,Ville(2009)."TRElibrary0.7.6".Archivedfromtheoriginalon2010-07-14.Retrieved2009-04-01. Liger,François;McQueen,Craig;Wilton,Paul(2002).VisualBasic.NETTextManipulationHandbook.WroxPress.ISBN 978-1-86100-730-8. Sipser,Michael(1998)."Chapter1:RegularLanguages".IntroductiontotheTheoryofComputation.PWSPublishing.pp. 31–90.ISBN 978-0-534-94728-6. Stubblebine,Tony(2003).RegularExpressionPocketReference.O'Reilly.ISBN 978-0-596-00415-6. Thompson,Ken(1968)."ProgrammingTechniques:Regularexpressionsearchalgorithm".CommunicationsoftheACM.11(6):419–422.doi:10.1145/363347.363387.S2CID 21260384. Wall,Larry(2002)."Apocalypse5:PatternMatching".Archivedfromtheoriginalon2010-01-12.Retrieved2006-10-11. Externallinks[edit] Wikibookshasabookonthetopicof:RegularExpressions TheWikibookRProgramminghasapageonthetopicof:TextProcessing LookupregularexpressioninWiktionary,thefreedictionary. MediarelatedtoRegexatWikimediaCommons RegularExpressionsatCurlie ISO/IEC9945-2:1993Informationtechnology–PortableOperatingSystemInterface(POSIX)–Part2:ShellandUtilities ISO/IEC9945-2:2002Informationtechnology–PortableOperatingSystemInterface(POSIX)–Part2:SystemInterfaces ISO/IEC9945-2:2003Informationtechnology–PortableOperatingSystemInterface(POSIX)–Part2:SystemInterfaces ISO/IEC/IEEE9945:2009Informationtechnology–PortableOperatingSystemInterface(POSIX®)BaseSpecifications,Issue7 RegularExpression,IEEEStd1003.1-2017,OpenGroup vteAutomatatheory:formallanguagesandformalgrammarsChomskyhierarchyGrammarsLanguagesAbstractmachines Type-0 — Type-1 — — — — — Type-2 — — Type-3 — — Unrestricted (nocommonname) Context-sensitive Positiverangeconcatenation Indexed — Linearcontext-freerewritingsystems Tree-adjoining Context-free Deterministiccontext-free Visiblypushdown Regular — Non-recursive Recursivelyenumerable Decidable Context-sensitive Positiverangeconcatenation* Indexed* — Linearcontext-freerewritinglanguage Tree-adjoining Context-free Deterministiccontext-free Visiblypushdown Regular Star-free Finite Turingmachine Decider Linear-bounded PTIMETuringMachine Nestedstack Threadautomaton restrictedTreestackautomaton Embeddedpushdown Nondeterministicpushdown Deterministicpushdown Visiblypushdown Finite Counter-free(withaperiodicfinitemonoid) Acyclicfinite Eachcategoryoflanguages,exceptthosemarkedbya*,isapropersubsetofthecategorydirectlyaboveit.Anylanguageineachcategoryisgeneratedbyagrammarandbyanautomatoninthecategoryinthesameline. vteStringsStringmetric Approximatestringmatching Bitapalgorithm Damerau–Levenshteindistance Editdistance GestaltPatternMatching Hammingdistance Jaro–Winklerdistance Leedistance Levenshteinautomaton Levenshteindistance Wagner–Fischeralgorithm String-searchingalgorithm Apostolico–Giancarloalgorithm Boyer–Moorestring-searchalgorithm Boyer–Moore–Horspoolalgorithm Knuth–Morris–Prattalgorithm Rabin–Karpalgorithm Multiplestringsearching Aho–Corasick Commentz-Walteralgorithm Regularexpression Comparisonofregular-expressionengines Regulargrammar Thompson'sconstruction Nondeterministicfiniteautomaton Sequencealignment Hirschberg'salgorithm Needleman–Wunschalgorithm Smith–Watermanalgorithm Datastructure DAFSA Suffixarray Suffixautomaton Suffixtree Generalizedsuffixtree Rope Ternarysearchtree Trie Other Parsing Patternmatching Compressedpatternmatching Longestcommonsubsequence Longestcommonsubstring Sequentialpatternmining Sorting Authoritycontrol:Nationallibraries France(data) Germany UnitedStates Japan CzechRepublic Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Regular_expression&oldid=1095256273" Categories:Automata(computation)FormallanguagesPatternmatchingProgrammingconstructsRegularexpressions1951introductionsHiddencategories:HarvandSfnno-targeterrorsWikipediaarticlesneedingpagenumbercitationsfromFebruary2015WebarchivetemplatewaybacklinksArticleswithshortdescriptionShortdescriptionmatchesWikidataAllarticleswithunsourcedstatementsArticleswithunsourcedstatementsfromJune2022ArticleswithunsourcedstatementsfromFebruary2018Articlescontainingpotentiallydatedstatementsfrom2016AllarticlescontainingpotentiallydatedstatementsCommonscategorylinkisonWikidataArticleswithCurlielinksArticleswithBNFidentifiersArticleswithGNDidentifiersArticleswithLCCNidentifiersArticleswithNDLidentifiersArticleswithNKCidentifiersArticleswithexamplecode Navigationmenu Personaltools NotloggedinTalkContributionsCreateaccountLogin Namespaces ArticleTalk English Views ReadEditViewhistory More Search Navigation MainpageContentsCurrenteventsRandomarticleAboutWikipediaContactusDonate Contribute HelpLearntoeditCommunityportalRecentchangesUploadfile Tools WhatlinkshereRelatedchangesUploadfileSpecialpagesPermanentlinkPageinformationCitethispageWikidataitem Print/export DownloadasPDFPrintableversion Inotherprojects WikimediaCommonsWikibooksWikiversity Languages AfrikaansالعربيةAzərbaycancaবাংলাБългарскиBoarischCatalàČeštinaDanskDeutschEestiΕλληνικάEspañolEsperantoEuskaraفارسیFrançaisGalego한국어Հայերենहिन्दीHrvatskiBahasaIndonesiaÍslenskaItalianoעבריתქართულიКыргызчаLatviešuMagyarМакедонскиMirandésNederlands日本語NorskbokmålپنجابیPolskiPortuguêsRomânăРусскийShqipSimpleEnglishSlovenčinaSlovenščinaСрпски/srpskiSuomiSvenskaதமிழ்ไทยTürkçeУкраїнськаاردوTiếngViệtWalon吴语粵語中文 Editlinks
延伸文章資訊
- 1正規表達式- JavaScript - MDN Web Docs
Regular expressions are used with the RegExp methods test and exec and with the String methods ma...
- 2正規表示式- 維基百科,自由的百科全書
Regular Expression的Regular一般被譯為正規、正則或常規。 此處的Regular即是規則、規律的意思,Regular Expression即「描述某種規則的表達式」之 ...
- 3一輩子受用的Regular Expressions -- 兼談另類的電腦學習態度
Regular Expression (簡稱regexp 或RE) 是什麼? 有人直譯為「常規表示式」; 筆者偏好意譯, 姑且叫它「字串樣版」。 它的功能是協助我們搜尋字串, 甚至對檔案內的特定...
- 4Regular expression - Wikipedia
A regular expression is a sequence of characters that specifies a search pattern in text. Usually...
- 5規則運算式語言- 快速參考 - Microsoft Docs
expression 會解譯為零寬度判斷提示。 若要避免具名或編號的擷取群組模棱兩可,您可以選擇性地使用明確的判斷提示,如下所示: