Regular Expressions - Pages supplied by users

文章推薦指數: 80 %
投票人數:10人

REGEX BASICS ; Matching simple expressions, Most characters match themselves. The only exceptions are called special characters: asterisk ( * ),; plus sign ( + ) ... REGULAREXPRESSIONS Multi-threadedNewsWatcher Contents RegularExpressions Whatarethey? Moreinformation Regexpbasics Matchingsimpleexpressions Matchinganycharacter Repeatingexpressions Groupingexpressions Choosingonecharacterfrommany Matchingthebeginningorendofaline Regexpextensions Matchingwords Alternatives Someusefulexamples Gobacktothepageonfiltering. Seethepageoncreatingandeditingfilters. REGULAREXPRESSIONS Whatareregularexpressions? Regularexpressionsareasystemformatchingpatternsintextdata,which arewidelyusedinUNIXsystems,andoccasionallyonpersonalcomputers aswell.Theyprovideaverypowerful,butalsoratherobtuse,setoftools forfindingparticularwordsorcombinationsofcharactersinstrings. Onfirstreading,thisallseemsparticularlycomplicatedandnotofmuch useoverandabovethestandardstringmatchingprovidedintheEdit Filtersdialog(Wordmatching,forexample).Inactualfact,inthese casesNewsWatcherconvertsyourstringmatchingcriteriaintoaregular expressionwhenapplyingfilterstoarticles. However,youcanusesomeofthesimplermatchingcriteriawithease (someexamplesaresuggestedbelow),andgraduallybuildupthecomplexity oftheregularexpressionsthatyouuse. Onepointtonoteisthatregularexpressionsarenotwildcards.The regularexpression'c*t'doesnotmean'match"cat","cot"'etc. Inthiscase,itmeans'matchzeroormore'c'charactersfollowedby at',soitwouldmatch't','ct','cccct'etc. Informationsources Theinformationhereisanamalgamationofthedocumentationofregular expressionsintheMetrowerksCodeWarriorIDE,andofachapterinthe bookUNIXPowerTools(Peek,O'Reilly&Loukides).Onlineinformation (oftenthemanpagesforUNIXutilities)isavailablebyusingone ofthesearchengines(e.g.InfoSeek)to searchfor'regularexpressions'. REGEXBASICS Matchingsimpleexpressions Mostcharactersmatchthemselves.Theonlyexceptionsarecalled specialcharacters: asterisk(*), plussign(+), questionmark(?), backslash(\), period(.), caret(^), squarebrackets([and]), dollarsign($), ampersand(&). orsign(|). Tomatchaspecialcharacter,precedeitwithabackslash,likethis\*. Forexample, Thisexpression...matchesthis...butnotthis... aab \.\*.*dog 100100ABCDEFG Matchinganycharacter Aperiod(.)matchesanycharacterexceptanewlinecharacter. Thisexpression...matchesthis...butnotthis... .artdartart carthurt tartdark Repeatingexpressions Youcanrepeatexpressionswithanasteriskorplussign. Aregularexpressionfollowedbyanasterisk(*)matcheszeroormore occurrencesoftheregularexpression.Ifthereisanychoice,the firstmatchingstringinalineisused. Aregularexpressionfollowedbyaplussign(+)matchesoneormore occurrencesoftheone-characterregularexpression.Ifthereisany choice,thefirstmatchingstringinalineisused. Aregularexpressionfollowedbyaquestionmark(?)matcheszeroor oneoccurrenceoftheone-characterregularexpression. Forexample: Thisexpression... matchesthis...butnotthis... a+b ab b aaab baa a*b b daa ab aaab .*cat cat dog 9393cat theoldcat c7sb@#puiercat a[n]?h aherb annhat anherb Sotomatchanyseriesofzeroormorecharacters,use".*".On itsownthisisn'tmuchuse,butinthemiddleofalongerregular expression,itcanbe. Groupingexpressions Ifanexpressionisenclosedinparentheses((and)),theeditor treatsitasoneexpressionandappliesanyasterisk(*)orplus(+) tothewholeexpression. Forexample Thisexpression... matchesthis...butnotthis... (ab)*c abc ababab ababababc ababd (.a)+b xab b ra5afab aagb Choosingonecharacterfrommany Astringofcharactersenclosedinsquarebrackets([])matchesanyonecharacter inthatstring.Ifthefirstcharacterinthebracketsisacaret(^),itmatches anycharacterexceptthoseinthestring.Forexample,[abc]matchesa,b,orc, butnotx,y,orz.However,[^abc]matchesx,y,orz,butnota,b,orc. Aminussign(-)withinsquarebracketsindicatesarangeofconsecutiveASCII characters.Forexample,[0-9]isthesameas[0123456789].Theminussignloses itsspecialmeaningifit'sthefirst(afteraninitial^,ifany)orlastcharacter inthestring. Ifarightsquarebracketisimmediatelyafteraleftsquarebracket,itdoes notterminatethestringbutisconsideredtobeoneofthecharacterstomatch. Ifanyspecialcharacter,suchasbackslash(\),asterisk(*),orplussign(+), isimmediatelyaftertheleftsquarebracket,itdoesn'thaveitsspecialmeaning andisconsideredtobeoneofthecharacterstomatch. Thisexpression... matchesthis...butnotthis... [aeiou][0-9] a6 ex i3 9a u2 $6 [^cfl]og dog cog bog fog END[.] END. END; ENDDO ENDIAN Matchingthebeginningorendofaline Youcanspecifythataregularexpressionmatchonlythebeginningorendoftheline. InNewsWatcher,alineisthewholefieldthatisbeingmatched,forexamplethe authororsubjectfields.Thesearecalledanchorcharacters: Ifacaret(^)isatthebeginningoftheentireregularexpression, itmatchesthebeginningofaline. Ifadollarsign($)isattheendoftheentireregularexpression, itmatchestheendofaline. Ifanentireregularexpressionisenclosedbyacaretanddollarsign (^likethis$),itmatchesanentireline. Thisexpression... matchesthis...butnotthis... ^(thecat).+ thecatruns seethecatrun .+(thecat)$ watchthecat thecateats So,tomatchallstringscontainingjustonecharacters,use"^.$". REGEXEXTENSIONS Matchingwords Youcanspecifythataregularexpressionmatchpartsofwordswith\< (matchthestartofaword)and\>(matchtheendofaword). Anexpressionlike"\"willmatchallwordsendingin-ing.Tomatchawholeword, usinganexpressionlike"\". NewsWatcherprovidesfacilitiesfordoingwordsmatches(whichusethese expressionsinternally),butifyouwantmoreflexibility,thesecomein useful.Forexample,youmightwant M.*\ tomatch MSExcel,MicrosoftExcel,MicrosquishExceletc.Toremindyou,the.* mean'zeroormore(*)ofanycharacter(.)'. Alternatives Youcandefineanexpressionlike(cash|money)tomatch stringswhichcontaineithertheword'cash',ortheword'money',orboth. Notethattheparenthesesaroundtheexpressionarerequired. REGEXEXAMPLES Examples HerearesomesampleregularexpressionsthatI'vefounduseful. Killif'subject'containsthereg.exp."(cash|money)" Thiskillsarticleswith'cash'or'money'inthesubject.Thisshouldbe acase-insensitivematch. Killif'subject'containsthereg.exp."^\[?F.?S.?" Thiskills'ForSale'articles,whichhaveasubjectlinethatstarts('^') witheitherFS,F.S.,[FS]or[F.S.].Herethe'['needstobeescapedto'\[', andthe'?'means'matchzerooroneinstanceof'. Killif'subject'containsthereg.exp."[[$%|_\*!][[$%|_\*!][[$%|_\*!]" Thisisaniftyonethatkillsthosepostswithsubjectslike"$$$blahblah"or "_______this..."whicharealmostsurelynotworthreading.Theregularexpression readslikethis.Itrepeatstherangeofcharacters[[$%|_\*!]threetimes, meaningthatanyofthecharactersinthe[]willbematched.([isnormally interpretedasstartingagrouplikethisunlessitisthefirst characteraftera[,henceitspositionhere.)Thisgroupingisthenrepeated threetimes,tomatchsubjectslike$_*or***or!_!.Youcouldprependa^toforcethematch athebeginningoftheline. Killif'Xref'containsthereg.exp."[^]+[^]+[^]+[^]+" Thiskillsarticleswhichhavebeencross-postedtofourormoregroups,and worksbylookingforrunsofnon-spacecharacters(the[^])separated byspaces. Hiliteif'subject'containsthereg.exp."News?Watcher(ignorecase)" Thiswillmatch"newswatcher","NewsWatcher","NewsWatcher","newsWatcher"andsoon. The'?'meansmatchzerooronespace. Hiliteif'subject'containsthereg.exp."Kaleid[aeio]scope" Thiswillmatch"Kaleidoscope",aswellasallthemisspellingsthatare common,the[]meaningmatchanyofthealternativeswithinthesquarebrackets. Hiliteif'subject'containsthereg.exp."^\[?A[Nn][Nn]" Thisisusefulforcatchingannouncementposts,wherethesubjectlinestartswith [ANN]orAnnor[Ann.Thefirst"^"forcesamatchatthebeginningoftheline. Thenitlooksforzeroorone(themeaningofthe"?")"["characters,butsince thisisareservedcharacter,ithastobeescapedto"\[".Thenwelookfora "A",followedbyeither"N"or"n"andthenoneormore"N"or"n"characters. Gobacktothepageonfiltering. Seethepageoncreatingandeditingfilters. MT-NewsWatcher Download Basics SpeechRecognition Filtering CoolFeatures



請為這篇文章評分?