Regular expressions - memoQ Documentation
文章推薦指數: 80 %
Regular expressions are a powerful means for finding character sequences in text. In memoQ, they are used to define segmentation rules, auto-translation ... SkipToMainContent Account Settings Logout placeholder Account Settings Logout Filter: AllFiles SubmitSearch ThistopicisformemoQ9.10.Haveanolderversion?Clickhere. Filter: AllFiles SubmitSearch Youarehere: Regularexpressions Regularexpressionsareapowerfulmeansforfindingcharactersequencesintext.InmemoQ,theyareusedtodefinesegmentationrules,auto-translationrules,orrulesfortheRegextagger.YoucanalsouseregularexpressionsinFindandreplace,andintheFilterfieldsinthetranslationeditor. Findingcharactersequencesisafamiliartasktoeveryonewhohasusedawordprocessorortexteditorbefore.TheFindorSearchdialogservesthispurpose–ifyousearchfor'cat',youreditorwillhighlightwords(orpartsofwords)suchas'cat','cats',oreven'sophisticated'. Regularexpressions,however,providealotmorefreedomtotellthecomputerwhatyouarelookingfor.Youcanidentifysequencessuchasaletter'a',followedbytwoorthreeletters'c';anumberoflettersfollowedbyoneormoredigits;eitherofthewords'cat','dog'or'mouse';oreventheoccurrencesofawordwhereitisbetweenquotationmarks–andmuchmore.Afterreadingthroughthispageandexperimentingwiththeexamples,you'llknowexactlyhow.Ifyoudonotfeelreadytolearnthedetails,theRegexAssistantwillhelpyou. Note:Thetermregularexpressioncomesfromthemathematicaltheoryonwhichthispatternmatchingmethodisbased.Itisoftenabbreviatedasregexporregex–herewe'lluseregex,orintheplural,regexes. LiteralandMeta Inawordprocessor'sold-schoolFindfunctioneverycharacterisinterpretedliterally.Ifyousearchfor'Yes?No...'itwillhighlight'Yes?No...'–ornothingifthesecharactersdonotappearinthetext.Inaregex,however,somecharactershavespecialmeaning–thesearecalledmetacharacters.Themostimportantmetacharactersare: Expression Description . Matchesanycharacter. | Eitherexpressiononitsleftandrightsidematchesthetargetstring.Forexample,'a|b'matches'a'and'b'. [] Anyoftheenclosedcharactersmaymatchthetargetcharacter.Forexample,'[ab]'matches'a'and'b'.'[0-9]'matchesanydigit. [^] Noneoftheenclosedcharactersmaymatchthetargetcharacter.Forexample,'[^ab]'matchesallcharactersexcept'a'and'b'.'[^0-9]'matchesanynon-digitcharacter. * Charactertotheleftoftheasteriskintheexpressionshouldmatch0ormoretimes.Forexample,'be*'matches'b','be'and'bee'. + Charactertotheleftoftheplussignintheexpressionshouldmatch1ormoretimes.Forexample,'be+'matches'be'and'bee'butnot'b'. ? Charactertotheleftofthequestionmarkintheexpressionshouldmatch0or1time.Forexample,'be?'matches'b'and'be'butnot'bee'. {num} Charactertotheleftoftheenclosednumbershouldmatchnumtimes.Forexample,'be{2}'matches'bee'butnot'be'. () Createsagroupand'remembers'thematchingsectionofthestring.Groupscanbeusedtore-orderpartsofastring,e.g.whenconvertingdatestoadifferentformat. \ Escapecharacter.Ifyouwanttousethecharacter'\'itself,youshoulduse'\\'. Confusing?Thistableisonlymeantasashortsummaryandreference–themeaningofalloftheseexpressionswillbeclarifiedinthesectionsbelow. Fornow,let'sfocusonthefirstone,thedot.Inaregexitmeans'anycharactermaystandhere'.Sotheexpression'No...'inaregexwillmatchanyofthefollowing: Notes Notte No... No&%XSowhatdoyouneedtowriteinaregextomatchprecisely'No...'andnoothertext?Touseacharacterthathasaspecialmeaning,youmust'escape'it:thatis,precedewithabackslash.Thus,'No\.\.\.'willmatchexactly'No...'andnothingelse. Howtotestregularexpressions InmemoQ,regularexpressionsnowworkinFindandreplace,too.Youcantryaregularexpressionbyopeningadocumentfortranslation,pressingCtrl+F.TypeyourregularexpressionintheFindwhatbox,andclickFindNext. Alternatively,youcanfilteryourdocumentinthetranslationeditor-usingregularexpressions.Justabovethetranslationgrid,checkthecheckboxwiththeUseregularexpressionsicon,andthentypetheregularexpressioninthefilterboxabovethesourceorthetargetsegments.PressEnter:memoQwillshowthosesegmentsthatmatchyourregularexpression.Forexample,hereishowyoufilterforsegmentsthatincludenumbers,followedbyadot: IfyouneedmoredetailedfeedbackfrommemoQasyoulearnregularexpressions,here'satrickto'abuse'auto-translationrulesforexperimenting.Createatestproject,andintheSettingspaneofProjecthomeclicktheAuto-translationrulestab.Inthedialogthatappears,deleteeveryrulealreadythere,andenteraruleofyourown.Forthatrule,alsoaddareplaceorderrulesothatyouseethedialogfieldsfilledasshownbelow.(Whatareplaceorderrulemeansandwhyyouneeditherewillbeexplainedbelow.) NowclickPreview,typethetextshownbelowintheBeforeauto-translationbox,andclickthePreviewbutton.Youwillseethefollowing: The'x'intheReplaceorderrulesfieldtellsmemoQtoreplacetextwhichthespecifiedregexmatcheswithaletter'x'–that'showyouknowthatyourregexisworkinginthisexperiment.IntheAutotranslationpreviewdialogyoucanseeexactlywhichpartsofthetextyouprovidedarereplacedbyan'x',allowingyoutotestyourregex. Characterclasses Nowthatwe'vecoveredthedotandknowhowtoexperimentwithnewregexes,let'smoveontosomemoreseriousexpressions.Bracketsinregexesallowyoutospecifyasetofcharacters,oracharacterclass.'[ab][01]'willmatchtwo-character-longsequenceswherethefirstcharacteriseitheran'a'ora'b',andthesecondiseithera'0'ora'1'.Thisyields4possiblematches:'a0','b0','a1','b1'. Characterclassescanbeusedtoexpressthingslike'adigitfollowedbyacommaoranexclamationmark'–whichcouldbeexpressedas'[0123456789][,!]'.This,however,wouldbeaveryinconvenientthingtowrite.Regexesknowbetter:youcanspecifyarangeofcharactersbywriting'[0-9][,!]',whichisexactlythesameasthepreviousexpression. Note:Canyouuserangestosay'matchanalphabeticalletter'?Yesandno.Atypicalsolutiontodothisusedtobe'[a-z]',whichmatchesanyofthelettersbetweenaandz.Keepinmind,however,thatmemoQworkswithmanydifferentlanguageswhichoftenhavespecialcharactersintheiralphabet.TheIcelandicletter'đ',forinstance,isdefinitelynotintherangea-z.ThereforememoQusesaspecialextensiontodealwithalphabeticalletters,whichwillbedescribedbelow. Also,keepinmindthatalllettersinmemoQregexesareinterpretedinacase-sensitiveway.Thus,'[a-z]'willmatch'f'butnot'F'. Besidesspecifyingwhatyouwanttomatch,youcanalsousecharacterclassestospecifywhatnottomatch.Theregex'[^0a].'willmatchaninfinitenumberoftwo-charactersequences,solongasthefirstcharacterisnot'0'or'a'. Escapesequences Asyousawabove,youcanspecifytheoriginalmeaningofthespecialmetacharactersbyprecedingthemwithabackslash('\'),orescapingthem.Therearealsootherpracticalescapesequencesavailable.TheonesmostimportantforthepurposesregexesareusedforinmemoQare: Sequence Description \s Whitespace:space,tabornewline \S Anythingbutwhitespace \t Tab \n Newline \d Digit(between0and9) \D Anythingbutdigits \w Alphanumericcharacterandunderscore \W Anythingbutalphanumericcharacters Quantifiers Nowthatyou'velearnedtospecifyasetofalternativecharacterstomatchatagivenposition,it'stimetomovedowntheroadandtellmemoQhowmanycharacterstomatch.Thespecialcharacters'*'and'+',andtheexpression{num}areusedforthispurpose. Theregex'x+'willmatchasequenceofcharacterswhichconsistsofoneormore'x's–thus,'x','xx','xxx'andsoon. Theregex'x{3}'willmatchasequenceofcharacterswhichconsistsofexactly3'x's–thus,'xxx',butnot'x'or'xx'.Ifthetextis'xxxx',theregexwillmatchthefirst3'x'sandignorethefourth.Visually:'xxxx'.Foraparallel,rememberthatthetraditionalFinddialogwillfindtheword'cat'in'cats'. Youcanusethe{num}quantifierinaspecialflavorbyspecifyingaminimumormaximumvalue(orboth).Thus,'x{3,5}'willmatchbetween3and5'x's;'x{3,}'willmatchanysequencewithatleast3'x's;and'x{,5}'willmatchanysequencewithatmost5'x's. Perhapsthefunniestofthequantifiersistheasterisk('*').Itsmeaningis'matchzeroormoreofthegivencharacter'.Whatonearthisthatgoodfor?Well,youcansaythingslike"matchtheletter'T'precededbysome'a's–ormaybenone".Thecorrespondingregexis'a*T',whichwillmatch'T','aT','aaT'andsoon. Alittlelessexcitingbutnolessusefulquantifieristhequestionmark.Itsmeaningistomatchzerooroneofthecharacterinfrontofit.Thus,'ax?y'willmatch'ay'and'axy',butnot'axxy'.Ifyouthinkquantifiersarefun,it'stimetocombinethemwithcharactersets.Justasaftercharacters,youcanwritequantifiersaftercharactersets.'[0-9]+%'willmatchasequenceofdigitsfollowedbyapercentagesign;forinstance,'1%'or'99%',butnot'10a%'. GroupsandAlternatives Havingcoveredcharactersetsandquantifiers,thereareonlytwostandardregexfeatureslefttoexplore:groupsandalternatives. Usingthepipe('|')symbolyoucanjoinseveralsmallerregexestosay'matcheitherthis,thatortheotherthing'.Theregex'EUR|USD|GBP'willmatchanyofthesewords,andonlythese. Whenworkingwithalternativesyoumostlyneedtogroupthemtogetherusingparenthesestogetthedesiredresults.Let'ssayyouwantaregexthatmatchesanyoftheseexpressions:'EUR15million','USD37million'and'GBP5million'.Asafirsttry,youmightbeinclinedtowrite'EUR|USD|GBP\d{1,}million'.This,however,willnotdo,asitonlymatchesthefollowingstrings:'EUR','USD'and'GBP[anynaturalnumber]million'.Youneedtogroupyouralternativestogetherintheregex:'(EUR|USD|GBP)\d{1,}million',where'EUR|USD|GBP'canbeeither'EUR'or'USD'or'GBP'and'\d{1,}'canbeanynaturalnumberstartingfromzero. Replacingandreordering Forthepurposesofsegmentation,memoQonlyusesregexestomatchpatternsinthetranslationdocument'stext.Forauto-translationrulesitalsomakesuseofanotherpowerfulregexfeaturethathastodowithgroups:replacingandreorderingpartsofthematchedtext. Replacingamatchedtextwithasinglestring:YoualreadysawapossibleuseforreplacementintheHowtotestsectionofthispage.TherewedefinedtherathersimplisticReplaceorderruleof'x'toreplacearegexmatchwiththeletter'x'forthepurposesoftesting. Reorderingand/orreplacingpartsofamatchedtext:Hereyouneedtogroupallthosepartsoftheregexinpairofparenthesesthatyouwanttoreference.ThematchenclosedineverypairofparenthesesisrememberedbymemoQandassignedanumberstartingwith1.Whenwritingthereplaceorderruleyoucanreferencetheserememberedsubstringsby'$1','$2'etc.,intheorderoftheopeningparenthesis'appearanceintheregex. Usingthepreviousregexexample,youhavetoputalso'\d{1,}'inparenthesestomakereorderingofthesecurrenciesandtheirvaluespossible:'(EUR|USD|GBP)(\d{1,})million'.Inthereplaceorderruleyoucanreference'EUR|USD|GBP'by'$1',and'\d{1,}'by'$2'.Soifyouwanttochangetheirorder,thereplaceorderrulecouldbe'$2Millionen$1'. memoQextensions Findingtags Ifyouwanttofindtagswithregularexpressions,youcanusethreespecialescapesequencestomatchthem: \tagwillmatchanytag \itagwillmatchaninlinetag(onethatappearsinaheptagonorahexagon,likethis: \mtagwillmatchamemoQtag(onethatappearsin{curlybrackets}inthetext) Becausetagsareusuallyjoinedwiththetextthatprecedesorfollowsthem,itisbesttoputtheminparentheses'()'whenyouarelookingforacombinationoftagsandtext.Example:'(\itag)int'willmatchinlinetags(nomatterwhetheropening,closing,orempty)thatarefollowedbywordslike'integrated','interesting','intentional'. Customlists Forthepurposesofsegmentationanddefiningauto-translationrulesitisoftenusefultoworkwithlistsofwords–abbreviations,thenamesofmonths,currenciesetc.Intheoryitwouldbepossibletolistthesewordsgroupedtogetherasalternativesintheregularexpressions,asyousawintheprecedingsection.However,doingsowouldresultinverycomplicatedandhardtomaintainregexes.memoQthereforeintroducesaspecialextensiontoregularexpressions:customlists. ListsofwordsusedinregularexpressionscanbedefinedintheCustomliststabofthesegmentationrulesdialogoroftheauto-translationrulesdialog,orintheTranslationpairstaboftheauto-translatablesdialog. ThecustomlistsintheCustomliststabofthesegmentationrulesdialogshouldcontaincharacters,abbreviationsthatareimportantforsegmentation(e.g.'.','!','e.g.'). ThecustomlistsintheCustomliststaboftheauto-translatablesdialogshouldcontainwordsthathavethesamesourceandtargetform(e.g.'€','$'). ThecustomlistsintheTranslationpairstaboftheauto-translatablesdialogshouldcontainsourcewordswiththeirtargetequivalents(e.g.InEnglish-Germanprojects'January'shouldbetranslatedas'Januar','February'as'Februar'etc.).Thenameofacustomlistmustalwaysstartandendwithahashmark('#').Thewordsthatmakeupacustomlistarealwaysinterpretedasplaintext,i.e.nocharactersaretreatedasmetacharacterswithaspecialmeaning.Note:ForsegmentationrulesmemoQdefinesonemorespecialitem:'#!#'.Thisextensiondoesnotinfluenceregexmatchinginanyway.Instead,ittellsmemoQtointroduceasegmentbreakatthegivenlocationiftheexpressionmatchestextintheimporteddocument. ExampleforusingcustomlistsoftheCustomliststaboftheAuto-translationrulesdialog. IfyouwantmemoQtoofferyou'15MillionenEUR'intheTranslationresultspaneforeveryoccurrenceof'EUR15million'and'37MillionenUSD'for'USD37million'.Createacustomlistlabeled'#currency#'intheCustomliststabcontaining'EUR','USD'and'GBP'. NowcreatethefollowingRegex'(#currency#)(\d{1,})million'(equivalentwith'(EUR|USD|GBP)(\d{1,})million')forwhichthereplaceorderrulecouldbe'$2Millionen$1'.ThepreviewoftheaboveRegexandreplaceorderrulewillyieldthefollowingresult: Translationpairs IfyouwantmemoQtoofferyou'15MillionenEuro'intheTranslationresultspaneforeveryoccurrenceof'EUR15million'and'37MillionenDollar'for'USD37million'.Createacustomlistlabeled'#currency2#'intheTranslationpairstabcontainingthefollowingtranslationpairs:'EUR'–'Euro','USD'–'Dollar'and'GBP'–'Pfund'. Note:ThenamefortheTranslationpairslistmustbeadifferentonethanfortheCustomlists. NowcreatethefollowingRegex'(#currency2#)(\d{1,})million'forwhichthereplaceorderrulecouldbe'$2Millionen$1'.ThepreviewoftheaboveRegexandreplaceorderrulewillyieldthefollowingresult:
延伸文章資訊
- 1Regular expression to English [closed] - regex - Stack Overflow
Given a regular expression, is there a library or webservice which will give the human/non-progra...
- 2Regex Explanation ^.*$ [duplicate] - Stack Overflow
- 3Regular Expressions Clearly Explained with Examples
- 4Regular Expression Analyzer - Online Software Tool - dCode
- 5RegExr: Learn, Build, & Test RegEx
Regular expression tester with syntax highlighting, PHP / PCRE & JS Support, contextual help, che...