Writing a regex to detect a range of numbers? Why not just ...

文章推薦指數: 80 %
投票人數:10人

Parsing a string to integer vs. Regex matching numerical values, which one is a “better” option? I pondered this question, as I ... OpeninappHomeNotificationsListsStoriesWritePublishedinLevelUpCodingWritingaregextodetectarangeofnumbers?WhynotjustparsethestringtointegersinsteadParsingastringtointegervs.Regexmatchingnumericalvalues,whichoneisa“better”option?Iponderedthisquestion,asIwasexploringoptionsforabucketbasedapproach,todoab-testingwithNginx.Thepremiseofthisideawentlikethis.Let'ssaycohortAencompasses10%oftheaudience,whiletheremaining90%wereslottedautomaticallytocohortB.IfweweretoincreasecohortAto20%,howcanwedothisinproductionwithouthavingtoreassignusersoncohortA,ensuringpersistenceinthesametime.Partoftheideawastouse"buckets"withnumericalvalues.Eachcohortwillbeassignedarangeofnumericalvalues(e.g.cohortA’susersareassignedvaluesbetween0to19,whilecohortB’susersareassignedvaluesbetween20to99).InNginx,wewilltestforthisassignedvalueandproxytothecorrectcohortaspartoftheab-test.However,sincewecan’tperformnumericalcomparisonsinNginx(seehere,operatorssuchasarenotallowed),usingRegularExpressions(regex)seemstobeasuitablesolution.Ialsowantedtobuildarangeregexgeneratortomaketheworkflowwithregexsimpler.Beforediscussingtheperformancecomparisonsofevaluatingregexvsparsingastringintoanintegerforconditionalchecks,let'sexaminehowwecanbreakdownanumericalrangetoitsregexequivalent.TheRegexPuzzleThesimplestwaytosolvethisproblemistoconvertthenumericalrangeintoaregexvaluewitheverysinglenumberoccurringwithintherange.Asimplerangeof0-99willbeconvertedtothefollowingregex./^(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99)$/However,theabovedoesn'ttakeadvantageofbasicfeaturesofregex,suchasdetectingforarangeofcharactersandgrouping.Oneexamplewouldbearangelike10,11,12 ...19.Sincethefirstcharacterofeachnumberisthesameinthisrange,wecouldgrouptheregexforthefirstcharacter.Anotherobservationisthatforthesecondcharacter,allnumericalvaluesbetween0-9arefound,whichwecanverifywithasimplerterminregex.Theresultingregexisasfollows:/^1[0-9]$/Expandingthelogictothepreviousqueryof0-99,wegetamuchmoresimplifiedregexvalue:/^[0-9]{1,2}$/Howdoestheperformanceofthelongandshortregexmatchup?WhentestingfortheperformanceinJavaScript,Igotthefollowing:Performancebenchmark,courtesyofjsperf.com—seehereforrunningthistestyourselfAsseenabove,Iranaperformancebenchmarkofthelongandshortregex3times,toensurethatIamgettingconsistentresults.Theconclusionwasthatwhiletheshorterregexperformedbetter,thedifferencewassmall.However,astheregexgrowslongerforlongerranges,wecanexpectthedifferencetogrowaswell.Besides,theshorterregexmakesiteasiertoread,andthere’snoneedtoinundateyourcodewithalongregex,especiallyifthenumbersandrangeinvolvedarehuge.Sidenote:regex101hasareallycoolrulebreakdownandexplanationforallregexthings.Seehowtheaboveregexvalueisexplainedinthefollowing:regex101—linkButisthesolutionthatsimple?Withtheoptimisationabove,wefindthatwehaveencounteredanotherproblem,whichisthatitismerelyasubsetofthefinalsolution.Takeanexampleofarangefor2-99.Wefindthatwecannotsimplyusethesamesolutionas0-99,asitwouldtest0and1aspositiveaswell.However,ifweremove2-9fromtherange,andtestfor10-99instead,wecanmodifytheprevioussolutionslightlytogetanacceptablesolutionforthissub-range:/^[1-9][0-9]$/On the other hand,detectingfor2-9isreminiscentofthesameproblem,andwassimpletosolve:/^[2-9]$/Puttingthetwosolutionstogether,wewereabletogetasolutionforthisrange:/^([2-9]|[1-9][0-9])$/ThecompletesolutionBasedontheabove,Istartedformulatingrulestowhatwouldconstituteasacompletesolutionforgeneratingaregex,todetectnumbersinarange.Getthelargestsub-rangethatcanberepresentedasaregex,whichusuallyliesinthemiddleofthefullrange.Thissub-rangetypicallystartswithanumberthathasthehighestamountof“0”intherange,countingfromtheright.Thenumberattheendofthisrangeusuallyhasthesameamountofcharactersasthefirstnumberinthissub-range,andhasthesameamountof“9”astheamountof“0”asthefirstnumber.Usingthesub-rangeremainingontheleftofthefullrange,generatearegex.Usingthesub-rangeremainingontherightofthefullrange,generatearegex.Let’sseeanexample,fromapplyingtheaboverulestoarangefor2-98:Woot!Itseemslikeitworked!Howaboutsomethingmorecomplicated,like2-963 ?Uhoh.Breakingthefullrangedownto3sub-rangesdidn’tseemtosolvetheproblem.Butifyou’reobservantenough,you’llseethatontheleft,wehavealreadysolvedtheproblemfortherange2-99.Canweapplythesolutioniterativelyonbothsides,untilwereachthefinalsolution?Turnsoutthatwecan!Adjustingtherulestomatchtheabove,wegottheserevisedrules:….(sameasbefore)Usingthesub-rangeremainingontheleftofthefullrange,generatearegex.Ifwecannotresolvethissub-rangeimmediatelytoaregex,applyallthestepsfrom1tothissub-range.Usingthesub-rangeremainingontherightofthefullrange,generatearegex.Ifwecannotresolvethissub-rangeimmediatelytoaregex,applyallthestepsfrom1tothissub-range.👍👍👍👍👍👍BuildingtheregexgeneratorWiththerulessetinstone,buildingtheregexgeneratorwasjustthepointofimplementingtherules.IdecidedtobuildasimplefrontendappwithTypeScript,somethingthatIhadnotyetexplored at work.Anyway,here’sthegeneratorinitsfullglory:TheMomentofTruthArmedwithmyregexgenerator,Idecidedto“gototown”withthiscomparisonbenchmark.Thetestherewastocheckifthevalue(1234567)isbetweentherangeof0-12345678.Noooooooo,theregexsolutionconsistently performed worse, by about 98%.Hmm,whatifIweretosimplifythevaluebeingmatched ?Sadly,theregexsolutionagaindidn’tdeliverinperformanceascomparedtoparsingastringtoanumber,regardlessofcomplexityoftheregex,inJavaScript.Ifyou’recuriousinrunningthetestsaboveonyourownmachine,docheckouttheselinks:ComplexregexvsparseInt—https://jsperf.com/regex-vs-parseint-compSimpleregexvsparseInt—https://jsperf.com/simple-regex-vs-parseint-compeAnyway,that’sallIhaveinregardstothisveryinterestingproblem.Ifyouhaveanythoughts,doletmeknowinthecomments!Ciao!TL;DRTestedinJavaScript,parsingastringandcomparingitiswayfasterthanusingregex.--1MorefromLevelUpCodingFollowCodingtutorialsandnews.Thedeveloperhomepagegitconnected.com&&skilled.dev&&levelup.devReadmorefromLevelUpCodingRecommendedfromMediumNirvanainNirvanaFinanceTheWeeklyDharma:July1SKVMInfotechKishanHubAppUsingFlutter&FirebaseshotarochibainCollaboGateNowweuselinterArtemDumanovPowerfullandsimplesearchengineinDjangoRestFrameworkRahulbhatia1998LVM(LogicalVolumeManager)akaElasticStorageVolumeRaviSharmaTwoMinuteITAdvice:FREEcanbeexpensive!MetadluxROADMAPA7SebastianinGeekCultureTheRaspberryPicoMicrocontroller:HardwareandGPIOFunctionsAboutHelpTermsPrivacyGettheMediumappGetstartedWeiyuan202FollowersDirectorofEngineering,ZilLearn|FormerTechLeadManager,Hubble|FormerTechLead,RakutenViki|bit.ly/weiyuanFollowMorefromMediumAkashSrivastavainLevelUpCodingUnderstandingIsolationLevelsinaDatabaseTransactionPavloStavytskyiinCodeXVulkaninBazelprojects.Part1 — EnvironmentsetupChethRoweinBetterProgrammingSemanticallyInflectYourAPIRoutesMaxenceCailleinTheDevProjectProbabilisticDataStructures|UsingtheBloomFiltertodetectsimilaritiesHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?