Extract a substring from a string in Python (position, regex)

文章推薦指數: 80 %
投票人數:10人

You can extract a substring by specifying the position and number of characters, or with regular expression patterns.Extract a substring by ... Top Python ExtractasubstringfromastringinPython(position,regex) Posted:2022-05-20/Tags:Python,String,Regex Tweet ThisarticledescribeshowtoextractasubstringfromastringinPython.Youcanextractasubstringbyspecifyingthepositionandnumberofcharacters,orwithregularexpressionpatterns. Extractasubstringbyspecifyingthepositionandnumberofcharacters Extractacharacterbyindex Extractasubstringbyslicing Extractbasedonthenumberofcharacters Extractasubstringwithregularexpressions:re.search(),re.findall() Regularexpressionpatternexamples Wildcard-likepatterns Greedyandnon-greedy Extractpartofthepatternwithparentheses Matchanysinglecharacter Matchthestart/endofthestring Extractbymultiplepatterns Case-insensitive Ifyouwanttoreplaceasubstringwithanotherstring,seethefollowingarticle. ReplacestringsinPython(replace,translate,re.sub,re.subn) SponsoredLink Extractasubstringbyspecifyingthepositionandnumberofcharacters Extractacharacterbyindex Youcangetacharacteratthedesiredpositionbyspecifyinganindexin[].Indexesbeginwith0(zero-basedindexing). s='abcde' print(s[0]) #a print(s[4]) #e source:str_index_slice.py Youcanspecifyabackwardpositionwithnegativevalues.-1representsthelastcharacter. print(s[-1]) #e print(s[-5]) #a source:str_index_slice.py Anerrorisraisedifthenon-existentindexisspecified. #print(s[5]) #IndexError:stringindexoutofrange #print(s[-6]) #IndexError:stringindexoutofrange source:str_index_slice.py Extractasubstringbyslicing Youcanextractasubstringintherangestart<=xend,noerrorisraisedandanemptycharacter''isextracted. print(s[3:1]) # print(s[3:1]=='') #True source:str_index_slice.py Outofrangeisignored. print(s[-100:100]) #abcde source:str_index_slice.py Inadditiontothestartpositionstartandendpositionstop,youcanspecifyanincrementsteplike[start:stop:step].Ifstepisnegative,itisextractedfromtheback. print(s[1:4:2]) #bd print(s[::2]) #ace print(s[::3]) #ad print(s[::-1]) #edcba print(s[::-2]) #eca source:str_index_slice.py Formoreinformationonslicing,seethefollowingarticle. Howtoslicealist,string,tupleinPython Extractbasedonthenumberofcharacters Thebuilt-infunctionlen()returnsthenumberofcharacters.Forexample,youcanusethistogetthecentralcharacterorextractthefirstorsecondhalfofthestringwithslicing. Notethatyoucanspecifyonlyintegerintvaluesforindex[]andslice[:].Divisionby/raisesanerrorbecausetheresultisafloating-pointnumberfloat. Thefollowingexampleusesintegerdivision//.Thedecimalpointistruncated. s='abcdefghi' print(len(s)) #9 #print(s[len(s)/2]) #TypeError:stringindicesmustbeintegers print(s[len(s)//2]) #e print(s[:len(s)//2]) #abcd print(s[len(s)//2:]) #efghi source:str_index_slice.py Extractasubstringwithregularexpressions:re.search(),re.findall() Youcanuseregularexpressionswiththeremoduleofthestandardlibrary. re—Regularexpressionoperations—Python3.10.4documentation Usere.search()toextractasubstringmatchingaregularexpressionpattern.Specifytheregularexpressionpatternasthefirstparameterandthetargetstringasthesecondparameter. importre s='012-3456-7890' print(re.search(r'\d+',s)) # source:str_extract_re.py \dmatchesadigitcharacter,and+matchesoneormorerepetitionsoftheprecedingpattern.Thus,\d+matchesoneormoreconsecutivedigits. Sincebackslash\isusedinregularexpressionspecialsequencessuchas\d,itisconvenienttousearawstringbyaddingrbefore''or"". RawstringsinPython Whenastringmatchesthepattern,re.search()returnsamatchobject.Youcangetthematchedpartasastringstrbythegroup()methodofthematchobject. m=re.search(r'\d+',s) print(m.group()) #012 print(type(m.group())) # source:str_extract_re.py Asintheexampleabove,re.search()returnsonlythematchobjectofthefirstpart,eveniftherearemultiplematchingparts. re.findall()returnsallmatchingpartsasalistofstrings. print(re.findall(r'\d+',s)) #['012','3456','7890'] source:str_extract_re.py SponsoredLink Regularexpressionpatternexamples Thissectionpresentssomeexamplesofregularexpressionpatternswithmetacharacters/specialsequences. Wildcard-likepatterns .matchesanysinglecharacterexceptanewline,and*matcheszeroormorerepetitionsoftheprecedingpattern. Forexample,a.*bmatchesthestringstartingwithaandendingwithb.Since*matcheszerorepetitions,italsomatchesab. print(re.findall('a.*b','axyzb')) #['axyzb'] print(re.findall('a.*b','a---b')) #['a---b'] print(re.findall('a.*b','aあいうえおb')) #['aあいうえおb'] print(re.findall('a.*b','ab')) #['ab'] source:str_extract_re.py +matchesoneormorerepetitionsoftheprecedingpattern.a.+bdoesnotmatchab. print(re.findall('a.+b','ab')) #[] print(re.findall('a.+b','axb')) #['axb'] print(re.findall('a.+b','axxxxxxb')) #['axxxxxxb'] source:str_extract_re.py ?matcheszerooroneprecedingpattern.Inthecaseofa.?b,itmatchesabandthestringwithonlyonecharacterbetweenaandb. print(re.findall('a.?b','ab')) #['ab'] print(re.findall('a.?b','axb')) #['axb'] print(re.findall('a.?b','axxb')) #[] source:str_extract_re.py Greedyandnon-greedy *,+,and?areallgreedymatches,matchingasmuchtextaspossible.*?,+?,and??arenon-greedy,minimalmatches,matchingasfewcharactersaspossible. s='axb-axxxxxxb' print(re.findall('a.*b',s)) #['axb-axxxxxxb'] print(re.findall('a.*?b',s)) #['axb','axxxxxxb'] source:str_extract_re.py Extractpartofthepatternwithparentheses Ifyouenclosepartofaregularexpressionpatterninparentheses(),youcanextractasubstringinthatpart. print(re.findall('a(.*)b','axyzb')) #['xyz'] source:str_extract_re.py Ifyouwanttomatchparentheses()ascharacters,escapethemwithbackslash\. print(re.findall(r'\(.+\)','abc(def)ghi')) #['(def)'] print(re.findall(r'\((.+)\)','abc(def)ghi')) #['def'] source:str_extract_re.py Matchanysinglecharacter Enclosingastringwith[]matchesanyoneofthecharactersinthestring. IfyouconnectconsecutiveUnicodecodepointswith-,suchas[a-z],allcharactersbetweenthemarecovered.Forexample,[a-z]matchesanyonecharacterofthelowercasealphabet. print(re.findall('[abc]x','ax-bx-cx')) #['ax','bx','cx'] print(re.findall('[abc]+','abc-aaa-cba')) #['abc','aaa','cba'] print(re.findall('[a-z]+','abc-xyz')) #['abc','xyz'] source:str_extract_re.py Matchthestart/endofthestring ^matchesthestartofthestring,and$matchestheendofthestring. s='abc-def-ghi' print(re.findall('[a-z]+',s)) #['abc','def','ghi'] print(re.findall('^[a-z]+',s)) #['abc'] print(re.findall('[a-z]+$',s)) #['ghi'] source:str_extract_re.py Extractbymultiplepatterns Use|toextractasubstringthatmatchesoneofthemultiplepatterns.Forexample,forregularexpressionpatternsAandB,youcanwriteA|B. s='axxxb-012' print(re.findall('a.*b',s)) #['axxxb'] print(re.findall(r'\d+',s)) #['012'] print(re.findall(r'a.*b|\d+',s)) #['axxxb','012'] source:str_extract_re.py Case-insensitive Theremoduleiscase-sensitivebydefault.Settheflagsargumenttore.IGNORECASEtoperformcase-insensitive. s='abc-Abc-ABC' print(re.findall('[a-z]+',s)) #['abc','bc'] print(re.findall('[A-Z]+',s)) #['A','ABC'] print(re.findall('[a-z]+',s,flags=re.IGNORECASE)) #['abc','Abc','ABC'] source:str_extract_re.py SponsoredLink Share Tweet RelatedCategories Python String Regex RelatedArticles StringcomparisoninPython(exact/partialmatch,etc.) SplitstringsinPython(delimiter,linebreak,regex,etc.) ReplacestringsinPython(replace,translate,re.sub,re.subn) CreateastringinPython(single,double,triplequotes,str()) Sortalist,string,tupleinPython(sort,sorted) ConcatenatestringsinPython(+operator,join,etc.) WritealongstringonmultiplelinesinPython SortalistofnumericstringsinPython Convertbinary,octal,decimal,andhexadecimalinPython WrapandtruncateastringwithtextwrapinPython ConvertalistofstringsandalistofnumberstoeachotherinPython Howtoslicealist,string,tupleinPython Getthelengthofastring(numberofcharacters)inPython RawstringsinPython HandlinglinebreaksinPython(Create,concatenate,split,remove,replace) Categories Python NumPy OpenCV pandas Pillow pip scikit-image JupyterNotebook ImageProcessingStringRegex File Dateandtime Mathematics Dictionary List Summary About GitHub:nkmk SponsoredLink RelatedArticles StringcomparisoninPython(exact/partialmatch,etc.) SplitstringsinPython(delimiter,linebreak,regex,etc.) ReplacestringsinPython(replace,translate,re.sub,re.subn) CreateastringinPython(single,double,triplequotes,str()) Sortalist,string,tupleinPython(sort,sorted) SponsoredLink



請為這篇文章評分?