You can extract a substring by specifying the position and number of characters, or with regular expression patterns.Extract a substring by ...
Top
Python
ExtractasubstringfromastringinPython(position,regex)
Posted:2022-05-20/Tags:Python,String,Regex
Tweet
ThisarticledescribeshowtoextractasubstringfromastringinPython.Youcanextractasubstringbyspecifyingthepositionandnumberofcharacters,orwithregularexpressionpatterns.
Extractasubstringbyspecifyingthepositionandnumberofcharacters
Extractacharacterbyindex
Extractasubstringbyslicing
Extractbasedonthenumberofcharacters
Extractasubstringwithregularexpressions:re.search(),re.findall()
Regularexpressionpatternexamples
Wildcard-likepatterns
Greedyandnon-greedy
Extractpartofthepatternwithparentheses
Matchanysinglecharacter
Matchthestart/endofthestring
Extractbymultiplepatterns
Case-insensitive
Ifyouwanttoreplaceasubstringwithanotherstring,seethefollowingarticle.
ReplacestringsinPython(replace,translate,re.sub,re.subn)
SponsoredLink
Extractasubstringbyspecifyingthepositionandnumberofcharacters
Extractacharacterbyindex
Youcangetacharacteratthedesiredpositionbyspecifyinganindexin[].Indexesbeginwith0(zero-basedindexing).
s='abcde'
print(s[0])
#a
print(s[4])
#e
source:str_index_slice.py
Youcanspecifyabackwardpositionwithnegativevalues.-1representsthelastcharacter.
print(s[-1])
#e
print(s[-5])
#a
source:str_index_slice.py
Anerrorisraisedifthenon-existentindexisspecified.
#print(s[5])
#IndexError:stringindexoutofrange
#print(s[-6])
#IndexError:stringindexoutofrange
source:str_index_slice.py
Extractasubstringbyslicing
Youcanextractasubstringintherangestart<=xend,noerrorisraisedandanemptycharacter''isextracted.
print(s[3:1])
#
print(s[3:1]=='')
#True
source:str_index_slice.py
Outofrangeisignored.
print(s[-100:100])
#abcde
source:str_index_slice.py
Inadditiontothestartpositionstartandendpositionstop,youcanspecifyanincrementsteplike[start:stop:step].Ifstepisnegative,itisextractedfromtheback.
print(s[1:4:2])
#bd
print(s[::2])
#ace
print(s[::3])
#ad
print(s[::-1])
#edcba
print(s[::-2])
#eca
source:str_index_slice.py
Formoreinformationonslicing,seethefollowingarticle.
Howtoslicealist,string,tupleinPython
Extractbasedonthenumberofcharacters
Thebuilt-infunctionlen()returnsthenumberofcharacters.Forexample,youcanusethistogetthecentralcharacterorextractthefirstorsecondhalfofthestringwithslicing.
Notethatyoucanspecifyonlyintegerintvaluesforindex[]andslice[:].Divisionby/raisesanerrorbecausetheresultisafloating-pointnumberfloat.
Thefollowingexampleusesintegerdivision//.Thedecimalpointistruncated.
s='abcdefghi'
print(len(s))
#9
#print(s[len(s)/2])
#TypeError:stringindicesmustbeintegers
print(s[len(s)//2])
#e
print(s[:len(s)//2])
#abcd
print(s[len(s)//2:])
#efghi
source:str_index_slice.py
Extractasubstringwithregularexpressions:re.search(),re.findall()
Youcanuseregularexpressionswiththeremoduleofthestandardlibrary.
re—Regularexpressionoperations—Python3.10.4documentation
Usere.search()toextractasubstringmatchingaregularexpressionpattern.Specifytheregularexpressionpatternasthefirstparameterandthetargetstringasthesecondparameter.
importre
s='012-3456-7890'
print(re.search(r'\d+',s))
#
source:str_extract_re.py
\dmatchesadigitcharacter,and+matchesoneormorerepetitionsoftheprecedingpattern.Thus,\d+matchesoneormoreconsecutivedigits.
Sincebackslash\isusedinregularexpressionspecialsequencessuchas\d,itisconvenienttousearawstringbyaddingrbefore''or"".
RawstringsinPython
Whenastringmatchesthepattern,re.search()returnsamatchobject.Youcangetthematchedpartasastringstrbythegroup()methodofthematchobject.
m=re.search(r'\d+',s)
print(m.group())
#012
print(type(m.group()))
#
source:str_extract_re.py
Asintheexampleabove,re.search()returnsonlythematchobjectofthefirstpart,eveniftherearemultiplematchingparts.
re.findall()returnsallmatchingpartsasalistofstrings.
print(re.findall(r'\d+',s))
#['012','3456','7890']
source:str_extract_re.py
SponsoredLink
Regularexpressionpatternexamples
Thissectionpresentssomeexamplesofregularexpressionpatternswithmetacharacters/specialsequences.
Wildcard-likepatterns
.matchesanysinglecharacterexceptanewline,and*matcheszeroormorerepetitionsoftheprecedingpattern.
Forexample,a.*bmatchesthestringstartingwithaandendingwithb.Since*matcheszerorepetitions,italsomatchesab.
print(re.findall('a.*b','axyzb'))
#['axyzb']
print(re.findall('a.*b','a---b'))
#['a---b']
print(re.findall('a.*b','aあいうえおb'))
#['aあいうえおb']
print(re.findall('a.*b','ab'))
#['ab']
source:str_extract_re.py
+matchesoneormorerepetitionsoftheprecedingpattern.a.+bdoesnotmatchab.
print(re.findall('a.+b','ab'))
#[]
print(re.findall('a.+b','axb'))
#['axb']
print(re.findall('a.+b','axxxxxxb'))
#['axxxxxxb']
source:str_extract_re.py
?matcheszerooroneprecedingpattern.Inthecaseofa.?b,itmatchesabandthestringwithonlyonecharacterbetweenaandb.
print(re.findall('a.?b','ab'))
#['ab']
print(re.findall('a.?b','axb'))
#['axb']
print(re.findall('a.?b','axxb'))
#[]
source:str_extract_re.py
Greedyandnon-greedy
*,+,and?areallgreedymatches,matchingasmuchtextaspossible.*?,+?,and??arenon-greedy,minimalmatches,matchingasfewcharactersaspossible.
s='axb-axxxxxxb'
print(re.findall('a.*b',s))
#['axb-axxxxxxb']
print(re.findall('a.*?b',s))
#['axb','axxxxxxb']
source:str_extract_re.py
Extractpartofthepatternwithparentheses
Ifyouenclosepartofaregularexpressionpatterninparentheses(),youcanextractasubstringinthatpart.
print(re.findall('a(.*)b','axyzb'))
#['xyz']
source:str_extract_re.py
Ifyouwanttomatchparentheses()ascharacters,escapethemwithbackslash\.
print(re.findall(r'\(.+\)','abc(def)ghi'))
#['(def)']
print(re.findall(r'\((.+)\)','abc(def)ghi'))
#['def']
source:str_extract_re.py
Matchanysinglecharacter
Enclosingastringwith[]matchesanyoneofthecharactersinthestring.
IfyouconnectconsecutiveUnicodecodepointswith-,suchas[a-z],allcharactersbetweenthemarecovered.Forexample,[a-z]matchesanyonecharacterofthelowercasealphabet.
print(re.findall('[abc]x','ax-bx-cx'))
#['ax','bx','cx']
print(re.findall('[abc]+','abc-aaa-cba'))
#['abc','aaa','cba']
print(re.findall('[a-z]+','abc-xyz'))
#['abc','xyz']
source:str_extract_re.py
Matchthestart/endofthestring
^matchesthestartofthestring,and$matchestheendofthestring.
s='abc-def-ghi'
print(re.findall('[a-z]+',s))
#['abc','def','ghi']
print(re.findall('^[a-z]+',s))
#['abc']
print(re.findall('[a-z]+$',s))
#['ghi']
source:str_extract_re.py
Extractbymultiplepatterns
Use|toextractasubstringthatmatchesoneofthemultiplepatterns.Forexample,forregularexpressionpatternsAandB,youcanwriteA|B.
s='axxxb-012'
print(re.findall('a.*b',s))
#['axxxb']
print(re.findall(r'\d+',s))
#['012']
print(re.findall(r'a.*b|\d+',s))
#['axxxb','012']
source:str_extract_re.py
Case-insensitive
Theremoduleiscase-sensitivebydefault.Settheflagsargumenttore.IGNORECASEtoperformcase-insensitive.
s='abc-Abc-ABC'
print(re.findall('[a-z]+',s))
#['abc','bc']
print(re.findall('[A-Z]+',s))
#['A','ABC']
print(re.findall('[a-z]+',s,flags=re.IGNORECASE))
#['abc','Abc','ABC']
source:str_extract_re.py
SponsoredLink
Share
Tweet
RelatedCategories
Python
String
Regex
RelatedArticles
StringcomparisoninPython(exact/partialmatch,etc.)
SplitstringsinPython(delimiter,linebreak,regex,etc.)
ReplacestringsinPython(replace,translate,re.sub,re.subn)
CreateastringinPython(single,double,triplequotes,str())
Sortalist,string,tupleinPython(sort,sorted)
ConcatenatestringsinPython(+operator,join,etc.)
WritealongstringonmultiplelinesinPython
SortalistofnumericstringsinPython
Convertbinary,octal,decimal,andhexadecimalinPython
WrapandtruncateastringwithtextwrapinPython
ConvertalistofstringsandalistofnumberstoeachotherinPython
Howtoslicealist,string,tupleinPython
Getthelengthofastring(numberofcharacters)inPython
RawstringsinPython
HandlinglinebreaksinPython(Create,concatenate,split,remove,replace)
Categories
Python
NumPy
OpenCV
pandas
Pillow
pip
scikit-image
JupyterNotebook
ImageProcessingStringRegex
File
Dateandtime
Mathematics
Dictionary
List
Summary
About
GitHub:nkmk
SponsoredLink
RelatedArticles
StringcomparisoninPython(exact/partialmatch,etc.)
SplitstringsinPython(delimiter,linebreak,regex,etc.)
ReplacestringsinPython(replace,translate,re.sub,re.subn)
CreateastringinPython(single,double,triplequotes,str())
Sortalist,string,tupleinPython(sort,sorted)
SponsoredLink