\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space.
Home
Public
Questions
Tags
Users
Companies
Collectives
ExploreCollectives
Teams
StackOverflowforTeams
–Startcollaboratingandsharingorganizationalknowledge.
CreateafreeTeam
WhyTeams?
Teams
CreatefreeTeam
Collectives™onStackOverflow
Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost.
LearnmoreaboutCollectives
Teams
Q&Aforwork
Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch.
LearnmoreaboutTeams
Howtoremove\xa0fromstringinPython?
AskQuestion
Asked
10years,4monthsago
Modified
3monthsago
Viewed
402ktimes
331
IamcurrentlyusingBeautifulSouptoparseanHTMLfileandcallingget_text(),butitseemslikeI'mbeingleftwithalotof\xa0Unicoderepresentingspaces.IsthereanefficientwaytoremovealloftheminPython2.7,andchangethemintospaces?Iguessthemoregeneralizedquestionwouldbe,isthereawaytoremoveUnicodeformatting?
Itriedusing:line=line.replace(u'\xa0',''),assuggestedbyanotherthread,butthatchangedthe\xa0'stou's,sonowIhave"u"severywhereinstead.):
EDIT:Theproblemseemstoberesolvedbystr.replace(u'\xa0','').encode('utf-8'),butjustdoing.encode('utf-8')withoutreplace()seemstocauseittospitoutevenweirdercharacters,\xc2forinstance.Cananyoneexplainthis?
pythonpython-2.7unicodebeautifulsouputf-8
Share
Improvethisquestion
Follow
editedJul14,2020at15:32
ivanleoncz
8,02744goldbadges5353silverbadges4848bronzebadges
askedJun12,2012at9:12
zhuyxnzhuyxn
6,20188goldbadges3636silverbadges4343bronzebadges
4
triedthatalready,'ascii'codeccan'tdecodebyte0xa0inposition0:ordinalnotinrange(128)
– zhuyxn
Jun12,2012at9:19
18
embraceUnicode.Useu''sinsteadof''s.:-)
– jpaugh
Jun12,2012at9:26
2
triedusingstr.replace(u'\xa0','')butgot"u"severywhereinsteadof\xa0s:/
– zhuyxn
Jun12,2012at9:30
Ifthestringistheunicodeone,youhavetousetheu''replacement,notthe''.Istheoriginalstringtheunicodeone?
– pepr
Jun12,2012at10:51
Addacomment
|
15Answers
15
Sortedby:
Resettodefault
Highestscore(default)
Trending(recentvotescountmore)
Datemodified(newestfirst)
Datecreated(oldestfirst)
385
\xa0isactuallynon-breakingspaceinLatin1(ISO8859-1),alsochr(160).Youshouldreplaceitwithaspace.
string=string.replace(u'\xa0',u'')
When.encode('utf-8'),itwillencodetheunicodetoutf-8,thatmeanseveryunicodecouldberepresentedby1to4bytes.Forthiscase,\xa0isrepresentedby2bytes\xc2\xa0.
Readuponhttp://docs.python.org/howto/unicode.html.
Pleasenote:thisanswerinfrom2012,Pythonhasmovedon,youshouldbeabletouseunicodedata.normalizenow
Share
Improvethisanswer
Follow
editedJun11,2019at1:45
TFD
23.2k22goldbadges3333silverbadges5050bronzebadges
answeredJul19,2012at17:42
samwizesamwize
24.2k1515goldbadges137137silverbadges183183bronzebadges
6
15
Idon'tknowahugeamountaboutUnicodeandcharacterencodings..butitseemslikeunicodedata.normalizewouldbemoreappropriatethanstr.replace
– dbr
Sep9,2013at7:45
Yoursisworkableadviceforstrings,butnotethatallreferencestothisstringwillalsoneedtobereplaced.Forexample,ifyouhaveaprogramthatopensfiles,andoneofthefileshasanon-breakingspaceinitsname,youwillneedtorenamethatfileinadditiontodoingthisreplacement.
– user67416
Sep23,2014at10:52
3
U+00a0isanon-breakablespaceUnicodecharacterthatcanbeencodedasb'\xa0'byteinlatin1encoding,astwobytesb'\xc2\xa0'inutf-8encoding.Itcanberepresentedas inhtml.
– jfs
Jan20,2015at12:39
4
WhenItrythis,IgetUnicodeDecodeError:'ascii'codeccan'tdecodebyte0xa0inposition397:ordinalnotinrange(128).
– jds
May28,2015at22:15
Itriedthiscodeonalistofstrings,itdidn'tdoanything,andthe\xa0characterremained.IfIreencodedmytextfiletoUTF-8,thecharacterwouldappearasanuppercaseAwithacarrotonit'shead,andIencodeditinUnicodethePythoninterpretercrashed.
– MushroomMan
Jul20,2016at22:02
|
Show1morecomment
302
There'smanyusefulthingsinPython'sunicodedatalibrary.Oneofthemisthe.normalize()function.
Try:
new_str=unicodedata.normalize("NFKD",unicode_str)
ReplacingNFKDwithanyoftheothermethodslistedinthelinkaboveifyoudon'tgettheresultsyou'reafter.
Share
Improvethisanswer
Follow
answeredJan8,2016at4:24
JamieJamie
3,12811goldbadge88silverbadges66bronzebadges
8
3
Notsosure,youmaywantnormalize('NFKD','1º\xa0dia')toreturn'1ºdia'butitreturns'1odia'
– Faccion
Nov8,2017at14:58
5
hereisthedocsaboutunicodedata.normalize
– TT--
Dec4,2017at15:04
3
ah,iftextis'KOREAN',donottrythis.글자가전부깨져버리네요.
– Cho
Oct17,2019at9:05
2
ThissolutionchangesRussianletterйtoanidenticallylookingsequenceoftwounicodecharacters.Theproblemhereisthatstringsthatusedtobeequaldonotmatchanymore.Fix:use"NFKC"insteadof"NFKD".
– Markus
Apr21,2020at19:23
2
Thisisawesome.Itchangestheone-letterstring﷼tothefour-letterstringریالthatitactuallyis.Soit'smucheasiertoreplacewhenneeded.You'dnormalizeandthenreplace,withouthavingtocarewhichoneitwas.normalize("NFKD","﷼").replace("ریال",'').
– AmirShabani
Apr29,2021at7:55
|
Show3morecomments
33
Aftertryingseveralmethods,tosummarizeit,thisishowIdidit.Followingaretwowaysofavoiding/removing\xa0charactersfromparsedHTMLstring.
Assumewehaveourrawhtmlasfollowing:
raw_html='
DearParent,
Thisisatestmessage, kindlyignoreit.
Thanks
'
SoletstrytocleanthisHTMLstring:
frombs4importBeautifulSoup
raw_html='
DearParent,
Thisisatestmessage,kindlyignoreit.
Thanks
'
text_string=BeautifulSoup(raw_html,"lxml").text
printtext_string
#u'DearParent,\xa0Thisisatestmessage,\xa0kindlyignoreit.\xa0Thanks'
Theabovecodeproducesthesecharacters\xa0inthestring.Toremovethemproperly,wecanusetwoways.
Method#1(Recommended):
ThefirstoneisBeautifulSoup'sget_textmethodwithstripargumentasTrue
Soourcodebecomes:
clean_text=BeautifulSoup(raw_html,"lxml").get_text(strip=True)
printclean_text
#DearParent,Thisisatestmessage,kindlyignoreit.Thanks
Method#2:
Theotheroptionistousepython'slibraryunicodedata
importunicodedata
text_string=BeautifulSoup(raw_html,"lxml").text
clean_text=unicodedata.normalize("NFKD",text_string)
printclean_text
#u'DearParent,Thisisatestmessage,kindlyignoreit.Thanks'
Ihavealsodetailedthesemethodsonthisblogwhichyoumaywanttorefer.
Share
Improvethisanswer
Follow
answeredJan16,2018at16:57
AliRazaBhayaniAliRazaBhayani
2,8052424silverbadges2020bronzebadges
2
4
get_text(strip=True)reallydidatrick.Thanksm8
– ChewChew
Nov24,2021at18:57
thisisveryspecificforrawhtmlreturningunicodeaftercleaningwithbs4orregex.Worksperfectly,butitwillnotremovelinebreaksortabs
– Y4RD13
May9at12:18
Addacomment
|
29
Tryusing.strip()attheendofyourline
line.strip()workedwellforme
Share
Improvethisanswer
Follow
answeredJul21,2015at21:50
user3590113user3590113
50777silverbadges1313bronzebadges
0
Addacomment
|
21
trythis:
string.replace('\\xa0','')
Share
Improvethisanswer
Follow
answeredJun12,2012at9:20
user278064user278064
9,84811goldbadge3232silverbadges4646bronzebadges
2
6
@RyanMartin:thisreplacesfourbytes:len(b'\\xa0')==4butlen(b'\xa0')==1.Ifpossible;youshouldfixupstreamthatgeneratestheseescapes.
– jfs
Jan20,2015at12:43
3
Thissolutionworkedforme:string.replace('\xa0','')
– JenyaPu
Jul4,2020at14:31
Addacomment
|
14
Iranintothissameproblempullingsomedatafromasqlite3databasewithpython.Theaboveanswersdidn'tworkforme(notsurewhy),butthisdid:line=line.decode('ascii','ignore')However,mygoalwasdeletingthe\xa0s,ratherthanreplacingthemwithspaces.
Igotthisfromthissuper-helpfulunicodetutorialbyNedBatchelder.
Share
Improvethisanswer
Follow
editedJun20,2020at9:12
CommunityBot
111silverbadge
answeredDec11,2012at20:39
user1774699user1774699
4
15
Youarenowremovinganythingthatisn'taASCIIcharacter,youareprobablymaskingyouractualproblem.Using'ignore'islikeshovingthroughtheshiftstickeventhoughyoudon'tunderstandhowtheclutchworks..
– MartijnPieters
♦
Dec11,2012at20:58
@MartijnPietersThelinkedunicodetutorialisgood,butyouarecompletelycorrect-str.encode(...,'ignore')istheUnicode-handlingequivalentoftry:...except:....Whileitmighthidetheerrormessage,itrarelysolvestheproblem.
– dbr
Sep9,2013at7:43
2
forsomepurposeslikedealingwithEMAILorURLSitseemsperfecttouse.decode('ascii','ignore')
– andilabs
Dec12,2014at10:15
2
samwize'sanswerdidn'tworkforyoubecauseitworksonUnicodestrings.line.decode()inyouranswersuggeststhatyourinputisabytestring(youshouldnotcall.decode()onaUnicodestring(toenforceit,themethodisremovedinPython3).Idon'tunderstandhowitispossibletoseethetutorialthatyou'velinkedinyouranswerandmissthedifferencebetweenbytesandUnicode(donotmixthem).
– jfs
Jan20,2015at12:49
Addacomment
|
12
Trythiscode
importre
re.sub(r'[^\x00-\x7F]+','','pasteyourstringhere').decode('utf-8','ignore').strip()
Share
Improvethisanswer
Follow
answeredMar20,2017at13:04
shivashiva
40911goldbadge55silverbadges1717bronzebadges
Addacomment
|
11
Pythonrecognizeitlikeaspacecharacter,soyoucansplititwithoutargsandjoinbyanormalwhitespace:
line=''.join(line.split())
Share
Improvethisanswer
Follow
answeredApr23,2019at7:16
JonhyBeebopJonhyBeebop
1,42411goldbadge1717silverbadges2929bronzebadges
0
Addacomment
|
9
Iendupherewhilegooglingfortheproblemwithnotprintablecharacter.IuseMySQLUTF-8general_cianddealwithpolishlanguage.ForproblematicstringsIhavetoproccedasfollows:
text=text.replace('\xc2\xa0','')
Itisjustfastworkaroundandyouprobabllyshouldtrysomethingwithrightencodingsetup.
Share
Improvethisanswer
Follow
editedJun10,2015at15:30
answeredFeb22,2014at12:09
andilabsandilabs
21.2k1414goldbadges111111silverbadges144144bronzebadges
1
2
thisworksiftextisabytestringthatrepresentsatextencodedusingutf-8.Ifyouareworkingwithtext;decodeittoUnicodefirst(.decode('utf-8'))andencodeittoabytestringonlyattheveryend(ifAPIdoesnotsupportUnicodedirectlye.g.,socket).AllintermediateoperationsonthetextshouldbeperformedonUnicode.
– jfs
Jan20,2015at12:57
Addacomment
|
6
InBeautifulSoup,youcanpassget_text()thestripparameter,whichstripswhitespacefromthebeginningandendofthetext.Thiswillremove\xa0oranyotherwhitespaceifitoccursatthestartorendofthestring.BeautifulSoupreplacedanemptystringwith\xa0andthissolvedtheproblemforme.
mytext=soup.get_text(strip=True)
Share
Improvethisanswer
Follow
editedJan19,2015at15:25
shauryachats
9,56544goldbadges3535silverbadges4848bronzebadges
answeredJan19,2015at14:51
MarkMark
6111silverbadge22bronzebadges
1
9
strip=Trueworksonlyif isatthebeginningorendofeachbitoftext.Itwon'tremovethespaceifitisinbetweenothercharactersinthetext.
– jfs
Jan20,2015at13:01
Addacomment
|
5
It'stheequivalentofaspacecharacter,sostripit
print(string.strip())#nomorexa0
Share
Improvethisanswer
Follow
answeredMar6,2019at17:23
8bitjunkie8bitjunkie
12.4k99goldbadges5353silverbadges6969bronzebadges
1
5
Thiswillonlyremoveitifit'satthebeginningorendofthestring.
– Bill
Jan18,2021at23:55
Addacomment
|
4
0xA0(Unicode)is0xC2A0inUTF-8..encode('utf8')willjusttakeyourUnicode0xA0andreplacewithUTF-8's0xC2A0.Hencetheapparitionof0xC2s...Encodingisnotreplacing,asyou'veprobablyrealizednow.
Share
Improvethisanswer
Follow
editedSep26,2012at5:55
answeredJun12,2012at12:02
ddadda
5,81422goldbadges2424silverbadges3434bronzebadges
1
1
0xc2a0isambiguous(byteorder).Useb'\xc2\xa0'bytesliteralinstead.
– jfs
Jan20,2015at13:03
Addacomment
|
2
Youcantrystring.strip()
Itworkedforme!:)
Share
Improvethisanswer
Follow
editedJan30,2021at14:54
sta
26.5k88goldbadges4040silverbadges5353bronzebadges
answeredJan30,2021at14:13
SaemaMiftahSaemaMiftah
2922bronzebadges
Addacomment
|
1
Genericversionwiththeregularexpression(Itwillremoveallthecontrolcharacters):
importre
defremove_control_chart(s):
returnre.sub(r'\\x..','',s)
Share
Improvethisanswer
Follow
editedAug30,2018at6:23
answeredJul2,2018at12:28
ranaFireranaFire
955bronzebadges
Addacomment
|
1
ThisishowIsolvedthisissueasIencountered\xaoinhtmlencodedstring.
IdiscoveredaNonebreakingspaceisinsertedtoensurethatawordandsubsequentHTMLmarkupisnotseparatedduetoresizingofapage.
This
presentsaproblemfortheparsingcodeasitintroducedcodecencodingissues.Whatmadeithardwasthatwe
arenotprivytotheencodingused.FromWindowsmachinesitcanbelatin-1orCP1252(WesternISO),
butmorerecentOSeshavestandardizedtoUTF-8.Bynormalizingunicodedata,westrip\xa0
my_string=unicodedata.normalize('NFKD',my_string).encode('ASCII','ignore')
Share
Improvethisanswer
Follow
answeredJul6at3:13
AmroYounesAmroYounes
1,24211goldbadge1515silverbadges3232bronzebadges
Addacomment
|
YourAnswer
ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers.
Draftsaved
Draftdiscarded
Signuporlogin
SignupusingGoogle
SignupusingFacebook
SignupusingEmailandPassword
Submit
Postasaguest
Name
Email
Required,butnevershown
PostYourAnswer
Discard
Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy
Nottheansweryou'relookingfor?Browseotherquestionstaggedpythonpython-2.7unicodebeautifulsouputf-8oraskyourownquestion.
TheOverflowBlog
HowtoearnamillionreputationonStackOverflow:beofservicetoothers
Therightwaytojobhop(Ep.495)
FeaturedonMeta
BookmarkshaveevolvedintoSaves
Inboximprovements:markingnotificationsasread/unread,andafiltered...
Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew...
CollectivesUpdate:RecognizedMembers,Articles,andGitLab
Shouldweburninatethe[script]tag?
Linked
42
Removingunicode\u2026likecharactersinastringinpython2.7
0
Weirdprobleminpythonwithremoving\xa0andotherencodingwhenaddingitemstolist
0
Howtoremove'\xa0'inhtmlsource?
0
isthereanydirectwaytoremove\xa0fromtheoutputwhilewebscrapingusingpython
313
Replacenon-ASCIIcharacterswithasinglespace
104
Strippingnonprintablecharactersfromastringinpython
10
BeautifulSoupandUnicodeProblems
2
WhatisthisregexusedforinjQueryv1.11.0?
2
PHPIFORANDnotworking
2
Python-ReplaceSpecialCharactersfromkey,valueindictonary
Seemorelinkedquestions
Related
6474
HowdoImergetwodictionariesinasingleexpression?
6784
HowdoIcheckwhetherafileexistswithoutexceptions?
6975
WhataremetaclassesinPython?
7492
DoesPythonhaveaternaryconditionaloperator?
2557
HowdoIgetasubstringofastringinPython?
3246
HowdoIconcatenatetwolistsinPython?
3588
DoesPythonhaveastring'contains'substringmethod?
2455
HowdoIlowercaseastringinPython?
1454
UnicodeEncodeError:'ascii'codeccan'tencodecharacteru'\xa0'inposition20:ordinalnotinrange(128)
2646
HowcanIremoveakeyfromaPythondictionary?
HotNetworkQuestions
tutorialto"motionblur"peopleonbackground
HowtoruntheGUIofWindowsFeaturesOn/OffusingPowershell
Whatisthebestwaytocalculatetruepasswordentropyforhumancreatedpasswords?
Howtosimplifyapurefunction?
Interpretinganegativeself-evaluationofahighperformer
HowcanIkeepmyampfromtemperingthetoneofmyprocessor?(rockandhardmetalmusic)
Iwanttodothedoubleslitexperimentwithelectrons,but
Changelinkcolorbasedinbackgroundcolor?
Findanddeletepartiallyduplicatelines
Movingframesmethod
ArethereanyspellsotherthanWishthatcanlocateanobjectthroughleadshielding?
Whydoesn'ttheMBRS1100SchottkydiodehaveanexponentialI/Vcharacteristic?
Howtotellifmybikehasanaluminumframe
Traditionally,andcurrently,whatstopshumanvotecountersfromalteringballotstomakethem'Spoilt/Invalidvotes?
SomeoneofferedtaxdeductibledonationasapaymentmethodforsomethingIamselling.AmIgettingscammed?
Levinson'salgorithmandQRdecompositionforcomplexleast-squaresFIRdesign
Howtoprovethisalgebraicidentity?
Whyismyropeweird-looking?
HowdothosewhoholdtoaliteralinterpretationofthefloodaccountrespondtothecriticismthatNoahbuildingthearkwouldbeunfeasible?
Whyistherealotofcurrentvariationattheoutputofabuckwhenabatteryisconnectedattheoutput?
AreChernclasseswelldefineduptocontractiblechoice?
Areyougettingtiredofregularcrosswords?
What'sthedifferencebetween'Dynamic','Random',and'Procedural'generations?
AmIreallyrequiredtosetupanInheritedIRA?
morehotquestions
Questionfeed
SubscribetoRSS
Questionfeed
TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader.
lang-py
Yourprivacy
Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy.
Acceptallcookies
Customizesettings