Remove \ufeff from a string in Python | bobbyhadz

文章推薦指數: 80 %
投票人數:10人

Use the str.replace() method to remove \ufeff BOM character from a string, e.g. result = my_str.replace('\ufeff', '') . ☰HomeBookAboutContactsHomeBookAboutContactsGitHubLinkedinTwitterRemove\ufefffromastringinPythonBorislavHadzhievLastupdated:Aug14,2022PhotofromUnsplashRemove\ufefffromastringinPython#Usethestr.replace()methodtoremove\ufeffBOMcharacterfromastring, e.g.result=my_str.replace('\ufeff','').Thereplace()methodwillremove the\ufeffcharacterfromthestringbyreplacingitwithanemptystring.main.pyCopied!#✅remove\ufefffromastring my_str='\ufefffirstline' result=my_str.replace('\ufeff','') print(repr(result))#👉️'firstline' #----------------------------------------- #✅remove\ufeffwhenreadingfromafile #👇️explicitlysetencodingtoutf-8-sig withopen('example.txt','r',encoding='utf-8-sig')asf: lines=f.readlines() print(lines) The\ufeffcharacterisabyteordermark(BOM)andisinterpretedasa zero-widthnon-breakingspace.TheBOMcharactercausesanissuewhenweuseanincorrectcodectodecodebytesthatwereencodedusingadifferentcodec.IfyouhaveastringthatcontainsaBOMcharacter,usethestr.replace() methodtoremoveit.main.pyCopied!my_str='\ufefffirstline' result=my_str.replace('\ufeff','') print(repr(result))#👉️'firstline' Thestr.replace methodreturnsacopyofthestringwithalloccurrencesofasubstringreplaced bytheprovidedreplacement.Themethodtakesthefollowingparameters:NameDescriptionoldThesubstringwewanttoreplaceinthestringnewThereplacementforeachoccurrenceofoldcountOnlythefirstcountoccurrencesarereplaced(optional)Themethoddoesn'tchangetheoriginalstring.StringsareimmutableinPython.Ifyougottheerror"UnicodeEncodeError:'ascii'codeccan'tencodecharacteru'\ufeff'"whentryingtoreadfromafile,explicitlysettheencodingkeywordargumenttoutf-8-sig.main.pyCopied!withopen('example.txt','r',encoding='utf-8-sig')asf: lines=f.readlines() print(lines) Theopen()functiontakesanencodingkeywordargument,whichcanbesetto utf-8-sigtotreatthebyteordermarkasmetadatainsteadofastring.Whendecoding,theutf-8-sigcodecskipstheBOMbyteifitappearsasthe firstbyteinthefile.Whenusingtheutf-8encoding,theuseofthebyteordermark(BOM)is discouragedandshouldbeavoided.IwroteabookinwhichIshareeverythingIknowabouthowtobecomeabetter,moreefficientprogrammer.YoucanusethesearchfieldonmyHomePagetofilterthroughallofmyarticles.ShareShareShareShareShareBorislavHadzhievWebDeveloperTwitterGitHubLinkedinSUPPORTME:)AboutContactsPolicyTerms&ConditionsTwitterGitHubLinkedinCopyright©2022BorislavHadzhievSearchforposts0..................................................................................................................................................................



請為這篇文章評分?