Python : How to fix Unexpected UTF-8 BOM error when using ...
文章推薦指數: 80 %
So the solution is simple. We just need to decode the data using utf-8-sig encoding, which will get rid of the BOM value. There are several ways ... WithPython,itisareallyeasytoretrievedatafrom3rdpartyAPIservices,soImadeascriptforthispurpose.ThescriptworkedwithoutanyissueformanydifferentAPIURLs,butrecently,whenIwantedtoloadtheservercontentresponsefromaspecificAPIURLintojson.loadsmethod,itthrewan"UnexpectedUTF-8BOM"error.Inthisarticle,wewillexaminewhattheerrormeansandvariouswaystosolveit. Toretrievethedatafrom3rdpartyAPIservice,IusethiscodeinmyPythonscript: importrequests importjson url="API_ENDPOINT_URL" r=requests.get(url) data=json.loads(r.text) #....dosomethingwiththedata... TheabovecodeusesrequestslibrarytoreadthedatafromURLandthenitusesjson.loadsmethodtodeserializeaserver'sstringresponsecontainingJSONdataintoanobject. Untilthisparticularcase,theabovecodeworkedjustfine,butnowIwasgettingthefollowingerror: json.decoder.JSONDecodeError:UnexpectedUTF-8BOM(decodeusingutf-8-sig):line1column1(char0) Theerrorwascausedbythejson.loads(r.text),soIexaminedthevalueofr.text,whichhadthis: \ufeff\n{retreiveddatafromtheapicall} Thecontentfromserver'sresponsecontainedthedatafromtheAPI,butitalsohadthatstrange\ufeffUnicodecharacteratthebeginning.Itturnsout,theUnicodecharacterwithvalueu+feff(or\xef\xbb\xbfinbinary)isabyteordermark(BOM)character. TableofContents WhatisBOMWhatisutf-8-sig?Solution1-usingcodecsmoduleSolution2-withoutusingthecodecsmoduleSolution3-usingrequests.responsecontentpropertySolution4-usingrequests.responseencodingproperty WhatisBOM AccordingtoWikipedia,theBOMisanoptionalvalueatthebeginningofatextstreamandthepresencecanmeandifferentthings.WithUTF-8textstreams,forexample,itcanbeusedtosignalthatthetextisencodedinUTF-8format,whilewithUTF-16&UTF-32,thepresenceofBOMsignalsthebyteorderofastream. Inmycase,thedatawasinUTF-8andhasalreadybeenreceived,sohavingthatBOMcharacterinr.textseemedunnecessaryandsinceitwascausingthejson.loadsmethodtothrowtheJSONDecodeError,Iwantedtogetridofit. ThehintonhowtosolvethisproblemcanbefoundinthePythonerroritself.Itmentions"decodeusingutf-8-sig",solet'sexaminethisnext. Whatisutf-8-sig? Theutf-8-sigisaPythonvariantofUTF-8,inwhich,whenusedinencoding,theBOMvaluewillbewrittenbeforeanythingelse,whilewhenusedduringdecoding,itwillskiptheUTF-8BOMcharacterifitexistsandthisisexactlywhatIneeded. Sothesolutionissimple.Wejustneedtodecodethedatausingutf-8-sigencoding,whichwillgetridoftheBOMvalue.Thereareseveralwaystoaccomplishthat. Solution1-usingcodecsmodule First,ItriedtouseacodecsmodulewhichisapartofaPythonstandardlibrary.Itcontainsencodersanddecoders,mostlyforconvertingtext.Wecanusethecodecs.decode()methodtodecodethedatausingutf-8-sigencoding.Somethinglikethis: importcodecs decoded_data=codecs.decode(r.text,'utf-8-sig') Unfortunately,thecodecs.decodemethoddidn'tacceptstrings,asitthrewthefollowingerror: TypeError:decodingwith'utf-8-sig'codecfailed(TypeError:abytes-likeobjectisrequired,not'str') Next,Itriedtoconvertthestringintoabytesobject.Thiscanbedoneusingencode()methodavailableforstrings.Ifnospecificencodingargumentisprovided,itwillusethedefaultencodingwhichisUTF-8(atleastonWindows): decoded_data=codecs.decode(r.text.encode(),'utf-8-sig') data=json.loads(decoded_data) Thedecoded_datavariablefinallycontaineddatawithouttheBOMbyteordermarkUnicodecharacterandIwasfinallyabletouseitonjson.loadsmethod. So,thisworked,butIdidn'tlikeIwasusinganextramodulejusttogetridofoneUnicodeBOMcharacter. Solution2-withoutusingthecodecsmodule Itturnsout,thereisawaytoencode/decodestringswithouttheneedofimportingcodecsmodule.Wecansimplyusedecode()methodonthereturnvalueofstring.encode()method,sowecanjustdothis: decoded_data=r.text.encode().decode('utf-8-sig') data=json.loads(decoded_data) Let'strytosimplifythisfurther. Solution3-usingrequests.responsecontentproperty Sofar,thecodeinthisarticleusedr.textthatcontainsRequest'scontentresponseinastring.Wecanskiptheencodingpartalltogetherbysimplyusingther.contentinsteadasthispropertyalreadycontainstheservercontentresponseinbytes.Wethenjustsimplyusedecode()methodonr.content: decoded_d=r.content.decode('utf-8-sig') data=json.loads(decoded_data) Solution4-usingrequests.responseencodingproperty Wecanskipthepartofcallingencode()anddecode()methodsasshowninpreviousexamplesalltogetherandinsteadusetheencodingpropertyofarequests.responseobject.Wejustneedtomakesurewesetthevaluebeforethecalltor.textasshownbelow: r.encoding='utf-8-sig' data=json.loads(r.text) Conclusion Ifthejson.loads()methodthrowsanUnexpectedUTF-8BOMerror,itisduetoaBOMvaluebeingpresentinthestreamorafile.Inthisarticle,wefirstexaminedwhatthisBOMis,thenwetouchedabitaboututf-8-sigencodingandfinally,weexamined4waystosolvethisproblem. Tweet Share Pin Reddit RelatedPosts ReadMore ReadMore ReadMore 5Comments ClickHEREtoaddyourComment glbfor December20,2019 Reply Icouldn'tresolvemyproblemfromthesuggestionsabove,butIfinallyusedthemethodbelowtosolvetheproblemsuccessfully. resp=requests.get(url,params=j,headers=headers_url_encoded,verify=False) #print(resp.content.decode('utf-8')) resp.encoding='utf-8-sig' content=resp.text.encode().decode('utf-8-sig') returnjson.loads(content) Igor January25,2020 Reply VeryGood! Brice February3,2020 Reply Couldr.json()workdirectly? Asin: ``` r.encoding='utf-8-sig' data=r.json() ``` Steve May15,2020 Reply Ijustaddedthe'b'optiontoreadthefileinasbyteandthenwasabletoworkwiththefilewithoutanyoftheabove.Butthispagewashelpfulingettingmetothinkthroughtheprocess: input_file=open('filename','br') AshwaniGupta September28,2020 Reply SuperbAnswer😀 WriteaCommentCancelreply Savemyname,email,andwebsiteinthisbrowserforthenexttimeIcomment. Δ ThissiteusesAkismettoreducespam.Learnhowyourcommentdataisprocessed. Search StayConnected FrequentTopicsWordPress-Tips&Tricks7 Windows7 Beginner’sGuide7 VisualStudio-Issues&Errors7 WordPressPlugins6 WordPress-IssuesandErrors5 Blogger5 VisualStudio-CreatingProjects4 Database4 php.ini3 VisualStudio-Tips&Tricks3 C#3 Popularposts:VisualStudio:HowtofixmissingASP.NETtemplatefor.NETFramework53commentsVS:HowtosolveCouldnotloadfileorassemblyNewtonsoft.Jsonerror14commentsSSMS:HowtofixSQLServerManagementStudionotopeningproblem76commentsVS:Howtosolveerror-Theprojectfilecouldnotbeloaded.Couldnotfindapartofthepath91commentsLocalAreaNetwork:HowtofixslowLANtransferspeedoffilesinWindows144commentsPython:HowtofixUnexpectedUTF-8BOMerrorwhenusingjson.loads5commentsWinform:HowtocreateanewpopupwindowusingC#22commentsASP.NETMVC:InstallingAdminLTEdashboardtoreplaceBootstraptemplate72comments.NET:HowtosaveDataTabletoDatabaseTableusingC#3commentsVisualStudio:HowtocreateBlankorEmptySolutionin2steps1commentsPopularpostsbyTop10plugin reportthisad
延伸文章資訊
- 1json.decoder.JSONDecodeError: Unexpected UTF-8 BOM ...
Python3解析json文件时报错:json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): l...
- 2Unexpected UTF-8 BOM (decode using utf-8-sig) - 简书
原因分析:text包含BOM字符解决方案:将BOM头去掉. 问题描述: json.loads(text,encoding='utf8') 报Unexpected UTF-8 BOM (decod...
- 3How to remove BOM from any text/XML file - IBM
- 4How to Fix json.loads Unexpected UTF-8 BOM Error in Python
Solution 1 Decode content using utf-8-sig. In this solution, we can use decode() method on the re...
- 5json.decoder.JSONDecodeError: Unexpected UTF-8 BOM ...
Unexpected UTF-8 BOM (decode using utf-8-sig). 热门推荐.