Right way to write string into UTF-8 file? - Python Forum

文章推薦指數: 80 %
投票人數:10人

Python 3 has full Unicode support and has default encoding as UTf-8. Always for file out and in use UTF-8, do not encode decode anything if not ... RightwaytowritestringintoUTF-8file? PythonForum PythonCoding WebScraping&WebDevelopment 1 2 ThreadRating: 0Vote(s)-0Average 1 2 3 4 5 ThreadModes RightwaytowritestringintoUTF-8file? Winfried Lumberjack Posts:133 Threads:57 Joined:Aug2018 Reputation: 0 #1 Aug-28-2018,02:05PM (Thispostwaslastmodified:Aug-28-2018,02:05PMbyWinfried.) Hello, Ineedtoappendastringtoatextfilethat'sencodedinUTF-8. Itappearsthat,bydefault,Python3triestowriteinANSI(Latin-1,ISO8859-1,cp1252,orwhateveristhecorrectname).Asaresult,Iendupwithafilethatcannotbecorrectlydisplayed,sinceitusestwoencodingmethodsinthesamefile. (InANSI,"è"isindeed0xE8). Itriedthefollowingbutitdoesn'twork: file=open("test.latin1.utf8.txt","w") file.write("Crème") stringutf8="Crème".encode('utf-8') print(stringutf8) #BADError:TypeError:write()argumentmustbestr,notbytes file.write(stringutf8) file.close()Anyideahowtodothis? Thankyou. --- Edit:Pythonwon'tletmeopenthefileinUTF8sinceitdetectsanANSIcharacter("É"=0xc9)wronglyaddedbyanotherscript;Anditwon'tletmereplacethatfaultystringeither: #Error:UnicodeDecodeError:'utf-8'codeccan'tdecodebyte0xc9inposition327:invalidcontinuationbyte f=codecs.open(inputfile,"r","utf-8") content=f.read() f.close() … filename="Crème" #Error:AttributeError:'str'objecthasnoattribute'decode' filename=filename.decode('utf-8') Reply Find Reply snippsat Posts:6,390 Threads:115 Joined:Sep2016 Reputation: 481 #2 Aug-28-2018,02:27PM (Thispostwaslastmodified:Aug-28-2018,02:27PMbysnippsat.) Python3hasfullUnicodesupportandhasdefaultencodingasUTf-8. AlwaysforfileoutandinuseUTF-8, donotencodedecodeanythingifnotnecessaryasfortakingtext(Unicodeasdefault)outandinfromPython3. s='CrèmeandSpicyjalapeño☂' withopen('unicode.txt','w',encoding='utf-8')asf_out: f_out.write(s)Outputondisk. Output:CrèmeandSpicyjalapeño☂Readin: withopen('unicode.txt',encoding='utf-8')asf: print(f.read())Output:CrèmeandSpicyjalapeño☂ Reply Find Reply Winfried Lumberjack Posts:133 Threads:57 Joined:Aug2018 Reputation: 0 #3 Aug-28-2018,04:04PM (Thispostwaslastmodified:Aug-28-2018,04:04PMbyWinfried.) Thankyou. Thecodeworksfineas-is,butforsomereason,itmessesuptheoriginalfilewhenIusethiscode: withopen("input.gpx",'r')asf: content=f.read() content+='CrèmeandSpicyjalapeño☂' withopen(output.gpx",'w',encoding='utf-8')asf_out: f_out.write(content) Ifyou'dliketogiveitaquickshot:https://we.tl/t-neH7vye8wd Reply Find Reply snippsat Posts:6,390 Threads:115 Joined:Sep2016 Reputation: 481 #4 Aug-28-2018,05:31PM (Thispostwaslastmodified:Aug-28-2018,05:31PMbysnippsat.) Youdon'thaveencoding='utf-8'whenyoureadthefileasishow. Testwithyourfile. withopen('input.gpx')asf: print(f.read())Output: Départentre7et8h Fix: withopen('input.gpx',encoding='utf-8')asf: print(f.read())Output: Départentre7et8h Asit'sa.xmlfile,BSparsertest. frombs4importBeautifulSoup soup=BeautifulSoup(open('input.gpx',encoding='utf-8'),'xml') print(soup.find('desc').text)Output:Départentre7et8h Reply Find Reply Gribouillis Posts:3,758 Threads:54 Joined:Jan2018 Reputation: 304 #5 Aug-28-2018,07:59PM @WinfriedMakesureyouareusingpython3.Oneofthemajorachievementofpython3overpython2isthecorrecthandlingofunicodedata. Reply Find Reply Winfried Lumberjack Posts:133 Threads:57 Joined:Aug2018 Reputation: 0 #6 Aug-29-2018,05:22AM (Thispostwaslastmodified:Aug-29-2018,05:22AMbyWinfried.) Thanksmuchforthetiponopen(…,encoding='utf-8')! I'vealsolearnedthatUltraEdit(famousWindowseditor)encodesfilesinLatin1whilePyScripter(IDE)usesUTF-8,sothelatterisamuchbetteralternativewhenworkingwithaccentedstrings. stuff="Crème" withopen("cp1252.txt",'w')asoutFile: outFile.write(stuff) withopen("utf8.txt",mode='w',encoding='utf-8')asoutFile: outFile.write(stuff)Usingopen()withoutanyadditionaloptionmeantthatIendedupwithamixofLatin1andUTF-8,whichpreventedmefromusingGpsBabeltomergeGPXfiles. IamusingPython3(.7.0). Thanksagain. -- Edit:UltraEditseemstohavebeenrewrittentouseUnicodeinstead. Reply Find Reply buran Posts:7,802 Threads:144 Joined:Sep2016 Reputation: 567 #7 Aug-29-2018,05:41AM (Thispostwaslastmodified:Aug-29-2018,05:41AMbyburan.) (Aug-29-2018,05:22AM)WinfriedWrote:UltraEdit(famousWindowseditor)neverheardofit,somaybenotthatfamous,butanywaylookathttp://forums.ultraedit.com/set-default-...17446.html EDIT:IseeyouaddedthatthereisnewversionwithdefaultUTFsupport Ifyoucan'texplainittoasixyearold,youdon'tunderstandityourself,AlbertEinstein HowtoAskQuestionsTheSmartWay:linkandanotherlink CreateMCVexample Debugsmallprograms Reply Find Reply Winfried Lumberjack Posts:133 Threads:57 Joined:Aug2018 Reputation: 0 #8 Aug-29-2018,07:06AM It'saWindowseditorthat'sbeenaroundforabout25years. Ihaveyetanotherencodingissue,thistimewithgeojson:-/ IfIuseitsdump(),UTF8dataisturnedintoUTF16(apparently),eg."Fran\u00e7ois"insteadof"François": withopen('input.geojson',encoding='utf-8')asf: gj=geojson.load(f) fortrackingj['features']: #NODIFFwithopen(track['properties']['name'][0]+'.geojson','a+',encoding='utf-8')asf: withopen(track['properties']['name'][0]+'.geojson','a+')asf: dump(track,f,indent=2) #UnicodeEncodeError:'charmap'codeccan'tencodecharacter'\u2194'inposition7:charactermapsto #dump(track,f,indent=2,ensure_ascii=False) #NOTDEFINED #dumps(track,f,indent=2) #AttributeError:encode #dump(track.encode("utf-8"),f,indent=2,ensure_ascii=False)Asshown,Ifoundandtriedseveralthings,alltonoavail. ShouldIuseanothermethodthan"dump"? Reply Find Reply Winfried Lumberjack Posts:133 Threads:57 Joined:Aug2018 Reputation: 0 #9 Aug-29-2018,06:10PM Ifthere'snowayaroundit,Icanlivewithaccentsbeinghuman-unreadable,butIwonderwhyJSONturnsthemintoeg\u00e7. BTW,Ilearnedthatyoucan'tsimplydumptracksintoafilelikeIdidabove:Tohaveacleanfile,youmustfirstbuildalist,andacollection,anddumpthecollection: withopen(INPUTFILE,encoding='utf-8')asf: gj=geojson.load(f) features=[] fortrackingj['features']: features.append(track) feature_collection=FeatureCollection(features) withopen('myfile.geojson','w')asf: dump(feature_collection,f,indent=2) Reply Find Reply snippsat Posts:6,390 Threads:115 Joined:Sep2016 Reputation: 481 #10 Aug-29-2018,06:22PM (Aug-29-2018,07:06AM)WinfriedWrote:ShouldIuseanothermethodthan"dump"?Notsurebecausedon'tknowwhereigoeswrong,don'nothavefileyouuse. Youshouldnotusefinbothreadanddump. HereaexamplewhereiuseageojsonfileandputinFrançois. importjson frompprintimportpprint withopen('map.geojson',encoding='utf-8')asf: data=json.load(f) pprint(data)Output:{'features':[{'geometry':{'coordinates':[[8.9208984375,61.05828537037916], [9.84375,61.4597705702975], [10.7666015625, 60.930432202923335]], 'type':'François'}, 'properties':{}, 'type':'Feature'}], 'type':'FeatureCollection'}Dumpdata. importjson withopen("data_file.json","w",encoding='utf-8')asf_out: json.dump(data,f_out,ensure_ascii=FalseContentofdata_file.json. Output:{ "type":"FeatureCollection", "features":[ { "type":"Feature", "properties":{}, "geometry":{ "type":"François", "coordinates":[ [ 8.9208984375, 61.05828537037916 ], [ 9.84375, 61.4597705702975 ], [ 10.7666015625, 60.930432202923335 ] ] } } ] } Reply Find Reply 1 2 Usersbrowsingthisthread:1Guest(s) ViewaPrintableVersion ForumJump: PrivateMessages UserControlPanel Who'sOnline Search ForumHome PythonCoding --GeneralCodingHelp --DataScience --Homework --GUI --GameDevelopment --Networking --WebScraping&WebDevelopment General --NewsandDiscussions --Tutorials ----TutorialRequestsandSubmissions ----PythonInstallationandExecution ----Fundamentals ----Commonpitfallsandwhattodo ----WebScraping ----WebTutorials ----GUItutorials ----GameTutorials ----NetworkingTutorials --Codesharing --CodeReview --Jobs Forum&OffTopic --Board ----WeeklyTopPicks --Bar UserPanel Messages LogOut MyProfile Payyourprofileavisit UserControlPanel Dosomechangesonyourprofile MyMessages Viewprivatemessages unread Avatar Changeavatar Signature Changesignature Announcements Announcement#1 8/1/2020 Announcement#2 8/2/2020 Announcement#3 8/6/2020 LogintoPythonForum Enteryourdetailstologintoyouraccount: Rememberme LostPassword? Login Don'thaveanaccountyet? SignUp! LinearModeThreadedMode



請為這篇文章評分?