Python 3 has full Unicode support and has default encoding as UTf-8. Always for file out and in use UTF-8, do not encode decode anything if not ...
RightwaytowritestringintoUTF-8file?
PythonForum
PythonCoding
WebScraping&WebDevelopment
1
2
ThreadRating:
0Vote(s)-0Average
1
2
3
4
5
ThreadModes
RightwaytowritestringintoUTF-8file?
Winfried
Lumberjack
Posts:133
Threads:57
Joined:Aug2018
Reputation:
0
#1
Aug-28-2018,02:05PM
(Thispostwaslastmodified:Aug-28-2018,02:05PMbyWinfried.)
Hello,
Ineedtoappendastringtoatextfilethat'sencodedinUTF-8.
Itappearsthat,bydefault,Python3triestowriteinANSI(Latin-1,ISO8859-1,cp1252,orwhateveristhecorrectname).Asaresult,Iendupwithafilethatcannotbecorrectlydisplayed,sinceitusestwoencodingmethodsinthesamefile.
(InANSI,"è"isindeed0xE8).
Itriedthefollowingbutitdoesn'twork:
file=open("test.latin1.utf8.txt","w")
file.write("Crème")
stringutf8="Crème".encode('utf-8')
print(stringutf8)
#BADError:TypeError:write()argumentmustbestr,notbytes
file.write(stringutf8)
file.close()Anyideahowtodothis?
Thankyou.
---
Edit:Pythonwon'tletmeopenthefileinUTF8sinceitdetectsanANSIcharacter("É"=0xc9)wronglyaddedbyanotherscript;Anditwon'tletmereplacethatfaultystringeither:
#Error:UnicodeDecodeError:'utf-8'codeccan'tdecodebyte0xc9inposition327:invalidcontinuationbyte
f=codecs.open(inputfile,"r","utf-8")
content=f.read()
f.close()
…
filename="Crème"
#Error:AttributeError:'str'objecthasnoattribute'decode'
filename=filename.decode('utf-8')
Reply
Find
Reply
snippsat
Posts:6,390
Threads:115
Joined:Sep2016
Reputation:
481
#2
Aug-28-2018,02:27PM
(Thispostwaslastmodified:Aug-28-2018,02:27PMbysnippsat.)
Python3hasfullUnicodesupportandhasdefaultencodingasUTf-8.
AlwaysforfileoutandinuseUTF-8,
donotencodedecodeanythingifnotnecessaryasfortakingtext(Unicodeasdefault)outandinfromPython3.
s='CrèmeandSpicyjalapeño☂'
withopen('unicode.txt','w',encoding='utf-8')asf_out:
f_out.write(s)Outputondisk.
Output:CrèmeandSpicyjalapeño☂Readin:
withopen('unicode.txt',encoding='utf-8')asf:
print(f.read())Output:CrèmeandSpicyjalapeño☂
Reply
Find
Reply
Winfried
Lumberjack
Posts:133
Threads:57
Joined:Aug2018
Reputation:
0
#3
Aug-28-2018,04:04PM
(Thispostwaslastmodified:Aug-28-2018,04:04PMbyWinfried.)
Thankyou.
Thecodeworksfineas-is,butforsomereason,itmessesuptheoriginalfilewhenIusethiscode:
withopen("input.gpx",'r')asf:
content=f.read()
content+='CrèmeandSpicyjalapeño☂'
withopen(output.gpx",'w',encoding='utf-8')asf_out:
f_out.write(content)
Ifyou'dliketogiveitaquickshot:https://we.tl/t-neH7vye8wd
Reply
Find
Reply
snippsat
Posts:6,390
Threads:115
Joined:Sep2016
Reputation:
481
#4
Aug-28-2018,05:31PM
(Thispostwaslastmodified:Aug-28-2018,05:31PMbysnippsat.)
Youdon'thaveencoding='utf-8'whenyoureadthefileasishow.
Testwithyourfile.
withopen('input.gpx')asf:
print(f.read())Output:
Départentre7et8h
Fix:
withopen('input.gpx',encoding='utf-8')asf:
print(f.read())Output:
Départentre7et8h
Asit'sa.xmlfile,BSparsertest.
frombs4importBeautifulSoup
soup=BeautifulSoup(open('input.gpx',encoding='utf-8'),'xml')
print(soup.find('desc').text)Output:Départentre7et8h
Reply
Find
Reply
Gribouillis
Posts:3,758
Threads:54
Joined:Jan2018
Reputation:
304
#5
Aug-28-2018,07:59PM
@WinfriedMakesureyouareusingpython3.Oneofthemajorachievementofpython3overpython2isthecorrecthandlingofunicodedata.
Reply
Find
Reply
Winfried
Lumberjack
Posts:133
Threads:57
Joined:Aug2018
Reputation:
0
#6
Aug-29-2018,05:22AM
(Thispostwaslastmodified:Aug-29-2018,05:22AMbyWinfried.)
Thanksmuchforthetiponopen(…,encoding='utf-8')!
I'vealsolearnedthatUltraEdit(famousWindowseditor)encodesfilesinLatin1whilePyScripter(IDE)usesUTF-8,sothelatterisamuchbetteralternativewhenworkingwithaccentedstrings.
stuff="Crème"
withopen("cp1252.txt",'w')asoutFile:
outFile.write(stuff)
withopen("utf8.txt",mode='w',encoding='utf-8')asoutFile:
outFile.write(stuff)Usingopen()withoutanyadditionaloptionmeantthatIendedupwithamixofLatin1andUTF-8,whichpreventedmefromusingGpsBabeltomergeGPXfiles.
IamusingPython3(.7.0).
Thanksagain.
--
Edit:UltraEditseemstohavebeenrewrittentouseUnicodeinstead.
Reply
Find
Reply
buran
Posts:7,802
Threads:144
Joined:Sep2016
Reputation:
567
#7
Aug-29-2018,05:41AM
(Thispostwaslastmodified:Aug-29-2018,05:41AMbyburan.)
(Aug-29-2018,05:22AM)WinfriedWrote:UltraEdit(famousWindowseditor)neverheardofit,somaybenotthatfamous,butanywaylookathttp://forums.ultraedit.com/set-default-...17446.html
EDIT:IseeyouaddedthatthereisnewversionwithdefaultUTFsupport
Ifyoucan'texplainittoasixyearold,youdon'tunderstandityourself,AlbertEinstein
HowtoAskQuestionsTheSmartWay:linkandanotherlink
CreateMCVexample
Debugsmallprograms
Reply
Find
Reply
Winfried
Lumberjack
Posts:133
Threads:57
Joined:Aug2018
Reputation:
0
#8
Aug-29-2018,07:06AM
It'saWindowseditorthat'sbeenaroundforabout25years.
Ihaveyetanotherencodingissue,thistimewithgeojson:-/
IfIuseitsdump(),UTF8dataisturnedintoUTF16(apparently),eg."Fran\u00e7ois"insteadof"François":
withopen('input.geojson',encoding='utf-8')asf:
gj=geojson.load(f)
fortrackingj['features']:
#NODIFFwithopen(track['properties']['name'][0]+'.geojson','a+',encoding='utf-8')asf:
withopen(track['properties']['name'][0]+'.geojson','a+')asf:
dump(track,f,indent=2)
#UnicodeEncodeError:'charmap'codeccan'tencodecharacter'\u2194'inposition7:charactermapsto
#dump(track,f,indent=2,ensure_ascii=False)
#NOTDEFINED
#dumps(track,f,indent=2)
#AttributeError:encode
#dump(track.encode("utf-8"),f,indent=2,ensure_ascii=False)Asshown,Ifoundandtriedseveralthings,alltonoavail.
ShouldIuseanothermethodthan"dump"?
Reply
Find
Reply
Winfried
Lumberjack
Posts:133
Threads:57
Joined:Aug2018
Reputation:
0
#9
Aug-29-2018,06:10PM
Ifthere'snowayaroundit,Icanlivewithaccentsbeinghuman-unreadable,butIwonderwhyJSONturnsthemintoeg\u00e7.
BTW,Ilearnedthatyoucan'tsimplydumptracksintoafilelikeIdidabove:Tohaveacleanfile,youmustfirstbuildalist,andacollection,anddumpthecollection:
withopen(INPUTFILE,encoding='utf-8')asf:
gj=geojson.load(f)
features=[]
fortrackingj['features']:
features.append(track)
feature_collection=FeatureCollection(features)
withopen('myfile.geojson','w')asf:
dump(feature_collection,f,indent=2)
Reply
Find
Reply
snippsat
Posts:6,390
Threads:115
Joined:Sep2016
Reputation:
481
#10
Aug-29-2018,06:22PM
(Aug-29-2018,07:06AM)WinfriedWrote:ShouldIuseanothermethodthan"dump"?Notsurebecausedon'tknowwhereigoeswrong,don'nothavefileyouuse.
Youshouldnotusefinbothreadanddump.
HereaexamplewhereiuseageojsonfileandputinFrançois.
importjson
frompprintimportpprint
withopen('map.geojson',encoding='utf-8')asf:
data=json.load(f)
pprint(data)Output:{'features':[{'geometry':{'coordinates':[[8.9208984375,61.05828537037916],
[9.84375,61.4597705702975],
[10.7666015625,
60.930432202923335]],
'type':'François'},
'properties':{},
'type':'Feature'}],
'type':'FeatureCollection'}Dumpdata.
importjson
withopen("data_file.json","w",encoding='utf-8')asf_out:
json.dump(data,f_out,ensure_ascii=FalseContentofdata_file.json.
Output:{
"type":"FeatureCollection",
"features":[
{
"type":"Feature",
"properties":{},
"geometry":{
"type":"François",
"coordinates":[
[
8.9208984375,
61.05828537037916
],
[
9.84375,
61.4597705702975
],
[
10.7666015625,
60.930432202923335
]
]
}
}
]
}
Reply
Find
Reply
1
2
Usersbrowsingthisthread:1Guest(s)
ViewaPrintableVersion
ForumJump:
PrivateMessages
UserControlPanel
Who'sOnline
Search
ForumHome
PythonCoding
--GeneralCodingHelp
--DataScience
--Homework
--GUI
--GameDevelopment
--Networking
--WebScraping&WebDevelopment
General
--NewsandDiscussions
--Tutorials
----TutorialRequestsandSubmissions
----PythonInstallationandExecution
----Fundamentals
----Commonpitfallsandwhattodo
----WebScraping
----WebTutorials
----GUItutorials
----GameTutorials
----NetworkingTutorials
--Codesharing
--CodeReview
--Jobs
Forum&OffTopic
--Board
----WeeklyTopPicks
--Bar
UserPanel
Messages
LogOut
MyProfile
Payyourprofileavisit
UserControlPanel
Dosomechangesonyourprofile
MyMessages
Viewprivatemessages
unread
Avatar
Changeavatar
Signature
Changesignature
Announcements
Announcement#1
8/1/2020
Announcement#2
8/2/2020
Announcement#3
8/6/2020
LogintoPythonForum
Enteryourdetailstologintoyouraccount:
Rememberme
LostPassword?
Login
Don'thaveanaccountyet?
SignUp!
LinearModeThreadedMode