Write to UTF-8 file in Python - Stack Overflow
文章推薦指數: 80 %
Python reading from a file and saving to utf-8 - Stack Overflow Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams WritetoUTF-8fileinPython AskQuestion Asked 13years,4monthsago Modified 6monthsago Viewed 421ktimes 233 I'mreallyconfusedwiththecodecs.openfunction.WhenIdo: file=codecs.open("temp","w","utf-8") file.write(codecs.BOM_UTF8) file.close() Itgivesmetheerror UnicodeDecodeError:'ascii'codeccan'tdecodebyte0xefinposition 0:ordinalnotinrange(128) IfIdo: file=open("temp","w") file.write(codecs.BOM_UTF8) file.close() Itworksfine. Questioniswhydoesthefirstmethodfail?AndhowdoIinsertthebom? Ifthesecondmethodisthecorrectwayofdoingit,whatthepointofusingcodecs.open(filename,"w","utf-8")? pythonutf-8character-encodingbyte-order-mark Share Improvethisquestion Follow editedSep2,2020at18:58 dreftymac 30.3k2626goldbadges115115silverbadges178178bronzebadges askedJun1,2009at9:42 JohnJiangJohnJiang 10.6k1111goldbadges5050silverbadges6060bronzebadges 3 60 Don’tuseaBOMinUTF-8.Please. – tchrist Feb9,2012at11:12 10 @tchristHuh?Whynot? – SalmanvonAbbas Jun1,2013at5:16 11 @SalmanPKBOMisnotneededinUTF-8andonlyaddscomplexity(e.g.youcan'tjustconcatenateBOM'dfilesandresultwithvalidtext).SeethisQ&A;don'tmissthebigcommentunderQ – AloisMahdal Aug29,2013at14:18 Addacomment | 7Answers 7 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 305 Ibelievetheproblemisthatcodecs.BOM_UTF8isabytestring,notaUnicodestring.Isuspectthefilehandleristryingtoguesswhatyoureallymeanbasedon"I'mmeanttobewritingUnicodeasUTF-8-encodedtext,butyou'vegivenmeabytestring!" TrywritingtheUnicodestringforthebyteordermark(i.e.UnicodeU+FEFF)directly,sothatthefilejustencodesthatasUTF-8: importcodecs file=codecs.open("lol","w","utf-8") file.write(u'\ufeff') file.close() (Thatseemstogivetherightanswer-afilewithbytesEFBBBF.) EDIT:S.Lott'ssuggestionofusing"utf-8-sig"astheencodingisabetteronethanexplicitlywritingtheBOMyourself,butI'llleavethisanswerhereasitexplainswhatwasgoingwrongbefore. Share Improvethisanswer Follow editedJun17,2017at19:24 Zanon 27.1k2020goldbadges110110silverbadges122122bronzebadges answeredJun1,2009at9:46 JonSkeetJonSkeet 1.4m836836goldbadges89868986silverbadges90949094bronzebadges 5 Warning:openandopenisnotthesame.Ifyoudo"fromcodecsimportopen",itwillNOTbethesameasyouwouldsimplytype"open". – Apache Aug20,2013at13:19 2 youcanalsousecodecs.open('test.txt','w','utf-8-sig')instead – beta-closed Aug24,2016at15:04 1 I'mgetting"TypeError:anintegerisrequired(gottypestr)".Idon'tunderstandwhatwe'redoinghere.Cansomeonepleasehelp?Ineedtoappendastring(paragraph)toatextfile.DoIneedtoconvertthatintoanintegerfirstbeforewriting? – Mugen Apr2,2018at12:40 @Mugen:TheexactcodeI'vewrittenworksfineasfarasIcansee.Isuggestyouaskanewquestionshowingexactlywhatcodeyou'vegot,andwheretheerroroccurs. – JonSkeet Apr2,2018at13:23 @Mugenyouneedtocallcodecs.openinsteadofjustopen – northben May15,2018at12:48 Addacomment | 197 Readthefollowing:http://docs.python.org/library/codecs.html#module-encodings.utf_8_sig Dothis withcodecs.open("test_output","w","utf-8-sig")astemp: temp.write("himom\n") temp.write(u"Thishas♭") TheresultingfileisUTF-8withtheexpectedBOM. Share Improvethisanswer Follow editedMay14,2013at2:31 EricOLebigot 87.9k4747goldbadges213213silverbadges257257bronzebadges answeredJun1,2009at9:58 S.LottS.Lott 377k7878goldbadges503503silverbadges771771bronzebadges 4 2 Thanks.Thatworked(Windows7x64,Python2.7.5x64).Thissolutionworkswellwhenyouopenthefileinmode"a"(append). – MohamadFakih Aug23,2013at7:54 Thisdidn'tworkforme,Python3onWindows.Ihadtodothisinsteadwithopen(file_name,'wb')asbomfile:bomfile.write(codecs.BOM_UTF8)thenre-openthefileforappend. – DustinAndrews Nov17,2017at19:11 Maybeaddtemp.close()? – user2905353 Jan4,2020at2:08 2 @user2905353:notrequired;thisishandledbycontextmanagementofopen. – matheburg Mar28,2020at15:42 Addacomment | 38 Itisverysimplejustusethis.Notanylibraryneeded. withopen('text.txt','w',encoding='utf-8')asf: f.write(text) Share Improvethisanswer Follow answeredAug12,2021at11:17 KamranGasimovKamranGasimov 1,10111goldbadge1212silverbadges1111bronzebadges Addacomment | 12 @S-Lottgivestherightprocedure,butexpandingontheUnicodeissues,thePythoninterpretercanprovidemoreinsights. JonSkeetisright(unusual)aboutthecodecsmodule-itcontainsbytestrings: >>>importcodecs >>>codecs.BOM '\xff\xfe' >>>codecs.BOM_UTF8 '\xef\xbb\xbf' >>> Pickinganothernit,theBOMhasastandardUnicodename,anditcanbeenteredas: >>>bom=u"\N{ZEROWIDTHNO-BREAKSPACE}" >>>bom u'\ufeff' Itisalsoaccessibleviaunicodedata: >>>importunicodedata >>>unicodedata.lookup('ZEROWIDTHNO-BREAKSPACE') u'\ufeff' >>> Share Improvethisanswer Follow editedJun1,2009at17:10 tzot 89.3k2929goldbadges137137silverbadges201201bronzebadges answeredJun1,2009at10:12 gimelgimel 80.3k1010goldbadges7474silverbadges104104bronzebadges 0 Addacomment | 10 Iusethefile*nixcommandtoconvertaunknowncharsetfileinautf-8file #-*-encoding:utf-8-*- #convertingaunknownformattingfileinutf-8 importcodecs importcommands file_location="jumper.sub" file_encoding=commands.getoutput('file-b--mime-encoding%s'%file_location) file_stream=codecs.open(file_location,'r',file_encoding) file_output=codecs.open(file_location+"b",'w','utf-8') forlinfile_stream: file_output.write(l) file_stream.close() file_output.close() Share Improvethisanswer Follow answeredFeb8,2012at20:35 RicardoRicardo 59888silverbadges1111bronzebadges 2 1 Use#coding:utf8insteadof#-*-coding:utf-8-*-whichisfareasiertoremember. – show0k Apr10,2017at13:36 Iamreallyinterestedinseingsomethinglikethatworkingonwindows – paradox Jun5,2021at14:38 Addacomment | 0 python3.4>=usingpathlib: importpathlib pathlib.Path("text.txt").write_text(text,encoding='utf-8')#orutf-8-sigforBOM Share Improvethisanswer Follow answeredApr8at20:52 celsowmcelsowm 65288goldbadges3030silverbadges5656bronzebadges Addacomment | -2 IfyouareusingPandasI/Omethodslikepandas.to_excel(),addanencodingparameter,e.g. pd.to_excel("somefile.xlsx",sheet_name="export",encoding='utf-8') ThisworksformostinternationalcharactersIbelieve. Share Improvethisanswer Follow answeredDec8,2021at12:04 RogerZRogerZ 1 Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedpythonutf-8character-encodingbyte-order-markoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 3 UTF8PythonBOM 0 Python3.4unicodecharacterdisplayedcorrectlyonconsolebutnointextfile 974 What'sthedifferencebetweenUTF-8andUTF-8withBOM? 4 PythonWritingWeirdUnicodetoCSV 6 pythonwriteunicodetofileeasily? 1 Unexpectedbehaviouroft.unicode('utf-8')-Python 1 Removingspecialcharacters(¡)fromastring 1 self.writer.writerow([s.encode('utf-8')forsinrow]) 0 SPSS-PythonwritetoCSV-wrongencodingwhenopeninginExcel 0 CSVModule-writeGermantoCSVinpython Seemorelinkedquestions Related 6784 HowdoIcheckwhetherafileexistswithoutexceptions? 6975 WhataremetaclassesinPython? 7492 DoesPythonhaveaternaryconditionaloperator? 368 SettingthecorrectencodingwhenpipingstdoutinPython 974 What'sthedifferencebetweenUTF-8andUTF-8withBOM? 247 WritingUnicodetexttoatextfile? 3063 HowdoIdeleteafileorfolderinPython? 112 UnicodeDecodeError:'ascii'codeccan'tdecodebyte0xefinposition1 334 "forlinein..."resultsinUnicodeDecodeError:'utf-8'codeccan'tdecodebyte HotNetworkQuestions Awordfor"amessagetomyself" CPLEXstuckinsolvemethod-dualsimplexsolvedmodel Howtotellifmybikehasanaluminumframe Whyare"eat"and"drink"differentwordsinlanguages? Howtoplug2.5mm²strandedwiresintoapushwirewago? DoestheDemocraticPartyofficiallysupportrepealingtheSecondAmendment? Whataretheargumentsforrevengeandretribution? Howdocucumbershappen?Whatdoes"verypoorlypollinatedcucumber"meanexactly?Howcanpollinationbe"uneven"? IfthedrowshadowbladeusesShadowSwordasarangedattack,doesitthrowasword(thatitthenhastoretrievebeforeusingitagain)? Unsurewhatthesewatersoftenerdialsarefor Whatisthebestwaytocalculatetruepasswordentropyforhumancreatedpasswords? IsdocumentingabigprojectwithUMLDiagramsneeded,goodtohaveorevennotpossible? LeavingaTTjobthenre-enteringacademia:Areaofbusinessandmanagement InD&D3.5,canafamiliarbetemporarilydismissed? MLmodellingwheretheoutputaffectstheDGP Doyoupayforthebreakfastinadvance? WhydidGodprohibitwearingofgarmentsofdifferentmaterialsinLeviticus19:19? Howdoparty-listsystemsaccommodateindependentcandidates? 2016PutnamB6difficultsummationproblem Myfavoriteanimalisa-singularandpluralform Isitcorrecttochangetheverbto"being"in"Despitenoonewashurtinthisincident…"? Traditionally,andcurrently,whatstopshumanvotecountersfromalteringballotstomakethem'Spoilt/Invalidvotes? ShouldIusepwdortildeplus(~+)? Interpretinganegativeself-evaluationofahighperformer morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-py Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Python 3 Tutorial 第二堂(1)Unicode 支援、基本I/O
這是因為Python 3.x 中, python 直譯器預期的.py 編碼,預設是UTF-8,而 ... 類似地,一個寫入檔案的程式範例如下, write 方法會將文字的位元組序列寫入至檔案中:
- 2Right way to write string into UTF-8 file? - Python Forum
Python 3 has full Unicode support and has default encoding as UTf-8. Always for file out and in u...
- 3PYTHON : How to write UTF-8 in a CSV file - YouTube
- 4Python 的Big5 與UTF-8 檔案編碼轉換程式教學 - Office 指南
介紹如何使用簡單的Python 程式處理Big5 與UTF-8 檔案的編碼轉換問題。 ... 檔案 content = inFile.read() # 以UTF-8 編碼寫入檔案 outFile...
- 5Day27 Python 基礎- 字符轉編碼操作 - iT 邦幫忙
UTF-8 是一種針對Unicode的可變長度字元編碼,英文字符一樣會依照ASCII碼規範,只占一個字節8bit,而中文字符的話,統一就占三個字節. 回顧可以參考字符編碼.