Python - Decode UTF-16 file with BOM - Stack Overflow
文章推薦指數: 80 %
I'd like to flip this file in to UTF-8 without BOM so I can parse it using Python. The usual code that I use didn't do the trick, it returned ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams Python-DecodeUTF-16filewithBOM AskQuestion Asked 8years,7monthsago Modified 1year,5monthsago Viewed 23ktimes 18 IhaveaUTF-16LEfilewithBOM.I'dliketoflipthisfileintoUTF-8withoutBOMsoIcanparseitusingPython. TheusualcodethatIusedidn'tdothetrick,itreturnedunknowncharactersinsteadoftheactualfilecontents. f=open('dbo.chrRaces.Table.sql').read() f=str(f).decode('utf-16le',errors='ignore').encode('utf8') printf WhatwouldbetheproperwaytodecodethisfilesoIcanparsethroughitwithf.readlines()? pythonfileencodingutf-8utf-16 Share Improvethisquestion Follow editedJan22,2015at14:35 loopbackbee 20.8k99goldbadges5959silverbadges9292bronzebadges askedMar17,2014at15:52 DustinDustin 6,0661919goldbadges5656silverbadges9191bronzebadges 1 1 IfthisisonWindows,tryopeningthefileinbinarymodeandseeifthathelps. – MarkRansom Mar17,2014at16:00 Addacomment | 2Answers 2 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 20 Firstly,youshouldreadinbinarymode,otherwisethingswillgetconfusing. Then,checkforandremovetheBOM,sinceitispartofthefile,butnotpartoftheactualtext. importcodecs encoded_text=open('dbo.chrRaces.Table.sql','rb').read()#youshouldreadinbinarymodetogettheBOMcorrectly bom=codecs.BOM_UTF16_LE#printdir(codecs)forotherencodings assertencoded_text.startswith(bom)#makesuretheencodingiswhatyouexpect,otherwiseyou'llgetwrongdata encoded_text=encoded_text[len(bom):]#stripawaytheBOM decoded_text=encoded_text.decode('utf-16le')#decodetounicode Don'tencode(toutf-8orotherwise)untilyou'redonewithallparsing/processing.Youshoulddoallthatusingunicodestrings. Also,errors='ignore'ondecodemaybeabadidea.Considerwhat'sworse:havingyourprogramtellyousomethingiswrongandstop,orreturningwrongdata? Share Improvethisanswer Follow editedApr15,2021at23:20 answeredMar17,2014at16:00 loopbackbeeloopbackbee 20.8k99goldbadges5959silverbadges9292bronzebadges 0 Addacomment | 10 ThisworksinPython3: f=open('test_utf16.txt',mode='r',encoding='utf-16').read() print(f) Share Improvethisanswer Follow editedOct15,2020at3:05 mgrandi 3,36911goldbadge1919silverbadges1717bronzebadges answeredMay25,2020at14:35 AlekzanderAlekzander 77633goldbadges1111silverbadges1212bronzebadges 2 4 Ifyousetencodingonlytoutf-16,youdon'thavetoeliminateBOMmanually. – 026 Jun13,2020at13:57 2 editedtomakeitjustbeutf-16,itdoesn'tseemtobedocumented,buttheencodingofutf-16doesseemtoautomaticallyhandletheBOM.ifyouuseutf-16le,itstillworks,buttheBOMisstillthere,whichyoucanremoveyourselfbyusingstringfunctionsandcodecs.BOM_UTF16_BE – mgrandi Oct15,2020at3:07 Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedpythonfileencodingutf-8utf-16oraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 1 Anywaytogetcorrectconversionforunicodetextformatdatatocsvinpython? 0 ConvertfromUTF16LEtoANSIinPython 1 Can'treadlogfilebutcanreadaftercopypastetonotepad Related 6784 HowdoIcheckwhetherafileexistswithoutexceptions? 6975 WhataremetaclassesinPython? 7492 DoesPythonhaveaternaryconditionaloperator? 6080 HowdoIincludeaJavaScriptfileinanotherJavaScriptfile? 974 What'sthedifferencebetweenUTF-8andUTF-8withBOM? 2573 HowtoupgradeallPythonpackageswithpip? 3588 DoesPythonhaveastring'contains'substringmethod? 1944 HowdoIappendtoafile? 3063 HowdoIdeleteafileorfolderinPython? HotNetworkQuestions 2016PutnamB6difficultsummationproblem Flatkeyboardwithoutanyphysicalkeys IsdocumentingabigprojectwithUMLDiagramsneeded,goodtohaveorevennotpossible? Wordsforrestaurant ArethereanyspellsotherthanWishthatcanlocateanobjectthroughleadshielding? Realitycheck:PolarCO2lakescoexistingwithanequatorialH2Oocean DoestheDemocraticPartyofficiallysupportrepealingtheSecondAmendment? Canyoufindit? Interpretinganegativeself-evaluationofahighperformer Theunusualphrasing"verb+the+comparativeadjective"intheLordoftheRingsnovels LaTeX2(e)vsLaTeX3 WhatistheAmericanequivalentof"Icalledmymomtoaskafterher"? Single-rowSettingstable:prosandconsofJoinsvsscalarsubqueries What'sthedifferencebetween'Dynamic','Random',and'Procedural'generations? Unsurewhatthesewatersoftenerdialsarefor Whatdoyoucallastatementthatisgivenasanexplanationwhysomeonehaswonanaward? ConvertanintegertoIEEE754float Howtodestroydatapermanentlyinaworldwheretimetraveliseasilydone? SomeoneofferedtaxdeductibledonationasapaymentmethodforsomethingIamselling.AmIgettingscammed? Canananimalfilealawsuitonitsownbehalf? WhytheneedforaScienceOfficeronacargovessel? Howdocucumbershappen?Whatdoes"verypoorlypollinatedcucumber"meanexactly?Howcanpollinationbe"uneven"? Traditionally,andcurrently,whatstopshumanvotecountersfromalteringballotstomakethem'Spoilt/Invalidvotes? HowdoIresolverecentearthworksaroundmyfuturefenceline? morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-py Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Unicode & Character Encodings in Python: A Painless Guide
Unicode vs UTF-8; Encoding and Decoding in Python 3; Python 3: All-In on Unicode; One Byte, Two B...
- 2UTF-16 - IBM
- 3Decode UTF-8 in Python | Delft Stack
- 4Python 3.0 automatic decoding of UTF16 - Google Groups
Hello group,. I'm having trouble reading a utf-16 encoded file with Python3.0. This is my (comple...
- 5Python String encode() - Programiz