Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield
文章推薦指數: 80 %
According to the Python documentation on reading and writing Unicode data: Some encodings, such as UTF-16, expect a BOM to be present at the start of a file; ... I’vebeenusingPythonscriptstoautomaticallyeditandoutputWindowsResourcefiles(.rc)forC++projectsinVisualStudio2013.WhenhandlingUnicode,WindowsandVisualStudioalwayswantlittleendianUTF-16encoding,andtheresourcefileshouldalwaysstartwiththeUnicodeBOM(ByteOrderMark).However,despitethepromisesinthedocumentation,IfoundthatPythonwasn’toutputtingtheBOMautomatically. Inearlyresortedtooutputtingitmanually,butasisoftenthecasewithPython,thecorrectapproachissimplerthanitseems. WhatistheUnicodeBOM? TheUnicodeBOM(ByteOrderMark)isacharacterwhichcanoccuratthestartofaUnicodetextfiletoindicatewhatendiannessthedataisstoredin.It’sveryhelpfulforportabilityasitmeansprogramsondifferentsystemscanautomaticallydetecttheencoding.Thisallowsthemtodisplay,edit,andstorethetextappropriately,leavingnoroomforambiguity. Endiannessisonlyrelevantforencodingswhichusemorethanonebytepercodeunit,suchasUTF-16andUTF-32.ThereisalsoaBOMforUTF-8.However,it’sonlyusedtoidentifythefileasbeingUTF-8,asopposedtoASCIIorsomeotherencoding.Byteorderisirrelevantinthatcase,andtheUTF-8BOMisactivelydiscouraged. PythonandBOM AccordingtothePythondocumentationonreadingandwritingUnicodedata: Someencodings,suchasUTF-16,expectaBOMtobepresentatthestartofafile;whensuchanencodingisused,theBOMwillbeautomaticallywrittenasthefirstcharacterandwillbesilentlydroppedwhenthefileisread. Fromthis,itsoundslikeanyUTF-16orUTF-32encodingwillautomaticallytakeoftheBOM.However,tryrunningthefollowingcodeinaPython3script: withopen("output.txt",mode="w",encoding="utf-16-le")asf: f.write("HelloWorld.") Opentheoutputfileinaneditorwhichreportstheencoding,suchastheexcellentNotepad++onWindows.You’llseeUTF-16(orUCS-2)LittleEndian,butitwillsaythereisnoBOM. WheredidtheBOMgo? Toanswerthat,lookattheencodingargumentinthecodesnippetabove.It’ssettoutf-16-le,whichexplicitlyindicatesLittleEndianencoding.Itturnsoutthatifyouexplicitlyspecifyendianness,Pythonassumesyoudon’tneedaBOM.Thisisactuallymentionedintheofficialdocumentation,butnotparticularlyclearly. Instead,changetheencodingtoutf-16.ThisletsPythonusetheOperatingSystem’sendianness,anditassumesthataBOMisthereforenecessary.Here’sthemodifiedcodesnippet: withopen("output.txt",mode="w",encoding="utf-16")asf: f.write("HelloWorld.") Onceagain,runthatasaPython3scriptandthenopentheoutputfile.Assumingyou’rerunningonalittleendiansystem(whichshouldapplytoanythingrunningaWindowsOS),theencodingshouldshowupasUTF-16(orUCS-2)LittleEndianwithBOM. AtutorialexplaininghowtoprogramtheATOMMatrixwithArduinotoactlikea6-sideddice. Step-by-stepinstructionsforusingtheArduinoIDEonWindowstouploadprogramstotheATOMMatrixandATOMLite. Whenwritingunittests,remembertomakethemBRIEF:Brief,Reliable,Independent,Explicit,Focused.
延伸文章資訊
- 1Python: 關於Unicode 的BOM - 傑克! 真是太神奇了! - 痞客邦
至於UTF-8 編碼: 是將Unicode 編碼的字串資料轉成8 位元序列(轉換規則如下表: UTF-8 ... 寫檔時, 要依據需求自己先寫入一個BOM ( write('\ufeff') ).
- 2Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield
According to the Python documentation on reading and writing Unicode data: Some encodings, such a...
- 3Python Utf8 Bom? Best 5 Answer - Barkmanoil.com
Why Python 3 doesn't write the Unicode BOM – Peter Bloomfield. What is UTF-16 Le BOM? What is BOM...
- 4utf-16le[BOM] to utf-8 file solution - GitHub
http://stackoverflow.com/questions/22459020/python-decode-utf-16-file-with-bom. import codecs. en...
- 5Convert UTF-8 with BOM to UTF-8 with no BOM in Python