Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield

2025-01-10

文章推薦指數： 80 %

投票人數：10人

According to the Python documentation on reading and writing Unicode data: Some encodings, such as UTF-16, expect a BOM to be present at the start of a file; ... I’vebeenusingPythonscriptstoautomaticallyeditandoutputWindowsResourcefiles(.rc)forC++projectsinVisualStudio2013.WhenhandlingUnicode,WindowsandVisualStudioalwayswantlittleendianUTF-16encoding,andtheresourcefileshouldalwaysstartwiththeUnicodeBOM(ByteOrderMark).However,despitethepromisesinthedocumentation,IfoundthatPythonwasn’toutputtingtheBOMautomatically. Inearlyresortedtooutputtingitmanually,butasisoftenthecasewithPython,thecorrectapproachissimplerthanitseems. WhatistheUnicodeBOM? TheUnicodeBOM(ByteOrderMark)isacharacterwhichcanoccuratthestartofaUnicodetextfiletoindicatewhatendiannessthedataisstoredin.It’sveryhelpfulforportabilityasitmeansprogramsondifferentsystemscanautomaticallydetecttheencoding.Thisallowsthemtodisplay,edit,andstorethetextappropriately,leavingnoroomforambiguity. Endiannessisonlyrelevantforencodingswhichusemorethanonebytepercodeunit,suchasUTF-16andUTF-32.ThereisalsoaBOMforUTF-8.However,it’sonlyusedtoidentifythefileasbeingUTF-8,asopposedtoASCIIorsomeotherencoding.Byteorderisirrelevantinthatcase,andtheUTF-8BOMisactivelydiscouraged. PythonandBOM AccordingtothePythondocumentationonreadingandwritingUnicodedata: Someencodings,suchasUTF-16,expectaBOMtobepresentatthestartofafile;whensuchanencodingisused,theBOMwillbeautomaticallywrittenasthefirstcharacterandwillbesilentlydroppedwhenthefileisread. Fromthis,itsoundslikeanyUTF-16orUTF-32encodingwillautomaticallytakeoftheBOM.However,tryrunningthefollowingcodeinaPython3script: withopen("output.txt",mode="w",encoding="utf-16-le")asf: f.write("HelloWorld.") Opentheoutputfileinaneditorwhichreportstheencoding,suchastheexcellentNotepad++onWindows.You’llseeUTF-16(orUCS-2)LittleEndian,butitwillsaythereisnoBOM. WheredidtheBOMgo? Toanswerthat,lookattheencodingargumentinthecodesnippetabove.It’ssettoutf-16-le,whichexplicitlyindicatesLittleEndianencoding.Itturnsoutthatifyouexplicitlyspecifyendianness,Pythonassumesyoudon’tneedaBOM.Thisisactuallymentionedintheofficialdocumentation,butnotparticularlyclearly. Instead,changetheencodingtoutf-16.ThisletsPythonusetheOperatingSystem’sendianness,anditassumesthataBOMisthereforenecessary.Here’sthemodifiedcodesnippet: withopen("output.txt",mode="w",encoding="utf-16")asf: f.write("HelloWorld.") Onceagain,runthatasaPython3scriptandthenopentheoutputfile.Assumingyou’rerunningonalittleendiansystem(whichshouldapplytoanythingrunningaWindowsOS),theencodingshouldshowupasUTF-16(orUCS-2)LittleEndianwithBOM. AtutorialexplaininghowtoprogramtheATOMMatrixwithArduinotoactlikea6-sideddice. Step-by-stepinstructionsforusingtheArduinoIDEonWindowstouploadprogramstotheATOMMatrixandATOMLite. Whenwritingunittests,remembertomakethemBRIEF:Brief,Reliable,Independent,Explicit,Focused.

請為這篇文章評分？

延伸文章資訊

Python: 關於Unicode 的BOM - 傑克! 真是太神奇了! - 痞客邦

至於UTF-8 編碼: 是將Unicode 編碼的字串資料轉成8 位元序列(轉換規則如下表: UTF-8 ... 寫檔時, 要依據需求自己先寫入一個BOM ( write('\ufeff') ).

Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield

According to the Python documentation on reading and writing Unicode data: Some encodings, such a...

Python Utf8 Bom? Best 5 Answer - Barkmanoil.com

Why Python 3 doesn't write the Unicode BOM – Peter Bloomfield. What is UTF-16 Le BOM? What is BOM...

utf-16le[BOM] to utf-8 file solution - GitHub

http://stackoverflow.com/questions/22459020/python-decode-utf-16-file-with-bom. import codecs. en...

Convert UTF-8 with BOM to UTF-8 with no BOM in Python

Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用