Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield
文章推薦指數: 80 %
According to the Python documentation on reading and writing Unicode data: Some encodings, such as UTF-16, expect a BOM to be present at the start of a file; ... I’vebeenusingPythonscriptstoautomaticallyeditandoutputWindowsResourcefiles(.rc)forC++projectsinVisualStudio2013.WhenhandlingUnicode,WindowsandVisualStudioalwayswantlittleendianUTF-16encoding,andtheresourcefileshouldalwaysstartwiththeUnicodeBOM(ByteOrderMark).However,despitethepromisesinthedocumentation,IfoundthatPythonwasn’toutputtingtheBOMautomatically. Inearlyresortedtooutputtingitmanually,butasisoftenthecasewithPython,thecorrectapproachissimplerthanitseems. WhatistheUnicodeBOM? TheUnicodeBOM(ByteOrderMark)isacharacterwhichcanoccuratthestartofaUnicodetextfiletoindicatewhatendiannessthedataisstoredin.It’sveryhelpfulforportabilityasitmeansprogramsondifferentsystemscanautomaticallydetecttheencoding.Thisallowsthemtodisplay,edit,andstorethetextappropriately,leavingnoroomforambiguity. Endiannessisonlyrelevantforencodingswhichusemorethanonebytepercodeunit,suchasUTF-16andUTF-32.ThereisalsoaBOMforUTF-8.However,it’sonlyusedtoidentifythefileasbeingUTF-8,asopposedtoASCIIorsomeotherencoding.Byteorderisirrelevantinthatcase,andtheUTF-8BOMisactivelydiscouraged. PythonandBOM AccordingtothePythondocumentationonreadingandwritingUnicodedata: Someencodings,suchasUTF-16,expectaBOMtobepresentatthestartofafile;whensuchanencodingisused,theBOMwillbeautomaticallywrittenasthefirstcharacterandwillbesilentlydroppedwhenthefileisread. Fromthis,itsoundslikeanyUTF-16orUTF-32encodingwillautomaticallytakeoftheBOM.However,tryrunningthefollowingcodeinaPython3script: withopen("output.txt",mode="w",encoding="utf-16-le")asf: f.write("HelloWorld.") Opentheoutputfileinaneditorwhichreportstheencoding,suchastheexcellentNotepad++onWindows.You’llseeUTF-16(orUCS-2)LittleEndian,butitwillsaythereisnoBOM. WheredidtheBOMgo? Toanswerthat,lookattheencodingargumentinthecodesnippetabove.It’ssettoutf-16-le,whichexplicitlyindicatesLittleEndianencoding.Itturnsoutthatifyouexplicitlyspecifyendianness,Pythonassumesyoudon’tneedaBOM.Thisisactuallymentionedintheofficialdocumentation,butnotparticularlyclearly. Instead,changetheencodingtoutf-16.ThisletsPythonusetheOperatingSystem’sendianness,anditassumesthataBOMisthereforenecessary.Here’sthemodifiedcodesnippet: withopen("output.txt",mode="w",encoding="utf-16")asf: f.write("HelloWorld.") Onceagain,runthatasaPython3scriptandthenopentheoutputfile.Assumingyou’rerunningonalittleendiansystem(whichshouldapplytoanythingrunningaWindowsOS),theencodingshouldshowupasUTF-16(orUCS-2)LittleEndianwithBOM. AtutorialexplaininghowtoprogramtheATOMMatrixwithArduinotoactlikea6-sideddice. Step-by-stepinstructionsforusingtheArduinoIDEonWindowstouploadprogramstotheATOMMatrixandATOMLite. Whenwritingunittests,remembertomakethemBRIEF:Brief,Reliable,Independent,Explicit,Focused.
延伸文章資訊
- 1Python:str file,open,read,write,close,and utf8 BOM resloving
Python:str file,open,read,write,close,and utf8 BOM resloving - utf8_decode.py.
- 2python utf8 bom,在Python中将没有BOM的UTF-8转换为带有 ...
I have a set of files which are usually UTF-8 with BOM. ... that can take any known Python encodi...
- 3Why Python 3 doesn't write the Unicode BOM - Peter Bloomfield
According to the Python documentation on reading and writing Unicode data: Some encodings, such a...
- 4Adding BOM (unicode signature) while saving file in python
My method of adding BOM is by writing ansi characters '" at the beginning of the file, then op...
- 5在Python中將帶BOM的UTF - 程式人生
我想將它們(理想情況下)轉換為沒有BOM的UTF-8。似乎 codecs.StreamRecoder(stream, encode, decode, Reader, Writer, errors...