Byte Order Mark - IBM
文章推薦指數: 80 %
Byte Order Mark ... Unicode in the 16-bit UTF-16 form has no prescribed endian orientation for interchange. This requires communication processes to evaluate the ... ByteOrderMark Unicodeinthe16-bitUTF-16formhasnoprescribedendianorientation forinterchange.Thisrequirescommunicationprocessestoevaluate theendianorientationcorrectly.Toaidinthis,thecharacterU+FEFF ZEROWIDTHNO-BREAKSPACEcanbeusedasaByteOrderMark(BOM). Wheninterpretedintheincorrectendianorientation,itevaluates toU+FFFE,whichisdefinedasNOTACHARACTER. Someapplications,particularlyonWindowssystems, writeaBOMcharactertothestartofafile.InUTF-8,theBOMis thesequenceofbytesEFBBBF.Asabyte-orientedencoding,there arenoendianissueswithUTF-8,butsomeapplications(primarily onWindows)writetheBOM tothestartofaUTF-8encodedfile.AnIBM®Netezza®system doesnotloadtheBOMcodepoint;youcanusethe-bomswitch toremoveaninitialBOMcodepoint. YoucanremoveaBOMfromthestartofaUTF-8filebyusingthenzconvertcommand, asinthefollowingexample:nzconvert-futf8-tutf8-bom-dfinput_file-ofoutput_file WhenyouareconvertingfromortoUTF-16,youcanuseoneofthree converters:UTF16,UTF16be,orUTF16leastheinput(-foption) andoutput(-toption): UTF16 Asinput,Netezzachecks foraBOMtoindicateendianness;otherwise,Netezzainterprets theinputasbig-endian.Asoutput,Netezzawrites aBOMandoutputsinthenativeendiannessofthemachine.Whenconverting fromUTF-16toanyotherencoding,suchasUTF-8,theBOMisremoved. UTF16le Asinput,interpretstheinputaslittle-endian.Asoutput,Netezzaoutputs aslittle-endianwithoutaBOM.AnyBOMistreatedasdataandconverted, suchastoUTF-8. UTF16be Asinput,interpretsallinputasbig-endian.Asoutput,Netezzaconverts asbig-endianwithoutaBOM.AnyBOMistreatedasdataandconverted, suchastoUTF-8. Parenttopic:Convertlegacyformats
延伸文章資訊
- 1What's the difference between UTF-8 and UTF-8 with BOM?
- 2Byte order mark - Wikipedia
The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORD...
- 3UTF-8 BOM (Byte Order Mark) 的問題@新精讚
解釋為甚麼Windows 2000 以後的Notepad 存UTF-8 的檔案會加上BOM(Byte Order Mark, U+FEFF), 主要是因為UTF-8 和ASCII 是相容的, 為...
- 4Byte order mark - Globalization - Microsoft Learn
Byte Order Mark (BOM) is used to indicate how a processor places serialized text into a sequence ...
- 5What is UTF-8 Encoding? A Guide for Non-Programmers - HubSpot Blog