Byte Order Mark - IBM
文章推薦指數: 80 %
Byte Order Mark ... Unicode in the 16-bit UTF-16 form has no prescribed endian orientation for interchange. This requires communication processes to evaluate the ... ByteOrderMark Unicodeinthe16-bitUTF-16formhasnoprescribedendianorientation forinterchange.Thisrequirescommunicationprocessestoevaluate theendianorientationcorrectly.Toaidinthis,thecharacterU+FEFF ZEROWIDTHNO-BREAKSPACEcanbeusedasaByteOrderMark(BOM). Wheninterpretedintheincorrectendianorientation,itevaluates toU+FFFE,whichisdefinedasNOTACHARACTER. Someapplications,particularlyonWindowssystems, writeaBOMcharactertothestartofafile.InUTF-8,theBOMis thesequenceofbytesEFBBBF.Asabyte-orientedencoding,there arenoendianissueswithUTF-8,butsomeapplications(primarily onWindows)writetheBOM tothestartofaUTF-8encodedfile.AnIBM®Netezza®system doesnotloadtheBOMcodepoint;youcanusethe-bomswitch toremoveaninitialBOMcodepoint. YoucanremoveaBOMfromthestartofaUTF-8filebyusingthenzconvertcommand, asinthefollowingexample:nzconvert-futf8-tutf8-bom-dfinput_file-ofoutput_file WhenyouareconvertingfromortoUTF-16,youcanuseoneofthree converters:UTF16,UTF16be,orUTF16leastheinput(-foption) andoutput(-toption): UTF16 Asinput,Netezzachecks foraBOMtoindicateendianness;otherwise,Netezzainterprets theinputasbig-endian.Asoutput,Netezzawrites aBOMandoutputsinthenativeendiannessofthemachine.Whenconverting fromUTF-16toanyotherencoding,suchasUTF-8,theBOMisremoved. UTF16le Asinput,interpretstheinputaslittle-endian.Asoutput,Netezzaoutputs aslittle-endianwithoutaBOM.AnyBOMistreatedasdataandconverted, suchastoUTF-8. UTF16be Asinput,interpretsallinputasbig-endian.Asoutput,Netezzaconverts asbig-endianwithoutaBOM.AnyBOMistreatedasdataandconverted, suchastoUTF-8. Parenttopic:Convertlegacyformats
延伸文章資訊
- 1What is UTF-8 Encoding? A Guide for Non-Programmers - HubSpot Blog
- 2UTF-8 BOM (Byte Order Mark) 的問題@新精讚
解釋為甚麼Windows 2000 以後的Notepad 存UTF-8 的檔案會加上BOM(Byte Order Mark, U+FEFF), 主要是因為UTF-8 和ASCII 是相容的, 為...
- 3Byte Order Mark - IBM
Byte Order Mark ... Unicode in the 16-bit UTF-16 form has no prescribed endian orientation for in...
- 4什麼是BOM(Byte-order mark)? - 程式隨筆
位元組順序記號(英語:byte-order mark,BOM)是位於碼點 U+FEFF 的統一碼字元的名稱。當以UTF-16或UTF-32來將UCS/統一碼字元所組成的字串編碼時, ...
- 5Byte order mark - Globalization - Microsoft Learn
Byte Order Mark (BOM) is used to indicate how a processor places serialized text into a sequence ...