Unicode Byte Order Mark (BOM) - Stefan Trost Media
文章推薦指數: 80 %
The Unicode Byte Order Mark is a Unicode character, that displays the endianness of a Unicode file or stream. This character has the Unicode position U+FEFF and ... InfoCenterInfoCenterSettingsFileExtensionsShortcutsSystemIntegrationUnicodeFormatsInputFontsSoftwareGlossaryRegularExpressionsLineBreaksTimeFormatsTranscriptionTranslationYourQuestionsSoftwareAnswerCoachClipboardSaverEasyMusicPlayerFasterFilesFilelistCreatorFileRenamerImageConverterImageResizerIndexAuthorPasswordGeneratorPipettePrintMyFontsSlippyClerkSudokuTextConverterTextEncoderTextImagesUnitConverterWordCreatorByteOrderMark(BOM) TheUnicodeByteOrderMarkisaUnicodecharacter,thatdisplaystheendiannessofaUnicodefileorstream.ThischaracterhastheUnicodepositionU+FEFFandcanalsobeusedtodeterminethecodingofatextfile.Thecharacteralwayscomesfirstinthefileandisnotinterpretedaspartofthetextbythesoftwarethatsupportsthecorrespondingformat.Anadvantageofthistechniqueisthatnoadditionalinformationmustbesuppliedandthekeyforinterpretationislocateddirectlyinthefile. ByteOrderMarkofdifferentEncodingsInterpretationoftheByteOrderMarkChange,RemoveorAddByteOrderMark ByteOrderMarkofdifferentEncodings Dependingontheencoding,adifferentbytesequenceresultsfromthecharacterU-FEFF.Thebytesequencesforthemostpopularencodingsaresummarizedinthistable: EncodingByteOrderMarkASCII ANSINoBOM- UTF-72B2F76(38|39|2B|2F)+/v 89+/ UTF-8EFBBBF UTF-16BigEndianFEFFþÿ UTF-16LittleEndianFFFEÿþ UTF-32BigEndian0000FEFF??þÿ UTF-32LittleEndianFFFE0000ÿþ?? Thelastcolumn(ASCII)showshowthebytesequenceofthebyteordermarkwouldlooklikeifitwereinterpretedasASCIIcharactersinatexteditor.ItisimperativeinordertoshowafilecorrectlytousetheByteOrderMarkinUTF-16andUTF-32encodings,becauseonecharacterintheseencodingsoccupiesseveralbytesandthebyteordermarkindicatestheorderinwhichthebyteshavetobeinterpreted(see BigEndianandLittleEndian regardingthebyteorder).Ontheotherhand,inUTF-8andUTF-7,theBOMisnotmandatory,butnonethelessleadstobetterresults,becauseprogramsotherwisecouldinterpretsuchtextsasANSIalso. Youcaneasilysee,thattheBOMindicatestheorderofthebytes,whencomparingthesequencesofbytesbetweenBigEndian(mostsignificantbitinthebeginning)andLittleEndian(leastsignificantbitinthebeginning),becausethesetwocodeshaveanoppositebyteorder.InUTF-16LittleEndian,thebytesequenceisFFFEandinUTF-16BigEndianitisjustthecontrary(FEFF).AsgeneralinUTF-32,fourbytesareusedpercharacter.Thatisalsoevident,ifyoulookattheBOM:0000FEFFforUTF-32BigEndianandFFFE0000forUTF-32LittleEndian. InterpretationoftheByteOrderMark ProblemsandfalseinterpretationsusingthebyteordermarkoccurifprogramscannotinterprettheBOM,andshowANSIcharactersinstead.Forexample,canbeshownfortheBOMfromUTF-8(EFBBBF).Hereisalittleproblematic,becauseANSIfilesalsoallowsthebytesequenceEFBBBF.So,ifyoustorethestringatthebeginningofafileandyousavethisfileasANSI,mostsoftwarewillinterprettherestofthefileascodedinUTF-8.WithapplicationsliketheTextConverter ortheTextEncoder,youareabletoreadandwritefileswithorwithoutByteOrderMarkandyoucanchangetheUnicodeformatoffilesorwhetheraByteOrderMarkisusedinthefilesornot. IfthecharacterU+FEFFappearsatanotherpositionthanatthebeginningofafile,itisdisplayedasasignwithawidthof0andnobreak.However,thedeliberateuseofthismarkisobsoleteforthispurpose.U+FEFFshouldbeusedasabyteordermarkonlyandyoushouldnowusethecodepositionU+2060foracharacterwithnowidthandnobreak. Change,RemoveorAddByteOrderMarkWiththeprogramTextEncoderyoucanchange,removeoraddtheByteOrderMarkoffiles.AfterstartingtheTextEncoder,youcandothefollowing:DragthefilesyouwanttoeditfromanyfolderontotheTextEncoder.Ontherightsideunder"Changes"activatetheoption"Encoding".Under"WriteByteOrderMark(BOM)intoFiles",setwhetherthefilesshouldgetaByteOrderMarkornot.Inthestorageoptionsatthebottomright,setwhetheryouwanttooverwritethefilesorsavethemunderanewnameasnewfiles.Clickonthebutton"Convert".ThefilelistintheTextEncodercontainsacolumnnamed"BOM".HereyoucanseeifyouraddedfilescurrentlyhaveaByteOrderMarkornot.
延伸文章資訊
- 1UNICODEFILESYSTEMBOM (FTP client and server) statement
Restriction: UTF-8 and UTF-16 are the only Unicode encodings supported in the file system by z/OS...
- 2unicode-bom - ESLint - Pluggable JavaScript Linter
The Unicode Byte Order Mark (BOM) is used to specify whether code units are big endian or little ...
- 3The byte-order mark (BOM) in HTML - W3C
For example, if you use Save As in Dreamweaver and your file has a BOM at the start you will see ...
- 4「带BOM 的UTF-8」和「无BOM 的UTF-8」有什么区别?网页 ...
UTF-8 不需要BOM,尽管Unicode 标准允许在UTF-8 中使用BOM。 所以不含BOM 的UTF-8 才是标准形式,在UTF-8 文件中放置BOM 主要是微软的习惯(顺便提一下:把带...
- 5Byte order mark - Wikipedia
The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORD...