Unicode Byte Order Mark (BOM) - Stefan Trost Media

文章推薦指數: 80 %
投票人數:10人

The Unicode Byte Order Mark is a Unicode character, that displays the endianness of a Unicode file or stream. This character has the Unicode position U+FEFF and ... InfoCenterInfoCenterSettingsFileExtensionsShortcutsSystemIntegrationUnicodeFormatsInputFontsSoftwareGlossaryRegularExpressionsLineBreaksTimeFormatsTranscriptionTranslationYourQuestionsSoftwareAnswerCoachClipboardSaverEasyMusicPlayerFasterFilesFilelistCreatorFileRenamerImageConverterImageResizerIndexAuthorPasswordGeneratorPipettePrintMyFontsSlippyClerkSudokuTextConverterTextEncoderTextImagesUnitConverterWordCreatorByteOrderMark(BOM) TheUnicodeByteOrderMarkisaUnicodecharacter,thatdisplaystheendiannessofaUnicodefileorstream.ThischaracterhastheUnicodepositionU+FEFFandcanalsobeusedtodeterminethecodingofatextfile.Thecharacteralwayscomesfirstinthefileandisnotinterpretedaspartofthetextbythesoftwarethatsupportsthecorrespondingformat.Anadvantageofthistechniqueisthatnoadditionalinformationmustbesuppliedandthekeyforinterpretationislocateddirectlyinthefile. ByteOrderMarkofdifferentEncodingsInterpretationoftheByteOrderMarkChange,RemoveorAddByteOrderMark ByteOrderMarkofdifferentEncodings Dependingontheencoding,adifferentbytesequenceresultsfromthecharacterU-FEFF.Thebytesequencesforthemostpopularencodingsaresummarizedinthistable: EncodingByteOrderMarkASCII ANSINoBOM- UTF-72B2F76(38|39|2B|2F)+/v 89+/ UTF-8EFBBBF UTF-16BigEndianFEFFþÿ UTF-16LittleEndianFFFEÿþ UTF-32BigEndian0000FEFF??þÿ UTF-32LittleEndianFFFE0000ÿþ?? Thelastcolumn(ASCII)showshowthebytesequenceofthebyteordermarkwouldlooklikeifitwereinterpretedasASCIIcharactersinatexteditor.ItisimperativeinordertoshowafilecorrectlytousetheByteOrderMarkinUTF-16andUTF-32encodings,becauseonecharacterintheseencodingsoccupiesseveralbytesandthebyteordermarkindicatestheorderinwhichthebyteshavetobeinterpreted(see BigEndianandLittleEndian regardingthebyteorder).Ontheotherhand,inUTF-8andUTF-7,theBOMisnotmandatory,butnonethelessleadstobetterresults,becauseprogramsotherwisecouldinterpretsuchtextsasANSIalso. Youcaneasilysee,thattheBOMindicatestheorderofthebytes,whencomparingthesequencesofbytesbetweenBigEndian(mostsignificantbitinthebeginning)andLittleEndian(leastsignificantbitinthebeginning),becausethesetwocodeshaveanoppositebyteorder.InUTF-16LittleEndian,thebytesequenceisFFFEandinUTF-16BigEndianitisjustthecontrary(FEFF).AsgeneralinUTF-32,fourbytesareusedpercharacter.Thatisalsoevident,ifyoulookattheBOM:0000FEFFforUTF-32BigEndianandFFFE0000forUTF-32LittleEndian. InterpretationoftheByteOrderMark ProblemsandfalseinterpretationsusingthebyteordermarkoccurifprogramscannotinterprettheBOM,andshowANSIcharactersinstead.Forexample,canbeshownfortheBOMfromUTF-8(EFBBBF).Hereisalittleproblematic,becauseANSIfilesalsoallowsthebytesequenceEFBBBF.So,ifyoustorethestringatthebeginningofafileandyousavethisfileasANSI,mostsoftwarewillinterprettherestofthefileascodedinUTF-8.WithapplicationsliketheTextConverter ortheTextEncoder,youareabletoreadandwritefileswithorwithoutByteOrderMarkandyoucanchangetheUnicodeformatoffilesorwhetheraByteOrderMarkisusedinthefilesornot. IfthecharacterU+FEFFappearsatanotherpositionthanatthebeginningofafile,itisdisplayedasasignwithawidthof0andnobreak.However,thedeliberateuseofthismarkisobsoleteforthispurpose.U+FEFFshouldbeusedasabyteordermarkonlyandyoushouldnowusethecodepositionU+2060foracharacterwithnowidthandnobreak. Change,RemoveorAddByteOrderMarkWiththeprogramTextEncoderyoucanchange,removeoraddtheByteOrderMarkoffiles.AfterstartingtheTextEncoder,youcandothefollowing:DragthefilesyouwanttoeditfromanyfolderontotheTextEncoder.Ontherightsideunder"Changes"activatetheoption"Encoding".Under"WriteByteOrderMark(BOM)intoFiles",setwhetherthefilesshouldgetaByteOrderMarkornot.Inthestorageoptionsatthebottomright,setwhetheryouwanttooverwritethefilesorsavethemunderanewnameasnewfiles.Clickonthebutton"Convert".ThefilelistintheTextEncodercontainsacolumnnamed"BOM".HereyoucanseeifyouraddedfilescurrentlyhaveaByteOrderMarkornot.



請為這篇文章評分?