Display problems caused by the UTF-8 BOM - W3C
文章推薦指數: 80 %
If you are dealing with a file encoded in UTF-8, your display problems may be caused by the presence of a UTF-8 signature (BOM) that the ... WhenusingUTF-8encodedpagesinsomeuseragents,Igetanextralineorunwantedcharactersatthetopofmywebpageorincludedfile.HowdoIremovethem? Answer IfyouaredealingwithafileencodedinUTF-8,yourdisplayproblemsmaybecausedbythepresenceofaUTF-8signature(BOM)thatthe useragentdoesn'trecognize.ThisusedtobeaproblemforstaticHTMLfiles,butisnolongerinrecentversionsofmajorbrowsers.However,ifyouusePHPtogenerateyourHTML,thiswasstillanissuewithPHPversion5.3.6. TheBOMisalwaysatthebeginningofthefile,andsoyouwouldnormallyexpecttoseethedisplayissuesatthetopofapage.However, youmayalsofindblanklinesappearingwithinthepageifyouincludetextfromaseparatefilethatbeginswithaUTF-8signature. ThisarticlewillhelpyoudeterminewhethertheUTF-8iscausingtheproblem.IfthereisnoevidenceofaUTF-8signatureatthe beginningofthefile,thenyouwillhavetolookelsewhereforasolution. WhatisaUTF-8signature(BOM)? Someapplicationsinsertaparticularcombinationofbytesatthebeginningofafiletoindicatethatthetextcontainedinthe fileisUnicode.ThiscombinationofbytesisknownasasignatureorByteOrderMark(BOM).Someapplications- suchasatexteditororabrowser-willdisplaytheBOMasanextralineinthefile,otherswilldisplayunexpectedcharacters,suchas. SeethesidepanelformoredetailedinformationabouttheBOM. TheBOMistheUnicodecodepointU+FEFF,correspondingtotheUnicodecharacter'ZEROWIDTHNON-BREAKINGSPACE'(ZWNBSP). InUTF-16andUTF-32encodings,unlessthereissomealternativeindicator,theBOMisessentialtoensurecorrect interpretationofthefile'scontents.Eachcharacterinthefileisrepresentedby2or4bytesofdataandtheorderinwhichthesebytesare storedinthefileissignificant;theBOMindicatesthisorder. IntheUTF-8encoding,thepresenceoftheBOMisnotessentialbecause,unliketheUTF-16orUTF-32encodings,thereisno alternativesequenceofbytesinacharacter.TheBOMmaystilloccurinUTF-8encodingtext,however,eitherasaby-productofanencoding conversionorbecauseitwasaddedbyaneditor. DetectingtheBOM First,weneedtocheckwhetherthereisindeedaBOMatthebeginningofthefile. YoucantrylookingforaBOMinyourcontent,butifyoureditorhandlestheUTF-8signaturecorrectlyyouprobablywon'tbeableto seeit.AneditorwhichdoesnothandletheUTF-8signaturecorrectlydisplaysthebytesthatcomposethatsignatureaccordingtoitsowncharacter encodingsetting.(WiththeLatin1(ISO8859-1)characterencoding,thesignaturedisplaysascharacters.)Withabinaryeditorcapableof displayingthehexadecimalbytevaluesinthefile,theUTF-8signaturedisplaysasEFBBBF. Alternatively,youreditormaytellyouinastatusbaroramenuwhatencodingyourfileisin,includinginformationaboutthe presenceornotoftheUTF-8signature. Ifnot,somekindofscript-basedtest(seebelow)mayhelp.(Note,ifit’safileincludedbyPHPorsomeothermechanismthatyou thinkiscausingtheproblem,typeintheURIoftheincludedfile.) RemovingtheBOM IfyouhaveaneditorwhichshowsthecharactersthatmakeuptheUTF-8signatureyoumaybeabletodeletethembyhand.Chancesare, however,thattheBOMisthereinthefirstplacebecauseyoudidn'tseeit. CheckwhetheryoureditorallowsyoutospecifywhetheraUTF-8signatureisaddedorkeptduringasave.Suchaneditorprovidesawayofremoving thesignaturebysimplyreadingthefileinthensavingitoutagain.Forexample,ifDreamweaverdetectsaBOMtheSaveAsdialogueboxwillhavea checkmarkalongsidethetext"IncludeUnicodeSignature(BOM)".Justunchecktheboxandsave. Oneofthebenefitsofusingascriptisthatyoucanremovethesignaturequickly,andfrommultiplefiles.Infactthescriptcould berunautomaticallyaspartofyourprocess.IfyouusePerl,youcoulduseasimplescriptcreatedbyMartinDürst. Note:Youshouldchecktheprocessimpactofremovingthesignature.Itmaybethatsomepartofyourcontentdevelopmentprocess reliesontheuseofthesignaturetoindicatethatafileisinUTF-8.BearinmindalsothatpageswithahighproportionofLatincharactersmay lookcorrectsuperficiallybutthatoccasionalcharactersoutsidetheASCIIrange(U+0000toU+007F)maybeincorrectlyencoded. Bytheway YouwillfindthatsometexteditorssuchasWindowsNotepadwillautomaticallyaddaUTF-8signaturetoanyfileyousaveasUTF-8. AUTF-8signatureatthebeginningofaCSSfilecansometimescausetheinitialrulesinthefiletofailoncertainuseragents. Insomebrowsers,thepresenceofaUTF-8signaturewillcausethebrowsertointerpretthetextasUTF-8regardlessofanycharacter encodingdeclarationstothecontrary. Furtherreading UnicodeFAQabouttheByteOrderMark Settingencodinginwebauthoringapplications UnicodeBidirectionalAlgorithmbasics AuthoringHTML&CSS Characters Handlingthebyte-ordermark
延伸文章資訊
- 1Byte order mark - Globalization - Microsoft Learn
- 2FAQ - UTF-8, UTF-16, UTF-32 & BOM - Unicode
Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used...
- 3Display problems caused by the UTF-8 BOM - W3C
If you are dealing with a file encoded in UTF-8, your display problems may be caused by the prese...
- 4UTF8Encoding與BOM | 黃偉榮的學習筆記 - - 點部落
ToArray()); //do something //把檔案存起來 File.WriteAllText(filePath, result, Encoding.UTF8); }. 會照成輸入二...
- 5Byte order mark - Wikipedia
The BOM is encoded in the same scheme as the rest of the document and becomes a noncharacter Unic...