How to display and remove BOM in utf-8 encoded file

文章推薦指數: 80 %
投票人數:10人

Hi, I developed a website with Vim, working both on linux and windows and never had any problems. The other day someone else needed to edit some Groupsvim_useConversationsAboutHowtodisplayandremoveBOMinutf-8encodedfile8921viewsSkiptofirstunreadmessageCarloTrimarchiunread,Aug9,2011,7:37:34PM8/9/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim...@googlegroups.comHi,IdevelopedawebsitewithVim,workingbothonlinuxandwindowsandneverhadanyproblems.TheotherdaysomeoneelseneededtoeditsomefilesandtriedtouseMacandWindows.ApparentlyinthefilesheeditedthereisthisByte-OrderMark.Idiscoveredthisonlyviathew3cvalidatorthatgavemethiswarning:"Byte-OrderMarkfoundinUTF-8File.TheUnicodeByte-OrderMark(BOM)inUTF-8encodedfilesisknowntocauseproblemsforsometexteditorsandolderbrowsers.Youmaywanttoconsideravoidingitsuseuntilitisbettersupported."TheonlywayIcouldsolvetheproblemwasusingnotepad++whichhasanoptiontoexplicitlysavethefilewithouttheBOM.IsthereawaytodothesamethinginVim?MaybeeventodisplaythisBOM?Thanks,CarloNeilBirdunread,Aug9,2011,8:40:55PM8/9/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim...@googlegroups.comAroundabout09/08/1112:37,CarloTrimarchityped...>TheonlywayIcouldsolvetheproblemwasusingnotepad++whichhas>anoptiontoexplicitlysavethefilewithouttheBOM.Isthereaway>todothesamethinginVim?MaybeeventodisplaythisBOM?:setbomb?Do':setnobomb'beforesavingtoremoveaBOM.--[neil@fnx~]#rm-f.signature[neil@fnx~]#ls-l.signaturels:.signature:Nosuchfileordirectory[neil@fnx~]#exitTonyMechelynckunread,Aug9,2011,11:13:44PM8/9/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim...@googlegroups.com,CarloTrimarchiOn09/08/1113:37,CarloTrimarchiwrote:>Hi,>IdevelopedawebsitewithVim,workingbothonlinuxandwindowsand>neverhadanyproblems.Theotherdaysomeoneelseneededtoeditsome>filesandtriedtouseMacandWindows.Apparentlyinthefileshe>editedthereisthisByte-OrderMark.Idiscoveredthisonlyviathe>w3cvalidatorthatgavemethiswarning:>>"Byte-OrderMarkfoundinUTF-8File.TheUnicodeByte-OrderMark>(BOM)inUTF-8encodedfilesisknowntocauseproblemsforsometext>editorsandolderbrowsers.Youmaywanttoconsideravoidingitsuse>untilitisbettersupported."Thatmessageisoutdated.TheBOMissupportedinallUnicodeencodingsincludingUTF-8byall"reasonablyrecent"browers.ItisalsopartoftheHTMLstandard.Sometexteditors(suchasNotepad,Ithink)chokeonit,buttheanswertothatistouseabettereditor,suchasVimorevenWordPad,whichknowabouttheBOMandhandleitcorrectly,eveninUTF-8.Forsomeotherkindsoftextfiles(mostsourcefilesandshellscripts,forinstance),itisbettertosavethefilewithoutaBOM,butformomst"web"formatsincludingHTML,CSS,and,Ithink,XML,XHTML,etc.,aBOMisnoproblemandcanevenbeahelp(e.g.incasethewebserversetsthecharsetincorrectlyornotatallinitsContent-Typeheader).>>TheonlywayIcouldsolvetheproblemwasusingnotepad++whichhas>anoptiontoexplicitlysavethefilewithouttheBOM.Isthereaway>todothesamethinginVim?MaybeeventodisplaythisBOM?>>Thanks,>Carlo>TosavethefilewithoutaBOM: :setlocalnobomb :wToaskVimifthereisaBOM: :setlocalbomb?Theanswerisbombfor"BOMpresent"ornobombfor"BOMabsent".Notethatregardlessofthestateofthe'bomb'option,aBOMcanonlyexistifthe'fileencoding'isoneofUTF-8,UTF-16(oritsUCS-2subset)orUTF-16(akaUCS-4),anyofthem(otherthanUTF-8forwhichendiannessisnotrelevant)inanyendianness.Forother'fileencoding'valuesthe'bomb'optionisirrelevant.TodisplaythepresenceorabsenceoftheBOMonthestatusline: seehttp://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_lineBestregards,Tony.--GeorgeOrwellwasanoptimist.ChristianBrabandtunread,Aug10,2011,12:11:27AM8/10/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim...@googlegroups.comOnTue,August9,20115:13pm,TonyMechelynckwrote:>TosavethefilewithoutaBOM:>> :setlocalnobomb> :w:w++binshouldalsoworkIIRC.regards,ChristianCarloTrimarchiunread,Aug10,2011,1:36:14AM8/10/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetoTonyMechelynck,[email protected]:13,TonyMechelynckwrote:>Thatmessageisoutdated.TheBOMissupportedinallUnicodeencodings>includingUTF-8byall"reasonablyrecent"browers.Itisalsopartofthe>HTMLstandard.Well,withtheBOMthewholelayoutofthewebsiteappearedbrokeninInternetExplorer7.NoproblemwithFirefox.Stillitseemsisnotanissuetounderstimate.>Forsomeotherkindsoftextfiles(mostsourcefilesandshellscripts,for>instance),itisbettertosavethefilewithoutaBOM,butformomst"web">formatsincludingHTML,CSS,and,Ithink,XML,XHTML,etc.,aBOMisno>problemandcanevenbeahelp(e.g.incasethewebserversetsthecharset>incorrectlyornotatallinitsContent-Typeheader).Itwasaphpfile,somaybethat'sproblem.>TosavethefilewithoutaBOM:>>    :setlocalnobomb>    :w>>ToaskVimifthereisaBOM:>>    :setlocalbomb?>>Theanswerisbombfor"BOMpresent"ornobombfor"BOMabsent".>>>TodisplaythepresenceorabsenceoftheBOMonthestatusline:>>    see>http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_lineThanksforalltheinfoandthecommands.Veryuseful.BenFritzunread,Aug10,2011,5:54:08AM8/10/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim_use OnAug9,10:13 am,TonyMechelynck wrote: >On09/08/1113:37,CarloTrimarchiwrote: > >>Hi, >>IdevelopedawebsitewithVim,workingbothonlinuxandwindowsand >>neverhadanyproblems.Theotherdaysomeoneelseneededtoeditsome >>filesandtriedtouseMacandWindows.Apparentlyinthefileshe >>editedthereisthisByte-OrderMark.Idiscoveredthisonlyviathe >>w3cvalidatorthatgavemethiswarning: > >>"Byte-OrderMarkfoundinUTF-8File.TheUnicodeByte-OrderMark >>(BOM)inUTF-8encodedfilesisknowntocauseproblemsforsometext >>editorsandolderbrowsers.Youmaywanttoconsideravoidingitsuse >>untilitisbettersupported." > >Thatmessageisoutdated.TheBOMissupportedinallUnicodeencodings >includingUTF-8byall"reasonablyrecent"browers.Itisalsopartof >theHTMLstandard.Sometexteditors(suchasNotepad,Ithink)chokeon >it,buttheanswertothatistouseabettereditor,suchasVimor >evenWordPad,whichknowabouttheBOMandhandleitcorrectly,evenin >UTF-8. > Nottrue.W3CstillexplicitlyrecommendsagainstusingaBOMfor UTF-8(butIdon'trememberthelinkoff-hand,sorry,Ithinkitwas eitherintheHTML4.01orHTML5specsomewhere).Evenmodernbrowsers likeFirefoxandOperachokeonaBOMinUTF-8filesforXHTMLserved asXML.UsingaBOMforUTF-8ontheinternetisabadidea. ABOMishoweverrecommendedandusefulonUTF-16orUTF-32andthe like.panszunread,Aug10,2011,8:18:43AM8/10/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim...@googlegroups.comOnTue,Aug9,2011at11:13PM,TonyMechelynckwrote:>>Thatmessageisoutdated.TheBOMissupportedinallUnicodeencodings>includingUTF-8byall"reasonablyrecent"browers.Itisalsopartofthe>HTMLstandard.BOMisastandardforUCS2orUTF-16,notforUTF-8.BOMforutf-8willcauseproblemformostprogramswhichexpecttextstreams.gccisagoodexample,mostGNUCLIutilitieswillrejectutf-8withBOM.And,W3Cvalidatorwillofcoursecomplainaboutit...TonyMechelynckunread,Aug10,2011,7:19:30PM8/10/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim...@googlegroups.com,panszOn10/08/1102:18,panszwrote:>OnTue,Aug9,2011at11:13PM,TonyMechelynck>wrote:>>>>Thatmessageisoutdated.TheBOMissupportedinallUnicodeencodings>>includingUTF-8byall"reasonablyrecent"browers.Itisalsopartofthe>>HTMLstandard.>>BOMisastandardforUCS2orUTF-16,notforUTF-8.AccordingtotheUnicodeFAQ,http://www.unicode.org/faq//utf_bom.html#bom4(twosuccessiveFAQquestions)aBOMcanbeusedinUTF-8aswellasinUTF-16orUTF-32;butsinceUTF-8doesn'thaveendiannessvariants,withUTF-8itspecifiesencodingonly,notendianness.BTW,"good"editors(includingatleastVimandWordPad,possiblyothers)handletheBOMcorrectly,eveninUTF-8.Infact,inmyexperienceWordPadwon'treadUTF-8textcorrectly_unless_thereisaBOM.However(aboutyournextparagraph),whenUTF-8isfed"transparently"toaprogramwhichexpectsASCII,andinparticulartoanyprogramwhichexpects#!atthestartofafile,theBOMshouldnotbeused(seethe2ndFAQquestionlinkedabove,andalsohttp://www.unicode.org/faq//utf_bom.html#bom10"HowIshoulddealwithBOMs?",point3.>>BOMforutf-8willcauseproblemformostprogramswhichexpecttext>streams.gccisagoodexample,mostGNUCLIutilitieswillreject>utf-8withBOM.IexplicitlymentionedinthepartyousnippedthatforsomeotherkindsoftextthanHTMLorCSS(suchas,Isaid,sourcefilesandshellscripts)itisbettertosavethefilewithoutaBOM.>>And,W3Cvalidatorwillofcoursecomplainaboutit...>...withawarning,notanerror;andTidywon't.Bestregards,Tony.--"Myweightisperfectformyheight--whichvaries"BenFritzunread,Aug10,2011,11:30:38PM8/10/11ReplytoauthorSignintoreplytoauthorForwardSignintoforwardDeleteYoudonothavepermissiontodeletemessagesinthisgroupLinkReportmessageasabuseSignintoreportmessageasabuseShoworiginalmessageEitheremailaddressesareanonymousforthisgrouporyouneedtheviewmemberemailaddressespermissiontoviewtheoriginalmessagetovim_use OnAug10,6:19 am,TonyMechelynck wrote: >On10/08/1102:18,panszwrote: > >>OnTue,Aug9,2011at11:13PM,TonyMechelynck >> wrote: > >>>Thatmessageisoutdated.TheBOMissupportedinallUnicodeencodings >>>includingUTF-8byall"reasonablyrecent"browers.Itisalsopartofthe >>>HTMLstandard. > >>BOMisastandardforUCS2orUTF-16,notforUTF-8. > >AccordingtotheUnicodeFAQ,http://www.unicode.org/faq//utf_bom.html#bom4(twosuccessiveFAQ >questions)aBOMcanbeusedinUTF-8aswellasinUTF-16orUTF-32; >butsinceUTF-8doesn'thaveendiannessvariants,withUTF-8it >specifiesencodingonly,notendianness.BTW,"good"editors(including >atleastVimandWordPad,possiblyothers)handletheBOMcorrectly, >eveninUTF-8.Infact,inmyexperienceWordPadwon'treadUTF-8text >correctly_unless_thereisaBOM. > >However(aboutyournextparagraph),whenUTF-8isfed"transparently" >toaprogramwhichexpectsASCII,andinparticulartoanyprogramwhich >expects#!atthestartofafile,theBOMshouldnotbeused(seethe >2ndFAQquestionlinkedabove,andalsohttp://www.unicode.org/faq//utf_bom.html#bom10"HowIshoulddealwith >BOMs?",point3. > > > >>BOMforutf-8willcauseproblemformostprogramswhichexpecttext >>streams.gccisagoodexample,mostGNUCLIutilitieswillreject >>utf-8withBOM. > >Iexplicitlymentionedinthepartyousnippedthatforsomeotherkinds >oftextthanHTMLorCSS(suchas,Isaid,sourcefilesandshell >scripts)itisbettertosavethefilewithoutaBOM. > > > >>And,W3Cvalidatorwillofcoursecomplainaboutit... > >...withawarning,notanerror;andTidywon't. > W3CspecificallyrecommendsyoudoNOTuseaBOMforUTF-8onHTML/ XHTML/CSSdocuments.Seehttp://www.w3.org/International/questions/qa-byte-order-mark#bomhow WhiledevelopingTOhtml,Iranintoproblemsinsomebrowserswhen usingUTF-8withBOM.IfIremembercorrectly,browserswhichactually handleXHTMLcorrectly,likeOperaandFirefox,wereinterpretingthe BOMascharactersappearingbeforetheXMLprolog



請為這篇文章評分?