What's the difference between UTF-8 and UTF-8 with BOM?

文章推薦指數: 80 %
投票人數:10人

The UTF-8 BOM is a sequence of bytes at the start of a text stream ( 0xEF, 0xBB, 0xBF ) that allows the reader to more reliably guess a file as being ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams What'sthedifferencebetweenUTF-8andUTF-8withBOM? AskQuestion Asked 12years,8monthsago Modified 1monthago Viewed 760ktimes 974 What'sdifferentbetweenUTF-8andUTF-8withBOM?Whichisbetter? unicodeutf-8character-encodingbyte-order-mark Share Improvethisquestion Follow editedSep9at16:08 Henke 3,27522goldbadges2222silverbadges2929bronzebadges askedFeb8,2010at18:26 simplesimple 9,87333goldbadges1616silverbadges1111bronzebadges 18 83 UTF-8canbeauto-detectedbetterbycontentsthanbyBOM.Themethodissimple:trytoreadthefile(orastring)asUTF-8andifthatsucceeds,assumethatthedataisUTF-8.OtherwiseassumethatitisCP1252(orsomeother8bitencoding).Anynon-UTF-8eightbitencodingwillalmostcertainlycontainsequencesthatarenotpermittedbyUTF-8.PureASCII(7bit)getsinterpretedasUTF-8,buttheresultiscorrectthatwaytoo. – Tronic Feb11,2010at13:25 45 ScanninglargefilesforUTF-8contenttakestime.ABOMmakesthisprocessmuchfaster.Inpracticeyouoftenneedtodoboth.Theculpritnowadaysisthatstillalotoftextcontentisn'tUnicode,andIstillbumpintotoolsthatsaytheydoUnicode(forinstanceUTF-8)butemittheircontentadifferentcodepage. – JeroenWiertPluimers Dec18,2013at7:41 11 @TronicIdon'treallythinkthat"better"fitsinthiscase.Itdependsontheenvironment.IfyouaresurethatallUTF-8filesaremarkedwithaBOMthancheckingtheBOMisthe"better"way,becauseitisfasterandmorereliable. – mg30rg Jul31,2014at9:31 36 UTF-8doesnothaveaBOM.WhenyouputaU+FEFFcodepointatthestartofaUTF-8file,specialcaremustbemadetodealwithit.ThisisjustoneofthoseMicrosoftnaminglies,likecallinganencoding"Unicode"whenthereisnosuchthing. – tchrist Oct1,2014at22:37 9 "ThemodernMainframe(andAIX)islittleendianUTF-8aware"UTF-8doesn'thaveanendedness!thereisnoshufflingofbytesaroundtoputpairsorgroupsoffourintotheright"order"foraparticularsystem!TodetectaUTF-8bytesequenceitmaybeusefultonotethatthefirstbyteofamulti-bytesequence"codepoint"(thebytesthatareNOT"plain"ASCIIones)hastheMSbitsetandallonetothreemoresuccessivelylesssignificantbitsfollowedbyaresetbit.ThetotalnumberofthosesetbitsisonelessbytesthatareinthatcodepointandtheywillALLhavetheMSBset... – SlySven Aug19,2016at17:38  |  Show13morecomments 22Answers 22 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 898 TheUTF-8BOMisasequenceofbytesatthestartofatextstream(0xEF,0xBB,0xBF)thatallowsthereadertomorereliablyguessafileasbeingencodedinUTF-8. Normally,theBOMisusedtosignaltheendiannessofanencoding,butsinceendiannessisirrelevanttoUTF-8,theBOMisunnecessary. AccordingtotheUnicodestandard,theBOMforUTF-8filesisnotrecommended: 2.6EncodingSchemes ...UseofaBOMisneitherrequirednorrecommendedforUTF-8,butmaybeencounteredincontextswhereUTF-8dataisconvertedfromotherencodingformsthatuseaBOMorwheretheBOMisusedasaUTF-8signature.Seethe“ByteOrderMark”subsectioninSection16.8,Specials,formoreinformation. Share Improvethisanswer Follow editedApr16,2020at22:43 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredFeb8,2010at18:33 MartinCoteMartinCote 27.9k1313goldbadges7575silverbadges9999bronzebadges 30 136 ItmightnotberecommendedbutfrommyexperienceinHebrewconversionstheBOMissometimescrucialforUTF-8recognitioninExcel,andmaymakethedifferencebetweenJibrishandHebrew – Matanya Dec7,2012at8:13 41 Itmightnotberecommendedbutitdidwonderstomypowershellscriptwhentryingtooutput"æøå" – Marius Nov12,2013at9:22 75 Regardlessofitnotbeingrecommendedbythestandard,it'sallowed,andIgreatlypreferhavingsomethingtoactasaUTF-8signatureratherthealternativesofassumingorguessing.Unicode-compliantsoftwareshould/mustbeabletodealwithitspresence,soIpersonallyencourageitsuse. – martineau Dec31,2013at20:41 33 @bames53:Yes,inanidealworldstoringtheencodingoftextfilesasfilesystemmetadatawouldbeabetterwaytopreserveit.Butmostofuslivingintherealworldcan'tchangethefilesystemoftheOS(s)ourprogramsgetrunon--sousingtheUnicodestandard'splatform-independentBOMsignatureseemslikethebestandmostpracticalalternativeIMHO. – martineau Jan16,2014at19:37 41 @martineauJustyesterdayIranintoafilewithaUTF-8BOMthatwasn'tUTF-8(itwasCP936).What'sunfortunateisthattheonesresponsiblefortheimmenseamountofpaincausebytheUTF-8BOMarelargelyoblivioustoit. – bames53 Jan16,2014at23:21  |  Show25morecomments 274 Theotherexcellentanswersalreadyansweredthat: ThereisnoofficialdifferencebetweenUTF-8andBOM-edUTF-8 ABOM-edUTF-8stringwillstartwiththethreefollowingbytes.EFBBBF Thosebytes,ifpresent,mustbeignoredwhenextractingthestringfromthefile/stream. But,asadditionalinformationtothis,theBOMforUTF-8couldbeagoodwayto"smell"ifastringwasencodedinUTF-8...Oritcouldbealegitimatestringinanyotherencoding... Forexample,thedata[EFBBBF414243]couldeitherbe: ThelegitimateISO-8859-1string"ABC" ThelegitimateUTF-8string"ABC" Sowhileitcanbecooltorecognizetheencodingofafilecontentbylookingatthefirstbytes,youshouldnotrelyonthis,asshowbytheexampleabove Encodingsshouldbeknown,notdivined. Share Improvethisanswer Follow editedMay6,2015at19:25 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredFeb8,2010at18:42 paercebalpaercebal 79.5k3737goldbadges129129silverbadges158158bronzebadges 26 67 @Alcott:Youunderstoodcorrectly.Thestring[EFBBBF414243]isjustabunchofbytes.Youneedexternalinformationtochoosehowtointerpretit.IfyoubelievethosebyteswereencodedusingISO-8859-1,thenthestringis"ABC".IfyoubelievethosebyteswereencodedusingUTF-8,thenitis"ABC".Ifyoudon'tknow,thenyoumusttrytofindout.TheBOMcouldbeaclue.TheabsenceofinvalidcharacterwhendecodedasUTF-8couldbeanother...Intheend,unlessyoucanmemorize/findtheencodingsomehow,anarrayofbytesisjustanarrayofbytes. – paercebal Sep11,2011at18:57 23 @paercebalWhile""isvalidlatin-1,itisveryunlikelythatatextfilebeginswiththatcombination.Thesameholdsfortheucs2-le/bemarkersÿþandþÿ.Alsoyoucanneverknow. – user877329 Jun21,2013at16:48 17 @decezeItisprobablylinguisticallyinvalid:Firstï(whichisok),thensomequotationmarkwithoutspacein-between(notok).¿indicatesitisSpanishbutïisnotusedinSpanish.Conclusion:Itisnotlatin-1withacertaintywellabovethecertaintywithoutit. – user877329 Nov5,2013at7:20 26 @userSure,itdoesn'tnecessarilymakesense.Butifyoursystemreliesonguessing,that'swhereuncertaintiescomein.Somemalicioususersubmitstextstartingwiththese3lettersonpurpose,andyoursystemsuddenlyassumesit'slookingatUTF-8withaBOM,treatsthetextasUTF-8whereitshoulduseLatin-1,andsomeUnicodeinjectiontakesplace.Justahypotheticalexample,butcertainlypossible.Youcan'tjudgeatextencodingbyitscontent,period. – deceze ♦ Nov5,2013at7:44 50 "Encodingsshouldbeknown,notdivined."Theheartandsouloftheproblem.+1,goodsir.Inotherwords:eitherstandardizeyourcontentandsay,"We'realwaysusingthisencoding.Period.Writeitthatway.Readitthatway,"ordevelopanextendedformatthatallowsforstoringtheencodingasmetadata.(Thelatterprobablyneedssome"bootstrapstandardencoding,"too.Likesaying"ThepartthattellsyoutheencodingisalwaysASCII.") – jpmc26 Jul23,2015at21:25  |  Show21morecomments 147 ThereareatleastthreeproblemswithputtingaBOMinUTF-8encodedfiles. FilesthatholdnotextarenolongeremptybecausetheyalwayscontaintheBOM. FilesthatholdtextwithintheASCIIsubsetofUTF-8arenolongerthemselvesASCIIbecausetheBOMisnotASCII,whichmakessomeexistingtoolsbreakdown,anditcanbeimpossibleforuserstoreplacesuchlegacytools. ItisnotpossibletoconcatenateseveralfilestogetherbecauseeachfilenowhasaBOMatthebeginning. And,asothershavementioned,itisneithersufficientnornecessarytohaveaBOMtodetectthatsomethingisUTF-8: ItisnotsufficientbecauseanarbitrarybytesequencecanhappentostartwiththeexactsequencethatconstitutestheBOM. ItisnotnecessarybecauseyoucanjustreadthebytesasiftheywereUTF-8;ifthatsucceeds,itis,bydefinition,validUTF-8. Share Improvethisanswer Follow editedSep9at16:16 Henke 3,27522goldbadges2222silverbadges2929bronzebadges answeredNov15,2012at13:28 jpsecherjpsecher 4,04522goldbadges3131silverbadges3838bronzebadges 16 11 Repoint1"FilesthatholdnotextarenolongeremptybecausetheyalwayscontaintheBOM",this(1)conflatestheOSfilesystemlevelwiththeinterpretedcontentslevel,plusit(2)incorrectlyassumesthatusingBOMonemustputaBOMalsoineveryotherwiseemptyfile.Thepracticalsolutionto(1)istonotdo(2).Essentiallythecomplaintreducesto"it'spossibletoimpracticallyputaBOMinanotherwiseemptyfile,thuspreventingthemosteasydetectionoflogicallyemptyfile(bycheckingfilesize)".Stillgoodsoftwareshouldbeabletodealwithit,sinceithasapurpose. – Cheersandhth.-Alf Jun18,2014at14:22 9 Repoint2,"FilesthatholdASCIItextisnolongerthemselvesASCII",thisconflatesASCIIwithUTF-8.AnUTF-8filethatholdsASCIItextisnotASCII,it'sUTF-8.Similarly,anUTF-16filethatholdsASCIItextisnotASCII,it'sUTF-16.Andsoon.ASCIIisa7-bitsinglebytecode.UTF-8isan8-bitvariablelengthextensionofASCII.If"toolsbreakdown"dueto>127valuesthenthey'rejustnotfitforan8-bitworld.OnesimplepracticalsolutionistouseonlyASCIIfileswithtoolsthatbreakdownfornon-ASCIIbytevalues.Aprobablybettersolutionistoditchthoseungoodtools. – Cheersandhth.-Alf Jun18,2014at14:27 9 Repoint3,"ItisnotpossibletoconcatenateseveralfilestogetherbecauseeachfilenowhasaBOMatthebeginning"isjustwrong.IhavenoproblemconcatenatingUTF-8fileswithBOM,soit'sclearlypossible.IthinkmaybeyoumeanttheUnix-landcatwon'tgiveyouacleanresult,aresultthathasBOMonlyatthestart.Ifyoumeantthat,thenthat'sbecausecatworksatthebytelevel,notattheinterpretedcontentslevel,andinsimilarfashioncatcan'tdealwithphotographs,say.Stillitdoesn'tdomuchharm.That'sbecausetheBOMencodesazero-widthnon-breakingspace. – Cheersandhth.-Alf Jun18,2014at14:34 28 @Cheersandhth.-AlfThisansweriscorrect.YouaremerelypointingoutMicrosoftbugs. – tchrist Oct1,2014at22:34 13 @brighty:Thesituationisn'timprovedanybyaddingabomthough. – Deduplicator Sep20,2015at4:29  |  Show11morecomments 120 HereareexamplesoftheBOMusagethatactuallycauserealproblemsandyetmanypeopledon'tknowaboutit. BOMbreaksscripts Shellscripts,Perlscripts,Pythonscripts,Rubyscripts,Node.jsscriptsoranyotherexecutablethatneedstoberunbyaninterpreter-allstartwithashebanglinewhichlookslikeoneofthose: #!/bin/sh #!/usr/bin/python #!/usr/local/bin/perl #!/usr/bin/envnode Ittellsthesystemwhichinterpreterneedstoberunwheninvokingsuchascript.IfthescriptisencodedinUTF-8,onemaybetemptedtoincludeaBOMatthebeginning.Butactuallythe"#!"charactersarenotjustcharacters.TheyareinfactamagicnumberthathappenstobecomposedoutoftwoASCIIcharacters.Ifyouputsomething(likeaBOM)beforethosecharacters,thenthefilewilllooklikeithadadifferentmagicnumberandthatcanleadtoproblems. SeeWikipedia,article:Shebang,section:Magicnumber: Theshebangcharactersarerepresentedbythesametwobytesin extendedASCIIencodings,includingUTF-8,whichiscommonlyusedfor scriptsandothertextfilesoncurrentUnix-likesystems.However, UTF-8filesmaybeginwiththeoptionalbyteordermark(BOM);ifthe "exec"functionspecificallydetectsthebytes0x23and0x21,thenthe presenceoftheBOM(0xEF0xBB0xBF)beforetheshebangwillprevent thescriptinterpreterfrombeingexecuted.Someauthoritiesrecommend againstusingthebyteordermarkinPOSIX(Unix-like)scripts,[14] forthisreasonandforwiderinteroperabilityandphilosophical concerns.Additionally,abyteordermarkisnotnecessaryinUTF-8, asthatencodingdoesnothaveendiannessissues;itservesonlyto identifytheencodingasUTF-8.[emphasisadded] BOMisillegalinJSON SeeRFC7159,Section8.1: ImplementationsMUSTNOTaddabyteordermarktothebeginningofaJSONtext. BOMisredundantinJSON NotonlyitisillegalinJSON,itisalsonotneededtodeterminethecharacterencodingbecausetherearemorereliablewaystounambiguouslydetermineboththecharacterencodingandendiannessusedinanyJSONstream(seethisanswerfordetails). BOMbreaksJSONparsers NotonlyitisillegalinJSONandnotneeded,itactuallybreaksallsoftwarethatdeterminetheencodingusingthemethodpresentedinRFC4627: DeterminingtheencodingandendiannessofJSON,examiningthefirstfourbytesfortheNULbyte: 000000xx-UTF-32BE 00xx00xx-UTF-16BE xx000000-UTF-32LE xx00xx00-UTF-16LE xxxxxxxx-UTF-8 Now,ifthefilestartswithBOMitwilllooklikethis: 0000FEFF-UTF-32BE FEFF00xx-UTF-16BE FFFE0000-UTF-32LE FFFExx00-UTF-16LE EFBBBFxx-UTF-8 Notethat: UTF-32BEdoesn'tstartwiththreeNULs,soitwon'tberecognized UTF-32LEthefirstbyteisnotfollowedbythreeNULs,soitwon'tberecognized UTF-16BEhasonlyoneNULinthefirstfourbytes,soitwon'tberecognized UTF-16LEhasonlyoneNULinthefirstfourbytes,soitwon'tberecognized Dependingontheimplementation,allofthosemaybeinterpretedincorrectlyasUTF-8andthenmisinterpretedorrejectedasinvalidUTF-8,ornotrecognizedatall. Additionally,iftheimplementationtestsforvalidJSONasIrecommend,itwillrejecteventheinputthatisindeedencodedasUTF-8,becauseitdoesn'tstartwithanASCIIcharacter<128asitshouldaccordingtotheRFC. Otherdataformats BOMinJSONisnotneeded,isillegalandbreakssoftwarethatworkscorrectlyaccordingtotheRFC.Itshouldbeanobrainertojustnotuseitthenandyet,therearealwayspeoplewhoinsistonbreakingJSONbyusingBOMs,comments,differentquotingrulesordifferentdatatypes.OfcourseanyoneisfreetousethingslikeBOMsoranythingelseifyouneedit-justdon'tcallitJSONthen. ForotherdataformatsthanJSON,takealookathowitreallylookslike.IftheonlyencodingsareUTF-*andthefirstcharactermustbeanASCIIcharacterlowerthan128thenyoualreadyhavealltheinformationneededtodetermineboththeencodingandtheendiannessofyourdata.AddingBOMsevenasanoptionalfeaturewouldonlymakeitmorecomplicatedanderrorprone. OtherusesofBOM AsfortheusesoutsideofJSONorscripts,Ithinktherearealreadyverygoodanswershere.Iwantedtoaddmoredetailedinfospecificallyaboutscriptingandserialization,becauseitisanexampleofBOMcharacterscausingrealproblems. Share Improvethisanswer Follow editedOct7,2021at7:34 CommunityBot 111silverbadge answeredJun26,2016at11:34 rsprsp 103k2828goldbadges197197silverbadges174174bronzebadges 13 7 rfc7159whichsupersedesrfc4627actuallysuggestssupportingBOMmaynotbesoevil.BasicallynothavingaBOMisjustanambiguouskludgesothatoldWindowsandUnixsoftwarethatarenotUnicode-awarecanstillprocessutf-8. – EricGrange Apr10,2017at7:59 2 SoundslikeJSONneedsupdatinginordertosupportit,samewithPerlscripts,Pythonscripts,Rubyscripts,Node.js.Justbecausetheseplatformsoptedtonotincludesupport,doesn'tnecessarilykilltheuseforBOM.ApplehasbeentryingtokillAdobeforafewyearsnow,andAdobeisstillaround.Butanenlighteningpost. – htm11h Jul24,2017at15:47 19 @EricGrange,youseemtobeverystronglysupportingBOM,butfailtorealizethatthiswouldrendertheall-ubiquitous,universallyuseful,optimal-minimum"plaintext"formatarelicofthepre-UTF8past!Addinganysortof(in-band)headertotheplaintextstreamwould,bydefinition,imposeamandatoryprotocoltothesimplesttextfiles,makingitneveragainthe"simplest"!Andforwhatgain?Tosupportalltheother,ancientCPencodingsthatalsodidn'thavesignatures,soyoumightmistakethemwithUTF-8?(BTW,ASCIIisUTF-8,too.So,aBOMtothose,too?;)Comeon.) – Sz. Mar14,2018at22:20 4 ThisansweristhereasonwhyIcameuptothisquestion!IcreatmybashscriptsinWindowsandexperiencealotofproblemswhenpublishingthosescriptstoLinux!Samethingwithjasonfiles. – TonoNam Jul2,2019at14:43 4 IwishIcouldvotethisanswerupaboutfiftytimes.Ialsowanttoaddthatatthispoint,UTF-8haswonthestandardswar,andnearlyalltextbeingproducedontheInternetisUTF-8.Someofthemostpopularprogramminglanguages(suchasC#andJava)useUTF-16internally,butwhenprogrammersusingthoselanguageswritefilestooutputstreams,theyalmostalwaysencodethemasUTF-8.Therefore,itnolongermakessensetohaveaBOMtomarkaUTF-8file;UTF-8shouldbethedefaultyouusewhenreading,andonlytryotherencodingsifUTF-8decodingfails. – rmunn Aug23,2019at1:56  |  Show8morecomments 51 What'sdifferentbetweenUTF-8andUTF-8withoutBOM? Shortanswer:InUTF-8,aBOMisencodedasthebytesEFBBBFatthebeginningofthefile. Longanswer: Originally,itwasexpectedthatUnicodewouldbeencodedinUTF-16/UCS-2.TheBOMwasdesignedforthisencodingform.Whenyouhave2-bytecodeunits,it'snecessarytoindicatewhichorderthosetwobytesarein,andacommonconventionfordoingthisistoincludethecharacterU+FEFFasa"ByteOrderMark"atthebeginningofthedata.ThecharacterU+FFFEispermanentlyunassignedsothatitspresencecanbeusedtodetectthewrongbyteorder. UTF-8hasthesamebyteorderregardlessofplatformendianness,soabyteordermarkisn'tneeded.However,itmayoccur(asthebytesequenceEFBBFF)indatathatwasconvertedtoUTF-8fromUTF-16,orasa"signature"toindicatethatthedataisUTF-8. Whichisbetter? Without.AsMartinCoteanswered,theUnicodestandarddoesnotrecommendit.Itcausesproblemswithnon-BOM-awaresoftware. AbetterwaytodetectwhetherafileisUTF-8istoperformavaliditycheck.UTF-8hasstrictrulesaboutwhatbytesequencesarevalid,sotheprobabilityofafalsepositiveisnegligible.IfabytesequencelookslikeUTF-8,itprobablyis. Share Improvethisanswer Follow editedMay6,2015at19:27 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJul31,2010at22:53 dan04dan04 84.4k2323goldbadges160160silverbadges192192bronzebadges 6 8 thiswouldalsoinvalidatevalidUTF-8withasingleerroneousbyteinit,though:/ – endolith Jul15,2012at1:05 9 -1re"Itcausesproblemswithnon-BOM-awaresoftware.",that'sneverbeenaproblemforme,butonthecontrary,thatabsenceofBOMcausesproblemswithBOM-awaresoftware(inparticularVisualC++)hasbeenaproblem.Sothisstatementisveryplatform-specific,anarrowUnix-landpointofview,butismisleadinglypresentedasifitappliesingeneral.Whichitdoesnot. – Cheersandhth.-Alf Jun18,2014at14:46 6 No,UTF-8hasnoBOM.Thisanswerisincorrect.SeetheUnicodeStandard. – tchrist Oct1,2014at22:35 2 YoucaneventhinkyouhaveapureASCIIfilewhenjustlookingatthebytes.Butthiscouldbeautf-16fileaswellwhereyou'dhavetolookatwordsandnotatbytes.ModernsofwareshouldbeawareaboutBOMs.Stillreadingutf-8canfailifdetectinginvalidsequences,codepointsthatcanuseasmallersequenceorcodepointsthataresurrogates.Forutf-16readingmightfailtoowhenthereareorphanedsurrogates. – brighty Feb9,2015at16:56 2 @Alf,Idisagreewithyourinterpretationofanon-BOMattitudeas"platform-specific,anarrowUnix-landpointofview."Tome,theonlywaythatthenarrow-mindednesscouldliewith"Unixland"wereifMSandVisualC++camebefore*NIX,whichtheydidn't.ThefactthatMS(Iassumeknowingly)startedusingaBOMinUTF-8ratherthanUTF-16suggeststomethattheypromotedbreakingsh,perl,g++,andmanyotherfreeandpowerfultools.Wantthingstowork?JustbuytheMSversions.MScreatedtheplatform-specificproblem,justlikethedisasteroftheir\x80-\x95range. – bballdave025 Jan17,2020at23:17  |  Show1morecomment 38 UTF-8withBOMisbetteridentified.Ihavereachedthisconclusionthehardway.IamworkingonaprojectwhereoneoftheresultsisaCSVfile,includingUnicodecharacters. IftheCSVfileissavedwithoutaBOM,Excelthinksit'sANSIandshowsgibberish.Onceyouadd"EFBBBF"atthefront(forexample,byre-savingitusingNotepadwithUTF-8;orNotepad++withUTF-8withBOM),Excelopensitfine. PrependingtheBOMcharactertoUnicodetextfilesisrecommendedbyRFC3629:"UTF-8,atransformationformatofISO10646",November2003 athttps://www.rfc-editor.org/rfc/rfc3629(thislastinfofoundat:http://www.herongyang.com/Unicode/Notepad-Byte-Order-Mark-BOM-FEFF-EFBBBF.html) Share Improvethisanswer Follow editedOct7,2021at5:46 CommunityBot 111silverbadge answeredJun28,2012at17:34 HelenCraigmanHelenCraigman 1,40533goldbadges1515silverbadges2424bronzebadges 8 6 ThanksforthisexcellenttipincaseoneiscreatingUTF-8filesforusebyExcel.Inothercircumstancesthough,IwouldstillfollowtheotheranswersandskiptheBOM. – barfuin May7,2013at19:20 5 It'salsousefulifyoucreatefilesthatcontainonlyASCIIandlatermayhavenon-asciiaddedtoit.Ihavejustranintosuchanissue:softwarethatexpectsutf8,createsfilewithsomedataforuserediting.IftheinitialfilecontainsonlyASCII,isopenedinsomeeditorsandthensaved,itendsupinlatin-1andeverythingbreaks.IfIaddtheBOM,itwillgetdetectedasUTF8bytheeditorandeverythingworks. – RobertoAlsina Sep9,2013at22:03 1 IhavefoundmultipleprogrammingrelatedtoolswhichrequiretheBOMtoproperlyrecogniseUTF-8filescorrectly.VisualStudio,SSMS,SoureTree.... – kjbartel Jan27,2015at13:24 7 WheredoyoureadarecommendationforusingaBOMintothatRFC?Atmost,there'sastrongrecommendationtonotforbiditundercertaincircumstanceswheredoingsoisdifficult. – Deduplicator Aug11,2015at18:37 13 Excelthinksit'sANSIandshowsgibberishthentheproblemisinExcel. – user8017719 Nov26,2016at8:10  |  Show3morecomments 17 Question:What'sdifferentbetweenUTF-8andUTF-8withoutaBOM?Whichisbetter? HerearesomeexcerptsfromtheWikipediaarticleonthebyteordermark(BOM)thatIbelieveofferasolidanswertothisquestion. OnthemeaningoftheBOMandUTF-8: TheUnicodeStandardpermitstheBOMinUTF-8,butdoesnotrequire orrecommenditsuse.ByteorderhasnomeaninginUTF-8,soits onlyuseinUTF-8istosignalatthestartthatthetextstreamis encodedinUTF-8. ArgumentforNOTusingaBOM: TheprimarymotivationfornotusingaBOMisbackwards-compatibility withsoftwarethatisnotUnicode-aware...Anothermotivationfornot usingaBOMistoencourageUTF-8asthe"default"encoding. ArgumentFORusingaBOM: TheargumentforusingaBOMisthatwithoutit,heuristicanalysisis requiredtodeterminewhatcharacterencodingafileisusing. Historicallysuchanalysis,todistinguishvarious8-bitencodings,is complicated,error-prone,andsometimesslow.Anumberoflibraries areavailabletoeasethetask,suchasMozillaUniversalCharset DetectorandInternationalComponentsforUnicode. ProgrammersmistakenlyassumethatdetectionofUTF-8isequally difficult(itisnotbecauseofthevastmajorityofbytesequences areinvalidUTF-8,whiletheencodingstheselibrariesaretryingto distinguishallowallpossiblebytesequences).Thereforenotall Unicode-awareprogramsperformsuchananalysisandinsteadrelyon theBOM. Inparticular,Microsoftcompilersandinterpreters,andmany piecesofsoftwareonMicrosoftWindowssuchasNotepadwillnot correctlyreadUTF-8textunlessithasonlyASCIIcharactersorit startswiththeBOM,andwilladdaBOMtothestartwhensavingtext asUTF-8.GoogleDocswilladdaBOMwhenaMicrosoftWorddocumentis downloadedasaplaintextfile. Onwhichisbetter,WITHorWITHOUTtheBOM: TheIETFrecommendsthatifaprotocoleither(a)alwaysusesUTF-8, or(b)hassomeotherwaytoindicatewhatencodingisbeingused, thenit“SHOULDforbiduseofU+FEFFasasignature.” MyConclusion: UsetheBOMonlyifcompatibilitywithasoftwareapplicationisabsolutelyessential. AlsonotethatwhilethereferencedWikipediaarticleindicatesthatmanyMicrosoftapplicationsrelyontheBOMtocorrectlydetectUTF-8,thisisnotthecaseforallMicrosoftapplications.Forexample,aspointedoutby@barlop,whenusingtheWindowsCommandPromptwithUTF-8†,commandssuchtypeandmoredonotexpecttheBOMtobepresent.IftheBOMispresent,itcanbeproblematicasitisforotherapplications. †ThechcpcommandofferssupportforUTF-8(withouttheBOM)viacodepage65001. Share Improvethisanswer Follow editedMar4,2018at1:16 answeredOct2,2014at20:24 DavidRRDavidRR 17.3k2121goldbadges105105silverbadges180180bronzebadges 4 5 I'dbettertostricttoWITHOUTtheBOM.Ifoundthat.htaccessandgzipcompressionincombinationwithUTF-8BOMgivesanencodingerrorChangetoEncodinginUTF-8withoutBOMfollowtoasuggestionasexplainedheresolvetheproblems – eQ19 Apr16,2015at15:09 1 'AnothermotivationfornotusingaBOMistoencourageUTF-8asthe"default"encoding.'--Whichissostrong&validanargument,thatyoucouldhaveactuallystoppedtheanswerthere!...;-oUnlessyougotabetterideaforuniversaltextrepresentation,thatis.;)(Idon'tknowhowoldyouare,howmanyyearsyouhadtosufferinthepre-UTF8era(whenlinguistsdesperatelyconsideredevenchangingtheiralphabets),butIcantellyouthateverysecondwegetclosertoriddingthemessofalltheancientsingle-byte-with-no-metadataencodings,insteadofhaving"theone"ispurejoy.) – Sz. Mar14,2018at22:41 SeealsothiscommentabouthowaddingaBOM(oranything!)tothesimplestofthetextfileformats,"plaintext",wouldmeanpreventingexactlythebestuniversaltextencodingformatfrombeing"plain",and"simple"(i.e."overheadless")!... – Sz. Mar14,2018at22:58 BOMismostlyproblematiconLinuxbecausemanyutilitiesdonotreallysupportUnicodetobeginwith(theywillhappilytruncateinthemiddleofcodepointsforinstance).Formostothermodernsoftwareenvironment,useBOMwhenevertheencodingisnotunambiguous(throughspecsormetadata). – EricGrange Aug23,2019at7:58 Addacomment  |  16 BOMtendstoboom(nopunintended(sic))somewhere,someplace.Andwhenitbooms(forexample,doesn'tgetrecognizedbybrowsers,editors,etc.),itshowsupastheweirdcharactersatthestartofthedocument(forexample,HTMLfile,JSONresponse,RSS,etc.)andcausesthekindofembarrassmentsliketherecentencodingissueexperiencedduringthetalkofObamaonTwitter. It'sveryannoyingwhenitshowsupatplaceshardtodebugorwhentestingisneglected.Soit'sbesttoavoiditunlessyoumustuseit. Share Improvethisanswer Follow editedMay6,2015at19:28 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJul11,2011at7:56 HalilÖzgürHalilÖzgür 15.4k55goldbadges4848silverbadges5656bronzebadges 5 Yes,justspenthoursidentifyingaproblemcausedbyafilebeingencodedasUTF-8insteadofUTF-8withoutBOM.(TheissueonlyshowedupinIE7sothatledmeonaquiteagoosechase.IusedDjango's"include".) – user984003 Jan31,2013at20:45 Futurereaders:NotethatthetweetissueI'vementionedabovewasnotstrictlyrelatedtoBOM,butifitwas,thenthetweetwouldbegarbledinasimilarway,butatthestartofthetweet. – HalilÖzgür Feb1,2013at7:26 13 @user984003No,theproblemisthatMicrosofthasmisleadyou.WhatitcallsUTF-8isnotUTF-8.WhatitcallsUTF-8withoutBOMiswhatUTF-8reallyis. – tchrist Oct2,2014at0:11 whatdoesthe"sic"addtoyour"nopunintended" – JoelFan Oct23,2017at21:15 2 @JoelFanIcan'trecallanymorebutIguessthepunmighthavebeenintendeddespitetheauthor'sclaim:) – HalilÖzgür Oct23,2017at21:34 Addacomment  |  15 Thisquestionalreadyhasamillion-and-oneanswersandmanyofthemarequitegood,butIwantedtotryandclarifywhenaBOMshouldorshouldnotbeused. Asmentioned,anyuseoftheUTFBOM(ByteOrderMark)indeterminingwhetherastringisUTF-8ornotiseducatedguesswork.Ifthereispropermetadataavailable(likecharset="utf-8"),thenyoualreadyknowwhatyou'resupposedtobeusing,butotherwiseyou'llneedtotestandmakesomeassumptions.Thisinvolvescheckingwhetherthefileastringcomesfrombeginswiththehexadecimalbytecode,EFBBBF. IfabytecodecorrespondingtotheUTF-8BOMisfound,theprobabilityishighenoughtoassumeit'sUTF-8andyoucangofromthere.Whenforcedtomakethisguess,however,additionalerrorcheckingwhilereadingwouldstillbeagoodideaincasesomethingcomesupgarbled.YoushouldonlyassumeaBOMisnotUTF-8(i.e.latin-1orANSI)iftheinputdefinitelyshouldn'tbeUTF-8basedonitssource.IfthereisnoBOM,however,youcansimplydeterminewhetherit'ssupposedtobeUTF-8byvalidatingagainsttheencoding. WhyisaBOMnotrecommended? Non-Unicode-awareorpoorlycompliantsoftwaremayassumeit'slatin-1orANSIandwon'tstriptheBOMfromthestring,whichcanobviouslycauseissues. It'snotreallyneeded(justcheckifthecontentsarecompliantandalwaysuseUTF-8asthefallbackwhennocompliantencodingcanbefound) WhenshouldyouencodewithaBOM? Ifyou'reunabletorecordthemetadatainanyotherway(throughacharsettagorfilesystemmeta),andtheprogramsbeingusedlikeBOMs,youshouldencodewithaBOM.ThisisespeciallytrueonWindowswhereanythingwithoutaBOMisgenerallyassumedtobeusingalegacycodepage.TheBOMtellsprogramslikeOfficethat,yes,thetextinthisfileisUnicode;here'stheencodingused. Whenitcomesdowntoit,theonlyfilesIeverreallyhaveproblemswithareCSV.Dependingontheprogram,iteithermust,ormustnothaveaBOM.Forexample,ifyou'reusingExcel2007+onWindows,itmustbeencodedwithaBOMifyouwanttoopenitsmoothlyandnothavetoresorttoimportingthedata. Share Improvethisanswer Follow editedApr16,2020at23:37 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJan25,2016at16:03 jpc-aejpc-ae 15111silverbadge55bronzebadges 1 7 Thelastsectionofyouransweris100%correct:theonlyreasontouseaBOMiswhenyouhavetointeroperatewithbuggysoftwarethatdoesn'tuseUTF-8asitsdefaulttoparseunknownfiles. – rmunn Aug23,2019at2:01 Addacomment  |  8 UTF-8withoutBOMhasnoBOM,whichdoesn'tmakeitanybetterthanUTF-8withBOM,exceptwhentheconsumerofthefileneedstoknow(orwouldbenefitfromknowing)whetherthefileisUTF-8-encodedornot. TheBOMisusuallyusefultodeterminetheendiannessoftheencoding,whichisnotrequiredformostusecases. Also,theBOMcanbeunnecessarynoise/painforthoseconsumersthatdon'tknoworcareaboutit,andcanresultinuserconfusion. Share Improvethisanswer Follow editedFeb8,2010at18:42 answeredFeb8,2010at18:30 RomainRomain 12.4k33goldbadges3737silverbadges5454bronzebadges 3 2 "whichhasnouseforUTF-8asitis8-bitsperglyphanyway."Er...no,onlyASCII-7glyphsare8-bitsinUTF-8.Anythingbeyondthatisgoingtobe16,24,or32bits. – Powerlord Feb8,2010at18:38 4 "TheBOMisusuallyusefultodeterminetheendiannessoftheencoding,whichisnotrequiredformostusecases."...endiannesssimplydoesnotapplytoUTF-8,regardlessofusecase – JoelFan Oct23,2017at21:30 aconsumerthatneedstoknowisbrokenbydesign,. – Jasen Aug9,2020at8:38 Addacomment  |  8 ItshouldbenotedthatforsomefilesyoumustnothavetheBOMevenonWindows.ExamplesareSQL*plusorVBScriptfiles.IncasesuchfilescontainsaBOMyougetanerrorwhenyoutrytoexecutethem. Share Improvethisanswer Follow editedAug11,2015at18:43 Deduplicator 43.7k66goldbadges6262silverbadges110110bronzebadges answeredJan31,2015at21:09 WernfriedDomscheitWernfriedDomscheit 48.4k77goldbadges6666silverbadges9696bronzebadges Addacomment  |  7 QuotedatthebottomoftheWikipediapageonBOM:http://en.wikipedia.org/wiki/Byte-order_mark#cite_note-2 "UseofaBOMisneitherrequirednorrecommendedforUTF-8,butmaybeencounteredincontextswhereUTF-8dataisconvertedfromotherencodingformsthatuseaBOMorwheretheBOMisusedasaUTF-8signature" Share Improvethisanswer Follow answeredFeb8,2010at18:35 pibpib 3,2731717silverbadges1515bronzebadges 1 2 DoyouhaveanyexamplewheresoftwaremakesadecisionofwhethertouseUTF-8with/withoutBOM,basedonwhetherthepreviousencodingitisencodingfrom,hadaBOMornot?!Thatseemslikeanabsurdclaim – barlop Mar3,2018at15:31 Addacomment  |  7 UTF-8withBOMonlyhelpsifthefileactuallycontainssomenon-ASCIIcharacters.Ifitisincludedandtherearen'tany,thenitwillpossiblybreakolderapplicationsthatwouldhaveotherwiseinterpretedthefileasplainASCII.TheseapplicationswilldefinitelyfailwhentheycomeacrossanonASCIIcharacter,soinmyopiniontheBOMshouldonlybeaddedwhenthefilecan,andshould,nolongerbeinterpretedasplainASCII. IwanttomakeitclearthatIprefertonothavetheBOMatall.Additinifsomeoldrubbishbreakswithoutit,andreplacingthatlegacyapplicationisnotfeasible. Don'tmakeanythingexpectaBOMforUTF-8. Share Improvethisanswer Follow editedApr16,2020at23:15 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJul3,2014at2:43 JamesWakefieldJamesWakefield 52633silverbadges1010bronzebadges 2 1 it'snotcertainthatnonUTF8-awareapplicationswillfailiftheyencounterUTF8,thewholepointofUTF8isthatmanythingswilljustworkwc(1)willgiveacorrectlineandoctetcount,andacorrectwordcountifnounicode-onlyspacingcharactersareused. – Jasen Aug9,2020at8:37 Iagreewithyou@Jasen.TryingtoworkoutifIjustdeletethisoldanswer.Mycurrentopinionisthattheanswerissimplydon'taddaBOM.Theendusercanappendoneiftheyhavetohackafiletomakeitworkwitholdsoftware.Weshouldn'tmakesoftwarethatperpetuatesthisincorrectbehaviour.Thereisnoreasonwhyafilecouldn'tstartwithazero-width-non-joinerthatismeanttobeinterpretedasone. – JamesWakefield Dec16,2021at4:31 Addacomment  |  6 Ilookatthisfromadifferentperspective.IthinkUTF-8withBOMisbetterasitprovidesmoreinformationaboutthefile.IuseUTF-8withoutBOMonlyifIfaceproblems. Iamusingmultiplelanguages(evenCyrillic)onmypagesforalongtimeandwhenthefilesaresavedwithoutBOMandIre-openthemforeditingwithaneditor(ascherouvimalsonoted),somecharactersarecorrupted. NotethatWindows'classicNotepadautomaticallysavesfileswithaBOMwhenyoutrytosaveanewlycreatedfilewithUTF-8encoding. Ipersonallysaveserversidescriptingfiles(.asp,.ini,.aspx)withBOMand.htmlfileswithoutBOM. Share Improvethisanswer Follow editedMay23,2017at11:55 CommunityBot 111silverbadge answeredMay11,2012at8:34 user1358065user1358065 10311silverbadge44bronzebadges 5 4 ThanksfortheexcellenttipaboutwindowsclassicNotepad.Ialreadyspentsometimefindingouttheexactsamething.MyconsequencewastoalwaysuseNotepad++insteadofwindowsclassicNotepad.:-) – barfuin May7,2013at19:22 Youbetterusemadedit.It'stheonlyEditorthat-inhexmode-showsonecharacterifyouselectautf-8bytesequenceinsteadofa1:1Basisbetweenbyteandcharacter.Ahex-EditorthatisawareaboutaUTF-8fileshouldbevavelikemadeditdoes! – brighty Feb9,2015at16:49 @brightyIdon'tthinkyouneedonetooneforthesakeoftheBOM.itdoesn'tmatter,itdoesn'ttakemuchtorecogniseautf-8BOMisefbbbforfffe(offffeifreadwrong).Onecansimplydeletethosebytes.It'snotbadthoughtohaveamappingfortherestofthefilethough,buttoalsobeabletodeletebytebybytetoo – barlop Mar3,2018at15:34 @barlopWhywouldyouwanttodeleteautf-8BOMifthefile'scontentisutf-8encoded?TheBOMisrecognizedbymodernTextViewers,TextControlsaswellasTextEditors.Aonetooneviewofautf-8sequencemakesnosense,sincenbytesresultinonecharacter.Ofcourseatext-editororhex-editorshouldallowtodeleteanybyte,butthiscanleadtoinvalidutf-8sequences. – brighty Mar4,2018at16:41 @brightyutf-8withbomisanencoding,andutf-8withoutbomisanencoding.Thecmdpromptusesutf8withoutbom..soifyouhaveautf8file,yourunthecommandchcp65001forutf8support,it'sutf8withoutbom.Ifyoudotypemyfileitwillonlydisplayproperlyifthereisnobom.Ifyoudoechoaaa>a.aorechoאאא>a.atooutputthecharstofilea.a,andyouhavechcp65001,itwilloutputwithnoBOM. – barlop Mar5,2018at4:55 Addacomment  |  6 WhenyouwanttodisplayinformationencodedinUTF-8youmaynotfaceproblems.DeclareforexampleanHTMLdocumentasUTF-8andyouwillhaveeverythingdisplayedinyourbrowserthatiscontainedinthebodyofthedocument. Butthisisnotthecasewhenwehavetext,CSVandXMLfiles,eitheronWindowsorLinux. Forexample,atextfileinWindowsorLinux,oneoftheeasiestthingsimaginable,itisnot(usually)UTF-8. SaveitasXMLanddeclareitasUTF-8: Itwillnotdisplay(itwillnotbeberead)correctly,evenifit'sdeclaredasUTF-8. IhadastringofdatacontainingFrenchletters,thatneededtobesavedasXMLforsyndication.WithoutcreatingaUTF-8filefromtheverybeginning(changingoptionsinIDEand"CreateNewFile")oraddingtheBOMatthebeginningofthefile $file="\xEF\xBB\xBF".$string; IwasnotabletosavetheFrenchlettersinanXMLfile. Share Improvethisanswer Follow editedMay6,2015at19:33 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredSep10,2012at16:50 FlorinSimaFlorinSima 1,4791616silverbadges1313bronzebadges 1 4 Iknowthisisanoldanswer,butIjustwanttomentionthatit'swrong.TextfilesonLinux(can'tspeakforotherUnixes)usually/are/UTF-8. – Functino Nov14,2015at23:41 Addacomment  |  6 OnepracticaldifferenceisthatifyouwriteashellscriptforMac OS XandsaveitasplainUTF-8,youwillgettheresponse: #!/bin/bash:Nosuchfileordirectory inresponsetotheshebanglinespecifyingwhichshellyouwishtouse: #!/bin/bash IfyousaveasUTF-8,noBOM(sayinBBEdit)allwillbewell. Share Improvethisanswer Follow editedMay6,2015at19:46 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJan24,2014at20:38 DavidDavid 9981212silverbadges2121bronzebadges 1 10 That’sbecauseMicrosofthasswappedthemeaningofwhatthestandardsays.UTF-8hasnoBOM:theyhavecreatedMicrosoftUTF-8whichinsertsaspuriousBOMinfrontofthedatastreamandthentoldyouthatno,thisisactuallyUTF-8.Itisnot.Itisjustextendingandcorrupting. – tchrist Oct2,2014at0:14 Addacomment  |  5 TheUnicodeByteOrderMark(BOM)FAQprovidesaconciseanswer: Q:HowIshoulddealwithBOMs? A:Herearesomeguidelinestofollow: Aparticularprotocol(e.g.Microsoftconventionsfor.txtfiles)mayrequireuseoftheBOMoncertainUnicodedatastreams,suchas files.Whenyouneedtoconformtosuchaprotocol,useaBOM. SomeprotocolsallowoptionalBOMsinthecaseofuntaggedtext.Inthosecases, Whereatextdatastreamisknowntobeplaintext,butofunknownencoding,BOMcanbeusedasasignature.IfthereisnoBOM, theencodingcouldbeanything. WhereatextdatastreamisknowntobeplainUnicodetext(butnotwhichendian),thenBOMcanbeusedasasignature.Ifthere isnoBOM,thetextshouldbeinterpretedasbig-endian. SomebyteorientedprotocolsexpectASCIIcharactersatthebeginningofafile.IfUTF-8isusedwiththeseprotocols,useofthe BOMasencodingformsignatureshouldbeavoided. Wheretheprecisetypeofthedatastreamisknown(e.g.Unicodebig-endianorUnicodelittle-endian),theBOMshouldnotbeused.In particular,wheneveradatastreamisdeclaredtobeUTF-16BE, UTF-16LE,UTF-32BEorUTF-32LEaBOMmustnotbeused. Share Improvethisanswer Follow answeredMar8,2018at13:58 WernfriedDomscheitWernfriedDomscheit 48.4k77goldbadges6666silverbadges9696bronzebadges 0 Addacomment  |  4 Asmentionedabove,UTF-8withBOMmaycauseproblemswithnon-BOM-aware(orcompatible)software.IonceeditedHTMLfilesencodedasUTF-8+BOMwiththeMozilla-basedKompoZer,asaclientrequiredthatWYSIWYGprogram. Invariablythelayoutwouldgetdestroyedwhensaving.Ittookmysometimetofiddlemywayaroundthis.ThesefilesthenworkedwellinFirefox,butshowedaCSSquirkinInternetExplorerdestroyingthelayout,again.AfterfiddlingwiththelinkedCSSfilesforhourstonoavailIdiscoveredthatInternet Explorerdidn'tliketheBOMfedHTMLfile.Neveragain. Also,IjustfoundthisinWikipedia: TheshebangcharactersarerepresentedbythesametwobytesinextendedASCIIencodings,includingUTF-8,whichiscommonlyusedforscriptsandothertextfilesoncurrentUnix-likesystems.However,UTF-8filesmaybeginwiththeoptionalbyteordermark(BOM);ifthe"exec"functionspecificallydetectsthebytes0x230x21,thenthepresenceoftheBOM(0xEF0xBB0xBF)beforetheshebangwillpreventthescriptinterpreterfrombeingexecuted.SomeauthoritiesrecommendagainstusingthebyteordermarkinPOSIX(Unix-like)scripts,[15]forthisreasonandforwiderinteroperabilityandphilosophicalconcerns Share Improvethisanswer Follow editedMay6,2015at19:44 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJun22,2013at4:56 MarekMöhlingMarekMöhling 13288bronzebadges Addacomment  |  3 Fromhttp://en.wikipedia.org/wiki/Byte-order_mark: Thebyteordermark(BOM)isaUnicode characterusedtosignalthe endianness(byteorder)ofatextfile orstream.ItscodepointisU+FEFF. BOMuseisoptional,and,ifused, shouldappearatthestartofthetext stream.Beyonditsspecificuseasa byte-orderindicator,theBOM charactermayalsoindicatewhichof theseveralUnicoderepresentations thetextisencodedin. AlwaysusingaBOMinyourfilewillensurethatitalwaysopenscorrectlyinaneditorwhichsupportsUTF-8andBOM. MyrealproblemwiththeabsenceofBOMisthefollowing.Supposewe'vegotafilewhichcontains: abc WithoutBOMthisopensasANSIinmosteditors.Soanotheruserofthisfileopensitandappendssomenativecharacters,forexample: abg-αβγ Oops...NowthefileisstillinANSIandguesswhat,"αβγ"doesnotoccupy6bytes,but3.ThisisnotUTF-8andthiscausesotherproblemslateroninthedevelopmentchain. Share Improvethisanswer Follow editedMay6,2015at19:23 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredFeb8,2010at18:31 cherouvimcherouvim 31.4k1515goldbadges102102silverbadges151151bronzebadges 6 10 AnensurethatspuriousbytesappearinthebeginningofnonBOM-awaresoftware.Yay. – Romain Feb8,2010at18:33 1 @RomainMuller:e.g.PHP5willthrow"impossible"errorswhenyoutrytosendheadersaftertheBOM. – Piskvorleftthebuilding Feb8,2010at18:47 5 αβγisnotascii,butcanappearin8bit-ascii-bassedencodings.TheuseofaBOMdisablesabenafitofutf-8,itscompatabilitywithascii(abilitytoworkwithlagacyapplicationswherepureasciiisused). – ctrl-alt-delor Jan7,2011at13:03 1 Thisisthewronganswer.AstringwithaBOMinfrontofitissomethingelsealtogether.Itisnotsupposedtobethereandjustscrewseverythingup. – tchrist Oct2,2014at0:13 WithoutBOMthisopensasANSIinmosteditors.Iagreeabsolutely.Ifthishappensyou'reluckyifyoudealwiththecorrectCodepagebutindeedit'sjustaguess,becausetheCodepageisnotpartofthefile.ABOMis. – brighty Feb9,2015at16:59  |  Show1morecomment 1 HereismyexperiencewithVisualStudio,SourcetreeandBitbucketpullrequests,whichhasbeengivingmesomeproblems: SoitturnsoutBOMwithasignaturewillincludeareddotcharacteroneachfilewhenreviewingapullrequest(itcanbequiteannoying). Ifyouhoveronit,itwillshowacharacterlike"ufeff",butitturnsoutSourcetreedoesnotshowthesetypesofbytemarks,soitwillmostlikelyendupinyourpullrequests,whichshouldbeokbecausethat'showVisual Studio 2017encodesnewfilesnow,somaybeBitbucketshouldignorethisormakeitshowinanotherway,moreinfohere: ReddotmarkerBitBucketdiffview Share Improvethisanswer Follow editedApr16,2020at23:47 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredJul31,2019at9:30 LeoLeo 92077silverbadges2323bronzebadges Addacomment  |  0 Isaveaautohotkeyfilewithutf-8,thechinesecharactersbecomestrang. Withutf-8BOM,worksfine. AutoHotkeywillnotautomaticallyrecognizeaUTF-8fileunlessitbeginswithabyteordermark. https://www.autohotkey.com/docs/FAQ.htm#nonascii Share Improvethisanswer Follow answeredMay8at3:41 GoodPenGoodPen 53555silverbadges77bronzebadges Addacomment  |  -4 UTFwithaBOMisbetterifyouuseUTF-8inHTMLfilesandifyouuseSerbianCyrillic,SerbianLatin,German,Hungarianorsomeexoticlanguageonthesamepage. Thatismyopinion(30yearsofcomputingandITindustry). Share Improvethisanswer Follow editedApr16,2020at23:11 PeterMortensen 30.6k2121goldbadges102102silverbadges124124bronzebadges answeredMar15,2013at10:01 user2173444user2173444 19 3 1 Ifindthistobetrueaswell.Ifyouusecharactersoutsideofthefirst255ASCIIsetandyouomittheBOM,browsersinterpretitasISO-8859-1andyougetgarbledcharacters.Giventheanswersabove,thisisapparentlyonthebrowser-vendorsdoingthewrongthingwhentheydon'tdetectaBOM.ButunlessyouworkatMicrosoftEdge/Mozilla/Webkit/Blink,youhavenochoicebutworkwiththedefectstheseappshave. – asontu Nov28,2017at8:42 UTFwhat?UTF-8?UTF-16?Somethingelse? – PeterMortensen Apr16,2020at23:12 Ifyourserverdoesntindocatethecorrectmimetypecharsetparameteryoushouldusethevs 244 REerror:illegalbytesequenceonMacOSX 233 WritetoUTF-8fileinPython 112 HTMLforthePausesymbolinaudioandvideocontrol 141 PowerShellScripttoFindandReplaceforallFileswithaSpecificExtension 92 Areshellscriptssensitivetoencodingandlineendings? 64 Whywon'tmydocker-entrypoint.shexecute? Seemorelinkedquestions Related 1322 UTF-8allthewaythrough 609 UTF-8,UTF-16,andUTF-32 674 WhatisthedifferencebetweenUTF-8andUnicode? 1251 What'sthedifferencebetweenutf8_general_ciandutf8_unicode_ci? 475 WhatareUnicode,UTF-8,andUTF-16? 329 UsingPowerShelltowriteafileinUTF-8withouttheBOM 765 SavingUTF-8textswithjson.dumpsasUTF-8,notasa\uescapesequence 534 What'sthedifferencebetweenASCIIandUnicode? 466 Whatisthedifferencebetweenutf8mb4andutf8charsetsinMySQL? HotNetworkQuestions Whyare"eat"and"drink"differentwordsinlanguages? Determinethelengthoftherestofamathdisplaylineformultlined WhydopeopleinsistonusingTikzwhentheycanusesimplerdrawingtools? LaTeX2(e)vsLaTeX3 ElectronicCircuitsforSafeInitiationofPyrotechnics? keyless/flatkeyboard Wouldatraitthat'sgeneticshave"circulardominance"beplausible? CanNewton'sFirstLawbetreatedasaformofbias? InD&D3.5,canafamiliarbetemporarilydismissed? Workplaceidiomfor"beiGelegenheit"-ordertodoeventually,butdonotprovidepriority WhydoNorthandSouthAmericancountriesoffercitizenshipbasedonunrestrictedJusSoli(rightofsoil)? Supposethat(𝑋,𝑑)isacompletemetricspace.Showthatthereisnoopen,continuousfunction𝑓:𝑋→ℚ Whyarefighterjetssoloudwhendoingslowflight? WhydidGodprohibitwearingofgarmentsofdifferentmaterialsinLeviticus19:19? Shouldselectedoptionsberemovedfromsingle-andmulti-selectdropdownlists? WhyareRussiancombatantsinUkraineconsideredsoldiersratherthanterrorists? I2C(TWI)vsSPIEMInoiseresistance WhathappenswhenthequasarremnantsreachEarthin3millionyears? Unknownnotation:squarebrackets,triangles,andnumbers Howtosimplifyapurefunction? HowcanIuseWindowstocreateanOSXYosemiteUSBflashdriveinstallerfromthediskimage(.dmg)filedownloadedfromApple? IsthematrixinducedL1-normgreaterthantheinducedL2-norm? InD&D3.5,whathappenswhenyouplopaheadbandofintellectonananimal? DotheseresultsmeanthatIhavefoundthisexoplanet? morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings  



請為這篇文章評分?