UTF-8 vs UTF-8 with BOM - Super User
文章推薦指數: 80 %
The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in ... SuperUserisaquestionandanswersiteforcomputerenthusiastsandpowerusers.Itonlytakesaminutetosignup. Signuptojointhiscommunity Anybodycanaskaquestion Anybodycananswer Thebestanswersarevotedupandrisetothetop Home Public Questions Tags Users Companies Unanswered Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams UTF-8vsUTF-8withBOM AskQuestion Asked 2years,4monthsago Modified 3monthsago Viewed 10ktimes 4 ThelatestNotepad.exehasaSaveasUTF-8andUTF-8withBOM. IsUTF-8withBOMtheoldUTF?WhatisUTF-8now? windows-10notepad Share Improvethisquestion Follow askedMay21,2020at2:38 OldGeezerOldGeezer 1,15166goldbadges1616silverbadges3737bronzebadges 3 1 Differentsitebutsamequestionansweredhere:stackoverflow.com/questions/2223882/… – MC10 May21,2020at2:49 1 Thisansweralsoanswersthat.Noneedforthedownvoteeither;goodquestionforthissiteaswell. – Giacomo1968 May21,2020at3:28 docs.microsoft.com/en-us/windows/win32/api/winbase/… – Mark May21,2020at5:45 Addacomment | 2Answers 2 Sortedby: Resettodefault Highestscore(default) Datemodified(newestfirst) Datecreated(oldestfirst) 7 UTF-8isUTF-8regardlessofwhetheraBOMexists. SavingafilewithaBOM(byteordermark)isnotreallyneededforUTF-8. ThefactthatNotepadallowsthesavingoffilesin“UTF-8”or“UTF-8withBOM”seemstobeanoptionthatexiststoallowflexibilityincaseswhereaBOM(byteordermark)isneeded.Butingeneral,justsavingthefilewithoutaBOM—meaningplainUTF-8—isreallythebestwaytohandletextfileswithUTF-8content. AsexplainedontheWikipediapageforbyteordermark: “BOMuseisoptional.ItspresenceinterfereswiththeuseofUTF-8bysoftwarethatdoesnotexpectnon-ASCIIbytesatthestartofafilebutthatcouldotherwisehandlethetextstream.” Andthearticledelvesdeeperintoitbystatingthefollowing;boldemphasisismine: “TheUTF-8representationoftheBOMisthe(hexadecimal)bytesequence0xEF,0xBB,0xBF. TheUnicodeStandardpermitstheBOMinUTF-8,butdoesnotrequireorrecommenditsuse.ByteorderhasnomeaninginUTF-8,soitsonlyuseinUTF-8istosignalatthestartthatthetextstreamisencodedinUTF-8,orthatitwasconvertedtoUTF-8fromastreamthatcontainedanoptionalBOM.ThestandardalsodoesnotrecommendremovingaBOMwhenitisthere,sothatround-trippingbetweenencodingsdoesnotloseinformation,andsothatcodethatreliesonitcontinuestowork.TheIETFrecommendsthatifaprotocoleither(a)alwaysusesUTF-8,or(b)hassomeotherwaytoindicatewhatencodingisbeingused,thenit"SHOULDforbiduseofU+FEFFasasignature." NotusingaBOMallowstexttobebackwards-compatiblewithsomesoftwarethatisnotUnicode-aware.Examplesincludeprogramminglanguagesthatpermitnon-ASCIIbytesinstringliteralsbutnotatthestartofthefile.” AsforwhyMicrosoftcaresaboutsavingUTF-8withaBOMinNotepad?Thisexplainsitwell;seemstobeaspecificrequirementofMicrosoftprogrammingtoolsandnotanyothernon-Microsofttooloutthere: “Microsoftcompilersandinterpreters,andmanypiecesofsoftwareonMicrosoftWindowssuchasNotepadtreattheBOMasarequiredmagicnumberratherthanuseheuristics.ThesetoolsaddaBOMwhensavingtextasUTF-8,andcannotinterpretUTF-8unlesstheBOMispresentorthefilecontainsonlyASCII.GoogleDocsalsoaddsaBOMwhenconvertingadocumenttoaplaintextfilefordownload.” SounlessyouexplicitlyneedtosaveaUTF-8filewithaBOMtobesetforafile,justdon’tworryaboutthatsavingoption. Share Improvethisanswer Follow editedJul3at22:00 answeredMay21,2020at3:54 Giacomo1968Giacomo1968 50.5k1818goldbadges158158silverbadges203203bronzebadges 4 2 Iwonderwhystandardizingonfilemetadatatospecifytheencodingtypeisapoorerchoicethanmakingeveryoneaddingalltheextralogictoinfertheactualencodinginuse. – OldGeezer May21,2020at3:58 @OldGeezerBecausemetadataanbefudgedand“lie.”Itisbettertocreateastandardthatdoesn’trequiremetadataforfilecontentparsingthanhopethateveryapplicationintheworld—newandold—canunderstandthatnewlyintroducedmetadata. – Giacomo1968 May21,2020at4:06 1 @OldGeezerMetadatadoesn'ttransferwell.Uploadyourfiletoawebsiteandallmetadata,exceptforfilename,islost.AndBOMisn'tperfecteither,it'sfineunlessanotherencodinghappenstointerpretitascorrectcharactersandyouhavetouseheuristicsanyway.Compatibilitywithlegacystandardsishard. – gronostaj May21,2020at8:45 AutoHotkeyrequiresBOMinitsconfigurationfile(ifyouuseextendedUTF-8characters).SoeventhoughNotepaddisplaysitcorrectwithoutBOM,itwillnotworkuntilyousaveitwith"UTF-8withBOM"encoding. – AxelBregnsbo Aug8at7:52 Addacomment | -3 Theotheransweriswrong.Itissomepoliticalthing. ANSIisthedefaulttextformatinWindowsandhasbeenfor36years. InWindowsfilesareassumedtobeANSI.ThereforeyoualwaysuseaBOM.Unixprogramsthatcan'thandleBOMsarenotUnicodecompliant. Iwritetexteditors.Iftheuserdoesn'tspecifyitisANSI-ALWAYS. AssumingyouwillgetBOMlessUnicodemeansyouhavetocallhttps://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicodetoguesstheformat.Hardlyproperprogramming. Share Improvethisanswer Follow editedMay22,2020at6:23 answeredMay21,2020at7:41 MarkMark 68633silverbadges33bronzebadges 9 5 "InWindowsfilesareassumedtobeANSI[...]Iftheuserdoesn'tspecifyitisANSI-ALWAYS"-eitheryou'rereferringtosomesubsetofWindowssoftware(andthisanswershouldclarifywhatsubsetitis)orthisisincorrect.AllcompetenttexteditorscanheuristicallydetectUTF-8withoutBOM,regardlessofplatform.EvenNotepaddoes(testedwithWindows10v1909build18363.836). – gronostaj May21,2020at7:58 4 That'syouropinion,notafact.I'veliterallycreatedaUTF-8filewithnon-ASCIIcharactersinSublimeText,confirmedinhexviewthatthereisnoBOMandsomecharactersareencodedmultibyte,andthenopenedthatfileinNotepad.Itworkedjustfine.Whetheryoulikeitornot,it'sjustnottruethatWindowssoftwareassumesANSIunlessindicatedotherwisebyBOM. – gronostaj May21,2020at8:21 2 Methodofcreatingthefileisirrelevant. – gronostaj May21,2020at8:39 2 Letmerepeat:NotepadwillcorrectlyopenanUTF-8filewithoutBOM,evenonWindows7SP1.Soit'snotassumingANSIsinceatleast2011.Your(original)openingsentenceisfactuallyincorrect.NotepadonWindows10willalsobydefaultsaveasUTF-8withoutBOM,soyour(new)openingsentenceisalsoincorrect. – gronostaj May22,2020at7:32 3 No,itwon't.UTF-8withoutBOMisthedefaultforsavinginWindows10v1909.Ialsodon'tseehowtheotheranswerisa"Unixanswer". – gronostaj May22,2020at8:49 | Show4morecomments YourAnswer ThanksforcontributingananswertoSuperUser!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedwindows-10notepadoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Linked 19 2exactcopiesofautorun.inf,oneworksonedoesnt 0 Howdoesthetextdecoderknowswhichtextencoderisusedtoencode? Related 19 Unicode,UnicodeBigEndianorUTF-8?Whatisthedifference?Whichformatisbetter? 4 HowcanIrestoreNotepadafterhavingitinfectedbyavirus? 10 Whydoesnotepadcrashondesktopfilesinthesave-asdialog? 7 Replacingnotepad.exeinWindows7 0 Killaprocesswhenuserpresses“logoff”,“shutdown”or“restart” 23 ChangingthedefaultANSItoUTF-8inNotepad 2 ConvertbetweenUTF-8to1255onlineandlocally? 2 Recoverthetextfromanotepaddumpfile 2 Runningnotepad.exelaunchesTextpad-IwantittolaunchNotepad HotNetworkQuestions ShouldIusepwdortildeplus(~+)? Awordfor"amessagetomyself" Whenisthefirstelementintheargumentlistregardedasafunctionsymbolandwhennot? Whyarefighterjetssoloudwhendoingslowflight? Workplaceidiomfor"beiGelegenheit"-ordertodoeventually,butdonotprovidepriority WhytheneedforaScienceOfficeronacargovessel? Whyare"eat"and"drink"differentwordsinlanguages? MakinganODEexact,whenformula'sofexactnessdonotprovideasolution Howtoelegantlyimplementthisoneusefulobject-orientedfeatureinMathematica? Sapiensdominabiturastris—isitnotPassivevoice? Howdouncomputablenumbersrelatetouncomputablefunctions? Findanddeletepartiallyduplicatelines Howtoremovetikznode? Isitcorrecttochangetheverbto"being"in"Despitenoonewashurtinthisincident…"? ArethereanyspellsotherthanWishthatcanlocateanobjectthroughleadshielding? MLmodellingwheretheoutputaffectstheDGP Canaphotonturnaprotonintoaneutron? DidMS-DOSeverdropabilitytosupportnon-IBMPCcompatiblemachines? MakeaCourtTranscriber sshhowtoallowaverylimiteduserwithnohometologinwithpubkey InD&D3.5,whathappenswhenyouplopaheadbandofintellectonananimal? 2016PutnamB6difficultsummationproblem WhydoNorthandSouthAmericancountriesoffercitizenshipbasedonunrestrictedJusSoli(rightofsoil)? WhydidGodprohibitwearingofgarmentsofdifferentmaterialsinLeviticus19:19? morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1关于Encode in UTF-8 without BOM_绯浅yousa的博客
关于Encode in UTF-8 without BOM定义BOM(Byte Order Mark),字节顺序标记,出现在文本文件头部,Unicode编码标准中用于标识文件是采用哪种格式的 ...
- 2Byte order mark - Globalization - Microsoft Learn
- 3UTF-8與UTF-8 without BOM - 程式人生
在位元組流之前有BOM表示採用低位元組序列(低位元組在前面),而UTF-8不用考慮位元組序列,所以其實有無BOM都可以。UTF-8以位元組為編碼單元,沒有位元組序的問題。UTF-16 ...
- 4ASCII、Unicode、UTF-8、UTF-8(without BOM)傻傻分不清
UTF-8以单字节为编码单元,不存在字节序的问题,但是可以使用BOM来表明所使用的编码方式,字符”ZERO WITH NO-BREAK SPACE“在UTF-8中的编码是EF BB ...
- 5Saving file using UTF8 without BOM as default
It look's weird in unity engine. I know VS can change default save encoding. But why not just set...