How to add a UTF-8 BOM in Java? - Stack Overflow
文章推薦指數: 80 %
As noted in section 23.8 of the Unicode 9 specification, the BOM for UTF-8 is EF BB BF . That sequence is what you get when using UTF-8 encoding on '\ufeff' ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams HowtoaddaUTF-8BOMinJava? AskQuestion Asked 11years,10monthsago Modified 28daysago Viewed 86ktimes 28 IhaveaJavastoredprocedurewhichfetchesrecordfromthetableusingResultsetobjectandcreatesaCSVfile. BLOBretBLOB=BLOB.createTemporary(conn,true,BLOB.DURATION_SESSION); retBLOB.open(BLOB.MODE_READWRITE); OutputStreambOut=retBLOB.setBinaryStream(0L); ZipOutputStreamzipOut=newZipOutputStream(bOut); PrintStreamout=newPrintStream(zipOut,false,"UTF-8"); out.write('\ufeff'); out.flush(); zipOut.putNextEntry(newZipEntry("filename.csv")); while(rs.next()){ out.print("\""+rs.getString(i)+"\""); out.print(","); } out.flush(); zipOut.closeEntry(); zipOut.close(); retBLOB.close(); returnretBLOB; ButthegeneratedCSVfiledoesn'tshowthecorrectGermancharacter.OracledatabasealsohasaNLS_CHARACTERSETvalueofUTF8. Pleasesuggest. javacharacter-encodingoracle10gbyte-order-mark Share Improvethisquestion Follow editedMar28,2020at19:41 informatik01 15.8k1010goldbadges7373silverbadges102102bronzebadges askedDec8,2010at15:10 FaddFadd 76022goldbadges88silverbadges1919bronzebadges 8 2 Justincaseyouhaven'tcomeacrossthisbefore,notethattheUnicodestandarddoesnotrequireorrecommendusingaBOMwithUTF-8.Itisn'tillegal,either,butshouldn'tbeusedindiscriminately.Seehereforthedetails,includingsomeguidelinesonwhenandwheretouseit.IfyouaretryingtoviewthecsvfileinWindows,thisisprobablyavaliduseoftheBOM. – MarceloCantos Dec8,2010at15:16 Yes,wearetryingtotheviewthecsvinWindows,butthegeneratedcsvstillshowsgarbledcharacterforgermancharacters.IsthistherightwaytosettheBOM? – Fadd Dec8,2010at15:20 Yes,that’sright.TheUnicodestandardrecommendsagainstusingaso-calledBOM(itisn’treally)withUTF-8. – tchrist Dec8,2010at17:05 4 @tchrist:itrecommendsagainstusingaBOMwhendealingwithsoftwareandprotocolsthatexceptsASCII-onlychars.IftheOPknowsthattheWindowssoftwarehe'susingwillusetheBOMtodetectthatthefileisactuallyencodedinUTF-8(wedon'tcareaboutthefactthatitain'taBOM,wecareaboutthefactthatitcanallowsomesoftwaretodetectthattheencodingisUTF-8).AlsonotethatifyouhadaBOMtoUTF-8andsomesoftwarefail,thenthesesoftwarearebroken,becauseaBOMatthebeginningofanUTF-8isperfectlyvalid. – SyntaxT3rr0r Dec8,2010at17:20 4 ForthecompletenessoftheBOMdiscussion.Excel2003strictlyrequirestheBOMinUTF-8encodedCSVfiles.Otherwisemultibytecharsareunreadable. – Michael-O Jan9,2012at14:00 | Show3morecomments 8Answers 8 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 78 BufferedWriterout=newBufferedWriter(newOutputStreamWriter(newFileOutputStream(...),StandardCharsets.UTF_8)); out.write('\ufeff'); out.write(...); Thiscorrectlywritesout0xEF0xBB0xBFtothefile,whichistheUTF-8representationoftheBOM. Share Improvethisanswer Follow editedJul5,2017at12:39 JulienH.-SonarSourceTeam 5,0441919silverbadges2424bronzebadges answeredNov14,2011at11:18 astroastro 77511goldbadge55silverbadges33bronzebadges 1 4 Thiscodeissensitivetodefaultplatformencoding.OnWindows,Iendedupwith0x3Fwrittentothefile.ThecorrectwaytogettheBufferedWriteris:BufferedWriterout=newBufferedWriter(newOutputStreamWriter(newFileOutputStream(theFile),StandardCharsets.UTF_8)) – JulienH.-SonarSourceTeam Jul5,2017at12:22 Addacomment | 17 JustincasepeopleareusingPrintStreams,youneedtodoitalittledifferently.WhileaWriterwilldosomemagictoconvertasinglebyteinto3bytes,aPrintStreamrequiresall3bytesoftheUTF-8BOMindividually: //Printutf-8BOM PrintStreamout=System.out; out.write('\ufeef');//emits0xef out.write('\ufebb');//emits0xbb out.write('\ufebf');//emits0xbf Alternatively,youcanusethehexvaluesforthosedirectly: PrintStreamout=System.out; out.write(0xef);//emits0xef out.write(0xbb);//emits0xbb out.write(0xbf);//emits0xbf Share Improvethisanswer Follow answeredMar30,2016at14:29 ChristopherSchultzChristopherSchultz 19.6k99goldbadges5959silverbadges7777bronzebadges Addacomment | 12 TowriteaBOMinUTF-8youneedPrintStream.print(),notPrintStream.write(). AlsoifyouwanttohaveBOMinyourcsvfile,IguessyouneedtoprintaBOMafterputNextEntry(). Share Improvethisanswer Follow answeredDec8,2010at15:41 axtavtaxtavt 236k4141goldbadges501501silverbadges476476bronzebadges 3 Aren’tallPrintStreamsfundamentallyflawedbecausetheydiscardallerrorsthatmayoccuronthestream,includingI/Oerrors,fullfilesystems,networkinterruptions,andencodingmismatches?Ifthisisnottrue,couldyoupleasetellmehowtomakethemreliable(becauseIwanttousethem)?Butifitistrue,couldyoupleaseexplainwhenitcouldeverbeappropriatetouseanoutputmethodthatsuppressescorrectnessconcerns?Thisisaseriousquestion,becauseIdon’tunderstandwhythiswassetuptobesodangerous.Thanksforanyinsights. – tchrist Dec8,2010at17:09 @tchrist-itistruethatPrintStreamssuppresserrors.However...1)theyarenotentirelydiscarded-youcanchecktoseeifanerrorhasoccurred.2)Therearecaseswhereyoudon'tneedtoknowabouterrors.Anindisputablecaseiswhenyouaresendingcharacterstoastreamthatiswritingtoanin-memorybuffer. – StephenC Jan15,2013at22:46 @tchristIguess,thisisallcausedbyusingcheckedexceptions.Normally,you'djustthrowonanyerrorandbehappy.YoucouldmakeanexistingPrintStream"safe"bywrappingeachcallandaddingcheckErrorandconditionallythrow.Buttheinformationabouttheexceptionislost.Soyes,PrintStreamisahopelesscrap. – maaartinus Jul16,2014at10:15 Addacomment | 11 PrintStream#print Ithinkthatout.write('\ufeff');shouldactuallybeout.print('\ufeff');,callingthejava.io.PrintStream#printmethod. Accordingthejavadoc,thewrite(int)methodactuallywritesabyte...withoutanycharacterencoding.Soout.write('\ufeff');writesthebyte0xff.Bycontrast,theprint(char)methodencodesthecharacterasoneorbytesusingthestream'sencoding,andthenwritesthosebytes. Asnotedinsection23.8oftheUnicode9specification,theBOMforUTF-8isEFBBBF.ThatsequenceiswhatyougetwhenusingUTF-8encodingon'\ufeff'.See:WhyUTF-8BOMbytesefbbbfcanbereplacedby\ufeff?. Share Improvethisanswer Follow editedJul28,2021at22:44 BasilBourque 276k9292goldbadges785785silverbadges10641064bronzebadges answeredDec8,2010at15:42 StephenCStephenC 679k9292goldbadges780780silverbadges11851185bronzebadges 2 Isn’ttheonlysafewaytodoencodedoutputinJavaistousetherarely-seenOutputStreamWriter(OutputStreamout,CharsetEncoderenc)foroftheconstructor,theonlyoneofthefourwithanexplicitCharsetEncoderargument,andneverusingthePrintStreamthatyou’verecommendedhere? – tchrist Dec8,2010at17:13 1 @tchrist-1)No.2)Ididn'trecommendPrintStream.IsimplysaidhowtodowhattheOPaskedtodousingthePrintStreamhewasalreadyusing.3)InthiscasePrintStreamshouldbesafebecausebecauseitisfollowedbyotheractionsthatwillcausewritestotheunderlyingstream(socket)andthrowanexceptionifthepreviousPrintStreamwriteshadsilentlyfailed. – StephenC Jan15,2013at22:54 Addacomment | 7 YouAddThisForFirstOfCSVString StringCSV=""; byte[]BOM={(byte)0xEF,(byte)0xBB,(byte)0xBF}; CSV=newString(BOM)+CSV; ThisWorkForMe. Share Improvethisanswer Follow editedFeb11,2021at14:20 answeredJul15,2020at15:48 SilentSilent 10533silverbadges88bronzebadges Addacomment | 2 Ifyoujustwantto modifythesamefile (withoutnewfileanddeleteoldoneasIhadissueswiththat) privatevoidaddBOM(FilefileInput)throwsIOException{ try(RandomAccessFilefile=newRandomAccessFile(fileInput,"rws")){ byte[]text=newbyte[(int)file.length()]; file.readFully(text); file.seek(0); byte[]bom={(byte)0xEF,(byte)0xBB,(byte)0xBF}; file.write(bom); file.write(text); } } Share Improvethisanswer Follow editedSep13at11:21 answeredJun24,2021at14:03 timguytimguy 1,76111goldbadge1818silverbadges3333bronzebadges Addacomment | 0 Inmycaseitworkswiththecode: PrintWriterout=newPrintWriter(newFile(filePath),"UTF-8"); out.write(csvContent); out.flush(); out.close(); Share Improvethisanswer Follow answeredDec19,2013at9:01 RocioRocio 11 Addacomment | 0 HereasimplewaytoappendBOMheaderonanyfile: privatestaticvoidappendBOM(Filefile)throwsException{ FilebomFile=newFile(file+".bom"); try(FileOutputStreamoutput=newFileOutputStream(bomFile,true)){ byte[]bytes=FileUtils.readFileToByteArray(file); output.write('\ufeef');//emits0xef output.write('\ufebb');//emits0xbb output.write('\ufebf');//emits0xbf output.write(bytes); output.flush(); } file.delete(); bomFile.renameTo(file); } Share Improvethisanswer Follow answeredDec22,2020at15:24 DavidDavid 16611silverbadge66bronzebadges Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedjavacharacter-encodingoracle10gbyte-order-markoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 39 settingaUTF-8injavaandcsvfile 2 CharacterencodingUTFandISO-8859-1inCSV 0 AddBOMinthebeginningofaString 198 MicrosoftExcelmanglesDiacriticsin.csvfiles? 28 What'sthebestwaytoexportUTF8dataintoExcel? 6 WhyUTF-8BOMbytesefbbbfcanbereplacedby\ufeff? 3 Java.Appendingstringtofile,endedwithstrangeoutput 1 ExcelnotshowingEurosymbolcorrectlyingeneratedCSVfile 2 javacreatingfileweirdcharacterinnotepad 1 HowtoaddaUTF-8BOMinKotlin? Seemorelinkedquestions Related 4193 WhatarethedifferencesbetweenaHashMapandaHashtableinJava? 7539 IsJava"pass-by-reference"or"pass-by-value"? 3839 HowdoIefficientlyiterateovereachentryinaJavaMap? 4319 AvoidingNullPointerExceptioninJava 4567 HowdoIread/convertanInputStreamintoaStringinJava? 3502 WhentouseLinkedListoverArrayListinJava? 3971 HowdoIgeneraterandomintegerswithinaspecificrangeinJava? 974 What'sthedifferencebetweenUTF-8andUTF-8withBOM? 3409 HowdoIconvertaStringtoanintinJava? 3623 HowcanIcreateamemoryleakinJava? HotNetworkQuestions WhydidGodprohibitwearingofgarmentsofdifferentmaterialsinLeviticus19:19? WherewasthisneonsignofadragondisplayedinLosAngelesinthe1990s?Isitstilltherenow? Howdouncomputablenumbersrelatetouncomputablefunctions? WhydoNorthandSouthAmericancountriesoffercitizenshipbasedonunrestrictedJusSoli(rightofsoil)? Myfavoriteanimalisa-singularandpluralform Howdoyoucalculatethetimeuntilthesteady-stateofadrug? Doyoupayforthebreakfastinadvance? UnderstandingElectricFieldsLinesandhowtheyshow‘like’chargesrepelling ArethereanyspellsotherthanWishthatcanlocateanobjectthroughleadshielding? Isitcorrecttochangetheverbto"being"in"Despitenoonewashurtinthisincident…"? Idon'tunderstandif"per"meaningexactamountforeachunitordoesitmean"onaverage" AmIreallyrequiredtosetupanInheritedIRA? Howtoremovetikznode? ElectronicCircuitsforSafeInitiationofPyrotechnics? sshhowtoallowaverylimiteduserwithnohometologinwithpubkey Canananimalfilealawsuitonitsownbehalf? WhatdothecolorsindicateonthisKC135tankerboom? Howdocucumbershappen?Whatdoes"verypoorlypollinatedcucumber"meanexactly?Howcanpollinationbe"uneven"? Sapiensdominabiturastris—isitnotPassivevoice? Theunusualphrasing"verb+the+comparativeadjective"intheLordoftheRingsnovels circuitikz:Addingarrowheadtotapofvariableinductance? I2C(TWI)vsSPIEMInoiseresistance WhathappenswhenthequasarremnantsreachEarthin3millionyears? Whydostringhashcodeschangeforeachexecutionin.NET? morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-java Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Java - How to add and remove BOM from UTF-8 file
1. Add BOM to a UTF-8 file ... To Add BOM to a UTF-8 file, we can directly write Unicode \ufeff o...
- 2Java處理UTF-8帶BOM的文本的讀寫 - 網頁設計教學
BOM(byte-order mark),即字節順序標記,它是插入到以UTF-8、UTF16或UTF-32編碼Unicode文件開頭的特殊標記,用來識別Unicode文件的編碼類型。
- 3How to add a UTF-8 BOM in Java? - Stack Overflow
As noted in section 23.8 of the Unicode 9 specification, the BOM for UTF-8 is EF BB BF . That seq...
- 4Java处理UTF-8文件的BOM头部 - 51CTO博客
Java处理UTF-8文件的BOM头部. BOM——Byte Order Mark,就是字节序标记。 基本概念. 在 UCS 编码 中有一个叫做” ZERO WIDTH NO-BREAK SPA...
- 5Handle UTF8 file with BOM - Real's Java How-to
UTF8 file are a special case because it is not recommended to add a BOM to them. The presence of ...