Encode String to UTF-8 - java - Stack Overflow

文章推薦指數: 80 %
投票人數:10人

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams EncodeStringtoUTF-8 AskQuestion Asked 11years,5monthsago Modified 6monthsago Viewed 1.1mtimes 210 IhaveaStringwitha"ñ"characterandIhavesomeproblemswithit.IneedtoencodethisStringtoUTF-8encoding.Ihavetrieditbythisway,butitdoesn'twork: byteptext[]=myString.getBytes(); Stringvalue=newString(ptext,"UTF-8"); HowdoIencodethatstringtoutf-8? javautf-8 Share Improvethisquestion Follow editedAug8,2016at17:34 EricLeschinski 139k9191goldbadges405405silverbadges327327bronzebadges askedApr20,2011at11:55 AlexAlex 6,8871212goldbadges4242silverbadges4848bronzebadges 9 2 It'sunclearwhatexactlyyou'retryingtodo.DoesmyStringcorrectlycontaintheñcharacterandyouhaveproblemsconvertingittoabytearray(inthatcaseseeanswersfromPeterandAmir),orismyStringcorruptedandyou'retryingtofixit(inthatcase,seeanswersfromJoachimandme)? – MichaelBorgwardt Apr20,2011at12:13 IneedtosendmyStringtoaserverwithutf-8encodingandIneedtoconvertthe"ñ"charactertoutf-8encoding. – Alex Apr20,2011at12:20 1 Well,ifthatserverexpectsUTF-8thenwhatyouneedtosenditarebytes,notaString.SoasperPeter'sanswer,specifytheencodinginthefirstlineanddropthesecondline. – MichaelBorgwardt Apr20,2011at12:32 @Michael:Iagreethatitisn’tclearwhattherealintentishere.ThereseemtobealotofquestionswherepeoplearetryingtoexplicitconversionsbetweenStringsandbytesratherthanlettingthe{In,Out}putStream{Read,Writ}ersdoitforthem.Iwonderwhy? – tchrist Apr21,2011at15:05 1 @Michael:Thanks,Isupposethatmakessense.Butitalsomakesitharderthanitneedstobe,doesn’tit?Iamnotveryfondoflanguagesthatworkthatway,andsotrytoavoidworkingwiththem.IthinkJava’smodelofStringsofcharactersinsteadofbytesmakesthingsawholeloteasier.PerlandPythonalsosharethe“everythingisUnicodestrings”model.Yes,inallthreeyoucanstillgetatbytesifyouworkatit,butinpracticeitseemsrarethatyoutrulyneedto:that’squitelow-level.Plusitfeelskindalikebrushingacatthewrongdirection,ifyouknowwhatImean.:) – tchrist Apr21,2011at15:24  |  Show4morecomments 11Answers 11 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 187 Howaboutusing ByteBufferbyteBuffer=StandardCharsets.UTF_8.encode(myString) Share Improvethisanswer Follow editedAug11,2018at23:43 leventov 14k1111goldbadges6767silverbadges9696bronzebadges answeredApr20,2011at11:57 AmirRachumAmirRachum 74k7272goldbadges165165silverbadges244244bronzebadges 7 9 ButhowdoIobtainaencodedString?itreturnsaByteBuffer – Alex Apr20,2011at12:16 8 @Alex:it'snotpossibletohaveanUTF-8encodedJavaString.Youwantbytes,soeitherusetheByteBufferdirectly(couldevenbethebestsolutionifyourgoalistosenditviaanetworkcollection)orcallarray()onittogetabyte[] – MichaelBorgwardt Apr20,2011at12:35 2 SomethingelsethatmaybehelpfulistouseGuava'sCharsets.UTF_8enuminsteadofaStringthatmaythrowanUnsupportedEncodingException.String->bytes:myString.getBytes(Charsets.UTF_8),andbytes->String:newString(myByteArray,Charsets.UTF_8). – laughing_man Mar12,2014at3:24 25 Evenbetter,useStandardCharsets.UTF_8.AvailableinJava1.7+. – Kat Jul29,2014at23:25 1 Thearrayreturnbyarray()willmostlikelybebiggerthanneededandpadded,asitistheByteBuffersinternalarray.Bettertousestring.getBytes(StandardCharsets.UTF_8)whichwillreturnanewarraywiththecorrectsize. – Chirlo Mar31,2020at22:59  |  Show2morecomments 154 StringobjectsinJavausetheUTF-16encodingthatcan'tbemodified*. Theonlythingthatcanhaveadifferentencodingisabyte[].SoifyouneedUTF-8data,thenyouneedabyte[].IfyouhaveaStringthatcontainsunexpecteddata,thentheproblemisatsomeearlierplacethatincorrectlyconvertedsomebinarydatatoaString(i.e.itwasusingthewrongencoding). *Asamatterofimplementation,StringcaninternallyuseaISO-8859-1encodedbyte[]whentherangeofcharactersfitsit,butthatisanimplementation-specificoptimizationthatisn'tvisibletousersofString(i.e.you'llnevernoticeunlessyoudigintothesourcecodeorusereflectiontodigintoaStringobject). Share Improvethisanswer Follow editedMar22at8:27 answeredApr20,2011at11:58 JoachimSauerJoachimSauer 295k5656goldbadges548548silverbadges608608bronzebadges 4 99 Technicallyspeaking,byte[]doesn'thaveanyencoding.BytearrayPLUSencodingcangiveyoustringthough. – PeterŠtibraný Apr20,2011at14:34 1 @Peter:true.Butattachinganencodingtoitonlymakessenseforbyte[],itdoesn'tmakesenseforString(unlesstheencodingisUTF-16,inwhichcaseitmakessensebutitstillunnecessaryinformation). – JoachimSauer Apr20,2011at14:36 4 StringobjectsinJavausetheUTF-16encodingthatcan'tbemodified.Doyouhaveanofficialsourceforthisquote? – AhmadHajjar Oct25,2018at2:21 1 @AhmadHajjardocs.oracle.com/javase/10/docs/api/java/lang/…:"TheJavaplatformusestheUTF-16representationinchararraysandintheStringandStringBufferclasses." – MaxiGis Oct4,2019at14:43 Addacomment  |  88 InJava7youcanuse: importstaticjava.nio.charset.StandardCharsets.*; byte[]ptext=myString.getBytes(ISO_8859_1); Stringvalue=newString(ptext,UTF_8); ThishastheadvantageovergetBytes(String)thatitdoesnotdeclarethrowsUnsupportedEncodingException. Ifyou'reusinganolderJavaversionyoucandeclarethecharsetconstantsyourself: importjava.nio.charset.Charset; publicclassStandardCharsets{ publicstaticfinalCharsetISO_8859_1=Charset.forName("ISO-8859-1"); publicstaticfinalCharsetUTF_8=Charset.forName("UTF-8"); //.... } Share Improvethisanswer Follow editedApr3,2017at17:29 EduardoCuomo 16.7k66goldbadges108108silverbadges9090bronzebadges answeredNov27,2013at12:52 rzymekrzymek 8,75322goldbadges4444silverbadges5858bronzebadges 4 2 Thisistherightanswer.Ifsomeonewantstouseastringdatatype,hecanuseitintherightformat.Restoftheanswersarepointingtothebyteformattedtype. – NeerajShukla Feb8,2015at9:36 Worksin6.Thanks. – ItsikMauyhas Sep26,2017at12:26 Correctanswerformetoo.Onethingthough,whenIusedasabove,Germancharacterchangedto?.So,Iusedthis:byte[]ptext=myString.getBytes(UTF_8);Stringvalue=newString(ptext,UTF_8);Thisworkedfine. – FarhanHafeez Feb12,2019at7:23 4 Thecodesampledoesn'tmakesense.IfyoufirstconverttoISO-8859-1,thenthatarrayofbyteisnotUTF-8,sothenextlineistotallyincorrect.ItwillworkforASCIIstrings,ofcourse,butthenyoucouldaswellmakeasimplecopy:Stringvalue=newString(myString);. – AlexisWilke Aug16,2019at3:09 Addacomment  |  77 Usebyte[]ptext=String.getBytes("UTF-8");insteadofgetBytes().getBytes()usesso-called"defaultencoding",whichmaynotbeUTF-8. Share Improvethisanswer Follow answeredApr20,2011at11:57 PeterŠtibranýPeterŠtibraný 32.1k1616goldbadges8888silverbadges116116bronzebadges 4 9 @Michael:heisclearlyhavingtroublegettingbytesfromstring.HowisgetBytes(encoding)missingthepoint?Ithinksecondlineistherejusttocheckifhecanconvertitback. – PeterŠtibraný Apr20,2011at12:01 1 IinterpretitashavingabrokenStringandtryingto"fix"itbyconvertingtobytesandback(commonmisunderstanding).There'snoactualindicationthatthesecondlineisjustcheckingtheresult. – MichaelBorgwardt Apr20,2011at12:04 @Michael,nothereisn't,it'sjustmyinterpretation.Yoursissimplydifferent. – PeterŠtibraný Apr20,2011at12:05 1 @Peter:you'reright,we'dneedclarificationfromAlexwhathereallymeans.Can'trescindthedownvotethoughunlesstheanswerisedited... – MichaelBorgwardt Apr20,2011at12:07 Addacomment  |  33 AJavaStringisinternallyalwaysencodedinUTF-16-butyoureallyshouldthinkaboutitlikethis:anencodingisawaytotranslatebetweenStringsandbytes. Soifyouhaveanencodingproblem,bythetimeyouhaveString,it'stoolatetofix.YouneedtofixtheplacewhereyoucreatethatStringfromafile,DBornetworkconnection. Share Improvethisanswer Follow answeredApr20,2011at11:58 MichaelBorgwardtMichaelBorgwardt 338k7777goldbadges474474silverbadges709709bronzebadges 6 1 It'sacommonmistaketobelievethatstringsareinternallyencodedasUTF-16.Usuallytheyare,butif,itisonlyanimplementationspecificdetailoftheStringclass.SincetheinternalstorageofthecharacterdataisnotaccessiblethroughthepublicAPI,aspecificStringimplementationmaydecidetouseanyotherencoding. – jarnbjo Apr20,2011at12:45 4 @jarnbjo:TheAPIexplicitlystates"AStringrepresentsastringintheUTF-16format".Usinganythingelseasinternalformatwouldbehighlyinefficient,andallactualimplementationsIknowdouseUTF-16internally.Sounlessyoucanciteonethatdoesn't,you'reengaginginprettyabsurdhairsplitting. – MichaelBorgwardt Apr20,2011at13:30 Isitabsurdtodistinguishbetweenpublicaccessandinternalrepresentationofdatastructures? – jarnbjo Apr20,2011at15:01 6 TheJVM(asfarasitisrelevanttotheVMatall)usesUTF-8forstringencoding,e.g.intheclassfiles.Theimplementationofjava.lang.StringisdecoupledfromtheJVMandIcouldeasilyimplementtheclassforyouusinganyotherencodingfortheinternalrepresentationifthatisreallynecessaryforyoutorealizethatyouranswerisincorrect.UsingUTF-16astheinternalformatisinmostcaseshighlyinefficientaswellwhenitcomestomemoryconsumptionandIdon'tseewhye.g.Javaimplementationsforembeddedhardwarewouldn'toptimizeformemoryinsteadofperformance. – jarnbjo Apr20,2011at16:19 1 @jarnbjo:Andoncemore:aslongasyoucannotgiveaconcreteexampleofaJVMwhosestandardAPIimplementationdoesinternallyusesomethingotherthanUTF-16toimplementStrings,mystatementiscorrect.Andno,theStringclassisnotreallydecoupledfromtheJVM,duetothingslikeintern()andtheconstantpool. – MichaelBorgwardt Apr20,2011at18:25  |  Show1morecomment 25 Youcantrythisway. byteptext[]=myString.getBytes("ISO-8859-1"); Stringvalue=newString(ptext,"UTF-8"); Share Improvethisanswer Follow editedApr20,2011at16:56 bstpierre 29k1414goldbadges6767silverbadges102102bronzebadges answeredApr20,2011at12:24 user716840user716840 30122silverbadges22bronzebadges 3 1 Iwasgoingcrazy.Thankyoutogetthebytesin"ISO-8859-1"firstwasthesolution. – jhfdr3s Jun19,2018at21:22 3 Thisiswrong.IfyourstringincludesUnicodecharacters,convertingitto8859-1isgoingtothrowanexceptionorworsegiveyouaninvalidstring(maybethestringwithoutthosecharacterswithcodepoint0x100andover). – AlexisWilke Aug16,2019at3:22 worksperfectly – eng.ahmed Dec5,2021at22:15 Addacomment  |  16 InamomentIwentthroughthisproblemandmanagedtosolveitinthefollowingway firstineedtoimport importjava.nio.charset.Charset; ThenihadtodeclareaconstanttouseUTF-8andISO-8859-1 privatestaticfinalCharsetUTF_8=Charset.forName("UTF-8"); privatestaticfinalCharsetISO=Charset.forName("ISO-8859-1"); ThenIcoulduseitinthefollowingway: Stringtextwithaccent="Thísísatextwithaccent"; Stringtextwithletter="Ñandú"; text1=newString(textwithaccent.getBytes(ISO),UTF_8); text2=newString(textwithletter.getBytes(ISO),UTF_8); Share Improvethisanswer Follow editedApr9,2018at2:41 answeredApr8,2018at22:16 QuimboQuimbo 53255silverbadges1616bronzebadges 1 1 perfectsolution. – TundePizzle Aug1,2018at8:35 Addacomment  |  9 Stringvalue=newString(myString.getBytes("UTF-8")); and,ifyouwanttoreadfromtextfilewith"ISO-8859-1"encoded: Stringline; Stringf="C:\\MyPath\\MyFile.txt"; try{ BufferedReaderbr=Files.newBufferedReader(Paths.get(f),Charset.forName("ISO-8859-1")); while((line=br.readLine())!=null){ System.out.println(newString(line.getBytes("UTF-8"))); } }catch(IOExceptionex){ //... } Share Improvethisanswer Follow answeredFeb19,2015at19:34 fedesanpfedesanp 19922silverbadges33bronzebadges 0 Addacomment  |  3 Ihaveusebelowcodetoencodethespecialcharacterbyspecifyingencodeformat. Stringtext="Thisisanexampleé"; byte[]byteText=text.getBytes(Charset.forName("UTF-8")); //Togetoriginalstringfrombyte. StringoriginalString=newString(byteText,"UTF-8"); Share Improvethisanswer Follow answeredMay4,2016at7:49 laxman954laxman954 13311silverbadge88bronzebadges Addacomment  |  2 Aquickstep-by-stepguidehowtoconfigureNetBeansdefaultencodingUTF-8.InresultNetBeanswillcreateallnewfilesinUTF-8encoding. NetBeansdefaultencodingUTF-8step-by-stepguide GotoetcfolderinNetBeansinstallationdirectory Editnetbeans.conffile Findnetbeans_default_optionsline Add-J-Dfile.encoding=UTF-8insidequotationmarksinsidethatline (example:netbeans_default_options="-J-Dfile.encoding=UTF-8") RestartNetBeans YousetNetBeansdefaultencodingUTF-8. Yournetbeans_default_optionsmaycontainadditionalparametersinsidethequotationmarks.Insuchcase,add-J-Dfile.encoding=UTF-8attheendofthestring.Separateitwithspacefromotherparameters. Example: netbeans_default_options="-J-client-J-Xss128m-J-Xms256m -J-XX:PermSize=32m-J-Dapple.laf.useScreenMenuBar=true-J-Dapple.awt.graphics.UseQuartz=true-J-Dsun.java2d.noddraw=true-J-Dsun.java2d.dpiaware=true-J-Dsun.zip.disableMemoryMapping=true-J-Dfile.encoding=UTF-8" hereislinkforFurtherDetails Share Improvethisanswer Follow editedJun20,2020at9:12 CommunityBot 111silverbadge answeredOct9,2019at6:36 LaeeqKhanNiaziLaeeqKhanNiazi 33722silverbadges1010bronzebadges 0 Addacomment  |  0 Thissolvedmyproblem StringinputText="sometextwithescapedchars" InputStreamis=newByteArrayInputStream(inputText.getBytes("UTF-8")); Share Improvethisanswer Follow answeredDec9,2014at7:48 PrasanthRJPrasanthRJ 13711silverbadge88bronzebadges Addacomment  |  Highlyactivequestion.Earn10reputation(notcountingtheassociationbonus)inordertoanswerthisquestion.Thereputationrequirementhelpsprotectthisquestionfromspamandnon-answeractivity. Nottheansweryou'relookingfor?Browseotherquestionstaggedjavautf-8oraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 1 HowtoconvertStringtoutf-8andstillgetresultsasString 286 HowtoconvertStringstoandfromUTF8bytearraysinJava 6 HowtocountStringbytesproperly? 9 Encodingvariable-lengthutf8bytearrayinJava 2 HowtoconvertbytearrayinStringformattobytearray? 4 Android-concatenatetwodifferentlanguagesstrings 4 Kotlinprintsnon-Englishcharactersasquestionmarks 2 HowtologProtobufstringinnestedobjectsinahuman-readableway? 3 JavaStringnotdisplayingGermanumlautscharacters 1 Howtoremoveaccentsfromaunicodestringinjavausingahashmap? Seemorelinkedquestions Related 1829 SortaMapbyvalues 1323 UTF-8allthewaythrough 1578 Fastestwaytodetermineifaninteger'ssquarerootisaninteger 4567 HowdoIread/convertanInputStreamintoaStringinJava? 3971 HowdoIgeneraterandomintegerswithinaspecificrangeinJava? 2285 HowtogetanenumvaluefromastringvalueinJava 1872 HowdoIsplitastringinJava? 585 WhydoesmodernPerlavoidUTF-8bydefault? 26658 Whyisprocessingasortedarrayfasterthanprocessinganunsortedarray? HotNetworkQuestions PacifistethosblockingmyprogressinStellaris HowtoruntheGUIofWindowsFeaturesOn/OffusingPowershell Unsurewhatthesewatersoftenerdialsarefor Myfavoriteanimalisa-singularandpluralform Whydoes«facture»mean"bill,invoice"? SomeoneofferedtaxdeductibledonationasapaymentmethodforsomethingIamselling.AmIgettingscammed? ArethereanyspellsotherthanWishthatcanlocateanobjectthroughleadshielding? Whyarefighterjetssoloudwhendoingslowflight? Flatkeyboardwithoutanyphysicalkeys Howdoyoucalculatethetimeuntilthesteady-stateofadrug? InD&D3.5,canafamiliarbetemporarilydismissed? Howtosimplifyapurefunction? Interpretinganegativeself-evaluationofahighperformer Howtoformalizeagamewhereeachplayerisaprogramhavingaccesstoopponent'scode? Awordfor"amessagetomyself" Could"nocloning"beusedasadefenceforquantumencryption? Whatistheconventionalwaytonotateameterwithaccentsoneverysecond8thnote? WhatisthecurrentstatusofwatchtowerimplementationsinOctober2022?Aretheymature,widelyinuse? IfthedrowshadowbladeusesShadowSwordasarangedattack,doesitthrowasword(thatitthenhastoretrievebeforeusingitagain)? Howtoproperlycolorcellsinalatextablewithoutscrewingupthelines? Doyoupayforthebreakfastinadvance? Whyare"eat"and"drink"differentwordsinlanguages? WillIgetdeniedentryafterIremovedavisasticker?Ismypassportdamaged? Levinson'salgorithmandQRdecompositionforcomplexleast-squaresFIRdesign morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-java Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings  



請為這篇文章評分?