Encode String to UTF-8 - java - Stack Overflow
文章推薦指數: 80 %
A Java String is internally always encoded in UTF-16 - but you really should think about it like this: ...
Home
Public
Questions
Tags
Users
Companies
Collectives
ExploreCollectives
Teams
StackOverflowforTeams
–Startcollaboratingandsharingorganizationalknowledge.
CreateafreeTeam
WhyTeams?
Teams
CreatefreeTeam
Collectives™onStackOverflow
Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost.
LearnmoreaboutCollectives
Teams
Q&Aforwork
Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch.
LearnmoreaboutTeams
EncodeStringtoUTF-8
AskQuestion
Asked
11years,5monthsago
Modified
6monthsago
Viewed
1.1mtimes
210
IhaveaStringwitha"ñ"characterandIhavesomeproblemswithit.IneedtoencodethisStringtoUTF-8encoding.Ihavetrieditbythisway,butitdoesn'twork:
byteptext[]=myString.getBytes();
Stringvalue=newString(ptext,"UTF-8");
HowdoIencodethatstringtoutf-8?
javautf-8
Share
Improvethisquestion
Follow
editedAug8,2016at17:34
EricLeschinski
139k9191goldbadges405405silverbadges327327bronzebadges
askedApr20,2011at11:55
AlexAlex
6,8871212goldbadges4242silverbadges4848bronzebadges
9
2
It'sunclearwhatexactlyyou'retryingtodo.DoesmyStringcorrectlycontaintheñcharacterandyouhaveproblemsconvertingittoabytearray(inthatcaseseeanswersfromPeterandAmir),orismyStringcorruptedandyou'retryingtofixit(inthatcase,seeanswersfromJoachimandme)?
– MichaelBorgwardt
Apr20,2011at12:13
IneedtosendmyStringtoaserverwithutf-8encodingandIneedtoconvertthe"ñ"charactertoutf-8encoding.
– Alex
Apr20,2011at12:20
1
Well,ifthatserverexpectsUTF-8thenwhatyouneedtosenditarebytes,notaString.SoasperPeter'sanswer,specifytheencodinginthefirstlineanddropthesecondline.
– MichaelBorgwardt
Apr20,2011at12:32
@Michael:Iagreethatitisn’tclearwhattherealintentishere.ThereseemtobealotofquestionswherepeoplearetryingtoexplicitconversionsbetweenStringsandbytesratherthanlettingthe{In,Out}putStream{Read,Writ}ersdoitforthem.Iwonderwhy?
– tchrist
Apr21,2011at15:05
1
@Michael:Thanks,Isupposethatmakessense.Butitalsomakesitharderthanitneedstobe,doesn’tit?Iamnotveryfondoflanguagesthatworkthatway,andsotrytoavoidworkingwiththem.IthinkJava’smodelofStringsofcharactersinsteadofbytesmakesthingsawholeloteasier.PerlandPythonalsosharethe“everythingisUnicodestrings”model.Yes,inallthreeyoucanstillgetatbytesifyouworkatit,butinpracticeitseemsrarethatyoutrulyneedto:that’squitelow-level.Plusitfeelskindalikebrushingacatthewrongdirection,ifyouknowwhatImean.:)
– tchrist
Apr21,2011at15:24
|
Show4morecomments
11Answers
11
Sortedby:
Resettodefault
Highestscore(default)
Trending(recentvotescountmore)
Datemodified(newestfirst)
Datecreated(oldestfirst)
187
Howaboutusing
ByteBufferbyteBuffer=StandardCharsets.UTF_8.encode(myString)
Share
Improvethisanswer
Follow
editedAug11,2018at23:43
leventov
14k1111goldbadges6767silverbadges9696bronzebadges
answeredApr20,2011at11:57
AmirRachumAmirRachum
74k7272goldbadges165165silverbadges244244bronzebadges
7
9
ButhowdoIobtainaencodedString?itreturnsaByteBuffer
– Alex
Apr20,2011at12:16
8
@Alex:it'snotpossibletohaveanUTF-8encodedJavaString.Youwantbytes,soeitherusetheByteBufferdirectly(couldevenbethebestsolutionifyourgoalistosenditviaanetworkcollection)orcallarray()onittogetabyte[]
– MichaelBorgwardt
Apr20,2011at12:35
2
SomethingelsethatmaybehelpfulistouseGuava'sCharsets.UTF_8enuminsteadofaStringthatmaythrowanUnsupportedEncodingException.String->bytes:myString.getBytes(Charsets.UTF_8),andbytes->String:newString(myByteArray,Charsets.UTF_8).
– laughing_man
Mar12,2014at3:24
25
Evenbetter,useStandardCharsets.UTF_8.AvailableinJava1.7+.
– Kat
Jul29,2014at23:25
1
Thearrayreturnbyarray()willmostlikelybebiggerthanneededandpadded,asitistheByteBuffersinternalarray.Bettertousestring.getBytes(StandardCharsets.UTF_8)whichwillreturnanewarraywiththecorrectsize.
– Chirlo
Mar31,2020at22:59
|
Show2morecomments
154
StringobjectsinJavausetheUTF-16encodingthatcan'tbemodified*.
Theonlythingthatcanhaveadifferentencodingisabyte[].SoifyouneedUTF-8data,thenyouneedabyte[].IfyouhaveaStringthatcontainsunexpecteddata,thentheproblemisatsomeearlierplacethatincorrectlyconvertedsomebinarydatatoaString(i.e.itwasusingthewrongencoding).
*Asamatterofimplementation,StringcaninternallyuseaISO-8859-1encodedbyte[]whentherangeofcharactersfitsit,butthatisanimplementation-specificoptimizationthatisn'tvisibletousersofString(i.e.you'llnevernoticeunlessyoudigintothesourcecodeorusereflectiontodigintoaStringobject).
Share
Improvethisanswer
Follow
editedMar22at8:27
answeredApr20,2011at11:58
JoachimSauerJoachimSauer
295k5656goldbadges548548silverbadges608608bronzebadges
4
99
Technicallyspeaking,byte[]doesn'thaveanyencoding.BytearrayPLUSencodingcangiveyoustringthough.
– PeterŠtibraný
Apr20,2011at14:34
1
@Peter:true.Butattachinganencodingtoitonlymakessenseforbyte[],itdoesn'tmakesenseforString(unlesstheencodingisUTF-16,inwhichcaseitmakessensebutitstillunnecessaryinformation).
– JoachimSauer
Apr20,2011at14:36
4
StringobjectsinJavausetheUTF-16encodingthatcan'tbemodified.Doyouhaveanofficialsourceforthisquote?
– AhmadHajjar
Oct25,2018at2:21
1
@AhmadHajjardocs.oracle.com/javase/10/docs/api/java/lang/…:"TheJavaplatformusestheUTF-16representationinchararraysandintheStringandStringBufferclasses."
– MaxiGis
Oct4,2019at14:43
Addacomment
|
88
InJava7youcanuse:
importstaticjava.nio.charset.StandardCharsets.*;
byte[]ptext=myString.getBytes(ISO_8859_1);
Stringvalue=newString(ptext,UTF_8);
ThishastheadvantageovergetBytes(String)thatitdoesnotdeclarethrowsUnsupportedEncodingException.
Ifyou'reusinganolderJavaversionyoucandeclarethecharsetconstantsyourself:
importjava.nio.charset.Charset;
publicclassStandardCharsets{
publicstaticfinalCharsetISO_8859_1=Charset.forName("ISO-8859-1");
publicstaticfinalCharsetUTF_8=Charset.forName("UTF-8");
//....
}
Share
Improvethisanswer
Follow
editedApr3,2017at17:29
EduardoCuomo
16.7k66goldbadges108108silverbadges9090bronzebadges
answeredNov27,2013at12:52
rzymekrzymek
8,75322goldbadges4444silverbadges5858bronzebadges
4
2
Thisistherightanswer.Ifsomeonewantstouseastringdatatype,hecanuseitintherightformat.Restoftheanswersarepointingtothebyteformattedtype.
– NeerajShukla
Feb8,2015at9:36
Worksin6.Thanks.
– ItsikMauyhas
Sep26,2017at12:26
Correctanswerformetoo.Onethingthough,whenIusedasabove,Germancharacterchangedto?.So,Iusedthis:byte[]ptext=myString.getBytes(UTF_8);Stringvalue=newString(ptext,UTF_8);Thisworkedfine.
– FarhanHafeez
Feb12,2019at7:23
4
Thecodesampledoesn'tmakesense.IfyoufirstconverttoISO-8859-1,thenthatarrayofbyteisnotUTF-8,sothenextlineistotallyincorrect.ItwillworkforASCIIstrings,ofcourse,butthenyoucouldaswellmakeasimplecopy:Stringvalue=newString(myString);.
– AlexisWilke
Aug16,2019at3:09
Addacomment
|
77
Usebyte[]ptext=String.getBytes("UTF-8");insteadofgetBytes().getBytes()usesso-called"defaultencoding",whichmaynotbeUTF-8.
Share
Improvethisanswer
Follow
answeredApr20,2011at11:57
PeterŠtibranýPeterŠtibraný
32.1k1616goldbadges8888silverbadges116116bronzebadges
4
9
@Michael:heisclearlyhavingtroublegettingbytesfromstring.HowisgetBytes(encoding)missingthepoint?Ithinksecondlineistherejusttocheckifhecanconvertitback.
– PeterŠtibraný
Apr20,2011at12:01
1
IinterpretitashavingabrokenStringandtryingto"fix"itbyconvertingtobytesandback(commonmisunderstanding).There'snoactualindicationthatthesecondlineisjustcheckingtheresult.
– MichaelBorgwardt
Apr20,2011at12:04
@Michael,nothereisn't,it'sjustmyinterpretation.Yoursissimplydifferent.
– PeterŠtibraný
Apr20,2011at12:05
1
@Peter:you'reright,we'dneedclarificationfromAlexwhathereallymeans.Can'trescindthedownvotethoughunlesstheanswerisedited...
– MichaelBorgwardt
Apr20,2011at12:07
Addacomment
|
33
AJavaStringisinternallyalwaysencodedinUTF-16-butyoureallyshouldthinkaboutitlikethis:anencodingisawaytotranslatebetweenStringsandbytes.
Soifyouhaveanencodingproblem,bythetimeyouhaveString,it'stoolatetofix.YouneedtofixtheplacewhereyoucreatethatStringfromafile,DBornetworkconnection.
Share
Improvethisanswer
Follow
answeredApr20,2011at11:58
MichaelBorgwardtMichaelBorgwardt
338k7777goldbadges474474silverbadges709709bronzebadges
6
1
It'sacommonmistaketobelievethatstringsareinternallyencodedasUTF-16.Usuallytheyare,butif,itisonlyanimplementationspecificdetailoftheStringclass.SincetheinternalstorageofthecharacterdataisnotaccessiblethroughthepublicAPI,aspecificStringimplementationmaydecidetouseanyotherencoding.
– jarnbjo
Apr20,2011at12:45
4
@jarnbjo:TheAPIexplicitlystates"AStringrepresentsastringintheUTF-16format".Usinganythingelseasinternalformatwouldbehighlyinefficient,andallactualimplementationsIknowdouseUTF-16internally.Sounlessyoucanciteonethatdoesn't,you'reengaginginprettyabsurdhairsplitting.
– MichaelBorgwardt
Apr20,2011at13:30
Isitabsurdtodistinguishbetweenpublicaccessandinternalrepresentationofdatastructures?
– jarnbjo
Apr20,2011at15:01
6
TheJVM(asfarasitisrelevanttotheVMatall)usesUTF-8forstringencoding,e.g.intheclassfiles.Theimplementationofjava.lang.StringisdecoupledfromtheJVMandIcouldeasilyimplementtheclassforyouusinganyotherencodingfortheinternalrepresentationifthatisreallynecessaryforyoutorealizethatyouranswerisincorrect.UsingUTF-16astheinternalformatisinmostcaseshighlyinefficientaswellwhenitcomestomemoryconsumptionandIdon'tseewhye.g.Javaimplementationsforembeddedhardwarewouldn'toptimizeformemoryinsteadofperformance.
– jarnbjo
Apr20,2011at16:19
1
@jarnbjo:Andoncemore:aslongasyoucannotgiveaconcreteexampleofaJVMwhosestandardAPIimplementationdoesinternallyusesomethingotherthanUTF-16toimplementStrings,mystatementiscorrect.Andno,theStringclassisnotreallydecoupledfromtheJVM,duetothingslikeintern()andtheconstantpool.
– MichaelBorgwardt
Apr20,2011at18:25
|
Show1morecomment
25
Youcantrythisway.
byteptext[]=myString.getBytes("ISO-8859-1");
Stringvalue=newString(ptext,"UTF-8");
Share
Improvethisanswer
Follow
editedApr20,2011at16:56
bstpierre
29k1414goldbadges6767silverbadges102102bronzebadges
answeredApr20,2011at12:24
user716840user716840
30122silverbadges22bronzebadges
3
1
Iwasgoingcrazy.Thankyoutogetthebytesin"ISO-8859-1"firstwasthesolution.
– jhfdr3s
Jun19,2018at21:22
3
Thisiswrong.IfyourstringincludesUnicodecharacters,convertingitto8859-1isgoingtothrowanexceptionorworsegiveyouaninvalidstring(maybethestringwithoutthosecharacterswithcodepoint0x100andover).
– AlexisWilke
Aug16,2019at3:22
worksperfectly
– eng.ahmed
Dec5,2021at22:15
Addacomment
|
16
InamomentIwentthroughthisproblemandmanagedtosolveitinthefollowingway
firstineedtoimport
importjava.nio.charset.Charset;
ThenihadtodeclareaconstanttouseUTF-8andISO-8859-1
privatestaticfinalCharsetUTF_8=Charset.forName("UTF-8");
privatestaticfinalCharsetISO=Charset.forName("ISO-8859-1");
ThenIcoulduseitinthefollowingway:
Stringtextwithaccent="Thísísatextwithaccent";
Stringtextwithletter="Ñandú";
text1=newString(textwithaccent.getBytes(ISO),UTF_8);
text2=newString(textwithletter.getBytes(ISO),UTF_8);
Share
Improvethisanswer
Follow
editedApr9,2018at2:41
answeredApr8,2018at22:16
QuimboQuimbo
53255silverbadges1616bronzebadges
1
1
perfectsolution.
– TundePizzle
Aug1,2018at8:35
Addacomment
|
9
Stringvalue=newString(myString.getBytes("UTF-8"));
and,ifyouwanttoreadfromtextfilewith"ISO-8859-1"encoded:
Stringline;
Stringf="C:\\MyPath\\MyFile.txt";
try{
BufferedReaderbr=Files.newBufferedReader(Paths.get(f),Charset.forName("ISO-8859-1"));
while((line=br.readLine())!=null){
System.out.println(newString(line.getBytes("UTF-8")));
}
}catch(IOExceptionex){
//...
}
Share
Improvethisanswer
Follow
answeredFeb19,2015at19:34
fedesanpfedesanp
19922silverbadges33bronzebadges
0
Addacomment
|
3
Ihaveusebelowcodetoencodethespecialcharacterbyspecifyingencodeformat.
Stringtext="Thisisanexampleé";
byte[]byteText=text.getBytes(Charset.forName("UTF-8"));
//Togetoriginalstringfrombyte.
StringoriginalString=newString(byteText,"UTF-8");
Share
Improvethisanswer
Follow
answeredMay4,2016at7:49
laxman954laxman954
13311silverbadge88bronzebadges
Addacomment
|
2
Aquickstep-by-stepguidehowtoconfigureNetBeansdefaultencodingUTF-8.InresultNetBeanswillcreateallnewfilesinUTF-8encoding.
NetBeansdefaultencodingUTF-8step-by-stepguide
GotoetcfolderinNetBeansinstallationdirectory
Editnetbeans.conffile
Findnetbeans_default_optionsline
Add-J-Dfile.encoding=UTF-8insidequotationmarksinsidethatline
(example:netbeans_default_options="-J-Dfile.encoding=UTF-8")
RestartNetBeans
YousetNetBeansdefaultencodingUTF-8.
Yournetbeans_default_optionsmaycontainadditionalparametersinsidethequotationmarks.Insuchcase,add-J-Dfile.encoding=UTF-8attheendofthestring.Separateitwithspacefromotherparameters.
Example:
netbeans_default_options="-J-client-J-Xss128m-J-Xms256m
-J-XX:PermSize=32m-J-Dapple.laf.useScreenMenuBar=true-J-Dapple.awt.graphics.UseQuartz=true-J-Dsun.java2d.noddraw=true-J-Dsun.java2d.dpiaware=true-J-Dsun.zip.disableMemoryMapping=true-J-Dfile.encoding=UTF-8"
hereislinkforFurtherDetails
Share
Improvethisanswer
Follow
editedJun20,2020at9:12
CommunityBot
111silverbadge
answeredOct9,2019at6:36
LaeeqKhanNiaziLaeeqKhanNiazi
33722silverbadges1010bronzebadges
0
Addacomment
|
0
Thissolvedmyproblem
StringinputText="sometextwithescapedchars"
InputStreamis=newByteArrayInputStream(inputText.getBytes("UTF-8"));
Share
Improvethisanswer
Follow
answeredDec9,2014at7:48
PrasanthRJPrasanthRJ
13711silverbadge88bronzebadges
Addacomment
|
Highlyactivequestion.Earn10reputation(notcountingtheassociationbonus)inordertoanswerthisquestion.Thereputationrequirementhelpsprotectthisquestionfromspamandnon-answeractivity.
Nottheansweryou'relookingfor?Browseotherquestionstaggedjavautf-8oraskyourownquestion.
TheOverflowBlog
HowtoearnamillionreputationonStackOverflow:beofservicetoothers
Therightwaytojobhop(Ep.495)
FeaturedonMeta
BookmarkshaveevolvedintoSaves
Inboximprovements:markingnotificationsasread/unread,andafiltered...
Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew...
CollectivesUpdate:RecognizedMembers,Articles,andGitLab
Shouldweburninatethe[script]tag?
Linked
1
HowtoconvertStringtoutf-8andstillgetresultsasString
286
HowtoconvertStringstoandfromUTF8bytearraysinJava
6
HowtocountStringbytesproperly?
9
Encodingvariable-lengthutf8bytearrayinJava
2
HowtoconvertbytearrayinStringformattobytearray?
4
Android-concatenatetwodifferentlanguagesstrings
4
Kotlinprintsnon-Englishcharactersasquestionmarks
2
HowtologProtobufstringinnestedobjectsinahuman-readableway?
3
JavaStringnotdisplayingGermanumlautscharacters
1
Howtoremoveaccentsfromaunicodestringinjavausingahashmap?
Seemorelinkedquestions
Related
1829
SortaMap
延伸文章資訊
- 1Charset (Java SE 16 & JDK 16) - Oracle Help Center
- 2Java String Encoding - Javatpoint
- 3Java String - Jenkov.com
- 4Convert String to UTF-8 bytes in Java - Tutorialspoint
UTF-8 is a variable width character encoding. UTF-8 has ability to be as condense as ASCII but ca...
- 5java中string类型转换成UTF-8 - CSDN博客
1、测试方法如下: public static String toUtf8(String str) { return new String(str.getBytes("UTF-8"),"UTF-...