Difference between UTF-8, UTF-16 and UTF-32 Character ...
文章推薦指數: 80 %
The main difference between UTF-8, UTF-16, and UTF-32 character encoding is how many bytes it requires to represent a character in memory. UTF-8 uses a ... TopicsandCategories corejava spring hibernate collections multithreading designpatterns interviewquestions coding datastructure OOP java8 books AboutMe JavaCertifications JDBC jsp-servlet JSON SQL Linux Courses onlineresources jvm-internals REST Eclipse jQuery JavaIO JavaXML Disclosure:Thisarticlemaycontainaffiliatelinks.Whenyoupurchase,wemayearnasmallcommission. DifferencebetweenUTF-8,UTF-16andUTF-32CharacterEncoding?Example ThemaindifferencebetweenUTF-8,UTF-16,andUTF-32characterencodingishowmanybytesitrequirestorepresentacharacterinmemory.UTF-8usesaminimumofonebyte,whileUTF-16usesaminimumof2bytes.BTW,ifthecharacter'scodepointisgreaterthan127,themaximumvalueofbytethenUTF-8maytake2,3o4bytesbutUTF-16willonlytakeeithertwoorfourbytes.Ontheotherhand,UTF-32isafixed-widthencodingschemeandalwaysuses4bytestoencodeaUnicodecodepoint.Now,let'sstartwithwhatischaracterencodingandwhyit'simportant?Well,characterencodingisanimportantconceptintheprocessofconvertingbytestreamsintocharacters,whichcanbedisplayed. Therearetwothings,whichareimportanttoconvertbytestocharacters,acharactersetandanencoding.Sincetherearesomanycharactersandsymbolsintheworld,acharactersetisrequiredtosupportallthosecharacters.Acharactersetisnothingbutlistofcharacters,whereeachsymbolorcharacterismappedtoanumericvalue,alsoknownascodepoints. OntheotherhandUTF-16,UTF-32andUTF-8areencodingschemes,whichdescribehowthesevalues(codepoints)aremappedtobytes(usingdifferentbitvaluesasabasis;e.g.16-bitforUTF-16,32bitsforUTF-32and8-bitforUTF-8).UTFstandsforUnicodeTransformation,whichdefinesanalgorithmtomapeveryUnicodecodepointtoauniquebytesequence. Forexample,forcharacterA,whichisLatinCapitalA,UnicodecodepointisU+0041,UTF-8encodedbytesare41,UTF-16encodingis0041,andJavacharliteralis'\u0041'.Inshort,youjustneedacharacterencodingschemetointerpretastreamofbytes,intheabsenceofcharacterencoding,youcannotshowthemcorrectly.Javaprogramminglanguagehasextensivesupportfordifferentcharsetandcharacterencoding,bydefaultitusesUTF-8. DifferencebetweenUTF-32,UTF-16andUTF-8encoding AsIsaidearlier,UTF-8,UTF-16,andUTF-32arejustcoupleofwaystostoreUnicodecodepointsi.e.thoseU+magicnumbersusing8,16and32bitsinthecomputer'smemory.OnceUnicodecharacterisconvertedintobytes,itcanbeeasilypersistedinthedisk,transferredovernetworkandrecreatedatotherend. ThefundamentaldifferencebetweenUTF-32andUTF-8,UTF-16isthatformerisfixedwidthencodingscheme,whilelaterduoisvariablelengthencoding.BTW,despite,bothUTF-8andUTF-16usesUnicodecharactersandvariablewidthencoding,therearesomedifferencebetweenthemaswell. 1.UTF-8usesonebyteattheminimuminencodingthecharacterswhileUTF-16usesminimumtwobytes. InUTF-8,everycodepointfrom0-127isstoredinasinglebytes.Onlycodepoints128andabovearestoredusing2,3orinfact,upto4bytes.Inshort,UTF-8isvariablelengthencodingandtakes1to4bytes,dependinguponcodepoint.UTF-16isalsovariablelengthcharacterencodingbuteithertakes2or4bytes.OntheotherhandUTF-32isfixed4bytes. 2.UTF-8iscompatiblewithASCIIwhileUTF-16isincompatiblewithASCII UTF-8hasanadvantagewhereASCIIaremostusedcharacters,inthatcasemostcharactersonlyneedonebyte.UTF-8filecontainingonlyASCIIcharactershasthesameencodingasanASCIIfile,whichmeansEnglishtextlooksexactlythesameinUTF-8asitdidinASCII.GivendominanceofASCIIinpastthiswasthemainreasonofinitialacceptanceofUnicodeandUTF-8. Hereisanexample,whichshowshowdifferentcharactersaremappedtobytesunderdifferentcharacterencodingschemee.g.UTF-16,UTF-8andUTF-32.Youcanseehowdifferentschemetakesdifferentnumberofbytestorepresentsamecharacter. Summary 1)UTF16isnotfixedwidth.Ituses2or4bytes.TheonlyUTF32isfixed-widthandunfortunately,nooneusesit. Also,worthknowingisthatJavaStringsarerepresentedusingUTF-16bitcharacters,earliertheyuseUSC2,whichisfixedwidth. 2)YoumightthinkthatbecauseUTF-8takesfewerbytesformanycharactersitwouldtakelessmemorythanUTF-16,wellthatreallydependsonwhatlanguagethestringisin.Fornon-Europeanlanguages,UTF-8requiresmorememorythanUTF-16. 3)ASCIIisstrictlyfasterthanmulti-byteencodingschemebecauselessdatatoprocess=faster. That'sallaboutUnicode,UTF-8,UTF-32,andUTF-16characterencoding.Aswehavelearned,Unicodeisacharactersetofvarioussymbols,whileUTF-8,UTF-16,andUTF-32aredifferentwaystorepresenttheminbyteformat.BothUTF-8andUTF-16arevariable-lengthencoding,wherethenumberofbytesuseddependsuponUnicodecodepoints. Ontheotherhand,UTF-32isfixed-widthencoding,whereeachcodepointtakes4bytes.Unicodecontainscodepointsforalmostallrepresentablegraphicsymbolsintheworldanditsupportsallmajorlanguagese.g.English,Japanese,Mandarin,orDevanagari. Alwaysremember,UTF-32isfixed-widthencoding,alwaystakes32bits,butUTF-8andUTF-16arevariable-lengthencodingswhereUTF-8cantake1to4byteswhileUTF-16willtakeeither2or4bytes. By javinpaul EmailThis BlogThis! SharetoTwitter SharetoFacebook Labels: bestofjavarevisited , corejava , programming 11comments : KunalKrishna85 said... "BTW,ifcharacter'scodepointisgreaterthan127,"whatisCharacter'sCODEPOINT?plzexplain. February17,2015at9:21PM Anonymous said... Yousaid:"Javaprogramminglanguagehasextensivesupportfordifferentcharsetandcharacterencoding,bydefaultituseUTF-8."Thenyousaid:"Also,worthknowingisthatJavaStringsarerepresentedusingUTF-16bitcharacters"Couldyouclearthisout. February18,2015at11:29AM gm said... Onequestion.YoumentionthedefaultencodinginJavaisUTF-8butatleastCharacterandStringhavethedefaultUTF-16(http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).Isthereadifferentencodingyouwererefferingto?Tx,niceblog February19,2015at5:42AM Unknown said... @Kunal"Codepointsarethenumbersthatareusedincodedcharactersetwherecodedcharactersetrepresentcollectionofcharactersandeachcharacterwillassignauniquenumber.Thiscodedcharactersetdefinerangeofvalidcodepoints.ValidcodepointsforUnicodeareU+0000toU+10FFFF."http://javarevisited.blogspot.com/2012/01/java-string-codepoint-get-unicode.html February19,2015at8:42AM Anonymous said... HelloOnepointtonoteisthat,UTF-8cangountil6bytes,ihopeiamnotwronghere.Thanks. February19,2015at11:58AM javinpaul said... @gm,Yes,JavaStringusesUTF-16butwhenyouconvertBytearraytocharacters,Javausesplatform'sdefaultcharacterencoding.It'sdifferentatdifferentplacese.g.inEclipseitcouldbedifferentthanyourLinuxhost. February21,2015at11:45PM Anonymous said... Hellothere?WhatisdifferencebetweenUTF-16,UTF-16LEandUTF-16BE?Aretheysame? February23,2015at5:13AM Anonymous said... @Anonymous,Theyarenotsame.UTF-16LEstorebytesinlittleendianorder,whileUTF-16BEstoresbytesinbigendianorderindisk.SinceUTF-16usesminimum2bytestorepresentacharacter,howtheystorethosetwobytesindiskwillaffectthevalueofcharacter.Inbigendian,mostsignificantbyteisstoredathigherlocation. September16,2015at12:34AM vijaypratap said... (£)ThissymbolwearetakingfromDatabase,whiledisplayingthisvaluesin.jsppageitisfine,butwhilegettingthevalueintoAPIsitiscomingas(A^£).Weareusingchaset=utf-8.Couldyoupleasetellmewhyitishappeningandwhatissolutionforit. October24,2018at8:40PM Unknown said... Useutf16 August21,2019at4:44PM Anonymous said... Acharactersetisnothingbutlistofcharacters,whereeachsymbolorcharacterismappedtoanumericvalue,alsoknownascodepoints. December11,2020at12:21PM PostaComment NewerPost OlderPost Home Subscribeto: PostComments ( Atom ) SearchThisBlog SubscribeforDiscountsandUpdates Follow InterviewQuestions corejavainterviewquestion (178) interviewquestions (105) datastructureandalgorithm (86) CodingInterviewQuestion (79) designpatterns (38) objectorientedprogramming (37) SQLInterviewQuestions (35) springinterviewquestions (32) threadinterviewquestions (30) collectionsinterviewquestions (26) databaseinterviewquestions (16) servletinterviewquestions (15) Programminginterviewquestion (6) hibernateinterviewquestions (6) BestofJavarevisited HowSpringMVCworksinternally? HowtodesignavendingmachineinJava? HowHashMapworksinJava? WhyStringisImmutableinJava? 10ArticlesEveryProgrammerMustRead HowtoconvertlambdaexpressiontomethodreferenceinJava8? 10TipstoimproveProgrammingSkill 10OOPdesignprinciplesprogrammershouldknow HowSynchronizationworksinJava? 10tipstoworkfastinLinux 5BookstoimproveCodingSkills JavaTutorials dateandtimetutorial (24) FIXprotocoltutorial (15) JavaCertificationOCPJPSCJP (33) javacollectiontutorial (84) javaIOtutorial (29) JavaJSONtutorial (15) JavamultithreadingTutorials (61) JavaProgrammingTutorials (20) Javaxmltutorial (16) JDBC (34) jsp-servlet (37) onlineresources (227) GetNewBlogPostsonYourEmail Getnewpostsbyemail:Subscribe Followers Categories courses (395) SQL (68) linux (50) database (49) JavaCertificationOCPJPSCJP (33) Eclipse (30) REST (29) JVMInternals (24) JQuery (21) Testing (19) general (18) Maven (16) BlogArchive ► 2022 (701) ► October (11) ► September (37) ► August (83) ► July (144) ► June (111) ► May (64) ► April (126) ► March (25) ► February (44) ► January (56) ▼ 2021 (960) ► December (134) ► November (88) ► October (40) ► September (57) ► August (224) ▼ July (359) ParsingLargeJSONFilesusingJacksonStreamingA... HowtoSolveUnrecognizedPropertyException:Unreco... HowtoparseJSONwithdatefieldinJava-Jackso... HowtoIgnoreUnknownPropertiesWhileParsingJSO... HowtoFindPrimeFactorsofIntegerNumbersinJa... java.lang.ClassNotFoundException:org.postgresql.D... WhymultipleinheritancesarenotsupportedinJava HowtocreateHTTPServerinJava-ServerSocketE... LawofDemeterinJava-PrincipleofleastKnowle... HowtodoGROUPBYinJava8?Collectors.groupingB... 10ThingsEveryJavaProgrammerShouldKnowabout... 10TipstoDebugJavaPrograminEclipse-Examples HowSSL,HTTPSandCertificatesWorksinJavaweb... 3WaystoConvertanArraytoArrayListinJava?E... DifferencebetweenLEFTandRIGHTOUTERJoinsinS... DifferenceBetweenLinkedListandArrayinJava?... WhentoMakeaMethodStaticinJava?Example DifferentTypesofJDBCDriversinJava-QuickOv... DifferencebetweenClassNotFoundExceptionvsNoCla... WhyEnumSingletonarebetterinJava?Examples BuilderDesignpatterninJava-ExampleTutorial 5CodingTipsforImprovingPerformanceofJavaap... Differencebetweenrepaintandrevalidatemethodi... HowtoCountnumberofSetbitsor1'sofInteger... WhenaclassisloadedandinitializedinJVM-Ja... HowtoAddTwoIntegerNumberswithoutusingPlus... JavaArrayListandHashMapPerformanceImprovement... IsSwingThreadSafeinJava?Answer InvalidinitialandmaximumheapsizeinJVM-How... HowtoCloseJavaProgramorSwingApplicationwit... HowtoCheckifIntegerNumberisPowerofTwoin... InvokeLaterandInvokeAndWaitinJavaSwing(anex... HowtoUseBreak,Continue,andLabelinLoopin... 10ExamplesofHotSpotJVMOptionsinJava DifferencebetweenSun(Oracle)JVMandIBMJVM? HowtoGenerateMD5checksumforFilesinJava?Ex... HowtofindCPUandMemoryusedbyJavaprocessin... 10XSLTorXML,XSLTransformationInterviewQuest... HowClassLoaderWorksinJava?Example 3waystosolvejava.lang.NoClassDefFoundErrorin... 20DesignPatternsandSoftwareDesignInterviewQ... HowtouseComparatorandComparableinJava?With... 10InterviewQuestionsonJavaGenericsforProgra... Whatis-XX:+UseCompressedOopsin64bitJVM?Example Top10GarbageCollectionInterviewQuestionsand... WhatisClassFileandByteCodeinJava?Example Top10JavaSwingInterviewQuestionsAnswersaske... HowtocomparetwolistsofvaluesinMicrosoftEx... DifferencebetweenJVM,JIR,JRE,andJDKinJava?... Howtoreload/refreshapageusingJavaScriptand... HowtoincreaseHeapmemoryofApacheTomcatServe... HowmanycharactersallowedonVARCHAR(n)columns... WhatisboundedandunboundedwildcardsinGeneric... HowtoSplitStringbasedondelimiterinJava?Ex... DifferencebetweenRightshiftandUnsignedright... WhatisthemaximumHeapSizeof32bitor64-bit... HowtoReplaceLineBreaks,NewLinesFromString... HowtoConvertByteArraytoInputStreamandOutpu... HowtoCreateJUnitTestsinEclipseandNetBeans... 10ArticlesEveryProgrammerMustRead Whatisjava.library.path?HowtosetinEclipseI... HowtoaddandsubstractdaysincurrentdateinJ... 10JDK7FeaturestoRevisit,BeforeYouWelcomeJ... JavaProgramtofindfactorialofnumberinJava-... 7ExamplestoReadFileintoaByteArrayinJava DifferencebetweenConnectedvsDisconnectedRowSe... DifferencebetweenStubandMockobjectinJavaUn... HowtoAddLeadingZerostoIntegersinJava?Str... HowtoImplementLinkedListinJavawithJUnitTe... DifferencebetweenFileInputStreamandFileReader... Top10Puzzles,Riddles,Logical,andLateralThin... DifferencebetweenUTF-8,UTF-16andUTF-32Charac... HowtoImplementThreadinJavawithExample DifferencebetweenvalueOfandparseIntmethodin... HowtoCompareTwoEnuminJava?Equalsvs==vsC... DifferenceBetweenAbstractClassvsInterfacein... WhatisStringargs[]ArgumentinJavaMainmetho... HowtodisableJUnitTest-@IgnoreannotationExa... TheUltimateGuideofGenericsinJava-Examples Differencebetweentrunk,tagsandbranchesinSVN... HowtoCheckIfNumberisEvenorOddwithoutusin... HowtoConvertInputStreamtoByteArrayinJava-... JavaProgramtoprintPrimenumbersinJava-Exa... JavaProgramtoFindSumofDigitsinaNumberusi... HowtocomparetwoXMLfilesinJava-XMLUnitExa... JAXBDateFormatExampleusingAnnotation|JavaD... HowtoconvertdoubletointinJava?Example DoesmakingallfieldsFinalmakestheclassImmut... Top10TipsonLogginginJava-Tutorial HowtoSetupJavaRemoteDebugginginEclipse-St... HowtoFindFirstandLastelementinLinkedListJ... DifferenceBetweenjavaandjavawCommandsfromJDK JavaProgramtoconnectOracleDatabasewithExamp... DifferencebetweenValidatorFormvsValidatorActio... 10pointsaboutJavaHeapSpaceorJavaHeapMemory Top12ApacheWebServerInterviewQuestionsAnswe... WhatisinterfaceinJavawithExample-Tutorial StringreplaceAll()example-Howtoreplaceallc... JavaProgramtoReverseanIntegerNumber-Exampl... HowtoMeasureElapsedExecutionTimeinJava-Sp... ► June (5) ► May (7) ► April (15) ► March (17) ► February (8) ► January (6) ► 2020 (95) ► December (13) ► November (10) ► October (6) ► September (4) ► August (5) ► July (8) ► June (2) ► May (8) ► April (20) ► March (11) ► February (8) ► 2019 (24) ► December (3) ► November (6) ► October (4) ► August (1) ► July (2) ► June (2) ► May (1) ► April (2) ► February (1) ► January (2) ► 2018 (5) ► September (1) ► August (1) ► July (2) ► June (1) ► 2017 (22) ► December (2) ► November (2) ► October (4) ► September (2) ► July (3) ► June (5) ► May (3) ► April (1) TranslateThisBlog References Oracle'sJavaTechNetwork jQueryDocumentation MicrosoftSQLServerDocumentation JavaSE8APIDocumentation SpringDocumentation Oracle'sJAvaCertification SpringSecurity5Documentation Pages PrivacyPolicy TermsandConditions CopyrightbyJavinPaul2010-2021.PoweredbyBlogger.
延伸文章資訊
- 1Unicode character encoding - IBM
- 2What are Unicode, UTF-8, and UTF-16? - Stack Overflow
UTF-16 will allocate minimum 2 bytes and maximum of 4 bytes per character, it will not allocate 1...
- 3UTF16 Encoder - Browserling
Useful, free online tool for that converts text and strings to UTF-16 encoding ... two-byte or fo...
- 4UTF-16 - 維基百科,自由的百科全書
UTF-16是Unicode字元編碼五層次模型的第三層:字元編碼表(Character Encoding Form,也稱為"storage format")的一種實現方式。即把Unicode字元...
- 5Db2 12 - Internationalization - UTFs