Difference between UTF-8, UTF-16 and UTF-32 Character ...
文章推薦指數: 80 %
The main difference between UTF-8, UTF-16, and UTF-32 character encoding is how many bytes it requires to represent a character in memory. UTF-8 uses a ... TopicsandCategories corejava spring hibernate collections multithreading designpatterns interviewquestions coding datastructure OOP java8 books AboutMe JavaCertifications JDBC jsp-servlet JSON SQL Linux Courses onlineresources jvm-internals REST Eclipse jQuery JavaIO JavaXML Disclosure:Thisarticlemaycontainaffiliatelinks.Whenyoupurchase,wemayearnasmallcommission. DifferencebetweenUTF-8,UTF-16andUTF-32CharacterEncoding?Example ThemaindifferencebetweenUTF-8,UTF-16,andUTF-32characterencodingishowmanybytesitrequirestorepresentacharacterinmemory.UTF-8usesaminimumofonebyte,whileUTF-16usesaminimumof2bytes.BTW,ifthecharacter'scodepointisgreaterthan127,themaximumvalueofbytethenUTF-8maytake2,3o4bytesbutUTF-16willonlytakeeithertwoorfourbytes.Ontheotherhand,UTF-32isafixed-widthencodingschemeandalwaysuses4bytestoencodeaUnicodecodepoint.Now,let'sstartwithwhatischaracterencodingandwhyit'simportant?Well,characterencodingisanimportantconceptintheprocessofconvertingbytestreamsintocharacters,whichcanbedisplayed. Therearetwothings,whichareimportanttoconvertbytestocharacters,acharactersetandanencoding.Sincetherearesomanycharactersandsymbolsintheworld,acharactersetisrequiredtosupportallthosecharacters.Acharactersetisnothingbutlistofcharacters,whereeachsymbolorcharacterismappedtoanumericvalue,alsoknownascodepoints. OntheotherhandUTF-16,UTF-32andUTF-8areencodingschemes,whichdescribehowthesevalues(codepoints)aremappedtobytes(usingdifferentbitvaluesasabasis;e.g.16-bitforUTF-16,32bitsforUTF-32and8-bitforUTF-8).UTFstandsforUnicodeTransformation,whichdefinesanalgorithmtomapeveryUnicodecodepointtoauniquebytesequence. Forexample,forcharacterA,whichisLatinCapitalA,UnicodecodepointisU+0041,UTF-8encodedbytesare41,UTF-16encodingis0041,andJavacharliteralis'\u0041'.Inshort,youjustneedacharacterencodingschemetointerpretastreamofbytes,intheabsenceofcharacterencoding,youcannotshowthemcorrectly.Javaprogramminglanguagehasextensivesupportfordifferentcharsetandcharacterencoding,bydefaultitusesUTF-8. DifferencebetweenUTF-32,UTF-16andUTF-8encoding AsIsaidearlier,UTF-8,UTF-16,andUTF-32arejustcoupleofwaystostoreUnicodecodepointsi.e.thoseU+magicnumbersusing8,16and32bitsinthecomputer'smemory.OnceUnicodecharacterisconvertedintobytes,itcanbeeasilypersistedinthedisk,transferredovernetworkandrecreatedatotherend. ThefundamentaldifferencebetweenUTF-32andUTF-8,UTF-16isthatformerisfixedwidthencodingscheme,whilelaterduoisvariablelengthencoding.BTW,despite,bothUTF-8andUTF-16usesUnicodecharactersandvariablewidthencoding,therearesomedifferencebetweenthemaswell. 1.UTF-8usesonebyteattheminimuminencodingthecharacterswhileUTF-16usesminimumtwobytes. InUTF-8,everycodepointfrom0-127isstoredinasinglebytes.Onlycodepoints128andabovearestoredusing2,3orinfact,upto4bytes.Inshort,UTF-8isvariablelengthencodingandtakes1to4bytes,dependinguponcodepoint.UTF-16isalsovariablelengthcharacterencodingbuteithertakes2or4bytes.OntheotherhandUTF-32isfixed4bytes. 2.UTF-8iscompatiblewithASCIIwhileUTF-16isincompatiblewithASCII UTF-8hasanadvantagewhereASCIIaremostusedcharacters,inthatcasemostcharactersonlyneedonebyte.UTF-8filecontainingonlyASCIIcharactershasthesameencodingasanASCIIfile,whichmeansEnglishtextlooksexactlythesameinUTF-8asitdidinASCII.GivendominanceofASCIIinpastthiswasthemainreasonofinitialacceptanceofUnicodeandUTF-8. Hereisanexample,whichshowshowdifferentcharactersaremappedtobytesunderdifferentcharacterencodingschemee.g.UTF-16,UTF-8andUTF-32.Youcanseehowdifferentschemetakesdifferentnumberofbytestorepresentsamecharacter. Summary 1)UTF16isnotfixedwidth.Ituses2or4bytes.TheonlyUTF32isfixed-widthandunfortunately,nooneusesit. Also,worthknowingisthatJavaStringsarerepresentedusingUTF-16bitcharacters,earliertheyuseUSC2,whichisfixedwidth. 2)YoumightthinkthatbecauseUTF-8takesfewerbytesformanycharactersitwouldtakelessmemorythanUTF-16,wellthatreallydependsonwhatlanguagethestringisin.Fornon-Europeanlanguages,UTF-8requiresmorememorythanUTF-16. 3)ASCIIisstrictlyfasterthanmulti-byteencodingschemebecauselessdatatoprocess=faster. That'sallaboutUnicode,UTF-8,UTF-32,andUTF-16characterencoding.Aswehavelearned,Unicodeisacharactersetofvarioussymbols,whileUTF-8,UTF-16,andUTF-32aredifferentwaystorepresenttheminbyteformat.BothUTF-8andUTF-16arevariable-lengthencoding,wherethenumberofbytesuseddependsuponUnicodecodepoints. Ontheotherhand,UTF-32isfixed-widthencoding,whereeachcodepointtakes4bytes.Unicodecontainscodepointsforalmostallrepresentablegraphicsymbolsintheworldanditsupportsallmajorlanguagese.g.English,Japanese,Mandarin,orDevanagari. Alwaysremember,UTF-32isfixed-widthencoding,alwaystakes32bits,butUTF-8andUTF-16arevariable-lengthencodingswhereUTF-8cantake1to4byteswhileUTF-16willtakeeither2or4bytes. By javinpaul EmailThis BlogThis! SharetoTwitter SharetoFacebook Labels: bestofjavarevisited , corejava , programming 11comments : KunalKrishna85 said... "BTW,ifcharacter'scodepointisgreaterthan127,"whatisCharacter'sCODEPOINT?plzexplain. February17,2015at9:21PM Anonymous said... Yousaid:"Javaprogramminglanguagehasextensivesupportfordifferentcharsetandcharacterencoding,bydefaultituseUTF-8."Thenyousaid:"Also,worthknowingisthatJavaStringsarerepresentedusingUTF-16bitcharacters"Couldyouclearthisout. February18,2015at11:29AM gm said... Onequestion.YoumentionthedefaultencodinginJavaisUTF-8butatleastCharacterandStringhavethedefaultUTF-16(http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).Isthereadifferentencodingyouwererefferingto?Tx,niceblog February19,2015at5:42AM Unknown said... @Kunal"Codepointsarethenumbersthatareusedincodedcharactersetwherecodedcharactersetrepresentcollectionofcharactersandeachcharacterwillassignauniquenumber.Thiscodedcharactersetdefinerangeofvalidcodepoints.ValidcodepointsforUnicodeareU+0000toU+10FFFF."http://javarevisited.blogspot.com/2012/01/java-string-codepoint-get-unicode.html February19,2015at8:42AM Anonymous said... HelloOnepointtonoteisthat,UTF-8cangountil6bytes,ihopeiamnotwronghere.Thanks. February19,2015at11:58AM javinpaul said... @gm,Yes,JavaStringusesUTF-16butwhenyouconvertBytearraytocharacters,Javausesplatform'sdefaultcharacterencoding.It'sdifferentatdifferentplacese.g.inEclipseitcouldbedifferentthanyourLinuxhost. February21,2015at11:45PM Anonymous said... Hellothere?WhatisdifferencebetweenUTF-16,UTF-16LEandUTF-16BE?Aretheysame? February23,2015at5:13AM Anonymous said... @Anonymous,Theyarenotsame.UTF-16LEstorebytesinlittleendianorder,whileUTF-16BEstoresbytesinbigendianorderindisk.SinceUTF-16usesminimum2bytestorepresentacharacter,howtheystorethosetwobytesindiskwillaffectthevalueofcharacter.Inbigendian,mostsignificantbyteisstoredathigherlocation. September16,2015at12:34AM vijaypratap said... (£)ThissymbolwearetakingfromDatabase,whiledisplayingthisvaluesin.jsppageitisfine,butwhilegettingthevalueintoAPIsitiscomingas(A^£).Weareusingchaset=utf-8.Couldyoupleasetellmewhyitishappeningandwhatissolutionforit. October24,2018at8:40PM Unknown said... Useutf16 August21,2019at4:44PM Anonymous said... Acharactersetisnothingbutlistofcharacters,whereeachsymbolorcharacterismappedtoanumericvalue,alsoknownascodepoints. December11,2020at12:21PM PostaComment NewerPost OlderPost Home Subscribeto: PostComments ( Atom ) SearchThisBlog SubscribeforDiscountsandUpdates Follow InterviewQuestions corejavainterviewquestion (178) interviewquestions (105) datastructureandalgorithm (86) CodingInterviewQuestion (79) designpatterns (38) objectorientedprogramming (37) SQLInterviewQuestions (35) springinterviewquestions (32) threadinterviewquestions (30) collectionsinterviewquestions (26) databaseinterviewquestions (16) servletinterviewquestions (15) Programminginterviewquestion (6) hibernateinterviewquestions (6) BestofJavarevisited HowSpringMVCworksinternally? HowtodesignavendingmachineinJava? HowHashMapworksinJava? WhyStringisImmutableinJava? 10ArticlesEveryProgrammerMustRead HowtoconvertlambdaexpressiontomethodreferenceinJava8? 10TipstoimproveProgrammingSkill 10OOPdesignprinciplesprogrammershouldknow HowSynchronizationworksinJava? 10tipstoworkfastinLinux 5BookstoimproveCodingSkills JavaTutorials dateandtimetutorial (24) FIXprotocoltutorial (15) JavaCertificationOCPJPSCJP (33) javacollectiontutorial (84) javaIOtutorial (29) JavaJSONtutorial (15) JavamultithreadingTutorials (61) JavaProgrammingTutorials (20) Javaxmltutorial (16) JDBC (34) jsp-servlet (37) onlineresources (227) GetNewBlogPostsonYourEmail Getnewpostsbyemail:Subscribe Followers Categories courses (395) SQL (68) linux (50) database (49) JavaCertificationOCPJPSCJP (33) Eclipse (30) REST (29) JVMInternals (24) JQuery (21) Testing (19) general (18) Maven (16) BlogArchive ► 2022 (701) ► October (11) ► September (37) ► August (83) ► July (144) ► June (111) ► May (64) ► April (126) ► March (25) ► February (44) ► January (56) ▼ 2021 (960) ► December (134) ► November (88) ► October (40) ► September (57) ► August (224) ▼ July (359) ParsingLargeJSONFilesusingJacksonStreamingA... HowtoSolveUnrecognizedPropertyException:Unreco... HowtoparseJSONwithdatefieldinJava-Jackso... HowtoIgnoreUnknownPropertiesWhileParsingJSO... HowtoFindPrimeFactorsofIntegerNumbersinJa... java.lang.ClassNotFoundException:org.postgresql.D... WhymultipleinheritancesarenotsupportedinJava HowtocreateHTTPServerinJava-ServerSocketE... LawofDemeterinJava-PrincipleofleastKnowle... HowtodoGROUPBYinJava8?Collectors.groupingB... 10ThingsEveryJavaProgrammerShouldKnowabout... 10TipstoDebugJavaPrograminEclipse-Examples HowSSL,HTTPSandCertificatesWorksinJavaweb... 3WaystoConvertanArraytoArrayListinJava?E... DifferencebetweenLEFTandRIGHTOUTERJoinsinS... DifferenceBetweenLinkedListandArrayinJava?... WhentoMakeaMethodStaticinJava?Example DifferentTypesofJDBCDriversinJava-QuickOv... DifferencebetweenClassNotFoundExceptionvsNoCla... WhyEnumSingletonarebetterinJava?Examples BuilderDesignpatterninJava-ExampleTutorial 5CodingTipsforImprovingPerformanceofJavaap... Differencebetweenrepaintandrevalidatemethodi... HowtoCountnumberofSetbitsor1'sofInteger... WhenaclassisloadedandinitializedinJVM-Ja... HowtoAddTwoIntegerNumberswithoutusingPlus... JavaArrayListandHashMapPerformanceImprovement... IsSwingThreadSafeinJava?Answer InvalidinitialandmaximumheapsizeinJVM-How... HowtoCloseJavaProgramorSwingApplicationwit... HowtoCheckifIntegerNumberisPowerofTwoin... InvokeLaterandInvokeAndWaitinJavaSwing(anex... HowtoUseBreak,Continue,andLabelinLoopin... 10ExamplesofHotSpotJVMOptionsinJava DifferencebetweenSun(Oracle)JVMandIBMJVM? HowtoGenerateMD5checksumforFilesinJava?Ex... HowtofindCPUandMemoryusedbyJavaprocessin... 10XSLTorXML,XSLTransformationInterviewQuest... HowClassLoaderWorksinJava?Example 3waystosolvejava.lang.NoClassDefFoundErrorin... 20DesignPatternsandSoftwareDesignInterviewQ... HowtouseComparatorandComparableinJava?With... 10InterviewQuestionsonJavaGenericsforProgra... Whatis-XX:+UseCompressedOopsin64bitJVM?Example Top10GarbageCollectionInterviewQuestionsand... WhatisClassFileandByteCodeinJava?Example Top10JavaSwingInterviewQuestionsAnswersaske... HowtocomparetwolistsofvaluesinMicrosoftEx... DifferencebetweenJVM,JIR,JRE,andJDKinJava?... Howtoreload/refreshapageusingJavaScriptand... HowtoincreaseHeapmemoryofApacheTomcatServe... HowmanycharactersallowedonVARCHAR(n)columns... WhatisboundedandunboundedwildcardsinGeneric... HowtoSplitStringbasedondelimiterinJava?Ex... DifferencebetweenRightshiftandUnsignedright... WhatisthemaximumHeapSizeof32bitor64-bit... HowtoReplaceLineBreaks,NewLinesFromString... HowtoConvertByteArraytoInputStreamandOutpu... HowtoCreateJUnitTestsinEclipseandNetBeans... 10ArticlesEveryProgrammerMustRead Whatisjava.library.path?HowtosetinEclipseI... HowtoaddandsubstractdaysincurrentdateinJ... 10JDK7FeaturestoRevisit,BeforeYouWelcomeJ... JavaProgramtofindfactorialofnumberinJava-... 7ExamplestoReadFileintoaByteArrayinJava DifferencebetweenConnectedvsDisconnectedRowSe... DifferencebetweenStubandMockobjectinJavaUn... HowtoAddLeadingZerostoIntegersinJava?Str... HowtoImplementLinkedListinJavawithJUnitTe... DifferencebetweenFileInputStreamandFileReader... Top10Puzzles,Riddles,Logical,andLateralThin... DifferencebetweenUTF-8,UTF-16andUTF-32Charac... HowtoImplementThreadinJavawithExample DifferencebetweenvalueOfandparseIntmethodin... HowtoCompareTwoEnuminJava?Equalsvs==vsC... DifferenceBetweenAbstractClassvsInterfacein... WhatisStringargs[]ArgumentinJavaMainmetho... HowtodisableJUnitTest-@IgnoreannotationExa... TheUltimateGuideofGenericsinJava-Examples Differencebetweentrunk,tagsandbranchesinSVN... HowtoCheckIfNumberisEvenorOddwithoutusin... HowtoConvertInputStreamtoByteArrayinJava-... JavaProgramtoprintPrimenumbersinJava-Exa... JavaProgramtoFindSumofDigitsinaNumberusi... HowtocomparetwoXMLfilesinJava-XMLUnitExa... JAXBDateFormatExampleusingAnnotation|JavaD... HowtoconvertdoubletointinJava?Example DoesmakingallfieldsFinalmakestheclassImmut... Top10TipsonLogginginJava-Tutorial HowtoSetupJavaRemoteDebugginginEclipse-St... HowtoFindFirstandLastelementinLinkedListJ... DifferenceBetweenjavaandjavawCommandsfromJDK JavaProgramtoconnectOracleDatabasewithExamp... DifferencebetweenValidatorFormvsValidatorActio... 10pointsaboutJavaHeapSpaceorJavaHeapMemory Top12ApacheWebServerInterviewQuestionsAnswe... WhatisinterfaceinJavawithExample-Tutorial StringreplaceAll()example-Howtoreplaceallc... JavaProgramtoReverseanIntegerNumber-Exampl... HowtoMeasureElapsedExecutionTimeinJava-Sp... ► June (5) ► May (7) ► April (15) ► March (17) ► February (8) ► January (6) ► 2020 (95) ► December (13) ► November (10) ► October (6) ► September (4) ► August (5) ► July (8) ► June (2) ► May (8) ► April (20) ► March (11) ► February (8) ► 2019 (24) ► December (3) ► November (6) ► October (4) ► August (1) ► July (2) ► June (2) ► May (1) ► April (2) ► February (1) ► January (2) ► 2018 (5) ► September (1) ► August (1) ► July (2) ► June (1) ► 2017 (22) ► December (2) ► November (2) ► October (4) ► September (2) ► July (3) ► June (5) ► May (3) ► April (1) TranslateThisBlog References Oracle'sJavaTechNetwork jQueryDocumentation MicrosoftSQLServerDocumentation JavaSE8APIDocumentation SpringDocumentation Oracle'sJAvaCertification SpringSecurity5Documentation Pages PrivacyPolicy TermsandConditions CopyrightbyJavinPaul2010-2021.PoweredbyBlogger.
延伸文章資訊
- 1Db2 12 - Internationalization - UTFs
- 2UTF-16 - IBM
- 3Db2 12 - Internationalization - UTFs
UTF-16 is based on 16-bit code units. Each character is encoded as at least 2 bytes. Some charact...
- 4UTF-16 - 字嗨!
UTF-16是Unicode的一種可變長度的字元編碼形式。 它原來是最早期Unicode 1.0所想像,能用16位元的固定長去處理全世界所有文字的UCS-2。但自從Unicode 2.0新增補充...
- 5UTF-8, UTF-16, and UTF-32 - unicode - Stack Overflow