UTF-8 - HTML & CSS Wiki - Fandom

文章推薦指數: 80 %
投票人數:10人

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character ... HTML&CSSWiki Explore MainPage Discuss AllPages Community InteractiveMaps RecentBlogPosts HTML HTML HTML4.01 HTML5 Contentpage Aside Figure Footer Header Nav Section CSS CSS2 CSS3 Tutorial SeparationofLayers BrowserCompatibility Firebug Favicon ColorCodes BasicWikiMarkup Community Policies ArticlePolicy BlockingPolicy ChatPolicy DeletionPolicy ImagePolicy SignaturePolicy SockpuppetPolicy RfA RfFA Administrators HelpDesk FeaturedArticles ArticleStubs FANDOM Games Anime Movies TV Video Wikis ExploreWikis CommunityCentral StartaWiki Don'thaveanaccount? Register SignIn Advertisement in: Code UTF-8 Viewsource History Talk(0) UTF-8(8-bitUnicodeTransformationFormat)isavariable-lengthcharacterencodingforUnicode.LikeUTF-16andUTF-32,UTF-8canrepresenteverycharacterintheUnicodecharacterset,butunlikethemithasthespecialpropertyofbeingbackwards-compatiblewithASCII.Forthisreason,itissteadilybecomingthedominantcharacterencodingforfiles,e-mail,webpages,andsoftwarethatmanipulatestextualinformation. UTF-8encodeseachcharacter(codepoint)in1to4octets(8-bitbytes).Thefirst128charactersoftheUnicodecharacterset(whichcorresponddirectlytotheASCII)useasingleoctetwiththesamebinaryvalueasinASCII. TheInternetEngineeringTaskForce(IETF)requiresallInternetprotocolstoidentifytheencodingusedforcharacterdata,andthesupportedcharacterencodingsmustincludeUTF-8. TheUTF-8encodingisvariable-width,witheachcharacterrepresentedby1to4bytes.Eachbytehas0–4leadingconsecutive'1'bitsfollowedbya'0'bittoindicateitstype.TheremainingbitsareconcatenatedtogettheUnicodecodepoint. Codepoint Binarycodepoint UTF-8bytes Example U+0000toU+007F 0xxxxxxx 0xxxxxxx '$'U+0024=00100100→00100100→0x24 U+0080toU+07FF 00000yyy yyxxxxxx 110yyyyy10xxxxxx '¢'U+00A2=00000000 10100010→1100001010100010→0xC20xA2 U+0800toU+FFFF zzzzyyyy yyxxxxxx 1110zzzz10yyyyyy10xxxxxx '€'U+20AC=00100000 10101100→111000101000001010101100→0xE20x820xAC U+010000toU+10FFFF 000wwwzz zzzzyyyy yyxxxxxx 11110www10zzzzzz10yyyyyy10xxxxxx '𤭢'U+024B62=00000010 01001011 01100010→11110000101001001010110110100010→0xF00xA40xAD0xA2 Sothefirst128characters(US-ASCII)needonebyte.Thenext1,920charactersneedtwobytestoencode.ThisincludesLatinletterswithdiacriticsandcharactersfromGreek,Cyrillic,Coptic,Armenian,Hebrew,Arabic,SyriacandTānaalphabets.ThreebytesareneededfortherestoftheBasicMultilingualPlane(whichcontainsvirtuallyallcharactersincommonuse).FourbytesareneededforcharactersintheotherplanesofUnicode,whichincludelesscommonCJKcharactersandvarioushistoricscripts. Bycontinuingthepatterngivenaboveitispossibletodealwithmuchlargernumbers.Theoriginalspecificationallowedforsequencesofuptosixbytescoveringnumbersupto31bits(theoriginallimitoftheUniversalCharacterSet).However,UTF-8wasrestrictedbyRFC3629(Note:IETFdoesn'tdefineUTF-8,Unicodedoes)touseonlytheareacoveredbytheformalUnicodedefinition,U+0000toU+10FFFF,inNovember2003. ThispageusesCreativeCommonsLicensedcontentfromWikipedia(viewauthors). Categories: Code CommunitycontentisavailableunderCC-BY-SAunlessotherwisenoted. Advertisement FanFeed 1 ColorCodes 2 Border-radius 3 -moz-linear-gradient UniversalConquestWiki Let'sGoLuna!Wiki Club57Wiki FollowonIG TikTok JoinFanLab



請為這篇文章評分?