UTF-8 - HTML & CSS Wiki - Fandom
文章推薦指數: 80 %
UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character ... HTML&CSSWiki Explore MainPage Discuss AllPages Community InteractiveMaps RecentBlogPosts HTML HTML HTML4.01 HTML5 Contentpage Aside Figure Footer Header Nav Section CSS CSS2 CSS3 Tutorial SeparationofLayers BrowserCompatibility Firebug Favicon ColorCodes BasicWikiMarkup Community Policies ArticlePolicy BlockingPolicy ChatPolicy DeletionPolicy ImagePolicy SignaturePolicy SockpuppetPolicy RfA RfFA Administrators HelpDesk FeaturedArticles ArticleStubs FANDOM Games Anime Movies TV Video Wikis ExploreWikis CommunityCentral StartaWiki Don'thaveanaccount? Register SignIn Advertisement in: Code UTF-8 Viewsource History Talk(0) UTF-8(8-bitUnicodeTransformationFormat)isavariable-lengthcharacterencodingforUnicode.LikeUTF-16andUTF-32,UTF-8canrepresenteverycharacterintheUnicodecharacterset,butunlikethemithasthespecialpropertyofbeingbackwards-compatiblewithASCII.Forthisreason,itissteadilybecomingthedominantcharacterencodingforfiles,e-mail,webpages,andsoftwarethatmanipulatestextualinformation. UTF-8encodeseachcharacter(codepoint)in1to4octets(8-bitbytes).Thefirst128charactersoftheUnicodecharacterset(whichcorresponddirectlytotheASCII)useasingleoctetwiththesamebinaryvalueasinASCII. TheInternetEngineeringTaskForce(IETF)requiresallInternetprotocolstoidentifytheencodingusedforcharacterdata,andthesupportedcharacterencodingsmustincludeUTF-8. TheUTF-8encodingisvariable-width,witheachcharacterrepresentedby1to4bytes.Eachbytehas0–4leadingconsecutive'1'bitsfollowedbya'0'bittoindicateitstype.TheremainingbitsareconcatenatedtogettheUnicodecodepoint. Codepoint Binarycodepoint UTF-8bytes Example U+0000toU+007F 0xxxxxxx 0xxxxxxx '$'U+0024=00100100→00100100→0x24 U+0080toU+07FF 00000yyy yyxxxxxx 110yyyyy10xxxxxx '¢'U+00A2=00000000 10100010→1100001010100010→0xC20xA2 U+0800toU+FFFF zzzzyyyy yyxxxxxx 1110zzzz10yyyyyy10xxxxxx '€'U+20AC=00100000 10101100→111000101000001010101100→0xE20x820xAC U+010000toU+10FFFF 000wwwzz zzzzyyyy yyxxxxxx 11110www10zzzzzz10yyyyyy10xxxxxx '𤭢'U+024B62=00000010 01001011 01100010→11110000101001001010110110100010→0xF00xA40xAD0xA2 Sothefirst128characters(US-ASCII)needonebyte.Thenext1,920charactersneedtwobytestoencode.ThisincludesLatinletterswithdiacriticsandcharactersfromGreek,Cyrillic,Coptic,Armenian,Hebrew,Arabic,SyriacandTānaalphabets.ThreebytesareneededfortherestoftheBasicMultilingualPlane(whichcontainsvirtuallyallcharactersincommonuse).FourbytesareneededforcharactersintheotherplanesofUnicode,whichincludelesscommonCJKcharactersandvarioushistoricscripts. Bycontinuingthepatterngivenaboveitispossibletodealwithmuchlargernumbers.Theoriginalspecificationallowedforsequencesofuptosixbytescoveringnumbersupto31bits(theoriginallimitoftheUniversalCharacterSet).However,UTF-8wasrestrictedbyRFC3629(Note:IETFdoesn'tdefineUTF-8,Unicodedoes)touseonlytheareacoveredbytheformalUnicodedefinition,U+0000toU+10FFFF,inNovember2003. ThispageusesCreativeCommonsLicensedcontentfromWikipedia(viewauthors). Categories: Code CommunitycontentisavailableunderCC-BY-SAunlessotherwisenoted. Advertisement FanFeed 1 ColorCodes 2 Border-radius 3 -moz-linear-gradient UniversalConquestWiki Let'sGoLuna!Wiki Club57Wiki FollowonIG TikTok JoinFanLab
延伸文章資訊
- 1UTF-8 - Gentoo Wiki
UTF-8 means that ASCII and Latin characters are interchangeable with little increase in the size ...
- 2UTF-8 - 维基百科,自由的百科全书 - KFD.ME
UTF-8(8-bit Unicode Transformation Format)是一种针对Unicode的可变长度字符编码,也是一种前缀码。它可以用一至四个字节对Unicode字符集中的所有...
- 3UTF-8 - Wikipedia
UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Un...
- 4UTF-8 - HTML & CSS Wiki - Fandom
UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode. ...
- 5UTF-8 - Wikiwand
維基百科,自由的百科全書. 此條目需要補充更多來源。 (2018年12月27日)請協助 ...