What is UTF-8? - Twilio

文章推薦指數: 80 %
投票人數:10人

UTF-8 is a variable-width character encoding standard that uses between one and four eight-bit bytes to represent all valid Unicode code points. TwilioDocs SMS Voice Serverless Video Studio Alldocs... SDKs Help Login Signup SMS Voice Serverless Video Studio Alldocs... SDKs Help Login Signup UTF-8isavariable-widthcharacterencodingstandardthatusesbetweenoneandfoureight-bitbytestorepresentallvalidUnicodecodepoints. UTF-8Basics UTF-8(UnicodeTransformation–8-bit)isanencodingdefinedbytheInternationalOrganizationforStandardization(ISO)inISO10646.Itcanrepresentupto2,097,152codepoints(2^21),morethanenoughtocoverthecurrent1,112,064Unicodecodepoints. Insteadofcharacters,itisactuallymorecorrecttorefertocodepointswhendiscussingencodingsystems.Codepointsallowabstractionfromthetermcharacterandaretheatomicunitofstorageofinformationinanencoding.Mostcodepointsrepresentasinglecharacter,butsomerepresentinformationsuchasformatting. UTF-8isa“variable-width”encodingstandard.Thismeansthatitencodeseachcodepointwithadifferentnumberofbytes,betweenoneandfour.Asaspace-savingmeasure,commonlyusedcodepointsarerepresentedwithfewerbytesthaninfrequentlyappearingcodepoints. BackwardcompatibilitywithASCII UTF-8usesonebytetorepresentcodepointsfrom0-127.Thesefirst128Unicodecodepointscorrespondone-to-onewithASCIIcharactermappings,soASCIIcharactersarealsovalidUTF-8characters. HowUTF-8works:anexample ThefirstUTF-8bytesignalshowmanybyteswillfollowit.Thenthecodepointbitsare“distributed”overthefollowingbytes.Thisisbestexplainedwithanexample: UnicodeassignstheFrenchletterétothecodepointU+00E9.Thisis11101001inbinary;itisnotpartoftheASCIIcharacterset.UTF-8representsthiseight-bitnumberusingtwobytes. Theleadingbitsofbothbytescontainmeta-data.Thefirstbytebeginswith110.The1sindicatethatthisisatwo-bytesequence,andthe0indicatesthatthecodepointbitswillfollow.Thesecondbytebeginswith10tosignalthatitisacontinuationinaUTF-8sequence. Thisleaves11“slots”forthecodepointbits.RememberthattheU+00E9codepointonlyrequireseightbits.UTF-8padstheleadingbitswiththree0stofully“fillout”theremainingspaces. TheresultingUTF-8representationofé(U+00E9)is1100001110101001. HowdoesTwiliohandleUTF-8characters? UTF-8isthedominantencodingoftheWorldWideWeb,soyourcodeislikelyencodedwiththisstandard. ForSMSmessages,Twiliousesthemostcompactencodingmethodavailable.TwiliodefaultstoGSM-7andfallsbacktoUCS-2ifyourmessagecontainsanynon-GSM-7characters.TheuseofGSM-7versusUCS-2encodingstandardscanaffectthenumberofsegmentsittakestosendyourmessage. TwilioCopilot’sSmartEncodingautomaticallydetectseasy-to-missUnicodecharacters,suchasasmartquote(〞)orlongdash(—),andreplacesthemwithasimilarcharacter.Thiskeepsyournumberofmessagesegments,andpricing,aslowaspossible. NoneedtoworryifyourUTF-8encodedstring"Oohlàlà"willarriveoverSMS–Twilio'sProgrammableSMShasyoucovered. Readytostartbuilding?Signupnow. Ratethispage: 1 2 3 4 5 ⬅Backtoglossary Contributors: Thankyouforyourfeedback! Pleaseselectthereason(s)foryourfeedback.Theadditionalinformationyouprovidehelpsusimproveourdocumentation: IfapplicablefillinthecountrieswhereyouareusingTwilio Missinginformationorcode Contentisconfusingorhardtofollow Inaccurateoroutdatedinformation Brokenlinkortypo Didnotsolvemyproblem Contentiseasytofollow Solvedmyproblem Other Sendyoursuggestions Needhelp?TalktoSupport ProtectedbyreCAPTCHA–Privacy-Terms Sendingyourfeedback... 🎉Thankyouforyourfeedback! Somethingwentwrong.Pleasetryagain. Thanksforyourfeedback! Referusandget$10in3simplesteps! Step1 Getlink Getafreepersonalreferrallinkhere Step2 Give$10 Yourusersignsupandupgradeusinglink Step3 Get$10 1,250freeSMSesOR1,000freevoiceminsOR12,000chatsORmore Learnmoreaboutthereferralprogram



請為這篇文章評分?