What is UTF-8? - Twilio
文章推薦指數: 80 %
UTF-8 is a variable-width character encoding standard that uses between one and four eight-bit bytes to represent all valid Unicode code points. TwilioDocs SMS Voice Serverless Video Studio Alldocs... SDKs Help Login Signup SMS Voice Serverless Video Studio Alldocs... SDKs Help Login Signup UTF-8isavariable-widthcharacterencodingstandardthatusesbetweenoneandfoureight-bitbytestorepresentallvalidUnicodecodepoints. UTF-8Basics UTF-8(UnicodeTransformation–8-bit)isanencodingdefinedbytheInternationalOrganizationforStandardization(ISO)inISO10646.Itcanrepresentupto2,097,152codepoints(2^21),morethanenoughtocoverthecurrent1,112,064Unicodecodepoints. Insteadofcharacters,itisactuallymorecorrecttorefertocodepointswhendiscussingencodingsystems.Codepointsallowabstractionfromthetermcharacterandaretheatomicunitofstorageofinformationinanencoding.Mostcodepointsrepresentasinglecharacter,butsomerepresentinformationsuchasformatting. UTF-8isa“variable-width”encodingstandard.Thismeansthatitencodeseachcodepointwithadifferentnumberofbytes,betweenoneandfour.Asaspace-savingmeasure,commonlyusedcodepointsarerepresentedwithfewerbytesthaninfrequentlyappearingcodepoints. BackwardcompatibilitywithASCII UTF-8usesonebytetorepresentcodepointsfrom0-127.Thesefirst128Unicodecodepointscorrespondone-to-onewithASCIIcharactermappings,soASCIIcharactersarealsovalidUTF-8characters. HowUTF-8works:anexample ThefirstUTF-8bytesignalshowmanybyteswillfollowit.Thenthecodepointbitsare“distributed”overthefollowingbytes.Thisisbestexplainedwithanexample: UnicodeassignstheFrenchletterétothecodepointU+00E9.Thisis11101001inbinary;itisnotpartoftheASCIIcharacterset.UTF-8representsthiseight-bitnumberusingtwobytes. Theleadingbitsofbothbytescontainmeta-data.Thefirstbytebeginswith110.The1sindicatethatthisisatwo-bytesequence,andthe0indicatesthatthecodepointbitswillfollow.Thesecondbytebeginswith10tosignalthatitisacontinuationinaUTF-8sequence. Thisleaves11“slots”forthecodepointbits.RememberthattheU+00E9codepointonlyrequireseightbits.UTF-8padstheleadingbitswiththree0stofully“fillout”theremainingspaces. TheresultingUTF-8representationofé(U+00E9)is1100001110101001. HowdoesTwiliohandleUTF-8characters? UTF-8isthedominantencodingoftheWorldWideWeb,soyourcodeislikelyencodedwiththisstandard. ForSMSmessages,Twiliousesthemostcompactencodingmethodavailable.TwiliodefaultstoGSM-7andfallsbacktoUCS-2ifyourmessagecontainsanynon-GSM-7characters.TheuseofGSM-7versusUCS-2encodingstandardscanaffectthenumberofsegmentsittakestosendyourmessage. TwilioCopilot’sSmartEncodingautomaticallydetectseasy-to-missUnicodecharacters,suchasasmartquote(〞)orlongdash(—),andreplacesthemwithasimilarcharacter.Thiskeepsyournumberofmessagesegments,andpricing,aslowaspossible. NoneedtoworryifyourUTF-8encodedstring"Oohlàlà"willarriveoverSMS–Twilio'sProgrammableSMShasyoucovered. Readytostartbuilding?Signupnow. Ratethispage: 1 2 3 4 5 ⬅Backtoglossary Contributors: Thankyouforyourfeedback! Pleaseselectthereason(s)foryourfeedback.Theadditionalinformationyouprovidehelpsusimproveourdocumentation: IfapplicablefillinthecountrieswhereyouareusingTwilio Missinginformationorcode Contentisconfusingorhardtofollow Inaccurateoroutdatedinformation Brokenlinkortypo Didnotsolvemyproblem Contentiseasytofollow Solvedmyproblem Other Sendyoursuggestions Needhelp?TalktoSupport ProtectedbyreCAPTCHA–Privacy-Terms Sendingyourfeedback... 🎉Thankyouforyourfeedback! Somethingwentwrong.Pleasetryagain. Thanksforyourfeedback! Referusandget$10in3simplesteps! Step1 Getlink Getafreepersonalreferrallinkhere Step2 Give$10 Yourusersignsupandupgradeusinglink Step3 Get$10 1,250freeSMSesOR1,000freevoiceminsOR12,000chatsORmore Learnmoreaboutthereferralprogram
延伸文章資訊
- 1Db2 12 - Internationalization - UTFs
- 27. Unicode encodings
- 3UTF-8 - OpenHome.cc
Unicode 的實作方式之一UTF-8(8-bit Unicode Transformation Format),使用可變 ... 作為位元組順序記號(Byte-Order Mark,BOM)...
- 4UTF-8 - Wikipedia
UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to fo...
- 5UTF-8 - Jenkov.com
UTF-8 is a byte encoding used to encode unicode characters. UTF-8 uses 1, 2, 3 or 4 bytes to repr...