Db2 12 - Internationalization - UTFs
文章推薦指數: 80 %
UTF-16 is based on 16-bit code units. Each character is encoded as at least 2 bytes. Some characters that are encoded with a 1-byte code unit in UTF-8 are ... UTFs EachUnicodecodepointcanbeexpressedinseveral differentformats.TheseformatsarecalledUnicodetransformation formats(UTFs). Forexample,theletterMisthe UnicodecodepointU+004D.InUTF-8,thiscodepointisrepresented asX'4D'.InUTF-16,thiscodepointcanberepresentedasX'004D'.1 AUTFmapseachUnicodecodepointtoauniquecodeunit sequence.Acodeunitistheminimalbitcombination thatcanrepresentacharacter.EachUTFusesadifferentcodeunit size.Forexample,UTF-8isbasedon8-bitcodeunits.Therefore, eachcharactercanbe8bits(1byte),16bits(2bytes),24bits (3bytes),or32bits(4bytes).Likewise,UTF-16isbasedon16-bit codeunits.Therefore,eachcharactercanbe16bits(2bytes)or 32bits(4bytes). AllUTFsincludethefullUnicodecharacter repertoire,orsetofcharacters.EachUTFcanrepresentany Unicodecharacterthatyouneedtorepresent. Thefollowing UTFsaredefinedbytheUnicodeConsortium: UTF-8 UTF-8 isbasedon8-bitcodeunits.Eachcharacterisencodedas1to4 bytes.Thefirst128Unicodecodepointsareencodedas1byte inUTF-8.ThesecodepointsarethesameasthoseinASCIICCSID367. Anyothercharacterisencodedwithmorethan1byteinUTF-8. InIBM®,UTF-8isalsoknownasUnicode CCSID1208. Db2uses UTF-8toencodedatainthefollowingways: Db2uses UTF-8toencodedatainCHAR,VARCHAR,andCLOBcolumnsinUnicode tables. Db2parses SQLstatementsandprecompilessourcecodeinUTF-8. TheDb2catalog tablesthathavetheUnicodeencodingschemeareencodedinUTF-8. UTF-16 UTF-16isbasedon16-bitcodeunits.Eachcharacterisencoded asatleast2bytes.Somecharactersthatareencodedwitha1-byte codeunitinUTF-8areencodedwitha2-bytecodeunitinUTF-16.Charactersthataresurrogateorsupplementarycharacters use4bytesandthusrequireadditionalstorage.Thesecharacters canalsobestoredinUTF-8orUTF-32,but,becausetheyalwaysrequire 4bytesofstorage,neitheroftheseformatsprovideanyspacesavings. InIBM,UTF-16isalsoknownasUnicode CCSID1200. Db2uses UTF-16toencodedatainGRAPHIC,VARGRAPHIC,andDBCLOBcolumnsin Unicodetables. UTF-32 UTF-32isbasedon32-bitcodeunits.Eachcharacterisencoded asatleast4bytes.Db2does notstoredatainUTF-32. The followingtableshowsexampleUTFencodingsforseveralcharacters. Table1.ExampleUTFencodings Character Unicodecodepoint ASCII UTF-8 UTF-16(BigEndianformat)1 UTF-32(BigEndianformat) A U+0041 X'41' X'41' X'0041' X'00000041' a U+0061 X'61' X'61' X'0061' X'00000061' 9 U+0039 X'39' X'39' X'0039' X'00000039' Å U+00C5 X'C5' X'C385'2 X'00C5' X'000000C5' 顠 U+9860 X'CDDB'(CCSID939) X'E9A1A0' X'9860' X'00009860' U+200D0 Doesnotexist X'F0A08390' X'D840DCD0' X'000200D0' Notes: z/OS®usesBigEndianformat only.LittleEndianformatisusedinotheroperatingsystems. X'C5'becomesdouble-byteinUTF-8. Noticethatforsomecharacters,theUTFencodingsare fairlypredictable.Forexample,thecharacterA, whichisUnicodecodepointU+0041,isencodedasX'41'in ASCIIandUTF-8,andasX'0041'inUTF-16andasX'00000041'in UTF-32.However,theUTFencodingsforacharacterlikeÅordo notfollowthesamepattern. Theprocessofconvertingavalue fromitsUnicodecodepointtoitsUTFhexadecimalvalueiscalledencoding. Forexample,UnicodecodepointU+0041isencodedinUTF-8asX'41'. Thereverseprocess,convertingaUTFhexadecimalvaluetoitsUnicode codepoint,iscalleddecoding.Forexample,suppose thatyouseethehexadecimalvalueX'00C5'intraceoutput andyouknowthatthedataisinUTF-16.Youcandecodethevalue tofindthatitcorrespondstoUnicodecodepointU+00C5.Youcan thenlookupthisUnicodecodepointontheUnicodecharactercodechartsontheUnicode ConsortiumwebsiteandfindthatitcorrespondstothecharacterÅ. You canfindthestepsforhowtomanuallyencodeanddecodeUnicodedata ontheUnicodeConsortiumwebsite.Alternatively,youcanuseaconverter tooltodotheconversionforyou. Parenttopic:Unicode Relatedconcepts Endianness Relatedinformation UnicodeConsortium UTF-8,UTF-16,UTF-32&BOM(onUnicodeConsortiumwebsite) UnicodeCharacterCodeCharts(onUnicodeConsortiumwebsite) 1X'004D'is theUTF-16bigendianrepresentation.TheUTF-16littleendianrepresentation isX'4D00'.Formoreinformationaboutendianness,seeEndianness.
延伸文章資訊
- 1utf16string - Rust - Docs.rs
However different CPU architectures encode these u16 integers using different byte order: little-...
- 2UTF16 Encoder - Browserling
Useful, free online tool for that converts text and strings to UTF-16 encoding ... two-byte or fo...
- 3Db2 12 - Internationalization - UTFs
UTF-16 is based on 16-bit code units. Each character is encoded as at least 2 bytes. Some charact...
- 4Unicode、UTF-8、UTF-16,終於懂了 - 閱坊
比如:Unicode 只是字符集,UTF-8、UTF-16、UTF-32 纔是真正的字符編碼規則 ... BOM 是byte-order mark 的縮寫,是"字節序標記" 的意思, ...
- 5What are Unicode, UTF-8, and UTF-16? - Stack Overflow
UTF-16 will allocate minimum 2 bytes and maximum of 4 bytes per character, it will not allocate 1...