python - What is a unicode string? - Stack Overflow
文章推薦指數: 80 %
In Python 3, Unicode strings are the default. The type str is a collection of Unicode code points, and the type bytes is used for representing ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams Whatisaunicodestring?[closed] AskQuestion Asked 8years,7monthsago Modified 1year,8monthsago Viewed 100ktimes 35 Closed.Thisquestionneedstobemorefocused.Itisnotcurrentlyacceptinganswers. Wanttoimprovethisquestion?Updatethequestionsoitfocusesononeproblemonlybyeditingthispost. Closed8yearsago. Improvethisquestion Whatexactlyisaunicodestring? What'sthedifferencebetweenaregularstringandunicodestring? Whatisutf-8? I'mtryingtolearnPythonrightnow,andIkeephearingthisbuzzword.Whatdoesthecodebelowdo? i18nStrings(Unicode) >ustring=u'Aunicode\u018estring\xf1' >ustring u'Aunicode\u018estring\xf1' ##(ustringfromabovecontainsaunicodestring) >s=ustring.encode('utf-8') >s 'Aunicode\xc6\x8estring\xc3\xb1'##bytesofutf-8encoding >t=unicode(s,'utf-8')##Convertbytesbacktoaunicodestring >t==ustring##It'sthesameastheoriginal,yay! True FilesUnicode importcodecs f=codecs.open('foo.txt','rU','utf-8') forlineinf: #herelineisa*unicode*string pythonunicodeutf-8 Share Improvethisquestion Follow editedJul17,2017at19:21 evanhutomo 61711goldbadge1111silverbadges2323bronzebadges askedFeb16,2014at7:51 StevanusIskandarStevanusIskandar 39911goldbadge33silverbadges55bronzebadges 3 4 Aninternetsearchmightbeagoodplacetostart.... – MitchWheat Feb16,2014at7:54 possibleduplicateofUnicodeinPython – tripleee Feb16,2014at10:05 Seealsobit.ly/unipain – tripleee Feb16,2014at10:06 Addacomment | 2Answers 2 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 56 Update:Python3 InPython3,Unicodestringsarethedefault.ThetypestrisacollectionofUnicodecodepoints,andthetypebytesisusedforrepresentingcollectionsof8-bitintegers(ofteninterpretedasASCIIcharacters). Hereisthecodefromthequestion,updatedforPython3: >>>my_str='Aunicode\u018estring\xf1'#noneedfor"u"prefix #theescapesequence"\u"denotesaUnicodecodepoint(inhex) >>>my_str 'AunicodeƎstringñ' #theUnicodecodepointsU+018EandU+00F1weredisplayed #astheircorrespondingglyphs >>>my_bytes=my_str.encode('utf-8')#converttoabytesobject >>>my_bytes b'Aunicode\xc6\x8estring\xc3\xb1' #the"b"prefixmeansabytesliteral #theescapesequence"\x"denotesabyteusingitshexvalue #thecodepointsU+018EandU+00F1wereencodedas2-bytesequences >>>my_str2=my_bytes.decode('utf-8')#convertbacktostr >>>my_str2==my_str True Workingwithfiles: >>>f=open('foo.txt','r')#textmode(Unicode) >>>#theplatform'sdefaultencoding(e.g.UTF-8)isusedtodecodethefile >>>#tosetaspecificencoding,useopen('foo.txt','r',encoding="...") >>>forlineinf: >>>#herelineisastrobject >>>f=open('foo.txt','rb')#"b"meansbinarymode(bytes) >>>forlineinf: >>>#herelineisabytesobject Historicalanswer:Python2 InPython2,thestrtypewasacollectionof8-bitcharacters(likePython3'sbytestype).TheEnglishalphabetcanberepresentedusingthese8-bitcharacters,butsymbolssuchasΩ,и,±,and♠cannot. Unicodeisastandardforworkingwithawiderangeofcharacters.Eachsymbolhasacodepoint(anumber),andthesecodepointscanbeencoded(convertedtoasequenceofbytes)usingavarietyofencodings. UTF-8isonesuchencoding.Thelowcodepointsareencodedusingasinglebyte,andhighercodepointsareencodedassequencesofbytes. ToallowworkingwithUnicodecharacters,Python2hasaunicodetypewhichisacollectionofUnicodecodepoints(likePython3'sstrtype).Thelineustring=u'Aunicode\u018estring\xf1'createsaUnicodestringwith20characters. WhenthePythoninterpreterdisplaysthevalueofustring,itescapestwoofthecharacters(Ǝandñ)becausetheyarenotinthestandardprintablerange. Thelines=unistring.encode('utf-8')encodestheUnicodestringusingUTF-8.Thisconvertseachcodepointtotheappropriatebyteorsequenceofbytes.Theresultisacollectionofbytes,whichisreturnedasastr.Thesizeofsis22bytes,becausetwoofthecharactershavehighcodepointsandareencodedasasequenceoftwobytesratherthanasinglebyte. WhenthePythoninterpreterdisplaysthevalueofs,itescapesfourbytesthatarenotintheprintablerange(\xc6,\x8e,\xc3,and\xb1).Thetwopairsofbytesarenottreatedassinglecharacterslikebeforebecausesisoftypestr,notunicode. Thelinet=unicode(s,'utf-8')doestheoppositeofencode().Itreconstructstheoriginalcodepointsbylookingatthebytesofsandparsingbytesequences.TheresultisaUnicodestring. Thecalltocodecs.open()specifiesutf-8astheencoding,whichtellsPythontointerpretthecontentsofthefile(acollectionofbytes)asaUnicodestringthathasbeenencodedusingUTF-8. Share Improvethisanswer Follow editedFeb7,2021at5:44 answeredFeb16,2014at8:48 tomtom 20.6k66goldbadges4141silverbadges3636bronzebadges 3 2 Morespecifically,theaboveistrueforPythonv2.InPythonv3,Unicodestringsarethedefault. – tripleee Feb16,2014at10:49 thanks,...butwhenwillwebeabletoactually"see"thoseunicodecharacters?Willwekindof"inject"ourpythoncodeintoasystemwhichisabletodisplaythose? – aderchox Apr17,2019at5:20 1 Usuallynowadaysifyousimplyprintastringtoconsoleoutput,orwriteittoafilewhichyouthenviewinaneditor,youwillbeabletoseeanynon-asciicharacters.Sinceutf8ismostlybackwardscompatiblewithasciianyway,mostsystemsshouldnowassumeutf8encodingbydefault.(Forthesamereasonyoushouldbeabletosaveunicodecharactersdirectlyintoyour.pyfile,andskiptheescapedrepresentations.)@aderchox – benjimin Jan28,2020at3:29 Addacomment | -6 Pythonsupportsthestringtypeandtheunicodetype.Astringisasequenceofcharswhileaunicodeisasequenceof"pointers".Theunicodeisanin-memoryrepresentationofthesequenceandeverysymbolonitisnotacharbutanumber(inhexformat)intendedtoselectacharinamap.Soaunicodevardoesnothaveencodingbecauseitdoesnotcontainchars. Share Improvethisanswer Follow answeredFeb16,2014at7:54 RenjithNairRenjithNair 7011bronzebadge 2 1 Youcanhaveadetailedlookintoitonthisblogcarlosble.com/2010/12/understanding-python-and-unicode – RenjithNair Feb16,2014at7:55 4 -1Notanaccurateanswer.Thosearenot"pointers"andbothtypesareusedtorepresentstrings. – tripleee Feb16,2014at8:18 Addacomment | Nottheansweryou'relookingfor?Browseotherquestionstaggedpythonunicodeutf-8oraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked -2 UnicodeinPython 2 HowCanIusePythondictionaryGroupingAWSinstancesbytag'sname? 0 Whywheniamgettingu''infrontofcolumnnameswhenreadingcsvfileinpythonpandas? 0 ScrapingspecialcharacterswithSCRAPY -3 Printingtheobjectsinsteadofthekeys Related 6975 WhataremetaclassesinPython? 12290 Whatdoesthe"yield"keyworddo? 7750 Whatdoesif__name__=="__main__":do? 3357 Whatis__init__.pyfor? 2851 Convertstring"Jun120051:33PM"intodatetime 3469 Convertbytestoastring 3475 Whatisthedifferencebetween__str__and__repr__? 3588 DoesPythonhaveastring'contains'substringmethod? 1247 Whatdoesthe'b'characterdoinfrontofastringliteral? 1402 WhyisexecutingJavacodeincommentswithcertainUnicodecharactersallowed? HotNetworkQuestions Unsurewhatthesewatersoftenerdialsarefor Howdouncomputablenumbersrelatetouncomputablefunctions? Howdoyoucalculatethetimeuntilthesteady-stateofadrug? Levinson'salgorithmandQRdecompositionforcomplexleast-squaresFIRdesign HowtoviewpauseandviewcurrentsolutioninCPLEXOptimisationStudio? Howtoproperlycolorcellsinalatextablewithoutscrewingupthelines? Awordfor"amessagetomyself" Wordsforrestaurant IfthedrowshadowbladeusesShadowSwordasarangedattack,doesitthrowasword(thatitthenhastoretrievebeforeusingitagain)? Whyare"eat"and"drink"differentwordsinlanguages? Wouldextractinghydrogenfromthesunlessenitslifespan? WhatistheAmericanequivalentof"Icalledmymomtoaskafterher"? InD&D3.5,canafamiliarbetemporarilydismissed? Whydoesthesameelectrontransitionreleasephotonsofdifferentfrequenciesforsomeelements? WhatdothecolorsindicateonthisKC135tankerboom? Changelinkcolorbasedinbackgroundcolor? Botchingcrosswindlandings Howtotellifmybikehasanaluminumframe Whatprotocolisthiswaveform? ConvertanintegertoIEEE754float LeavingaTTjobthenre-enteringacademia:Areaofbusinessandmanagement Whydoes«facture»mean"bill,invoice"? Howtoformalizeagamewhereeachplayerisaprogramhavingaccesstoopponent'scode? meaningof'illesas'inMagnaCarta morehotquestions lang-py Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Converting Between Unicode and Plain Strings - O'Reilly
Unicode strings can be encoded in plain strings in a variety of ways, according to whichever enco...
- 2Unicode String in Python - Tutorialspoint
Unicode String in Python - Normal strings in Python are stored internally as 8-bit ASCII, while U...
- 3Unicode & Character Encodings in Python: A Painless Guide
Python 3's str type is meant to represent human-readable text and can contain any Unicode charact...
- 4A Guide to Unicode, UTF-8 and Strings in Python
As we discussed earlier, in Python, strings can either be represented in bytes or unicode code po...
- 5Byte string, Unicode string, Raw string — A Guide to all strings ...
With the basic concepts understood, let's cover some practical coding tips in Python. In Python3,...