Convert UTF-8 to string literals in Python - Stack Overflow
文章推薦指數: 80 %
The u'' syntax only works for string literals, e.g. defining values in source code. Using the syntax results in a unicode object being ... Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams ConvertUTF-8tostringliteralsinPython AskQuestion Asked 8years,3monthsago Modified 8years,3monthsago Viewed 38ktimes 6 IhaveastringinUTF-8formatbutnotsosurehowtoconvertthisstringtoit'scorrespondingcharacterliteral.ForexampleIhavethestring: Mystringis:'Entre\xc3\xa9' Exampleone: Thiscode: u'Entre\xc3\xa9'.encode('latin-1').decode('utf-8') returnstheresult:u'Entre\xe9' IfIthencontinuebyprintingthis: printu'Entre\xe9' Igettheresult:Entreé ThisisgreatandclosetowhatIneed.Theproblemis,Ican'tmake'Entre\xc3\xa9'avariableandpassitthroughthestepsasthisnowbreaks.Anytipsforgettingthisworking? Example: a='Entre\xc3\xa9' b='u'+a.encode('latin-1').decode('utf-8') c='u'+b Iwouldlikeresultof"c"tobe: Entreé pythonstringutf-8literals Share Improvethisquestion Follow editedJul4,2014at10:13 MartijnPieters♦ 989k275275goldbadges38913891silverbadges32473247bronzebadges askedJul4,2014at10:05 TminerTminer 29222goldbadges44silverbadges1414bronzebadges Addacomment | 1Answer 1 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 10 Theu''syntaxonlyworksforstringliterals,e.g.definingvaluesinsourcecode.Usingthesyntaxresultsinaunicodeobjectbeingcreated,butthat'snottheonlywaytocreatesuchanobject. Youcannotmakeaunicodevaluefromabytestringbyaddinguinfrontofit.Butifyoucalledstr.decode()withtherightencoding,yougetaunicodevalue.Vice-versa,youcanencodeunicodeobjectstobytestringswithunicode.encode(). Notethatwhendisplayingaunicodeobject,PythonrepresentsitbyusingtheUnicodestringliteralsyntaxagain(sou'...'),toeasedebugging.YoucanpastetherepresentationbackintoaPythoninterpreterandgetanobjectwiththesamevalue. Youravalueisdefinedusingabytestringliteral,soyouonlyneedtodecode: a='Entre\xc3\xa9' b=a.decode('utf8') YourfirstexamplecreatedaMojibake,aUnicodestringcontainingLatin-1codepointsthatactuallyrepresentUTF-8bytes.ThisiswhyyouhadtoencodetoLatin-1first(toundotheMojibake),thendecodefromUTF-8. YoumaywanttoreaduponPythonandUnicodeintheUnicodeHOWTO.Otherarticlesofinterestare: TheAbsoluteMinimumEverySoftwareDeveloperAbsolutely,PositivelyMustKnowAboutUnicodeandCharacterSets(NoExcuses!)byJoelSpolsky PragmaticUnicodebyNedBatchelder Share Improvethisanswer Follow editedJul4,2014at10:31 answeredJul4,2014at10:09 MartijnPieters♦MartijnPieters 989k275275goldbadges38913891silverbadges32473247bronzebadges 2 ManyThanks!SonowifIenter:bintothepythoninterpreterIget:u'Entre\xe9'IfIenter:printbIget:EntreéIsitpossibletohaveastringvariablethatwillautomaticallyreturnEntreéwithoutusingtheprintstatement? – Tminer Jul4,2014at10:29 @user3804963:Ithinkyouareconfusingtherepresentation(u'Entre\xe9')withthevalue.printshowsyouthevalue(asencodedforyourterminal),whileyourpythonconsoleshowsyoutherepresentation(fordebugging).Novaluechangehastakenplace.PythonisshowingyouavaluethatcanbecopiedandpastedintoyoursourcecodewithouthavingtodeclareasourcecodeencodingbeyondthedefaultASCII,soanescapesequence(\xe9)isshownfortheU+00E9Unicodecodepoint.Thisisnormal. – MartijnPieters ♦ Jul4,2014at11:46 Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedpythonstringutf-8literalsoraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 0 HowtoconvertaunicodestringtoaliteralstringinPython? Related 7319 WhatisthedifferencebetweenStringandstringinC#? 6975 WhataremetaclassesinPython? 4567 HowdoIread/convertanInputStreamintoaStringinJava? 7492 DoesPythonhaveaternaryconditionaloperator? 3469 Convertbytestoastring 4802 HowdoImakethefirstletterofastringuppercaseinJavaScript? 5276 HowdoIreplacealloccurrencesofastringinJavaScript? 7412 HowtocheckwhetherastringcontainsasubstringinJavaScript? 3588 DoesPythonhaveastring'contains'substringmethod? 3409 HowdoIconvertaStringtoanintinJava? HotNetworkQuestions ShouldIusepwdortildeplus(~+)? Whyarefighterjetssoloudwhendoingslowflight? SomeoneofferedtaxdeductibledonationasapaymentmethodforsomethingIamselling.AmIgettingscammed? Botchingcrosswindlandings Myfavoriteanimalisa-singularandpluralform My(large)employerhasn'tregisteredanobviousmisspellingoftheirprimarydomainURL sshhowtoallowaverylimiteduserwithnohometologinwithpubkey Canaphotonturnaprotonintoaneutron? Howtoremovetikznode? Howdoparty-listsystemsaccommodateindependentcandidates? Movingframesmethod ArethereanyspellsotherthanWishthatcanlocateanobjectthroughleadshielding? WhathappenswhenthequasarremnantsreachEarthin3millionyears? Whenisthefirstelementintheargumentlistregardedasafunctionsymbolandwhennot? CommonPlotLegendsforDensityPlot MakeaCourtTranscriber Probabilisticmethodsforundecidableproblem Whatisthebestwaytocalculatetruepasswordentropyforhumancreatedpasswords? Canyoufindit? Whatare"HollywoodTwin"beds? Workplaceidiomfor"beiGelegenheit"-ordertodoeventually,butdonotprovidepriority Interpretinganegativeself-evaluationofahighperformer HowIcanremoveautoincrementfromaPrimarykeyinpostgresql? WhydoNorthandSouthAmericancountriesoffercitizenshipbasedonunrestrictedJusSoli(rightofsoil)? morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-py Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Decode UTF-8 in Python | Delft Stack
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. ...
- 2Convert UTF-8 to string literals in Python - Stack Overflow
The u'' syntax only works for string literals, e.g. defining values in source code. Using the syn...
- 3Day27 Python 基礎- 字符轉編碼操作 - iT 邦幫忙
UTF-8 是一種針對Unicode的可變長度字元編碼,英文字符一樣會依照ASCII碼規範,只占一個字節8bit,而中文字符的話,統一就占三個字節. 回顧可以參考字符編碼.
- 4Converting Between Unicode and Plain Strings - O'Reilly
Convert Unicode to plain Python string: "encode" unicodestring = u"Hello world" utf8string = unic...
- 5Python Convert Unicode to Bytes, ASCII, UTF-8, Raw String
Converting Unicode strings to bytes is quite common these days because it is necessary to convert...