Decoding UTF-8 strings in Python - Stack Overflow
文章推薦指數: 80 %
It's an encoding error - so if it's a unicode string, this ought to fix it: text.encode("windows-1252").decode("utf-8"). Home Public Questions Tags Users Companies Collectives ExploreCollectives Teams StackOverflowforTeams –Startcollaboratingandsharingorganizationalknowledge. CreateafreeTeam WhyTeams? Teams CreatefreeTeam Collectives™onStackOverflow Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost. LearnmoreaboutCollectives Teams Q&Aforwork Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch. LearnmoreaboutTeams DecodingUTF-8stringsinPython AskQuestion Asked 9years,11monthsago Modified 4years,7monthsago Viewed 208ktimes 30 I'mwritingawebcrawlerinpython,anditinvolvestakingheadlinesfromwebsites. Oneoftheheadlinesshould'veread:AndtheHip'scoming,too Butinsteaditsaid:AndtheHip’scoming,too What'sgoingwronghere? pythonpython-2.7 Share Follow editedFeb15,2018at23:16 ZeroPiraeus 53.6k2727goldbadges149149silverbadges159159bronzebadges askedOct28,2012at16:22 user1624005user1624005 91711goldbadge1212silverbadges1818bronzebadges 1 4 Itwouldbeeasiertohelpyouifyouincludedtherelevantcode,andtheparticularwebsiteyou'reparsing. – jbowes Oct28,2012at16:27 Addacomment | 2Answers 2 Sortedby: Resettodefault Highestscore(default) Trending(recentvotescountmore) Datemodified(newestfirst) Datecreated(oldestfirst) 57 It'sanencodingerror-soifit'saunicodestring,thisoughttofixit: text.encode("windows-1252").decode("utf-8") Ifit'saplainstring,you'llneedanextrastep: text.decode("utf-8").encode("windows-1252").decode("utf-8") Bothofthesewillgiveyouaunicodestring. Bytheway-todiscoverhowapieceoftextlikethishasbeenmangledduetoencodingissues,youcanusechardet: >>>importchardet >>>chardet.detect(u"AndtheHip’scoming,too") {'confidence':0.5,'encoding':'windows-1252'} Share Follow editedOct28,2012at16:44 answeredOct28,2012at16:36 ZeroPiraeusZeroPiraeus 53.6k2727goldbadges149149silverbadges159159bronzebadges 1 6 Smallwarning:chardetisLGPL-licensed,sothat'saconsiderationifit'sgoinginsomethingthat'sdistributedtoendusers. – RobGrant Jan13,2019at15:25 Addacomment | 14 Youneedtoproperlydecodethesourcetext.MostlikelythesourcetextisinUTF-8format,notASCII. Becauseyoudonotprovideanycontextorcodeforyourquestionitisnotpossibletogiveadirectanswer. IsuggestyoustudyhowunicodeandcharacterencodingisdoneinPython: http://docs.python.org/2/howto/unicode.html Share Follow answeredOct28,2012at16:26 MikkoOhtamaaMikkoOhtamaa 79.2k4747goldbadges238238silverbadges398398bronzebadges 1 3 Yes,it'sUTF-8treatedlikeWindows1252:u'\N{RIGHTSINGLEQUOTATIONMARK}'.encode('utf-8').decode('cp1252'). – ErykSun Oct28,2012at16:27 Addacomment | YourAnswer ThanksforcontributingananswertoStackOverflow!Pleasebesuretoanswerthequestion.Providedetailsandshareyourresearch!Butavoid…Askingforhelp,clarification,orrespondingtootheranswers.Makingstatementsbasedonopinion;backthemupwithreferencesorpersonalexperience.Tolearnmore,seeourtipsonwritinggreatanswers. Draftsaved Draftdiscarded Signuporlogin SignupusingGoogle SignupusingFacebook SignupusingEmailandPassword Submit Postasaguest Name Email Required,butnevershown PostYourAnswer Discard Byclicking“PostYourAnswer”,youagreetoourtermsofservice,privacypolicyandcookiepolicy Nottheansweryou'relookingfor?Browseotherquestionstaggedpythonpython-2.7oraskyourownquestion. TheOverflowBlog HowtoearnamillionreputationonStackOverflow:beofservicetoothers Therightwaytojobhop(Ep.495) FeaturedonMeta BookmarkshaveevolvedintoSaves Inboximprovements:markingnotificationsasread/unread,andafiltered... Revieweroverboard!Orarequesttoimprovetheonboardingguidancefornew... CollectivesUpdate:RecognizedMembers,Articles,andGitLab Shouldweburninatethe[script]tag? Linked 0 Processnon-asciicharacterssuchaspoundinpython 0 Tensorflowunicodetextencoding-decoding 0 StringofHexCodestoNon-LatinUTF-8CharactersinPython Related 6975 WhataremetaclassesinPython? 7492 DoesPythonhaveaternaryconditionaloperator? 1673 ProperwaytodeclarecustomexceptionsinmodernPython? 3246 HowdoIconcatenatetwolistsinPython? 2975 Manuallyraising(throwing)anexceptioninPython 3588 DoesPythonhaveastring'contains'substringmethod? 2112 WhyisreadinglinesfromstdinmuchslowerinC++thanPython? 1445 Relativeimportsforthebillionthtime HotNetworkQuestions HowdothosewhoholdtoaliteralinterpretationofthefloodaccountrespondtothecriticismthatNoahbuildingthearkwouldbeunfeasible? Theunusualphrasing"verb+the+comparativeadjective"intheLordoftheRingsnovels Levinson'salgorithmandQRdecompositionforcomplexleast-squaresFIRdesign Whatare"HollywoodTwin"beds? WhydopeopleinsistonusingTikzwhentheycanusesimplerdrawingtools? HowtoruntheGUIofWindowsFeaturesOn/OffusingPowershell Howtosimplifyapurefunction? Adecimal-basedunitoftime DotheseresultsmeanthatIhavefoundthisexoplanet? Whatistheconventionalwaytonotateameterwithaccentsoneverysecond8thnote? Whydoesthesameelectrontransitionreleasephotonsofdifferentfrequenciesforsomeelements? Whattranslation/versionoftheBiblewouldChaucerhaveread? Shouldselectedoptionsberemovedfromsingle-andmulti-selectdropdownlists? Unsurewhatthesewatersoftenerdialsarefor Howtoproperlycolorcellsinalatextablewithoutscrewingupthelines? Canyoufindit? Wouldmerfolkgainanyrealadvantagefrommounts(andbeastsofburden)? Sciencefictionbook/novelaboutaliensinhumansbodies ConvertanintegertoIEEE754float Howtotellifmybikehasanaluminumframe HowIcanremoveautoincrementfromaPrimarykeyinpostgresql? InD&D3.5,canafamiliarbetemporarilydismissed? Realitycheck:PolarCO2lakescoexistingwithanequatorialH2Oocean Probabilisticmethodsforundecidableproblem morehotquestions Questionfeed SubscribetoRSS Questionfeed TosubscribetothisRSSfeed,copyandpastethisURLintoyourRSSreader. lang-py Yourprivacy Byclicking“Acceptallcookies”,youagreeStackExchangecanstorecookiesonyourdeviceanddiscloseinformationinaccordancewithourCookiePolicy. Acceptallcookies Customizesettings
延伸文章資訊
- 1Unicode HOWTO — Python 3.10.7 documentation
- 2Decoding UTF-8 strings in Python - Stack Overflow
It's an encoding error - so if it's a unicode string, this ought to fix it: text.encode("windows-...
- 3Unicode HOWTO — Python 3.10.7 documentation
UTF-8 is one of the most commonly used encodings, and Python often defaults to ... UnicodeDecodeE...
- 4Python String encode() decode() - DigitalOcean
This function returns the bytes object. If we don't provide encoding, “utf-8” encoding is used as...
- 5Python Strings decode() method - GeeksforGeeks