Handling encoding and decoding errors in Python

文章推薦指數: 80 %
投票人數:10人

Handling encoding and decoding errors in Python. By John Lekberg on April 03, 2020. This week's blog post is about handling errors when encoding and ... ReturntoBlog HandlingencodinganddecodingerrorsinPython ByJohnLekbergonApril03,2020. Thisweek'sblogpostisabouthandlingerrorswhenencodingand decodingdata. Youwilllearn6differentwaystohandletheseerrors,rangingfrom strictlyrequiringalldatatobevalid,toskippingovermalformed data. Codecs Pythonusescoder-decoders(codecs)to Encodestrobjectsintobytes objects. "小島秀夫(HideoKojima)".encode("shift_jis") b'\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)' Decodebytesobjectsintostrobjects. b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("shift_jis") '小島秀夫(HideoKojima)' (ShiftJISisacodecfortheJapaneselanguage.) Whathappenswhenacodecoperationfails? Whenacodecoperationencountersmalformeddata,that'sanerror: "小島秀夫(HideoKojima)".encode("ascii") UnicodeEncodeError:'ascii'codeccan'tencodecharactersin position0-1:ordinalnotinrange(128) b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii") UnicodeDecodeError:'ascii'codeccan'tdecodebyte0x8fin position0:ordinalnotinrange(128) HowcanIdealwithcodecoperationfailures? BesidesraisingaUnicodeErrorexception, thereare5otherwaystodealwithcodecoperationerrors: Whenencodinganddecoding,ignoremalformeddata: "小島秀夫(HideoKojima)".encode("ascii",errors="ignore") b'(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="ignore") 'Gv(HideoKojima)' Whenencodinganddecoding,replacemalformeddatawitha replacementcharacter("�"andb"?"): "小島秀夫(HideoKojima)".encode("ascii",errors="replace") b'????(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="replace") '�����G�v(HideoKojima)' Whenencodinganddecoding,replacemalformeddatawith backslashedescapesequences: "小島秀夫(HideoKojima)".encode("ascii", errors="backslashreplace") b'\\u5c0f\\u5cf6\\u79c0\\u592b(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="backslashreplace") '\\x8f\\xac\\x93\\x87\\x8fG\\x95v(HideoKojima)' Whenencoding,replacemalformeddatawithXMLcharacter references: "小島秀夫(HideoKojima)".encode("ascii", errors="xmlcharrefreplace") b'小島秀夫(HideoKojima)' Whenencoding,replacemalformeddatawith\N{...}(named unicodecharacters): "小島秀夫(HideoKojima)".encode("ascii",errors="namereplace") b'\\N{CJKUNIFIEDIDEOGRAPH-5C0F}\\N{CJKUNIFIEDIDEOGRAPH-5CF6} \\N{CJKUNIFIEDIDEOGRAPH-79C0}\\N{CJKUNIFIEDIDEOGRAPH-592B} (HideoKojima)' (Thereisanothererrorhandler,"surrogateescape",thatisoutof thescopeofthisblogpost.) Differenterrorhandlingstrategiesareusefulindifferentcontexts. Here'satableofthe6differenterrorshandlers: errors=...Do...withmalformeddata "strict"RaiseUnicodeError "ignore"Ignoreandcontinue "replace"Replacewithreplacementcharacter "backslashreplace"Replacewithbackslashedescapesequence "xmlcharrefreplace"ReplacewithXMLcharacterreference "namereplace"Replacewith\N{...}(namedunicodecharacter) "strict"isthedefaulterrorhandler. Besidesstr.encodeand bytes.decode,errorhandlingisavailable ... Withthebuilt-infunctionopen. Withthepathlibmodulefunctions Path.openand Path.read_text. Withthecodecsmodulefunctionsandclasses decode,encode, open,EncodedFile, iterencode, iterdecode, Codec.encode, Codec.decode, IncrementalEncoder, IncrementalDecoder, StreamWriter, Stream.Reader, StreamReaderWriter,and StreamRecoder. Withtheiomodulefunctionsandclassesopen, andTextIOWrapper. Inconclusion... Inthispostyoulearned6differentwaystohandlecodecoperation errors. Thedefaultstrategy(errors="strict")raisesanexceptionwhenan erroroccurs. But,sometimesyouwantyourprogramtocontinueprocessingdata, eitherbyomittingbaddata(errors="ignore")orbyreplacingbaddata withreplacementcharacters(errors="replace"). IfyouaregeneratingaHTMLoranXMLdocument,youcanreplace malformeddatawithXMLcharacterreferences (errors="xmlcharrefreplace"). Mychallengetoyou: Thispostdiscussed6differentwaystohandlecodecoperationerrors. Thereisanotherway, "surrogateescape". Learnhowtouse"surrogateescape"andcreateanexampleof decoding-then-encodingafileusingit. Ifyouenjoyedthisweek'spost,shareitwithyoufriendsandstay tunedfornextweek'spost. Seeyouthen! (Ifyouspotanyerrorsortyposonthispost,contactmeviamy contactpage.)



請為這篇文章評分?