Handling encoding and decoding errors in Python
文章推薦指數: 80 %
Handling encoding and decoding errors in Python. By John Lekberg on April 03, 2020. This week's blog post is about handling errors when encoding and ... ReturntoBlog HandlingencodinganddecodingerrorsinPython ByJohnLekbergonApril03,2020. Thisweek'sblogpostisabouthandlingerrorswhenencodingand decodingdata. Youwilllearn6differentwaystohandletheseerrors,rangingfrom strictlyrequiringalldatatobevalid,toskippingovermalformed data. Codecs Pythonusescoder-decoders(codecs)to Encodestrobjectsintobytes objects. "小島秀夫(HideoKojima)".encode("shift_jis") b'\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)' Decodebytesobjectsintostrobjects. b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("shift_jis") '小島秀夫(HideoKojima)' (ShiftJISisacodecfortheJapaneselanguage.) Whathappenswhenacodecoperationfails? Whenacodecoperationencountersmalformeddata,that'sanerror: "小島秀夫(HideoKojima)".encode("ascii") UnicodeEncodeError:'ascii'codeccan'tencodecharactersin position0-1:ordinalnotinrange(128) b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii") UnicodeDecodeError:'ascii'codeccan'tdecodebyte0x8fin position0:ordinalnotinrange(128) HowcanIdealwithcodecoperationfailures? BesidesraisingaUnicodeErrorexception, thereare5otherwaystodealwithcodecoperationerrors: Whenencodinganddecoding,ignoremalformeddata: "小島秀夫(HideoKojima)".encode("ascii",errors="ignore") b'(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="ignore") 'Gv(HideoKojima)' Whenencodinganddecoding,replacemalformeddatawitha replacementcharacter("�"andb"?"): "小島秀夫(HideoKojima)".encode("ascii",errors="replace") b'????(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="replace") '�����G�v(HideoKojima)' Whenencodinganddecoding,replacemalformeddatawith backslashedescapesequences: "小島秀夫(HideoKojima)".encode("ascii", errors="backslashreplace") b'\\u5c0f\\u5cf6\\u79c0\\u592b(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="backslashreplace") '\\x8f\\xac\\x93\\x87\\x8fG\\x95v(HideoKojima)' Whenencoding,replacemalformeddatawithXMLcharacter references: "小島秀夫(HideoKojima)".encode("ascii", errors="xmlcharrefreplace") b'小島秀夫(HideoKojima)' Whenencoding,replacemalformeddatawith\N{...}(named unicodecharacters): "小島秀夫(HideoKojima)".encode("ascii",errors="namereplace") b'\\N{CJKUNIFIEDIDEOGRAPH-5C0F}\\N{CJKUNIFIEDIDEOGRAPH-5CF6} \\N{CJKUNIFIEDIDEOGRAPH-79C0}\\N{CJKUNIFIEDIDEOGRAPH-592B} (HideoKojima)' (Thereisanothererrorhandler,"surrogateescape",thatisoutof thescopeofthisblogpost.) Differenterrorhandlingstrategiesareusefulindifferentcontexts. Here'satableofthe6differenterrorshandlers: errors=...Do...withmalformeddata "strict"RaiseUnicodeError "ignore"Ignoreandcontinue "replace"Replacewithreplacementcharacter "backslashreplace"Replacewithbackslashedescapesequence "xmlcharrefreplace"ReplacewithXMLcharacterreference "namereplace"Replacewith\N{...}(namedunicodecharacter) "strict"isthedefaulterrorhandler. Besidesstr.encodeand bytes.decode,errorhandlingisavailable ... Withthebuilt-infunctionopen. Withthepathlibmodulefunctions Path.openand Path.read_text. Withthecodecsmodulefunctionsandclasses decode,encode, open,EncodedFile, iterencode, iterdecode, Codec.encode, Codec.decode, IncrementalEncoder, IncrementalDecoder, StreamWriter, Stream.Reader, StreamReaderWriter,and StreamRecoder. Withtheiomodulefunctionsandclassesopen, andTextIOWrapper. Inconclusion... Inthispostyoulearned6differentwaystohandlecodecoperation errors. Thedefaultstrategy(errors="strict")raisesanexceptionwhenan erroroccurs. But,sometimesyouwantyourprogramtocontinueprocessingdata, eitherbyomittingbaddata(errors="ignore")orbyreplacingbaddata withreplacementcharacters(errors="replace"). IfyouaregeneratingaHTMLoranXMLdocument,youcanreplace malformeddatawithXMLcharacterreferences (errors="xmlcharrefreplace"). Mychallengetoyou: Thispostdiscussed6differentwaystohandlecodecoperationerrors. Thereisanotherway, "surrogateescape". Learnhowtouse"surrogateescape"andcreateanexampleof decoding-then-encodingafileusingit. Ifyouenjoyedthisweek'spost,shareitwithyoufriendsandstay tunedfornextweek'spost. Seeyouthen! (Ifyouspotanyerrorsortyposonthispost,contactmeviamy contactpage.)
延伸文章資訊
- 1Handling encoding and decoding errors in Python
Handling encoding and decoding errors in Python. By John Lekberg on April 03, 2020. This week's b...
- 2python decode errors参数_Garb_v2的博客 - CSDN博客
decode的函数原型是decode([encoding], [errors='strict']),可以用第二个参数控制错误处理的策略,默认的参数就是strict,代表遇到非法字符时抛出 ...
- 3Python ASCII and Unicode decode error - Stack Overflow
Getting a decoding error when encoding seems like your string is not unicode. In this case IIRC p...
- 4Python decode()方法 - 菜鸟教程
Python decode() 方法以encoding 指定的编码格式解码字符串。默认编码为字符串编码。 语法. decode()方法语法: str.decode(encoding='UTF-8...
- 5Python中解码decode()与编码encode()与错误处理 ... - 博客园
errors may be given to set a different error handling scheme. The default for errors is 'strict' ...