Handling encoding and decoding errors in Python
文章推薦指數: 80 %
Handling encoding and decoding errors in Python. By John Lekberg on April 03, 2020. This week's blog post is about handling errors when encoding and ... ReturntoBlog HandlingencodinganddecodingerrorsinPython ByJohnLekbergonApril03,2020. Thisweek'sblogpostisabouthandlingerrorswhenencodingand decodingdata. Youwilllearn6differentwaystohandletheseerrors,rangingfrom strictlyrequiringalldatatobevalid,toskippingovermalformed data. Codecs Pythonusescoder-decoders(codecs)to Encodestrobjectsintobytes objects. "小島秀夫(HideoKojima)".encode("shift_jis") b'\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)' Decodebytesobjectsintostrobjects. b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("shift_jis") '小島秀夫(HideoKojima)' (ShiftJISisacodecfortheJapaneselanguage.) Whathappenswhenacodecoperationfails? Whenacodecoperationencountersmalformeddata,that'sanerror: "小島秀夫(HideoKojima)".encode("ascii") UnicodeEncodeError:'ascii'codeccan'tencodecharactersin position0-1:ordinalnotinrange(128) b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii") UnicodeDecodeError:'ascii'codeccan'tdecodebyte0x8fin position0:ordinalnotinrange(128) HowcanIdealwithcodecoperationfailures? BesidesraisingaUnicodeErrorexception, thereare5otherwaystodealwithcodecoperationerrors: Whenencodinganddecoding,ignoremalformeddata: "小島秀夫(HideoKojima)".encode("ascii",errors="ignore") b'(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="ignore") 'Gv(HideoKojima)' Whenencodinganddecoding,replacemalformeddatawitha replacementcharacter("�"andb"?"): "小島秀夫(HideoKojima)".encode("ascii",errors="replace") b'????(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="replace") '�����G�v(HideoKojima)' Whenencodinganddecoding,replacemalformeddatawith backslashedescapesequences: "小島秀夫(HideoKojima)".encode("ascii", errors="backslashreplace") b'\\u5c0f\\u5cf6\\u79c0\\u592b(HideoKojima)' b"\x8f\xac\x93\x87\x8fG\x95v(HideoKojima)".decode("ascii", errors="backslashreplace") '\\x8f\\xac\\x93\\x87\\x8fG\\x95v(HideoKojima)' Whenencoding,replacemalformeddatawithXMLcharacter references: "小島秀夫(HideoKojima)".encode("ascii", errors="xmlcharrefreplace") b'小島秀夫(HideoKojima)' Whenencoding,replacemalformeddatawith\N{...}(named unicodecharacters): "小島秀夫(HideoKojima)".encode("ascii",errors="namereplace") b'\\N{CJKUNIFIEDIDEOGRAPH-5C0F}\\N{CJKUNIFIEDIDEOGRAPH-5CF6} \\N{CJKUNIFIEDIDEOGRAPH-79C0}\\N{CJKUNIFIEDIDEOGRAPH-592B} (HideoKojima)' (Thereisanothererrorhandler,"surrogateescape",thatisoutof thescopeofthisblogpost.) Differenterrorhandlingstrategiesareusefulindifferentcontexts. Here'satableofthe6differenterrorshandlers: errors=...Do...withmalformeddata "strict"RaiseUnicodeError "ignore"Ignoreandcontinue "replace"Replacewithreplacementcharacter "backslashreplace"Replacewithbackslashedescapesequence "xmlcharrefreplace"ReplacewithXMLcharacterreference "namereplace"Replacewith\N{...}(namedunicodecharacter) "strict"isthedefaulterrorhandler. Besidesstr.encodeand bytes.decode,errorhandlingisavailable ... Withthebuilt-infunctionopen. Withthepathlibmodulefunctions Path.openand Path.read_text. Withthecodecsmodulefunctionsandclasses decode,encode, open,EncodedFile, iterencode, iterdecode, Codec.encode, Codec.decode, IncrementalEncoder, IncrementalDecoder, StreamWriter, Stream.Reader, StreamReaderWriter,and StreamRecoder. Withtheiomodulefunctionsandclassesopen, andTextIOWrapper. Inconclusion... Inthispostyoulearned6differentwaystohandlecodecoperation errors. Thedefaultstrategy(errors="strict")raisesanexceptionwhenan erroroccurs. But,sometimesyouwantyourprogramtocontinueprocessingdata, eitherbyomittingbaddata(errors="ignore")orbyreplacingbaddata withreplacementcharacters(errors="replace"). IfyouaregeneratingaHTMLoranXMLdocument,youcanreplace malformeddatawithXMLcharacterreferences (errors="xmlcharrefreplace"). Mychallengetoyou: Thispostdiscussed6differentwaystohandlecodecoperationerrors. Thereisanotherway, "surrogateescape". Learnhowtouse"surrogateescape"andcreateanexampleof decoding-then-encodingafileusingit. Ifyouenjoyedthisweek'spost,shareitwithyoufriendsandstay tunedfornextweek'spost. Seeyouthen! (Ifyouspotanyerrorsortyposonthispost,contactmeviamy contactpage.)
延伸文章資訊
- 1Python decode 非法字符
簡單紀錄一下docode. >>> a = "你好" >>> a.encode("utf-8").decode("utf-8", "ignore"). decode 的函數原型是decode([...
- 2Python decode()方法 - 菜鸟教程
Python decode() 方法以encoding 指定的编码格式解码字符串。默认编码为字符串编码。 语法. decode()方法语法: str.decode(encoding='UTF-8...
- 3UnicodeError雜談之三———快點投靠python3啦啦啦 - iT 邦幫忙
python UnicodeError雜談之三囉唆一下:跟昨天的囉唆一樣, ... self.errors, final) UnicodeDecodeError: 'utf-8' codec c...
- 4Python ASCII and Unicode decode error - Stack Overflow
Getting a decoding error when encoding seems like your string is not unicode. In this case IIRC p...
- 5codecs — Codec registry and base classes — Python 3.10.7 ...
This module defines base classes for standard Python codecs (encoders and ... The default error h...