Python中解码decode()与编码encode()与错误处理 ... - 博客园

2025-01-25

文章推薦指數： 80 %

投票人數：10人

errors may be given to set a different error handling scheme. The default for errors is 'strict' , meaning that encoding errors raise a ... 首页新闻博问专区闪存班级我的博客我的园子账号设置简洁模式... 退出登录注册登录樟樟22 Python中解码decode()与编码encode()与错误处理UnicodeDecodeError:'gbk'codeccan'tdecodebyte0xab 编码方法encoding（）描述　　encode()方法以指定的编码格式编码字符串，默认编码为'utf-8'。

将字符串由string类型变成bytes类型。

　　对应的解码方法：bytesdecode() 方法。

语法　　str.encode([encoding='utf-8'][,errors='strict']) str是表示需要编码的字符串，并且是个string类型。

encoding-- 可选参数，要使用的编码方案，默认编码为'utf-8'。

errors--可选参数，设置不同错误的处理方案。

默认为'strict',意为编码错误引起一个UnicodeError。

其他可能得值有'ignore','replace','xmlcharrefreplace','backslashreplace'以及通过codecs.register_error()注册的任何值。

返回值　　该方法返回编码后的字符串，它是一个bytes对象，这个字节对象是用于下面的解码用的。

官方文档解释：　　str.encode(encoding="utf-8", errors="strict") 　　Returnan encodedversionofthestringasabytesobject.Defaultencodingis 'utf-8'. errors maybegiventosetadifferenterrorhandlingscheme.Thedefaultfor errors is 'strict',meaningthatencodingerrors　　raisea UnicodeError.Otherpossiblevaluesare 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' andanyothernameregisteredvia codecs.register_error(),seesection ErrorHandlers.Foralist　　ofpossibleencodings,seesection StandardEncodings. 　　Changedinversion3.1: Supportforkeywordargumentsadded. ------------------------------------------------------------------------------------------------------------------------------------------------ 解码方法decode() 　　decode()方法以 encoding 指定的编码格式来解码字符串。

默认编码规则是encoding=‘utf-8’ 语法：　　bytes.decode(encoding='UTF-8',errors='strict') 参数　　bytes是由编码方法encoding()编码转换过后得到的字符串的字节表示值。

　　encoding--解码时要使用的编码方案，如"UTF-8"。

　　errors--设置不同错误的处理方案。

默认为'strict',意为编码错误引起一个UnicodeError。

其他可能得值有'ignore','replace','xmlcharrefreplace','backslashreplace'以及通过codecs.register_error()注册的任何值。

返回值：　　该方法返回解码后的字符串。

官方文档解释　　bytes.decode(encoding="utf-8", errors="strict")bytearray.decode(encoding="utf-8", errors="strict") 　　Returnastring decodedfromthegivenbytes.Defaultencodingis 'utf-8'. errors maybegiventosetadifferenterrorhandlingscheme.Thedefaultfor errors is 'strict',meaningthatencodingerrorsraise　　　 a UnicodeError.Otherpossiblevaluesare 'ignore', 'replace' andanyothernameregisteredvia codecs.register_error(),seesection ErrorHandlers.Foralistofpossibleencodings,seesection Standard　　 Encodings. 　　Note 　　Passingthe encoding argumentto str allowsdecodingany bytes-likeobject directly,withoutneedingtomakeatemporarybytesorbytearrayobject. 　　Changedinversion3.1: Addedsupportforkeywordarguments. 其实编码解码的关系就是如下： str->bytes:encode编码 bytes->str:decode解码　　字符串通过编码成为字节码，字节码通过解码成为字符串。

可以这样解释，编码就是将字符串转换成字节码，涉及到字符串的内部表示。

解码就是将字节码转换为字符串，将比特位显示成字符。

例如： 1>>>text='我是文本' 2>>>text 3'我是文本' 4>>>print(text) 5我是文本 6>>>bytesText=text.encode() 7>>>bytesText 8b'\xe6\x88\x91\xe6\x98\xaf\xe6\x96\x87\xe6\x9c\xac' 9>>>print(bytesText) 10b'\xe6\x88\x91\xe6\x98\xaf\xe6\x96\x87\xe6\x9c\xac' 11>>>type(text) 12 13>>>type(bytesText) 14 15>>>textDecode=bytesText.decode() 16>>>textDecode 17'我是文本' 18>>>print(textDecode) 19我是文本例2 1>>>text='我好吗' 2>>>byteText=text.encode('gbk') 3>>>byteText 4b'\xce\xd2\xba\xc3\xc2\xf0' 5>>>strText=byteText.decode('gbk') 6>>>strText 7'我好吗' 8>>>byteText.decode('utf-8') 9Traceback(mostrecentcalllast): 10File"G:\softs\Anaconda\lib\site-packages\IPython\core\interactiveshell.py",line2862,inrun_code 11exec(code_obj,self.user_global_ns,self.user_ns) 12File"",line1,in 13byteText.decode('utf-8') 14UnicodeDecodeError:'utf-8'codeccan'tdecodebyte0xceinposition0:invalidcontinuationbyte 上面的第8行出现了错误，是由于文本text='我好吗'，是按照‘gbk’进行编码的，而在解码时是按照‘utf-8’的编码规则进行的解码，所以会导致解码失败，即‘utf-8’不能解码‘gbk’编码规则的字节。

用相对应的解码编码规则来对字符进行处理。

下面给出了几条处理这种错误的方法供参考。

出现如下错误时： UnicodeDecodeError:'gbk'codeccan'tdecodebyte0xabinposition11126:illegalmultibytesequence 使用python的时候经常会遇到文本的编码与解码问题，其中很常见的一种解码错误如题目所示，下面介绍该错误的解决方法，将‘gbk’换成‘utf-8’也适用。

（1）、首先在打开文本的时候，设置其编码格式，如：open(‘1.txt’,encoding=’gbk’)；（2）、若（1）不能解决，可能是文本中出现的一些特殊符号超出了gbk的编码范围，可以选择编码范围更广的‘gb18030’，如：open(‘1.txt’,encoding=’gb18030’)；（3）、若（2）仍不能解决，说明文中出现了连‘gb18030’也无法编码的字符，可以使用‘ignore’属性进行忽略，如：open(‘1.txt’,encoding=’gb18030’，errors=‘ignore’)；（4）、还有一种常见解决方法为open(‘1.txt’).read().decode(‘gb18030’,’ignore’) 对于机器学习实战第四章朴素贝叶斯一张代码实现出现的解码错误就用了上面的方法（4）解决了 1defspamTest(): 2docList=[];classList=[];fillText=[] 3foriinrange(1,26): 4wordList=textParse(open('D:/machinelearningdata/machinelearninginaction/Ch04/email/spam/%d.txt'%i,encoding='utf-8',errors='ignore').read()) 5#print('%dword:'%i) 6docList.append(wordList) 7fillText.extend(wordList) 8classList.append(1) 9wordList=textParse(open('D:/machinelearningdata/machinelearninginaction/Ch04/email/ham/%d.txt'%i,encoding='utf-8',errors='ignore').read()) 10docList.append(wordList) 11fillText.extend(wordList) 12classList.append(0) 原文上面代码出现错误是因为在解析ham文件夹文件23.txt时出现解码错误，才导致整个文件运行不了，我们将文件打开的编码方式统一换成'utf-8'，并且忽略掉出现的错误便可以正常运行了参考资料： 1，https://www.cnblogs.com/tingyugetc/p/5727383.html 2，https://blog.csdn.net/shijing_0214/article/details/51971734 posted@ 2018-04-1209:57 樟樟22 阅读(20303) 评论(0) 编辑收藏举报刷新评论刷新页面返回顶部 Copyright©2022樟樟22 Poweredby.NET6onKubernetes

請為這篇文章評分？

延伸文章資訊

python解決漢字編碼問題：Unicode Decode Error | 程式前沿

python解決漢字編碼問題：Unicode Decode Error ... position ordinal not in range 128 UnicodeDecodeError: 'ut...

Python ASCII and Unicode decode error - Stack Overflow

Getting a decoding error when encoding seems like your string is not unicode. In this case IIRC p...

Python decode()方法 - 菜鸟教程

Python decode() 方法以encoding 指定的编码格式解码字符串。默认编码为字符串编码。语法. decode()方法语法： str.decode(encoding='UTF-8...

codecs — Codec registry and base classes — Python 3.10.7 ...

This module defines base classes for standard Python codecs (encoders and ... The default error h...

Python中解码decode()与编码encode()与错误处理 ... - 博客园

errors may be given to set a different error handling scheme. The default for errors is 'strict' ...

Python中解码decode()与编码encode()与错误处理 ... - 博客园

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用