codecs — Codec registry and base classes — Python 3.10.7 ...
文章推薦指數: 80 %
Encodings and Unicode¶ · utf-8-sig encoding can be correctly guessed from the byte sequence. · 0xef , · 0xbb , · 0xbf as the first three bytes to the file. · utf-8- ... Navigation index modules| next| previous| Python» 3.10.7Documentation» ThePythonStandardLibrary» BinaryDataServices» codecs—Codecregistryandbaseclasses | codecs—Codecregistryandbaseclasses¶ Sourcecode:Lib/codecs.py ThismoduledefinesbaseclassesforstandardPythoncodecs(encodersand decoders)andprovidesaccesstotheinternalPythoncodecregistry,which managesthecodecanderrorhandlinglookupprocess.Moststandardcodecs aretextencodings,whichencodetexttobytes(and decodebytestotext),buttherearealsocodecsprovidedthatencodetextto text,andbytestobytes.Customcodecsmayencodeanddecodebetweenarbitrary types,butsomemodulefeaturesarerestrictedtobeusedspecificallywith textencodingsorwithcodecsthatencodeto bytes. Themoduledefinesthefollowingfunctionsforencodinganddecodingwith anycodec: codecs.encode(obj,encoding='utf-8',errors='strict')¶ Encodesobjusingthecodecregisteredforencoding. Errorsmaybegiventosetthedesirederrorhandlingscheme.The defaulterrorhandleris'strict'meaningthatencodingerrorsraise ValueError(oramorecodecspecificsubclass,suchas UnicodeEncodeError).RefertoCodecBaseClassesformore informationoncodecerrorhandling. codecs.decode(obj,encoding='utf-8',errors='strict')¶ Decodesobjusingthecodecregisteredforencoding. Errorsmaybegiventosetthedesirederrorhandlingscheme.The defaulterrorhandleris'strict'meaningthatdecodingerrorsraise ValueError(oramorecodecspecificsubclass,suchas UnicodeDecodeError).RefertoCodecBaseClassesformore informationoncodecerrorhandling. Thefulldetailsforeachcodeccanalsobelookedupdirectly: codecs.lookup(encoding)¶ LooksupthecodecinfointhePythoncodecregistryandreturnsa CodecInfoobjectasdefinedbelow. Encodingsarefirstlookedupintheregistry’scache.Ifnotfound,thelistof registeredsearchfunctionsisscanned.IfnoCodecInfoobjectis found,aLookupErrorisraised.Otherwise,theCodecInfoobject isstoredinthecacheandreturnedtothecaller. classcodecs.CodecInfo(encode,decode,streamreader=None,streamwriter=None,incrementalencoder=None,incrementaldecoder=None,name=None)¶ Codecdetailswhenlookingupthecodecregistry.Theconstructor argumentsarestoredinattributesofthesamename: name¶ Thenameoftheencoding. encode¶ decode¶ Thestatelessencodinganddecodingfunctions.Thesemustbe functionsormethodswhichhavethesameinterfaceas theencode()anddecode()methodsofCodec instances(seeCodecInterface). Thefunctionsormethodsareexpectedtoworkinastatelessmode. incrementalencoder¶ incrementaldecoder¶ Incrementalencoderanddecoderclassesorfactoryfunctions. Thesehavetoprovidetheinterfacedefinedbythebaseclasses IncrementalEncoderandIncrementalDecoder, respectively.Incrementalcodecscanmaintainstate. streamwriter¶ streamreader¶ Streamwriterandreaderclassesorfactoryfunctions.Thesehaveto providetheinterfacedefinedbythebaseclasses StreamWriterandStreamReader,respectively. Streamcodecscanmaintainstate. Tosimplifyaccesstothevariouscodeccomponents,themoduleprovides theseadditionalfunctionswhichuselookup()forthecodeclookup: codecs.getencoder(encoding)¶ Lookupthecodecforthegivenencodingandreturnitsencoderfunction. RaisesaLookupErrorincasetheencodingcannotbefound. codecs.getdecoder(encoding)¶ Lookupthecodecforthegivenencodingandreturnitsdecoderfunction. RaisesaLookupErrorincasetheencodingcannotbefound. codecs.getincrementalencoder(encoding)¶ Lookupthecodecforthegivenencodingandreturnitsincrementalencoder classorfactoryfunction. RaisesaLookupErrorincasetheencodingcannotbefoundorthecodec doesn’tsupportanincrementalencoder. codecs.getincrementaldecoder(encoding)¶ Lookupthecodecforthegivenencodingandreturnitsincrementaldecoder classorfactoryfunction. RaisesaLookupErrorincasetheencodingcannotbefoundorthecodec doesn’tsupportanincrementaldecoder. codecs.getreader(encoding)¶ LookupthecodecforthegivenencodingandreturnitsStreamReader classorfactoryfunction. RaisesaLookupErrorincasetheencodingcannotbefound. codecs.getwriter(encoding)¶ LookupthecodecforthegivenencodingandreturnitsStreamWriter classorfactoryfunction. RaisesaLookupErrorincasetheencodingcannotbefound. Customcodecsaremadeavailablebyregisteringasuitablecodecsearch function: codecs.register(search_function)¶ Registeracodecsearchfunction.Searchfunctionsareexpectedtotakeone argument,beingtheencodingnameinalllowercaseletterswithhyphens andspacesconvertedtounderscores,andreturnaCodecInfoobject. Incaseasearchfunctioncannotfindagivenencoding,itshouldreturn None. Changedinversion3.9:Hyphensandspacesareconvertedtounderscore. codecs.unregister(search_function)¶ Unregisteracodecsearchfunctionandcleartheregistry’scache. Ifthesearchfunctionisnotregistered,donothing. Newinversion3.10. Whilethebuiltinopen()andtheassociatediomodulearethe recommendedapproachforworkingwithencodedtextfiles,thismodule providesadditionalutilityfunctionsandclassesthatallowtheuseofa widerrangeofcodecswhenworkingwithbinaryfiles: codecs.open(filename,mode='r',encoding=None,errors='strict',buffering=-1)¶ Openanencodedfileusingthegivenmodeandreturnaninstanceof StreamReaderWriter,providingtransparentencoding/decoding. Thedefaultfilemodeis'r',meaningtoopenthefileinreadmode. Note Underlyingencodedfilesarealwaysopenedinbinarymode. Noautomaticconversionof'\n'isdoneonreadingandwriting. Themodeargumentmaybeanybinarymodeacceptabletothebuilt-in open()function;the'b'isautomaticallyadded. encodingspecifiestheencodingwhichistobeusedforthefile. Anyencodingthatencodestoanddecodesfrombytesisallowed,and thedatatypessupportedbythefilemethodsdependonthecodecused. errorsmaybegiventodefinetheerrorhandling.Itdefaultsto'strict' whichcausesaValueErrortoberaisedincaseanencodingerroroccurs. bufferinghasthesamemeaningasforthebuilt-inopen()function. Itdefaultsto-1whichmeansthatthedefaultbuffersizewillbeused. codecs.EncodedFile(file,data_encoding,file_encoding=None,errors='strict')¶ ReturnaStreamRecoderinstance,awrappedversionoffile whichprovidestransparenttranscoding.Theoriginalfileisclosed whenthewrappedversionisclosed. Datawrittentothewrappedfileisdecodedaccordingtothegiven data_encodingandthenwrittentotheoriginalfileasbytesusing file_encoding.Bytesreadfromtheoriginalfilearedecoded accordingtofile_encoding,andtheresultisencoded usingdata_encoding. Iffile_encodingisnotgiven,itdefaultstodata_encoding. errorsmaybegiventodefinetheerrorhandling.Itdefaultsto 'strict',whichcausesValueErrortoberaisedincaseanencoding erroroccurs. codecs.iterencode(iterator,encoding,errors='strict',**kwargs)¶ Usesanincrementalencodertoiterativelyencodetheinputprovidedby iterator.Thisfunctionisagenerator. Theerrorsargument(aswellasany otherkeywordargument)ispassedthroughtotheincrementalencoder. Thisfunctionrequiresthatthecodecaccepttextstrobjects toencode.Thereforeitdoesnotsupportbytes-to-bytesencoderssuchas base64_codec. codecs.iterdecode(iterator,encoding,errors='strict',**kwargs)¶ Usesanincrementaldecodertoiterativelydecodetheinputprovidedby iterator.Thisfunctionisagenerator. Theerrorsargument(aswellasany otherkeywordargument)ispassedthroughtotheincrementaldecoder. Thisfunctionrequiresthatthecodecacceptbytesobjects todecode.Thereforeitdoesnotsupporttext-to-textencoderssuchas rot_13,althoughrot_13maybeusedequivalentlywith iterencode(). Themodulealsoprovidesthefollowingconstantswhichareusefulforreading andwritingtoplatformdependentfiles: codecs.BOM¶ codecs.BOM_BE¶ codecs.BOM_LE¶ codecs.BOM_UTF8¶ codecs.BOM_UTF16¶ codecs.BOM_UTF16_BE¶ codecs.BOM_UTF16_LE¶ codecs.BOM_UTF32¶ codecs.BOM_UTF32_BE¶ codecs.BOM_UTF32_LE¶ Theseconstantsdefinevariousbytesequences, beingUnicodebyteordermarks(BOMs)forseveralencodings.Theyare usedinUTF-16andUTF-32datastreamstoindicatethebyteorderused, andinUTF-8asaUnicodesignature.BOM_UTF16iseither BOM_UTF16_BEorBOM_UTF16_LEdependingontheplatform’s nativebyteorder,BOMisanaliasforBOM_UTF16, BOM_LEforBOM_UTF16_LEandBOM_BEfor BOM_UTF16_BE.TheothersrepresenttheBOMinUTF-8andUTF-32 encodings. CodecBaseClasses¶ Thecodecsmoduledefinesasetofbaseclasseswhichdefinethe interfacesforworkingwithcodecobjects,andcanalsobeusedasthebasis forcustomcodecimplementations. EachcodechastodefinefourinterfacestomakeitusableascodecinPython: statelessencoder,statelessdecoder,streamreaderandstreamwriter.The streamreaderandwriterstypicallyreusethestatelessencoder/decoderto implementthefileprotocols.Codecauthorsalsoneedtodefinehowthe codecwillhandleencodinganddecodingerrors. ErrorHandlers¶ Tosimplifyandstandardizeerrorhandling,codecsmayimplementdifferent errorhandlingschemesbyacceptingtheerrorsstringargument: >>>'Germanß,♬'.encode(encoding='ascii',errors='backslashreplace') b'German\\xdf,\\u266c' >>>'Germanß,♬'.encode(encoding='ascii',errors='xmlcharrefreplace') b'Germanß,♬' ThefollowingerrorhandlerscanbeusedwithallPython StandardEncodingscodecs: 'strict' RaiseUnicodeError(orasubclass), thisisthedefault.Implementedin strict_errors(). 'ignore' Ignorethemalformeddataandcontinuewithout furthernotice.Implementedin ignore_errors(). 'replace' Replacewithareplacementmarker.On encoding,use?(ASCIIcharacter).On decoding,use�(U+FFFD,theofficial REPLACEMENTCHARACTER).Implementedin replace_errors(). 'backslashreplace' Replacewithbackslashedescapesequences. Onencoding,usehexadecimalformofUnicode codepointwithformats\xhh\uxxxx \Uxxxxxxxx.Ondecoding,usehexadecimal formofbytevaluewithformat\xhh. Implementedin backslashreplace_errors(). 'surrogateescape' Ondecoding,replacebytewithindividual surrogatecoderangingfromU+DC80to U+DCFF.Thiscodewillthenbeturned backintothesamebytewhenthe 'surrogateescape'errorhandlerisused whenencodingthedata.(SeePEP383for more.) Thefollowingerrorhandlersareonlyapplicabletoencoding(within textencodings): 'xmlcharrefreplace' ReplacewithXML/HTMLnumericcharacter reference,whichisadecimalformofUnicode codepointwithformatnum;Implemented inxmlcharrefreplace_errors(). 'namereplace' Replacewith\N{...}escapesequences, whatappearsinthebracesistheName propertyfromUnicodeCharacterDatabase. Implementedinnamereplace_errors(). Inaddition,thefollowingerrorhandlerisspecifictothegivencodecs: 'surrogatepass' utf-8,utf-16,utf-32, utf-16-be,utf-16-le, utf-32-be,utf-32-le Allowencodinganddecodingsurrogatecode point(U+D800-U+DFFF)asnormal codepoint.Otherwisethesecodecstreat thepresenceofsurrogatecodepointin strasanerror. Newinversion3.1:The'surrogateescape'and'surrogatepass'errorhandlers. Changedinversion3.4:The'surrogatepass'errorhandlernowworkswithutf-16*andutf-32* codecs. Newinversion3.5:The'namereplace'errorhandler. Changedinversion3.5:The'backslashreplace'errorhandlernowworkswithdecodingand translating. Thesetofallowedvaluescanbeextendedbyregisteringanewnamederror handler: codecs.register_error(name,error_handler)¶ Registertheerrorhandlingfunctionerror_handlerunderthenamename. Theerror_handlerargumentwillbecalledduringencodinganddecoding incaseofanerror,whennameisspecifiedastheerrorsparameter. Forencoding,error_handlerwillbecalledwithaUnicodeEncodeError instance,whichcontainsinformationaboutthelocationoftheerror.The errorhandlermusteitherraisethisoradifferentexception,orreturna tuplewithareplacementfortheunencodablepartoftheinputandaposition whereencodingshouldcontinue.Thereplacementmaybeeitherstror bytes.Ifthereplacementisbytes,theencoderwillsimplycopy themintotheoutputbuffer.Ifthereplacementisastring,theencoderwill encodethereplacement.Encodingcontinuesonoriginalinputatthe specifiedposition.Negativepositionvalueswillbetreatedasbeing relativetotheendoftheinputstring.Iftheresultingpositionisoutof boundanIndexErrorwillberaised. Decodingandtranslatingworkssimilarly,exceptUnicodeDecodeErroror UnicodeTranslateErrorwillbepassedtothehandlerandthatthe replacementfromtheerrorhandlerwillbeputintotheoutputdirectly. Previouslyregisterederrorhandlers(includingthestandarderrorhandlers) canbelookedupbyname: codecs.lookup_error(name)¶ Returntheerrorhandlerpreviouslyregisteredunderthenamename. RaisesaLookupErrorincasethehandlercannotbefound. Thefollowingstandarderrorhandlersarealsomadeavailableasmodulelevel functions: codecs.strict_errors(exception)¶ Implementsthe'strict'errorhandling. EachencodingordecodingerrorraisesaUnicodeError. codecs.ignore_errors(exception)¶ Implementsthe'ignore'errorhandling. Malformeddataisignored;encodingordecodingiscontinuedwithout furthernotice. codecs.replace_errors(exception)¶ Implementsthe'replace'errorhandling. Substitutes?(ASCIIcharacter)forencodingerrorsor�(U+FFFD, theofficialREPLACEMENTCHARACTER)fordecodingerrors. codecs.backslashreplace_errors(exception)¶ Implementsthe'backslashreplace'errorhandling. Malformeddataisreplacedbyabackslashedescapesequence. Onencoding,usethehexadecimalformofUnicodecodepointwithformats \xhh\uxxxx\Uxxxxxxxx.Ondecoding,usethehexadecimalformof bytevaluewithformat\xhh. Changedinversion3.5:Workswithdecodingandtranslating. codecs.xmlcharrefreplace_errors(exception)¶ Implementsthe'xmlcharrefreplace'errorhandling(forencodingwithin textencodingonly). TheunencodablecharacterisreplacedbyanappropriateXML/HTMLnumeric characterreference,whichisadecimalformofUnicodecodepointwith formatnum;. codecs.namereplace_errors(exception)¶ Implementsthe'namereplace'errorhandling(forencodingwithin textencodingonly). Theunencodablecharacterisreplacedbya\N{...}escapesequence.The setofcharactersthatappearinthebracesistheNamepropertyfrom UnicodeCharacterDatabase.Forexample,theGermanlowercaseletter'ß' willbeconvertedtobytesequence\N{LATINSMALLLETTERSHARPS}. Newinversion3.5. StatelessEncodingandDecoding¶ ThebaseCodecclassdefinesthesemethodswhichalsodefinethe functioninterfacesofthestatelessencoderanddecoder: Codec.encode(input,errors='strict')¶ Encodestheobjectinputandreturnsatuple(outputobject,lengthconsumed). Forinstance,textencodingconverts astringobjecttoabytesobjectusingaparticular charactersetencoding(e.g.,cp1252oriso-8859-1). Theerrorsargumentdefinestheerrorhandlingtoapply. Itdefaultsto'strict'handling. ThemethodmaynotstorestateintheCodecinstance.Use StreamWriterforcodecswhichhavetokeepstateinordertomake encodingefficient. Theencodermustbeabletohandlezerolengthinputandreturnanemptyobject oftheoutputobjecttypeinthissituation. Codec.decode(input,errors='strict')¶ Decodestheobjectinputandreturnsatuple(outputobject,length consumed).Forinstance,foratextencoding,decodingconverts abytesobjectencodedusingaparticular charactersetencodingtoastringobject. Fortextencodingsandbytes-to-bytescodecs, inputmustbeabytesobjectoronewhichprovidestheread-only bufferinterface–forexample,bufferobjectsandmemorymappedfiles. Theerrorsargumentdefinestheerrorhandlingtoapply. Itdefaultsto'strict'handling. ThemethodmaynotstorestateintheCodecinstance.Use StreamReaderforcodecswhichhavetokeepstateinordertomake decodingefficient. Thedecodermustbeabletohandlezerolengthinputandreturnanemptyobject oftheoutputobjecttypeinthissituation. IncrementalEncodingandDecoding¶ TheIncrementalEncoderandIncrementalDecoderclassesprovide thebasicinterfaceforincrementalencodinganddecoding.Encoding/decodingthe inputisn’tdonewithonecalltothestatelessencoder/decoderfunction,but withmultiplecallstothe encode()/decode()methodof theincrementalencoder/decoder.Theincrementalencoder/decoderkeepstrackof theencoding/decodingprocessduringmethodcalls. Thejoinedoutputofcallstothe encode()/decode()methodis thesameasifallthesingleinputswerejoinedintoone,andthisinputwas encoded/decodedwiththestatelessencoder/decoder. IncrementalEncoderObjects¶ TheIncrementalEncoderclassisusedforencodinganinputinmultiple steps.Itdefinesthefollowingmethodswhicheveryincrementalencodermust defineinordertobecompatiblewiththePythoncodecregistry. classcodecs.IncrementalEncoder(errors='strict')¶ ConstructorforanIncrementalEncoderinstance. Allincrementalencodersmustprovidethisconstructorinterface.Theyarefree toaddadditionalkeywordarguments,butonlytheonesdefinedhereareusedby thePythoncodecregistry. TheIncrementalEncodermayimplementdifferenterrorhandlingschemes byprovidingtheerrorskeywordargument.SeeErrorHandlersfor possiblevalues. Theerrorsargumentwillbeassignedtoanattributeofthesamename. Assigningtothisattributemakesitpossibletoswitchbetweendifferenterror handlingstrategiesduringthelifetimeoftheIncrementalEncoder object. encode(object,final=False)¶ Encodesobject(takingthecurrentstateoftheencoderintoaccount) andreturnstheresultingencodedobject.Ifthisisthelastcallto encode()finalmustbetrue(thedefaultisfalse). reset()¶ Resettheencodertotheinitialstate.Theoutputisdiscarded:call .encode(object,final=True),passinganemptybyteortextstring ifnecessary,toresettheencoderandtogettheoutput. getstate()¶ Returnthecurrentstateoftheencoderwhichmustbeaninteger.The implementationshouldmakesurethat0isthemostcommon state.(Statesthataremorecomplicatedthanintegerscanbeconverted intoanintegerbymarshaling/picklingthestateandencodingthebytes oftheresultingstringintoaninteger.) setstate(state)¶ Setthestateoftheencodertostate.statemustbeanencoderstate returnedbygetstate(). IncrementalDecoderObjects¶ TheIncrementalDecoderclassisusedfordecodinganinputinmultiple steps.Itdefinesthefollowingmethodswhicheveryincrementaldecodermust defineinordertobecompatiblewiththePythoncodecregistry. classcodecs.IncrementalDecoder(errors='strict')¶ ConstructorforanIncrementalDecoderinstance. Allincrementaldecodersmustprovidethisconstructorinterface.Theyarefree toaddadditionalkeywordarguments,butonlytheonesdefinedhereareusedby thePythoncodecregistry. TheIncrementalDecodermayimplementdifferenterrorhandlingschemes byprovidingtheerrorskeywordargument.SeeErrorHandlersfor possiblevalues. Theerrorsargumentwillbeassignedtoanattributeofthesamename. Assigningtothisattributemakesitpossibletoswitchbetweendifferenterror handlingstrategiesduringthelifetimeoftheIncrementalDecoder object. decode(object,final=False)¶ Decodesobject(takingthecurrentstateofthedecoderintoaccount) andreturnstheresultingdecodedobject.Ifthisisthelastcallto decode()finalmustbetrue(thedefaultisfalse).Iffinalis truethedecodermustdecodetheinputcompletelyandmustflushall buffers.Ifthisisn’tpossible(e.g.becauseofincompletebytesequences attheendoftheinput)itmustinitiateerrorhandlingjustlikeinthe statelesscase(whichmightraiseanexception). reset()¶ Resetthedecodertotheinitialstate. getstate()¶ Returnthecurrentstateofthedecoder.Thismustbeatuplewithtwo items,thefirstmustbethebuffercontainingthestillundecoded input.Thesecondmustbeanintegerandcanbeadditionalstate info.(Theimplementationshouldmakesurethat0isthemostcommon additionalstateinfo.)Ifthisadditionalstateinfois0itmustbe possibletosetthedecodertothestatewhichhasnoinputbufferedand 0astheadditionalstateinfo,sothatfeedingthepreviously bufferedinputtothedecoderreturnsittothepreviousstatewithout producinganyoutput.(Additionalstateinfothatismorecomplicatedthan integerscanbeconvertedintoanintegerbymarshaling/picklingtheinfo andencodingthebytesoftheresultingstringintoaninteger.) setstate(state)¶ Setthestateofthedecodertostate.statemustbeadecoderstate returnedbygetstate(). StreamEncodingandDecoding¶ TheStreamWriterandStreamReaderclassesprovidegeneric workinginterfaceswhichcanbeusedtoimplementnewencodingsubmodulesvery easily.Seeencodings.utf_8foranexampleofhowthisisdone. StreamWriterObjects¶ TheStreamWriterclassisasubclassofCodecanddefinesthe followingmethodswhicheverystreamwritermustdefineinordertobe compatiblewiththePythoncodecregistry. classcodecs.StreamWriter(stream,errors='strict')¶ ConstructorforaStreamWriterinstance. Allstreamwritersmustprovidethisconstructorinterface.Theyarefreetoadd additionalkeywordarguments,butonlytheonesdefinedhereareusedbythe Pythoncodecregistry. Thestreamargumentmustbeafile-likeobjectopenforwriting textorbinarydata,asappropriateforthespecificcodec. TheStreamWritermayimplementdifferenterrorhandlingschemesby providingtheerrorskeywordargument.SeeErrorHandlersfor thestandarderrorhandlerstheunderlyingstreamcodecmaysupport. Theerrorsargumentwillbeassignedtoanattributeofthesamename. Assigningtothisattributemakesitpossibletoswitchbetweendifferenterror handlingstrategiesduringthelifetimeoftheStreamWriterobject. write(object)¶ Writestheobject’scontentsencodedtothestream. writelines(list)¶ Writestheconcatenatediterableofstringstothestream(possiblybyreusing thewrite()method).Infiniteor verylargeiterablesarenotsupported.Thestandardbytes-to-bytescodecs donotsupportthismethod. reset()¶ Resetsthecodecbuffersusedforkeepinginternalstate. Callingthismethodshouldensurethatthedataontheoutputisputinto acleanstatethatallowsappendingofnewfreshdatawithouthavingto rescanthewholestreamtorecoverstate. Inadditiontotheabovemethods,theStreamWritermustalsoinherit allothermethodsandattributesfromtheunderlyingstream. StreamReaderObjects¶ TheStreamReaderclassisasubclassofCodecanddefinesthe followingmethodswhicheverystreamreadermustdefineinordertobe compatiblewiththePythoncodecregistry. classcodecs.StreamReader(stream,errors='strict')¶ ConstructorforaStreamReaderinstance. Allstreamreadersmustprovidethisconstructorinterface.Theyarefreetoadd additionalkeywordarguments,butonlytheonesdefinedhereareusedbythe Pythoncodecregistry. Thestreamargumentmustbeafile-likeobjectopenforreading textorbinarydata,asappropriateforthespecificcodec. TheStreamReadermayimplementdifferenterrorhandlingschemesby providingtheerrorskeywordargument.SeeErrorHandlersfor thestandarderrorhandlerstheunderlyingstreamcodecmaysupport. Theerrorsargumentwillbeassignedtoanattributeofthesamename. Assigningtothisattributemakesitpossibletoswitchbetweendifferenterror handlingstrategiesduringthelifetimeoftheStreamReaderobject. Thesetofallowedvaluesfortheerrorsargumentcanbeextendedwith register_error(). read(size=-1,chars=-1,firstline=False)¶ Decodesdatafromthestreamandreturnstheresultingobject. Thecharsargumentindicatesthenumberofdecoded codepointsorbytestoreturn.Theread()methodwill neverreturnmoredatathanrequested,butitmightreturnless, ifthereisnotenoughavailable. Thesizeargumentindicatestheapproximatemaximum numberofencodedbytesorcodepointstoread fordecoding.Thedecodercanmodifythissettingas appropriate.Thedefaultvalue-1indicatestoreadanddecodeasmuchas possible.Thisparameterisintendedto preventhavingtodecodehugefilesinonestep. Thefirstlineflagindicatesthat itwouldbesufficienttoonlyreturnthefirst line,iftherearedecodingerrorsonlaterlines. Themethodshoulduseagreedyreadstrategymeaningthatitshouldread asmuchdataasisallowedwithinthedefinitionoftheencodingandthe givensize,e.g.ifoptionalencodingendingsorstatemarkersare availableonthestream,theseshouldbereadtoo. readline(size=None,keepends=True)¶ Readonelinefromtheinputstreamandreturnthedecodeddata. size,ifgiven,ispassedassizeargumenttothestream’s read()method. Ifkeependsisfalseline-endingswillbestrippedfromthelines returned. readlines(sizehint=None,keepends=True)¶ Readalllinesavailableontheinputstreamandreturnthemasalistof lines. Line-endingsareimplementedusingthecodec’sdecode()methodand areincludedinthelistentriesifkeependsistrue. sizehint,ifgiven,ispassedasthesizeargumenttothestream’s read()method. reset()¶ Resetsthecodecbuffersusedforkeepinginternalstate. Notethatnostreamrepositioningshouldtakeplace.Thismethodis primarilyintendedtobeabletorecoverfromdecodingerrors. Inadditiontotheabovemethods,theStreamReadermustalsoinherit allothermethodsandattributesfromtheunderlyingstream. StreamReaderWriterObjects¶ TheStreamReaderWriterisaconvenienceclassthatallowswrapping streamswhichworkinbothreadandwritemodes. Thedesignissuchthatonecanusethefactoryfunctionsreturnedbythe lookup()functiontoconstructtheinstance. classcodecs.StreamReaderWriter(stream,Reader,Writer,errors='strict')¶ CreatesaStreamReaderWriterinstance.streammustbeafile-like object.ReaderandWritermustbefactoryfunctionsorclassesprovidingthe StreamReaderandStreamWriterinterfaceresp.Errorhandling isdoneinthesamewayasdefinedforthestreamreadersandwriters. StreamReaderWriterinstancesdefinethecombinedinterfacesof StreamReaderandStreamWriterclasses.Theyinheritallother methodsandattributesfromtheunderlyingstream. StreamRecoderObjects¶ TheStreamRecodertranslatesdatafromoneencodingtoanother, whichissometimesusefulwhendealingwithdifferentencodingenvironments. Thedesignissuchthatonecanusethefactoryfunctionsreturnedbythe lookup()functiontoconstructtheinstance. classcodecs.StreamRecoder(stream,encode,decode,Reader,Writer,errors='strict')¶ CreatesaStreamRecoderinstancewhichimplementsatwo-wayconversion: encodeanddecodeworkonthefrontend —thedatavisibleto codecallingread()andwrite(),whileReaderandWriter workonthebackend —thedatainstream. Youcanusetheseobjectstodotransparenttranscodings,e.g.,fromLatin-1 toUTF-8andback. Thestreamargumentmustbeafile-likeobject. Theencodeanddecodeargumentsmust adheretotheCodecinterface.Readerand Writermustbefactoryfunctionsorclassesprovidingobjectsofthe StreamReaderandStreamWriterinterfacerespectively. Errorhandlingisdoneinthesamewayasdefinedforthestreamreadersand writers. StreamRecoderinstancesdefinethecombinedinterfacesof StreamReaderandStreamWriterclasses.Theyinheritallother methodsandattributesfromtheunderlyingstream. EncodingsandUnicode¶ Stringsarestoredinternallyassequencesofcodepointsin rangeU+0000–U+10FFFF.(SeePEP393for moredetailsabouttheimplementation.) OnceastringobjectisusedoutsideofCPUandmemory,endianness andhowthesearraysarestoredasbytesbecomeanissue.Aswithother codecs,serialisingastringintoasequenceofbytesisknownasencoding, andrecreatingthestringfromthesequenceofbytesisknownasdecoding. Thereareavarietyofdifferenttextserialisationcodecs,whichare collectivityreferredtoastextencodings. Thesimplesttextencoding(called'latin-1'or'iso-8859-1')maps thecodepoints0–255tothebytes0x0–0xff,whichmeansthatastring objectthatcontainscodepointsaboveU+00FFcan’tbeencodedwiththis codec.DoingsowillraiseaUnicodeEncodeErrorthatlooks likethefollowing(althoughthedetailsoftheerrormessagemaydiffer): UnicodeEncodeError:'latin-1'codeccan'tencodecharacter'\u1234'in position3:ordinalnotinrange(256). There’sanothergroupofencodings(thesocalledcharmapencodings)thatchoose adifferentsubsetofallUnicodecodepointsandhowthesecodepointsare mappedtothebytes0x0–0xff.Toseehowthisisdonesimplyopen e.g.encodings/cp1252.py(whichisanencodingthatisusedprimarilyon Windows).There’sastringconstantwith256charactersthatshowsyouwhich characterismappedtowhichbytevalue. Alloftheseencodingscanonlyencode256ofthe1114112codepoints definedinUnicode.AsimpleandstraightforwardwaythatcanstoreeachUnicode codepoint,istostoreeachcodepointasfourconsecutivebytes.Therearetwo possibilities:storethebytesinbigendianorinlittleendianorder.These twoencodingsarecalledUTF-32-BEandUTF-32-LErespectively.Their disadvantageisthatife.g.youuseUTF-32-BEonalittleendianmachineyou willalwayshavetoswapbytesonencodinganddecoding.UTF-32avoidsthis problem:byteswillalwaysbeinnaturalendianness.Whenthesebytesareread byaCPUwithadifferentendianness,thenbyteshavetobeswappedthough.To beabletodetecttheendiannessofaUTF-16orUTF-32bytesequence, there’sthesocalledBOM(“ByteOrderMark”).ThisistheUnicodecharacter U+FEFF.ThischaractercanbeprependedtoeveryUTF-16orUTF-32 bytesequence.Thebyteswappedversionofthischaracter(0xFFFE)isan illegalcharacterthatmaynotappearinaUnicodetext.Sowhenthe firstcharacterinaUTF-16orUTF-32bytesequence appearstobeaU+FFFEthebyteshavetobeswappedondecoding. UnfortunatelythecharacterU+FEFFhadasecondpurposeas aZEROWIDTHNO-BREAKSPACE:acharacterthathasnowidthanddoesn’tallow awordtobesplit.Itcane.g.beusedtogivehintstoaligaturealgorithm. WithUnicode4.0usingU+FEFFasaZEROWIDTHNO-BREAKSPACEhasbeen deprecated(withU+2060(WORDJOINER)assumingthisrole).Nevertheless UnicodesoftwarestillmustbeabletohandleU+FEFFinbothroles:asaBOM it’sadevicetodeterminethestoragelayoutoftheencodedbytes,andvanishes oncethebytesequencehasbeendecodedintoastring;asaZEROWIDTH NO-BREAKSPACEit’sanormalcharacterthatwillbedecodedlikeanyother. There’sanotherencodingthatisabletoencodethefullrangeofUnicode characters:UTF-8.UTF-8isan8-bitencoding,whichmeanstherearenoissues withbyteorderinUTF-8.EachbyteinaUTF-8bytesequenceconsistsoftwo parts:markerbits(themostsignificantbits)andpayloadbits.Themarkerbits areasequenceofzerotofour1bitsfollowedbya0bit.Unicodecharactersare encodedlikethis(withxbeingpayloadbits,whichwhenconcatenatedgivethe Unicodecharacter): U-00000000…U-0000007F 0xxxxxxx U-00000080…U-000007FF 110xxxxx10xxxxxx U-00000800…U-0000FFFF 1110xxxx10xxxxxx10xxxxxx U-00010000…U-0010FFFF 11110xxx10xxxxxx10xxxxxx10xxxxxx TheleastsignificantbitoftheUnicodecharacteristherightmostxbit. AsUTF-8isan8-bitencodingnoBOMisrequiredandanyU+FEFFcharacterin thedecodedstring(evenifit’sthefirstcharacter)istreatedasaZERO WIDTHNO-BREAKSPACE. Withoutexternalinformationit’simpossibletoreliablydeterminewhich encodingwasusedforencodingastring.Eachcharmapencodingcan decodeanyrandombytesequence.Howeverthat’snotpossiblewithUTF-8,as UTF-8bytesequenceshaveastructurethatdoesn’tallowarbitrarybyte sequences.ToincreasethereliabilitywithwhichaUTF-8encodingcanbe detected,MicrosoftinventedavariantofUTF-8(thatPythoncalls "utf-8-sig")foritsNotepadprogram:BeforeanyoftheUnicodecharacters iswrittentothefile,aUTF-8encodedBOM(whichlookslikethisasabyte sequence:0xef,0xbb,0xbf)iswritten.Asit’sratherimprobable thatanycharmapencodedfilestartswiththesebytevalues(whichwoulde.g. mapto LATINSMALLLETTERIWITHDIAERESIS RIGHT-POINTINGDOUBLEANGLEQUOTATIONMARK INVERTEDQUESTIONMARK iniso-8859-1),thisincreasestheprobabilitythatautf-8-sigencodingcanbe correctlyguessedfromthebytesequence.SoheretheBOMisnotusedtobeable todeterminethebyteorderusedforgeneratingthebytesequence,butasa signaturethathelpsinguessingtheencoding.Onencodingtheutf-8-sigcodec willwrite0xef,0xbb,0xbfasthefirstthreebytestothefile.On decodingutf-8-sigwillskipthosethreebytesiftheyappearasthefirst threebytesinthefile.InUTF-8,theuseoftheBOMisdiscouragedand shouldgenerallybeavoided. StandardEncodings¶ Pythoncomeswithanumberofcodecsbuilt-in,eitherimplementedasCfunctions orwithdictionariesasmappingtables.Thefollowingtableliststhecodecsby name,togetherwithafewcommonaliases,andthelanguagesforwhichthe encodingislikelyused.Neitherthelistofaliasesnorthelistoflanguages ismeanttobeexhaustive.Noticethatspellingalternativesthatonlydifferin caseoruseahypheninsteadofanunderscorearealsovalidaliases;therefore, e.g.'utf-8'isavalidaliasforthe'utf_8'codec. CPythonimplementationdetail:Somecommonencodingscanbypassthecodecslookupmachineryto improveperformance.Theseoptimizationopportunitiesareonly recognizedbyCPythonforalimitedsetof(caseinsensitive) aliases:utf-8,utf8,latin-1,latin1,iso-8859-1,iso8859-1,mbcs (Windowsonly),ascii,us-ascii,utf-16,utf16,utf-32,utf32,and thesameusingunderscoresinsteadofdashes.Usingalternative aliasesfortheseencodingsmayresultinslowerexecution. Changedinversion3.6:Optimizationopportunityrecognizedforus-ascii. Manyofthecharactersetssupportthesamelanguages.Theyvaryinindividual characters(e.g.whethertheEUROSIGNissupportedornot),andinthe assignmentofcharacterstocodepositions.FortheEuropeanlanguagesin particular,thefollowingvariantstypicallyexist: anISO8859codeset aMicrosoftWindowscodepage,whichistypicallyderivedfroman8859codeset, butreplacescontrolcharacterswithadditionalgraphiccharacters anIBMEBCDICcodepage anIBMPCcodepage,whichisASCIIcompatible ascii 646,us-ascii English big5 big5-tw,csbig5 TraditionalChinese big5hkscs big5-hkscs,hkscs TraditionalChinese cp037 IBM037,IBM039 English cp273 273,IBM273,csIBM273 German Newinversion3.4. cp424 EBCDIC-CP-HE,IBM424 Hebrew cp437 437,IBM437 English cp500 EBCDIC-CP-BE,EBCDIC-CP-CH, IBM500 WesternEurope cp720 Arabic cp737 Greek cp775 IBM775 Balticlanguages cp850 850,IBM850 WesternEurope cp852 852,IBM852 CentralandEasternEurope cp855 855,IBM855 Bulgarian,Byelorussian, Macedonian,Russian,Serbian cp856 Hebrew cp857 857,IBM857 Turkish cp858 858,IBM858 WesternEurope cp860 860,IBM860 Portuguese cp861 861,CP-IS,IBM861 Icelandic cp862 862,IBM862 Hebrew cp863 863,IBM863 Canadian cp864 IBM864 Arabic cp865 865,IBM865 Danish,Norwegian cp866 866,IBM866 Russian cp869 869,CP-GR,IBM869 Greek cp874 Thai cp875 Greek cp932 932,ms932,mskanji,ms-kanji Japanese cp949 949,ms949,uhc Korean cp950 950,ms950 TraditionalChinese cp1006 Urdu cp1026 ibm1026 Turkish cp1125 1125,ibm1125,cp866u,ruscii Ukrainian Newinversion3.4. cp1140 ibm1140 WesternEurope cp1250 windows-1250 CentralandEasternEurope cp1251 windows-1251 Bulgarian,Byelorussian, Macedonian,Russian,Serbian cp1252 windows-1252 WesternEurope cp1253 windows-1253 Greek cp1254 windows-1254 Turkish cp1255 windows-1255 Hebrew cp1256 windows-1256 Arabic cp1257 windows-1257 Balticlanguages cp1258 windows-1258 Vietnamese euc_jp eucjp,ujis,u-jis Japanese euc_jis_2004 jisx0213,eucjis2004 Japanese euc_jisx0213 eucjisx0213 Japanese euc_kr euckr,korean,ksc5601, ks_c-5601,ks_c-5601-1987, ksx1001,ks_x-1001 Korean gb2312 chinese,csiso58gb231280, euc-cn,euccn,eucgb2312-cn, gb2312-1980,gb2312-80, iso-ir-58 SimplifiedChinese gbk 936,cp936,ms936 UnifiedChinese gb18030 gb18030-2000 UnifiedChinese hz hzgb,hz-gb,hz-gb-2312 SimplifiedChinese iso2022_jp csiso2022jp,iso2022jp, iso-2022-jp Japanese iso2022_jp_1 iso2022jp-1,iso-2022-jp-1 Japanese iso2022_jp_2 iso2022jp-2,iso-2022-jp-2 Japanese,Korean,Simplified Chinese,WesternEurope,Greek iso2022_jp_2004 iso2022jp-2004, iso-2022-jp-2004 Japanese iso2022_jp_3 iso2022jp-3,iso-2022-jp-3 Japanese iso2022_jp_ext iso2022jp-ext,iso-2022-jp-ext Japanese iso2022_kr csiso2022kr,iso2022kr, iso-2022-kr Korean latin_1 iso-8859-1,iso8859-1,8859, cp819,latin,latin1,L1 WesternEurope iso8859_2 iso-8859-2,latin2,L2 CentralandEasternEurope iso8859_3 iso-8859-3,latin3,L3 Esperanto,Maltese iso8859_4 iso-8859-4,latin4,L4 Balticlanguages iso8859_5 iso-8859-5,cyrillic Bulgarian,Byelorussian, Macedonian,Russian,Serbian iso8859_6 iso-8859-6,arabic Arabic iso8859_7 iso-8859-7,greek,greek8 Greek iso8859_8 iso-8859-8,hebrew Hebrew iso8859_9 iso-8859-9,latin5,L5 Turkish iso8859_10 iso-8859-10,latin6,L6 Nordiclanguages iso8859_11 iso-8859-11,thai Thailanguages iso8859_13 iso-8859-13,latin7,L7 Balticlanguages iso8859_14 iso-8859-14,latin8,L8 Celticlanguages iso8859_15 iso-8859-15,latin9,L9 WesternEurope iso8859_16 iso-8859-16,latin10,L10 South-EasternEurope johab cp1361,ms1361 Korean koi8_r Russian koi8_t Tajik Newinversion3.5. koi8_u Ukrainian kz1048 kz_1048,strk1048_2002,rk1048 Kazakh Newinversion3.5. mac_cyrillic maccyrillic Bulgarian,Byelorussian, Macedonian,Russian,Serbian mac_greek macgreek Greek mac_iceland maciceland Icelandic mac_latin2 maclatin2,maccentraleurope, mac_centeuro CentralandEasternEurope mac_roman macroman,macintosh WesternEurope mac_turkish macturkish Turkish ptcp154 csptcp154,pt154,cp154, cyrillic-asian Kazakh shift_jis csshiftjis,shiftjis,sjis, s_jis Japanese shift_jis_2004 shiftjis2004,sjis_2004, sjis2004 Japanese shift_jisx0213 shiftjisx0213,sjisx0213, s_jisx0213 Japanese utf_32 U32,utf32 alllanguages utf_32_be UTF-32BE alllanguages utf_32_le UTF-32LE alllanguages utf_16 U16,utf16 alllanguages utf_16_be UTF-16BE alllanguages utf_16_le UTF-16LE alllanguages utf_7 U7,unicode-1-1-utf-7 alllanguages utf_8 U8,UTF,utf8,cp65001 alllanguages utf_8_sig alllanguages Changedinversion3.4:Theutf-16*andutf-32*encodersnolongerallowsurrogatecodepoints (U+D800–U+DFFF)tobeencoded. Theutf-32*decodersnolongerdecode bytesequencesthatcorrespondtosurrogatecodepoints. Changedinversion3.8:cp65001isnowanaliastoutf_8. PythonSpecificEncodings¶ AnumberofpredefinedcodecsarespecifictoPython,sotheircodecnameshave nomeaningoutsidePython.Thesearelistedinthetablesbelowbasedonthe expectedinputandoutputtypes(notethatwhiletextencodingsarethemost commonusecaseforcodecs,theunderlyingcodecinfrastructuresupports arbitrarydatatransformsratherthanjusttextencodings).Forasymmetric codecs,thestatedmeaningdescribestheencodingdirection. TextEncodings¶ Thefollowingcodecsprovidestrtobytesencodingand bytes-likeobjecttostrdecoding,similartotheUnicodetext encodings. idna ImplementRFC3490, seealso encodings.idna. Onlyerrors='strict' issupported. mbcs ansi, dbcs Windowsonly:Encodethe operandaccordingtothe ANSIcodepage(CP_ACP). oem Windowsonly:Encodethe operandaccordingtothe OEMcodepage(CP_OEMCP). Newinversion3.6. palmos EncodingofPalmOS3.5. punycode ImplementRFC3492. Statefulcodecsarenot supported. raw_unicode_escape Latin-1encodingwith \uXXXXand \UXXXXXXXXforother codepoints.Existing backslashesarenot escapedinanyway. ItisusedinthePython pickleprotocol. undefined Raiseanexceptionfor allconversions,even emptystrings.Theerror handlerisignored. unicode_escape Encodingsuitableasthe contentsofaUnicode literalinASCII-encoded Pythonsourcecode, exceptthatquotesare notescaped.Decode fromLatin-1sourcecode. BewarethatPythonsource codeactuallyusesUTF-8 bydefault. Changedinversion3.8:“unicode_internal”codecisremoved. BinaryTransforms¶ Thefollowingcodecsprovidebinarytransforms:bytes-likeobject tobytesmappings.Theyarenotsupportedbybytes.decode() (whichonlyproducesstroutput). base64_codec1 base64,base_64 Converttheoperandto multilineMIMEbase64(the resultalwaysincludesa trailing'\n'). Changedinversion3.4:acceptsany bytes-likeobject asinputforencodingand decoding base64.encodebytes()/ base64.decodebytes() bz2_codec bz2 Compresstheoperandusing bz2. bz2.compress()/ bz2.decompress() hex_codec hex Converttheoperandto hexadecimal representation,withtwo digitsperbyte. binascii.b2a_hex()/ binascii.a2b_hex() quopri_codec quopri, quotedprintable, quoted_printable ConverttheoperandtoMIME quotedprintable. quopri.encode()with quotetabs=True/ quopri.decode() uu_codec uu Converttheoperandusing uuencode. uu.encode()/ uu.decode() zlib_codec zip,zlib Compresstheoperandusing gzip. zlib.compress()/ zlib.decompress() 1 Inadditiontobytes-likeobjects, 'base64_codec'alsoacceptsASCII-onlyinstancesofstrfor decoding Newinversion3.2:Restorationofthebinarytransforms. Changedinversion3.4:Restorationofthealiasesforthebinarytransforms. TextTransforms¶ Thefollowingcodecprovidesatexttransform:astrtostr mapping.Itisnotsupportedbystr.encode()(whichonlyproduces bytesoutput). rot_13 rot13 ReturntheCaesar-cypher encryptionofthe operand. Newinversion3.2:Restorationoftherot_13texttransform. Changedinversion3.4:Restorationoftherot13alias. encodings.idna—InternationalizedDomainNamesinApplications¶ ThismoduleimplementsRFC3490(InternationalizedDomainNamesin Applications)andRFC3492(Nameprep:AStringprepProfilefor InternationalizedDomainNames(IDN)).Itbuildsuponthepunycodeencoding andstringprep. IfyouneedtheIDNA2008standardfromRFC5891andRFC5895,usethe third-partyidnamodule. TheseRFCstogetherdefineaprotocoltosupportnon-ASCIIcharactersindomain names.Adomainnamecontainingnon-ASCIIcharacters(suchas www.Alliancefrançaise.nu)isconvertedintoanASCII-compatibleencoding (ACE,suchaswww.xn--alliancefranaise-npb.nu).TheACEformofthedomain nameisthenusedinallplaceswherearbitrarycharactersarenotallowedby theprotocol,suchasDNSqueries,HTTPHostfields,andso on.Thisconversioniscarriedoutintheapplication;ifpossibleinvisibleto theuser:TheapplicationshouldtransparentlyconvertUnicodedomainlabelsto IDNAonthewire,andconvertbackACElabelstoUnicodebeforepresentingthem totheuser. Pythonsupportsthisconversioninseveralways:theidnacodecperforms conversionbetweenUnicodeandACE,separatinganinputstringintolabels basedontheseparatorcharactersdefinedinsection3.1ofRFC3490 andconvertingeachlabeltoACEasrequired,andconverselyseparatinganinput bytestringintolabelsbasedonthe.separatorandconvertinganyACE labelsfoundintounicode.Furthermore,thesocketmodule transparentlyconvertsUnicodehostnamestoACE,sothatapplicationsneednot beconcernedaboutconvertinghostnamesthemselveswhentheypassthemtothe socketmodule.Ontopofthat,modulesthathavehostnamesasfunction parameters,suchashttp.clientandftplib,acceptUnicodehost names(http.clientthenalsotransparentlysendsanIDNAhostnameinthe Hostfieldifitsendsthatfieldatall). Whenreceivinghostnamesfromthewire(suchasinreversenamelookup),no automaticconversiontoUnicodeisperformed:applicationswishingtopresent suchhostnamestotheusershoulddecodethemtoUnicode. Themoduleencodings.idnaalsoimplementsthenameprepprocedure,which performscertainnormalizationsonhostnames,toachievecase-insensitivityof internationaldomainnames,andtounifysimilarcharacters.Thenameprep functionscanbeuseddirectlyifdesired. encodings.idna.nameprep(label)¶ Returnthenamepreppedversionoflabel.Theimplementationcurrentlyassumes querystrings,soAllowUnassignedistrue. encodings.idna.ToASCII(label)¶ ConvertalabeltoASCII,asspecifiedinRFC3490.UseSTD3ASCIIRulesis assumedtobefalse. encodings.idna.ToUnicode(label)¶ ConvertalabeltoUnicode,asspecifiedinRFC3490. encodings.mbcs—WindowsANSIcodepage¶ ThismoduleimplementstheANSIcodepage(CP_ACP). Availability:Windowsonly. Changedinversion3.3:Supportanyerrorhandler. Changedinversion3.2:Before3.2,theerrorsargumentwasignored;'replace'wasalwaysused toencode,and'ignore'todecode. encodings.utf_8_sig—UTF-8codecwithBOMsignature¶ ThismoduleimplementsavariantoftheUTF-8codec.Onencoding,aUTF-8encoded BOMwillbeprependedtotheUTF-8encodedbytes.Forthestatefulencoderthis isonlydoneonce(onthefirstwritetothebytestream).Ondecoding,an optionalUTF-8encodedBOMatthestartofthedatawillbeskipped. TableofContents codecs—Codecregistryandbaseclasses CodecBaseClasses ErrorHandlers StatelessEncodingandDecoding IncrementalEncodingandDecoding IncrementalEncoderObjects IncrementalDecoderObjects StreamEncodingandDecoding StreamWriterObjects StreamReaderObjects StreamReaderWriterObjects StreamRecoderObjects EncodingsandUnicode StandardEncodings PythonSpecificEncodings TextEncodings BinaryTransforms TextTransforms encodings.idna—InternationalizedDomainNamesinApplications encodings.mbcs—WindowsANSIcodepage encodings.utf_8_sig—UTF-8codecwithBOMsignature Previoustopic struct—Interpretbytesaspackedbinarydata Nexttopic DataTypes ThisPage ReportaBug ShowSource Navigation index modules| next| previous| Python» 3.10.7Documentation» ThePythonStandardLibrary» BinaryDataServices» codecs—Codecregistryandbaseclasses |
延伸文章資訊
- 1【Python 必會技巧】利用utf-8-sig 編碼格式解決寫入csv 文件 ...
先舉個例子,分別以不指定編碼、指定編碼爲utf-8、指定編碼爲utf-8-sig 三種方式來做比較,再將寫入csv 文件和txt 文件來做個對比一、不指定編碼方式 ...
- 2Python中utf-8与utf-8-sig两种编码格式的区别 - CSDN博客
- 3python 字符串编码,区别utf-8 和utf-8-sig - 静悟生慧- 博客园
要打开的路径比预期A.txt多了一串字符"\ufeff", 显然无法正确打开文件. 解决方案:. 在读取B.txt 时,指定编码方式为"utf-8-sig"即可 如下 ...
- 4Python利用utf-8-sig 編碼格式解決寫入csv 檔案亂碼問題
先舉個例子,分別以 不指定編碼 、 指定編碼為utf-8 、 指定編碼為utf-8-sig 三種方式來做比較,再將寫入csv 檔案和txt 檔案來做個對比.
- 5Python如何利用utf-8-sig编码格式解决写入csv文件乱码问题
这篇文章主要介绍了Python如何利用utf-8-sig编码格式解决写入csv文件乱码问题,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章 ...