pandas.read_csv — pandas 1.5.0 documentation

文章推薦指數: 80 %
投票人數:10人

Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks. Ctrl+K SiteNavigation Gettingstarted UserGuide APIreference Development Releasenotes 1.5.0 GitHub Twitter SiteNavigation Gettingstarted UserGuide APIreference Development Releasenotes 1.5.0 GitHub Twitter Input/output pandas.read_pickle pandas.DataFrame.to_pickle pandas.read_table pandas.read_csv pandas.DataFrame.to_csv pandas.read_fwf pandas.read_clipboard pandas.DataFrame.to_clipboard pandas.read_excel pandas.DataFrame.to_excel pandas.ExcelFile.parse pandas.io.formats.style.Styler.to_excel pandas.ExcelWriter pandas.read_json pandas.json_normalize pandas.DataFrame.to_json pandas.io.json.build_table_schema pandas.read_html pandas.DataFrame.to_html pandas.io.formats.style.Styler.to_html pandas.read_xml pandas.DataFrame.to_xml pandas.DataFrame.to_latex pandas.io.formats.style.Styler.to_latex pandas.read_hdf pandas.HDFStore.put pandas.HDFStore.append pandas.HDFStore.get pandas.HDFStore.select pandas.HDFStore.info pandas.HDFStore.keys pandas.HDFStore.groups pandas.HDFStore.walk pandas.read_feather pandas.DataFrame.to_feather pandas.read_parquet pandas.DataFrame.to_parquet pandas.read_orc pandas.DataFrame.to_orc pandas.read_sas pandas.read_spss pandas.read_sql_table pandas.read_sql_query pandas.read_sql pandas.DataFrame.to_sql pandas.read_gbq pandas.read_stata pandas.DataFrame.to_stata pandas.io.stata.StataReader.data_label pandas.io.stata.StataReader.value_labels pandas.io.stata.StataReader.variable_labels pandas.io.stata.StataWriter.write_file Generalfunctions Series DataFrame pandasarrays,scalars,anddatatypes Indexobjects Dateoffsets Window GroupBy Resampling Style Plotting Optionsandsettings Extensions Testing pandas.read_csv# pandas.read_csv(filepath_or_buffer,sep=_NoDefault.no_default,delimiter=None,header='infer',names=_NoDefault.no_default,index_col=None,usecols=None,squeeze=None,prefix=_NoDefault.no_default,mangle_dupe_cols=True,dtype=None,engine=None,converters=None,true_values=None,false_values=None,skipinitialspace=False,skiprows=None,skipfooter=0,nrows=None,na_values=None,keep_default_na=True,na_filter=True,verbose=False,skip_blank_lines=True,parse_dates=None,infer_datetime_format=False,keep_date_col=False,date_parser=None,dayfirst=False,cache_dates=True,iterator=False,chunksize=None,compression='infer',thousands=None,decimal='.',lineterminator=None,quotechar='"',quoting=0,doublequote=True,escapechar=None,comment=None,encoding=None,encoding_errors='strict',dialect=None,error_bad_lines=None,warn_bad_lines=None,on_bad_lines=None,delim_whitespace=False,low_memory=True,memory_map=False,float_precision=None,storage_options=None)[source]# Readacomma-separatedvalues(csv)fileintoDataFrame. Alsosupportsoptionallyiteratingorbreakingofthefile intochunks. Additionalhelpcanbefoundintheonlinedocsfor IOTools. Parameters filepath_or_bufferstr,pathobjectorfile-likeobjectAnyvalidstringpathisacceptable.ThestringcouldbeaURL.Valid URLschemesincludehttp,ftp,s3,gs,andfile.ForfileURLs,ahostis expected.Alocalfilecouldbe:file://localhost/path/to/table.csv. Ifyouwanttopassinapathobject,pandasacceptsanyos.PathLike. Byfile-likeobject,werefertoobjectswitharead()method,suchas afilehandle(e.g.viabuiltinopenfunction)orStringIO. sepstr,default‘,’Delimitertouse.IfsepisNone,theCenginecannotautomaticallydetect theseparator,butthePythonparsingenginecan,meaningthelatterwill beusedandautomaticallydetecttheseparatorbyPython’sbuiltinsniffer tool,csv.Sniffer.Inaddition,separatorslongerthan1characterand differentfrom'\s+'willbeinterpretedasregularexpressionsand willalsoforcetheuseofthePythonparsingengine.Notethatregex delimitersarepronetoignoringquoteddata.Regexexample:'\r\t'. delimiterstr,defaultNoneAliasforsep. headerint,listofint,None,default‘infer’Rownumber(s)touseasthecolumnnames,andthestartofthe data.Defaultbehavioristoinferthecolumnnames:ifnonames arepassedthebehaviorisidenticaltoheader=0andcolumn namesareinferredfromthefirstlineofthefile,ifcolumn namesarepassedexplicitlythenthebehaviorisidenticalto header=None.Explicitlypassheader=0tobeableto replaceexistingnames.Theheadercanbealistofintegersthat specifyrowlocationsforamulti-indexonthecolumns e.g.[0,1,3].Interveningrowsthatarenotspecifiedwillbe skipped(e.g.2inthisexampleisskipped).Notethatthis parameterignorescommentedlinesandemptylinesif skip_blank_lines=True,soheader=0denotesthefirstlineof dataratherthanthefirstlineofthefile. namesarray-like,optionalListofcolumnnamestouse.Ifthefilecontainsaheaderrow, thenyoushouldexplicitlypassheader=0tooverridethecolumnnames. Duplicatesinthislistarenotallowed. index_colint,str,sequenceofint/str,orFalse,optional,defaultNoneColumn(s)touseastherowlabelsoftheDataFrame,eithergivenas stringnameorcolumnindex.Ifasequenceofint/strisgiven,a MultiIndexisused. Note:index_col=Falsecanbeusedtoforcepandastonotusethefirst columnastheindex,e.g.whenyouhaveamalformedfilewithdelimitersat theendofeachline. usecolslist-likeorcallable,optionalReturnasubsetofthecolumns.Iflist-like,allelementsmusteither bepositional(i.e.integerindicesintothedocumentcolumns)orstrings thatcorrespondtocolumnnamesprovidedeitherbytheuserinnamesor inferredfromthedocumentheaderrow(s).Ifnamesaregiven,thedocument headerrow(s)arenottakenintoaccount.Forexample,avalidlist-like usecolsparameterwouldbe[0,1,2]or['foo','bar','baz']. Elementorderisignored,sousecols=[0,1]isthesameas[1,0]. ToinstantiateaDataFramefromdatawithelementorderpreserveduse pd.read_csv(data,usecols=['foo','bar'])[['foo','bar']]forcolumns in['foo','bar']orderor pd.read_csv(data,usecols=['foo','bar'])[['bar','foo']] for['bar','foo']order. Ifcallable,thecallablefunctionwillbeevaluatedagainstthecolumn names,returningnameswherethecallablefunctionevaluatestoTrue.An exampleofavalidcallableargumentwouldbelambdax:x.upper()in ['AAA','BBB','DDD'].Usingthisparameterresultsinmuchfaster parsingtimeandlowermemoryusage. squeezebool,defaultFalseIftheparseddataonlycontainsonecolumnthenreturnaSeries. Deprecatedsinceversion1.4.0:Append.squeeze("columns")tothecalltoread_csvtosqueeze thedata. prefixstr,optionalPrefixtoaddtocolumnnumberswhennoheader,e.g.‘X’forX0,X1,… Deprecatedsinceversion1.4.0:UsealistcomprehensionontheDataFrame’scolumnsaftercallingread_csv. mangle_dupe_colsbool,defaultTrueDuplicatecolumnswillbespecifiedas‘X’,‘X.1’,…’X.N’,ratherthan ‘X’…’X’.PassinginFalsewillcausedatatobeoverwrittenifthere areduplicatenamesinthecolumns. Deprecatedsinceversion1.5.0:Notimplemented,andanewargumenttospecifythepatternforthe namesofduplicatedcolumnswillbeaddedinstead dtypeTypenameordictofcolumn->type,optionalDatatypefordataorcolumns.E.g.{‘a’:np.float64,‘b’:np.int32, ‘c’:‘Int64’} Usestrorobjecttogetherwithsuitablena_valuessettings topreserveandnotinterpretdtype. Ifconvertersarespecified,theywillbeappliedINSTEAD ofdtypeconversion. Newinversion1.5.0:Supportfordefaultdictwasadded.Specifyadefaultdictasinputwhere thedefaultdeterminesthedtypeofthecolumnswhicharenotexplicitly listed. engine{‘c’,‘python’,‘pyarrow’},optionalParserenginetouse.TheCandpyarrowenginesarefaster,whilethepythonengine iscurrentlymorefeature-complete.Multithreadingiscurrentlyonlysupportedby thepyarrowengine. Newinversion1.4.0:The“pyarrow”enginewasaddedasanexperimentalengine,andsomefeatures areunsupported,ormaynotworkcorrectly,withthisengine. convertersdict,optionalDictoffunctionsforconvertingvaluesincertaincolumns.Keyscaneither beintegersorcolumnlabels. true_valueslist,optionalValuestoconsiderasTrue. false_valueslist,optionalValuestoconsiderasFalse. skipinitialspacebool,defaultFalseSkipspacesafterdelimiter. skiprowslist-like,intorcallable,optionalLinenumberstoskip(0-indexed)ornumberoflinestoskip(int) atthestartofthefile. Ifcallable,thecallablefunctionwillbeevaluatedagainsttherow indices,returningTrueiftherowshouldbeskippedandFalseotherwise. Anexampleofavalidcallableargumentwouldbelambdax:xin[0,2]. skipfooterint,default0Numberoflinesatbottomoffiletoskip(Unsupportedwithengine=’c’). nrowsint,optionalNumberofrowsoffiletoread.Usefulforreadingpiecesoflargefiles. na_valuesscalar,str,list-like,ordict,optionalAdditionalstringstorecognizeasNA/NaN.Ifdictpassed,specific per-columnNAvalues.Bydefaultthefollowingvaluesareinterpretedas NaN:‘’,‘#N/A’,‘#N/AN/A’,‘#NA’,‘-1.#IND’,‘-1.#QNAN’,‘-NaN’,‘-nan’, ‘1.#IND’,‘1.#QNAN’,‘’,‘N/A’,‘NA’,‘NULL’,‘NaN’,‘n/a’, ‘nan’,‘null’. keep_default_nabool,defaultTrueWhetherornottoincludethedefaultNaNvalueswhenparsingthedata. Dependingonwhetherna_valuesispassedin,thebehaviorisasfollows: Ifkeep_default_naisTrue,andna_valuesarespecified,na_values isappendedtothedefaultNaNvaluesusedforparsing. Ifkeep_default_naisTrue,andna_valuesarenotspecified,only thedefaultNaNvaluesareusedforparsing. Ifkeep_default_naisFalse,andna_valuesarespecified,only theNaNvaluesspecifiedna_valuesareusedforparsing. Ifkeep_default_naisFalse,andna_valuesarenotspecified,no stringswillbeparsedasNaN. Notethatifna_filterispassedinasFalse,thekeep_default_naand na_valuesparameterswillbeignored. na_filterbool,defaultTrueDetectmissingvaluemarkers(emptystringsandthevalueofna_values).In datawithoutanyNAs,passingna_filter=Falsecanimprovetheperformance ofreadingalargefile. verbosebool,defaultFalseIndicatenumberofNAvaluesplacedinnon-numericcolumns. skip_blank_linesbool,defaultTrueIfTrue,skipoverblanklinesratherthaninterpretingasNaNvalues. parse_datesboolorlistofintornamesorlistoflistsordict,defaultFalseThebehaviorisasfollows: boolean.IfTrue->tryparsingtheindex. listofintornames.e.g.If[1,2,3]->tryparsingcolumns1,2,3 eachasaseparatedatecolumn. listoflists.e.g.If[[1,3]]->combinecolumns1and3andparseas asingledatecolumn. dict,e.g.{‘foo’:[1,3]}->parsecolumns1,3asdateandcall result‘foo’ Ifacolumnorindexcannotberepresentedasanarrayofdatetimes, saybecauseofanunparsablevalueoramixtureoftimezones,thecolumn orindexwillbereturnedunalteredasanobjectdatatype.For non-standarddatetimeparsing,usepd.to_datetimeafter pd.read_csv.Toparseanindexorcolumnwithamixtureoftimezones, specifydate_parsertobeapartially-applied pandas.to_datetime()withutc=True.See ParsingaCSVwithmixedtimezonesformore. Note:Afast-pathexistsforiso8601-formatteddates. infer_datetime_formatbool,defaultFalseIfTrueandparse_datesisenabled,pandaswillattempttoinferthe formatofthedatetimestringsinthecolumns,andifitcanbeinferred, switchtoafastermethodofparsingthem.Insomecasesthiscanincrease theparsingspeedby5-10x. keep_date_colbool,defaultFalseIfTrueandparse_datesspecifiescombiningmultiplecolumnsthen keeptheoriginalcolumns. date_parserfunction,optionalFunctiontouseforconvertingasequenceofstringcolumnstoanarrayof datetimeinstances.Thedefaultusesdateutil.parser.parsertodothe conversion.Pandaswilltrytocalldate_parserinthreedifferentways, advancingtothenextifanexceptionoccurs:1)Passoneormorearrays (asdefinedbyparse_dates)asarguments;2)concatenate(row-wise)the stringvaluesfromthecolumnsdefinedbyparse_datesintoasinglearray andpassthat;and3)calldate_parseronceforeachrowusingoneor morestrings(correspondingtothecolumnsdefinedbyparse_dates)as arguments. dayfirstbool,defaultFalseDD/MMformatdates,internationalandEuropeanformat. cache_datesbool,defaultTrueIfTrue,useacacheofunique,converteddatestoapplythedatetime conversion.Mayproducesignificantspeed-upwhenparsingduplicate datestrings,especiallyoneswithtimezoneoffsets. Newinversion0.25.0. iteratorbool,defaultFalseReturnTextFileReaderobjectforiterationorgettingchunkswith get_chunk(). Changedinversion1.2:TextFileReaderisacontextmanager. chunksizeint,optionalReturnTextFileReaderobjectforiteration. SeetheIOToolsdocs formoreinformationoniteratorandchunksize. Changedinversion1.2:TextFileReaderisacontextmanager. compressionstrordict,default‘infer’Foron-the-flydecompressionofon-diskdata.If‘infer’and‘filepath_or_buffer’is path-like,thendetectcompressionfromthefollowingextensions:‘.gz’, ‘.bz2’,‘.zip’,‘.xz’,‘.zst’,‘.tar’,‘.tar.gz’,‘.tar.xz’or‘.tar.bz2’ (otherwisenocompression). Ifusing‘zip’or‘tar’,theZIPfilemustcontainonlyonedatafiletobereadin. SettoNonefornodecompression. Canalsobeadictwithkey'method'set tooneof{'zip','gzip','bz2','zstd','tar'}andother key-valuepairsareforwardedto zipfile.ZipFile,gzip.GzipFile, bz2.BZ2File,zstandard.ZstdDecompressoror tarfile.TarFile,respectively. Asanexample,thefollowingcouldbepassedforZstandarddecompressionusinga customcompressiondictionary: compression={'method':'zstd','dict_data':my_compression_dict}. Newinversion1.5.0:Addedsupportfor.tarfiles. Changedinversion1.4.0:Zstandardsupport. thousandsstr,optionalThousandsseparator. decimalstr,default‘.’Charactertorecognizeasdecimalpoint(e.g.use‘,’forEuropeandata). lineterminatorstr(length1),optionalCharactertobreakfileintolines.OnlyvalidwithCparser. quotecharstr(length1),optionalThecharacterusedtodenotethestartandendofaquoteditem.Quoted itemscanincludethedelimiteranditwillbeignored. quotingintorcsv.QUOTE_*instance,default0Controlfieldquotingbehaviorpercsv.QUOTE_*constants.Useoneof QUOTE_MINIMAL(0),QUOTE_ALL(1),QUOTE_NONNUMERIC(2)orQUOTE_NONE(3). doublequotebool,defaultTrueWhenquotecharisspecifiedandquotingisnotQUOTE_NONE,indicate whetherornottointerprettwoconsecutivequotecharelementsINSIDEa fieldasasinglequotecharelement. escapecharstr(length1),optionalOne-characterstringusedtoescapeothercharacters. commentstr,optionalIndicatesremainderoflineshouldnotbeparsed.Iffoundatthebeginning ofaline,thelinewillbeignoredaltogether.Thisparametermustbea singlecharacter.Likeemptylines(aslongasskip_blank_lines=True), fullycommentedlinesareignoredbytheparameterheaderbutnotby skiprows.Forexample,ifcomment='#',parsing #empty\na,b,c\n1,2,3withheader=0willresultin‘a,b,c’being treatedastheheader. encodingstr,optionalEncodingtouseforUTFwhenreading/writing(ex.‘utf-8’).ListofPython standardencodings. Changedinversion1.2:WhenencodingisNone,errors="replace"ispassedto open().Otherwise,errors="strict"ispassedtoopen(). Thisbehaviorwaspreviouslyonlythecaseforengine="python". Changedinversion1.3.0:encoding_errorsisanewargument.encodinghasnolongeran influenceonhowencodingerrorsarehandled. encoding_errorsstr,optional,default“strict”Howencodingerrorsaretreated.Listofpossiblevalues. Newinversion1.3.0. dialectstrorcsv.Dialect,optionalIfprovided,thisparameterwilloverridevalues(defaultornot)forthe followingparameters:delimiter,doublequote,escapechar, skipinitialspace,quotechar,andquoting.Ifitisnecessaryto overridevalues,aParserWarningwillbeissued.Seecsv.Dialect documentationformoredetails. error_bad_linesbool,optional,defaultNoneLineswithtoomanyfields(e.g.acsvlinewithtoomanycommas)willby defaultcauseanexceptiontoberaised,andnoDataFramewillbereturned. IfFalse,thenthese“badlines”willbedroppedfromtheDataFramethatis returned. Deprecatedsinceversion1.3.0:Theon_bad_linesparametershouldbeusedinsteadtospecifybehaviorupon encounteringabadlineinstead. warn_bad_linesbool,optional,defaultNoneIferror_bad_linesisFalse,andwarn_bad_linesisTrue,awarningforeach “badline”willbeoutput. Deprecatedsinceversion1.3.0:Theon_bad_linesparametershouldbeusedinsteadtospecifybehaviorupon encounteringabadlineinstead. on_bad_lines{‘error’,‘warn’,‘skip’}orcallable,default‘error’Specifieswhattodouponencounteringabadline(alinewithtoomanyfields). Allowedvaluesare: ‘error’,raiseanExceptionwhenabadlineisencountered. ‘warn’,raiseawarningwhenabadlineisencounteredandskipthatline. ‘skip’,skipbadlineswithoutraisingorwarningwhentheyareencountered. Newinversion1.3.0. Newinversion1.4.0: callable,functionwithsignature (bad_line:list[str])->list[str]|Nonethatwillprocessasingle badline.bad_lineisalistofstringssplitbythesep. IfthefunctionreturnsNone,thebadlinewillbeignored. Ifthefunctionreturnsanewlistofstringswithmoreelementsthan expected,aParserWarningwillbeemittedwhiledroppingextraelements. Onlysupportedwhenengine="python" delim_whitespacebool,defaultFalseSpecifieswhetherornotwhitespace(e.g.''or'   ')willbe usedasthesep.Equivalenttosettingsep='\s+'.Ifthisoption issettoTrue,nothingshouldbepassedinforthedelimiter parameter. low_memorybool,defaultTrueInternallyprocessthefileinchunks,resultinginlowermemoryuse whileparsing,butpossiblymixedtypeinference.Toensurenomixed typeseithersetFalse,orspecifythetypewiththedtypeparameter. NotethattheentirefileisreadintoasingleDataFrameregardless, usethechunksizeoriteratorparametertoreturnthedatainchunks. (OnlyvalidwithCparser). memory_mapbool,defaultFalseIfafilepathisprovidedforfilepath_or_buffer,mapthefileobject directlyontomemoryandaccessthedatadirectlyfromthere.Usingthis optioncanimproveperformancebecausethereisnolongeranyI/Ooverhead. float_precisionstr,optionalSpecifieswhichconvertertheCengineshoulduseforfloating-point values.TheoptionsareNoneor‘high’fortheordinaryconverter, ‘legacy’fortheoriginallowerprecisionpandasconverter,and ‘round_trip’fortheround-tripconverter. Changedinversion1.2. storage_optionsdict,optionalExtraoptionsthatmakesenseforaparticularstorageconnection,e.g. host,port,username,password,etc.ForHTTP(S)URLsthekey-valuepairs areforwardedtourllib.request.Requestasheaderoptions.Forother URLs(e.g.startingwith“s3://”,and“gcs://”)thekey-valuepairsare forwardedtofsspec.open.Pleaseseefsspecandurllibformore details,andformoreexamplesonstorageoptionsreferhere. Newinversion1.2. Returns DataFrameorTextParserAcomma-separatedvalues(csv)fileisreturnedastwo-dimensional datastructurewithlabeledaxes. Seealso DataFrame.to_csvWriteDataFrametoacomma-separatedvalues(csv)file. read_csvReadacomma-separatedvalues(csv)fileintoDataFrame. read_fwfReadatableoffixed-widthformattedlinesintoDataFrame. Examples >>>pd.read_csv('data.csv') ShowSource



請為這篇文章評分?