15 ways to read CSV file with pandas - ListenData

文章推薦指數: 80 %
投票人數:10人

This tutorial explains how to read a CSV file in python using read_csv ... Example 6 : Set Index Column; Example 7 : Read CSV File from External URL ... Menu About Index Contact Menu Home SAS Tutorials SASCertification InterviewQuestions Resumes R Python DataScience CreditRisk SQL Excel Functions Advanced Dashboard/Charts VBA/Macros StatisticalAnalysis Resumes Jobs SPSS Calculators Infographics Home » Pandas » Python »15waystoreadCSVfilewithpandas 15waystoreadCSVfilewithpandas DeepanshuBhalla 8Comments Pandas, Python ThistutorialexplainshowtoreadaCSVfileinpythonusingread_csvfunctionofpandaspackage.Withoutuseofread_csvfunction,itisnotstraightforwardtoimportCSVfilewithpythonobject-orientedprogramming.Pandasisanawesomepowerfulpythonpackagefordatamanipulationandsupportsvariousfunctionstoloadandimportdatafromvariousformats.HerewearecoveringhowtodealwithcommonissuesinimportingCSVfile. TableofContents InstallandLoadPandasPackage Makesureyouhavepandaspackagealreadyinstalledonyoursystem.IfyousetuppythonusingAnaconda,itcomeswithpandaspackagesoyoudon'tneedtoinstallitagain.Otherwiseyoucaninstallitbyusingcommandpipinstallpandas.Nextstepistoloadthepackagebyrunningthefollowingcommand.pdisanaliasofpandaspackage.Wewilluseitinsteadoffullname"pandas". importpandasaspd CreateSampleDataforImport Theprogrambelowcreatesasamplepandasdataframewhichcanbeusedfurtherfordemonstration. dt={'ID':[11,12,13,14,15], 'first_name':['David','Jamie','Steve','Stevart','John'], 'company':['Aon','TCS','Google','RBS','.'], 'salary':[74,76,96,71,78]} mydt=pd.DataFrame(dt,columns=['ID','first_name','company','salary']) Thesampledatalookslikebelow- IDfirst_namecompanysalary 011DavidAon74 112JamieTCS76 213SteveGoogle96 314StevartRBS71 415John.78 SavedataasCSVintheworkingdirectory Checkworkingdirectorybeforeyousaveyourdatafile. importos os.getcwd() Incaseyouwanttochangetheworkingdirectory,youcanspecifyitinunderos.chdir()function.SinglebackslashdoesnotworkinPythonsouse2backslasheswhilespecifyingfilelocation. os.chdir("C:\\Users\\DELL\\Documents\\") ThefollowingcommandtellspythontowritedatainCSVformatinyourworkingdirectory. mydt.to_csv('workingfile.csv',index=False) Example1:ReadCSVfilewithheaderrow It'sthebasicsyntaxofread_csv()function.Youjustneedtomentionthefilename.ItassumesyouhavecolumnnamesinfirstrowofyourCSVfile. mydata=pd.read_csv("workingfile.csv") ItstoresthedatathewayItshouldbeaswehaveheadersinthefirstrowofourdatafile. Itisimportanttohighlightthatheader=0isthedefaultvalue.Hencewedon'tneedtomentiontheheader=parameter.Itmeansheaderstartsfromfirstrowasindexinginpythonstartsfrom0.Theabovecodeisequivalenttothislineofcode.pd.read_csv("workingfile.csv",header=0) Inspectdataafterimporting mydata.shape mydata.columns mydata.dtypes Itreturns5numberofrowsand4numberofcolumns.ColumnNamesare['ID','first_name','company','salary'] Seethecolumntypesofdataweimported.first_nameandcompanyarecharactervariables.Remainingvariablesarenumericones. IDint64 first_nameobject companyobject salaryint64 Example2:ReadCSVfilewithheaderinsecondrow Supposeyouhavecolumnorvariablenamesinsecondrow.ToreadthiskindofCSVfile,youcansubmitthefollowingcommand. mydata=pd.read_csv("workingfile.csv",header=1) header=1tellspythontopickheaderfromsecondrow.It'ssettingsecondrowasheader.It'snotarealisticexample.Ijustuseditforillustrationsothatyougetanideahowtosolveit.Tomakeitpractical,youcanaddrandomvaluesinfirstrowinCSVfileandthenimportitagain. 11DavidAon74 012JamieTCS76 113SteveGoogle96 214StevartRBS71 315John.78 DefineyourowncolumnnamesinsteadofheaderrowfromCSVfile mydata0=pd.read_csv("workingfile.csv",skiprows=1,names=['CustID','Name','Companies','Income']) skiprows=1meansweareignoringfirstrowandnames=optionisusedtoassignvariablenamesmanually. CustIDNameCompaniesIncome 011DavidAon74 112JamieTCS76 213SteveGoogle96 314StevartRBS71 415John.78 Example3:Skiprowsbutkeepheader mydata=pd.read_csv("workingfile.csv",skiprows=[1,2]) Inthiscase,weareskippingsecondandthirdrowswhileimporting.Don'tforgetindexstartsfrom0inpythonso0referstofirstrowand1referstosecondrowand2impliesthirdrow. IDfirst_namecompanysalary 013SteveGoogle96 114StevartRBS71 215John.78 Insteadof[1,2]youcanalsowriterange(1,3).Bothmeansthesamethingbutrange()functionisveryusefulwhenyouwanttoskipmanyrowssoitsavestimeofmanuallydefiningrowposition. Hiddensecretofskiprowsoption Whenskiprows=4,itmeansskippingfourrowsfromtop.skiprows=[1,2,3,4]meansskippingrowsfromsecondthroughfifth.Itisbecausewhenlistisspecifiedinskiprows=option,itskipsrowsatindexpositions.Whenasingleintegervalueisspecifiedintheoption,itconsidersskipthoserowsfromtop Example4:ReadCSVfilewithoutheaderrow Ifyouspecify"header=None",pythonwouldassignaseriesofnumbersstartingfrom0to(numberofcolumns-1)ascolumnnames.Inthisdatafile,wehavecolumnnamesinfirstrow. mydata0=pd.read_csv("workingfile.csv",header=None)Seetheoutputshownbelow- Output Addprefixtocolumnnames mydata0=pd.read_csv("workingfile.csv",header=None,prefix="var") Inthiscase,wearesettingvarasprefixwhichtellspythontoincludethiskeywordbeforeeachcolumnname. var0var1var2var3 0IDfirst_namecompanysalary 111DavidAon74 212JamieTCS76 313SteveGoogle96 414StevartRBS71 515John.78 Example5:Specifymissingvalues Thena_values=optionsisusedtosetsomevaluesasblank/missingvalueswhileimportingCSVfile. mydata00=pd.read_csv("workingfile.csv",na_values=['.']) IDfirst_namecompanysalary 011DavidAon74 112JamieTCS76 213SteveGoogle96 314StevartRBS71 415JohnNaN78 Example6:SetIndexColumn mydata01=pd.read_csv("workingfile.csv",index_col='ID') first_namecompanysalary ID 11DavidAon74 12JamieTCS76 13SteveGoogle96 14StevartRBS71 15John.78 Asyoucanseeintheaboveoutput,thecolumnIDhasbeensetasindexcolumn. Example7:ReadCSVFilefromExternalURL YoucandirectlyreaddatafromtheCSVfilethatisstoredonaweblink.Itisveryhandywhenyouneedtoloadpubliclyavailabledatasetsfromgithub,kaggleandotherwebsites. mydata02=pd.read_csv("http://winterolympicsmedals.com/medals.csv") ThisDataFramecontains2311rowsand8columns.Usingmydata02.shape,youcangeneratethissummary. Example8:SkipLast5RowsWhileImportingCSV mydata04=pd.read_csv("http://winterolympicsmedals.com/medals.csv",skip_footer=5) Intheabovecode,weareexcludingbottom5rowsusingskip_footer=parameter. Example9:Readonlyfirst5rows mydata05=pd.read_csv("http://winterolympicsmedals.com/medals.csv",nrows=5) Usingnrows=option,youcanloadtopKnumberofrows. Example10:Interpreting","asthousandsseparator mydata06=pd.read_csv("http://winterolympicsmedals.com/medals.csv",thousands=",") Example11:Readonlyspecificcolumns mydata07=pd.read_csv("http://winterolympicsmedals.com/medals.csv",usecols=[1,5,7]) Theabovecodereadsonlycolumnsbasedonindexpositionswhicharesecond,sixthandeighthposition. Example12:Readsomerowsandcolumns mydata08=pd.read_csv("http://winterolympicsmedals.com/medals.csv",usecols=[1,5,7],nrows=5) Intheabovecommand,wehavecombinedusecols=andnrows=options.Itwillselectonlyfirst5rowsandselectedcolumns. Example13:Readfilewithsemicolondelimiter mydata09=pd.read_csv("file_path",sep=';') Usingsep=parameterinread_csv()function,youcanimportfilewithanydelimiterotherthandefaultcomma.Inthiscase,weareusingsemi-colonasaseparator. Example14:ChangecolumntypewhileimportingCSV Supposeyouwanttochangecolumnformatfromint64tofloat64whileloadingCSVfileintoPython.Wecanusedtype=optionforthesame. mydf=pd.read_csv("workingfile.csv",dtype={"salary":"float64"}) Example15:MeasuretimetakentoimportbigCSVfile Withtheuseofverbose=True,youcancapturetimetakenforTokenization,conversionandParsermemorycleanup. mydf=pd.read_csv("workingfile.csv",verbose=True) Example16:HowtoreadCSVfilewithoutusingPandaspackage ToimportCSVfilewithpurepythonway,youcansubmitthefollowingcommand: importcsv withopen("C:/Users/DELL/Downloads/nycflights.csv")asf: d=DictReader(f) l=list(d) YoucanalsodownloadandloadCSVfilefromURLorexternalwebpage. importcsv importrequests response=requests.get('https://dyurovsky.github.io/psyc201/data/lab2/nycflights.csv').text lines=response.splitlines() d=csv.DictReader(lines) l=list(d) EndNote Aftercompletionofthistutorial,IhopeyougainedconfidenceinimportingCSVfileintoPythonwithwaystocleanandmanagefile.YoucanalsocheckoutthistutorialwhichexplainshowtoimportfilesofdifferentformattoPython.Oncedone,youshouldlearnhowtoperformcommondatamanipulationorwranglingtaskslikefiltering,selectingandrenamingcolumns,identifyandremoveduplicatesetconpandasdataframe. LearnPython:Top50PythonTutorials SpreadtheWord! Share Share Tweet Subscribe RelatedPosts AboutAuthor: DeepanshufoundedListenDatawithasimpleobjective-Makeanalyticseasytounderstandandfollow.Hehasover10yearsofexperienceindatascience.Duringhistenure,hehasworkedwithglobalclientsinvariousdomainslikeBanking,Insurance,PrivateEquity,TelecomandHumanResource. WhileIlovehavingfriendswhoagree,Ionlylearnfromthosewhodon't Let'sGetConnected Email LinkedIn 8Responsesto"15waystoreadCSVfilewithpandas" Unknown29June2019at18:45Good..ReplyDeleteRepliesReplyUnknown1July2019at00:32Useful,thanks.ReplyDeleteRepliesReplyDeepanshuBhalla1July2019at03:34Gladyoufoundituseful.Cheers!ReplyDeleteRepliesReplySudheerRao24July2019at04:44Veryhelpful.ThankyouReplyDeleteRepliesReplyUnknown1December2019at22:20superbresourceThanksalotdepanshuReplyDeleteRepliesReplyUnknown24February2020at05:51AwesomeexamplesReplyDeleteRepliesReplyUnknown26September2021at05:48Goodpostwithveryusefulparameters.Onegreathandypost.ThanksDeepanshu!ReplyDeleteRepliesReplyAnonymous10December2021at05:31Amazing!!Thankyou!!!ReplyDeleteRepliesReplyAddcommentLoadmore... Next→ ←Prev NewerPost OlderPost Home Subscribeto: PostComments(Atom) ContactForm Name Email * Message * GetFreeEmailUpdates FollowusonFacebook ADVERTISEMENT Itlookslikeyouareusinganadblocker! Tocontinuereadingyouneedtoturnoffadblockerandrefreshthepage



請為這篇文章評分?