Reading and Writing CSV Files in Python - Real Python

文章推薦指數: 80 %
投票人數:10人

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python's built-in open() function, which returns a file ... Start Here LearnPython PythonTutorials→In-deptharticlesandvideocourses LearningPaths→Guidedstudyplansforacceleratedlearning Quizzes→Checkyourlearningprogress BrowseTopics→Focusonaspecificareaorskilllevel CommunityChat→LearnwithotherPythonistas OfficeHours→LiveQ&AcallswithPythonexperts Podcast→Hearwhat’snewintheworldofPython Books→Roundoutyourknowledgeandlearnoffline UnlockAllContent→ More PythonLearningResources PythonNewsletter PythonJobBoard MeettheTeam BecomeaTutorialAuthor BecomeaVideoInstructor Search Join Sign‑In ReadingandWritingCSVFilesinPython byJonFincher data-science intermediate python MarkasCompleted Tweet Share Email TableofContents WhatIsaCSVFile? WhereDoCSVFilesComeFrom? ParsingCSVFilesWithPython’sBuilt-inCSVLibrary ReadingCSVFilesWithcsv ReadingCSVFilesIntoaDictionaryWithcsv OptionalPythonCSVreaderParameters WritingCSVFilesWithcsv WritingCSVFileFromaDictionaryWithcsv ParsingCSVFilesWiththepandasLibrary ReadingCSVFilesWithpandas WritingCSVFilesWithpandas Conclusion Removeads WatchNowThistutorialhasarelatedvideocoursecreatedbytheRealPythonteam.Watchittogetherwiththewrittentutorialtodeepenyourunderstanding:ReadingandWritingCSVFiles Let’sfaceit:youneedtogetinformationintoandoutofyourprogramsthroughmorethanjustthekeyboardandconsole.Exchanginginformationthroughtextfilesisacommonwaytoshareinfobetweenprograms.OneofthemostpopularformatsforexchangingdataistheCSVformat.Buthowdoyouuseit? Let’sgetonethingclear:youdon’thaveto(andyouwon’t)buildyourownCSVparserfromscratch.Thereareseveralperfectlyacceptablelibrariesyoucanuse.ThePythoncsvlibrarywillworkformostcases.Ifyourworkrequireslotsofdataornumericalanalysis,thepandaslibraryhasCSVparsingcapabilitiesaswell,whichshouldhandletherest. Inthisarticle,you’lllearnhowtoread,process,andparseCSVfromtextfilesusingPython.You’llseehowCSVfileswork,learntheall-importantcsvlibrarybuiltintoPython,andseehowCSVparsingworksusingthepandaslibrary. Solet’sgetstarted! FreeDownload:GetasamplechapterfromPythonBasics:APracticalIntroductiontoPython3toseehowyoucangofrombeginnertointermediateinPythonwithacompletecurriculum,up-to-dateforPython3.8. TaketheQuiz:Testyourknowledgewithourinteractive“ReadingandWritingCSVFilesinPython”quiz.Uponcompletionyouwillreceiveascoresoyoucantrackyourlearningprogressovertime:TaketheQuiz» WhatIsaCSVFile? ACSVfile(CommaSeparatedValuesfile)isatypeofplaintextfilethatusesspecificstructuringtoarrangetabulardata.Becauseit’saplaintextfile,itcancontainonlyactualtextdata—inotherwords,printableASCIIorUnicodecharacters. ThestructureofaCSVfileisgivenawaybyitsname.Normally,CSVfilesuseacommatoseparateeachspecificdatavalue.Here’swhatthatstructurelookslike: column1name,column2name,column3name firstrowdata1,firstrowdata2,firstrowdata3 secondrowdata1,secondrowdata2,secondrowdata3 ... Noticehoweachpieceofdataisseparatedbyacomma.Normally,thefirstlineidentifieseachpieceofdata—inotherwords,thenameofadatacolumn.Everysubsequentlineafterthatisactualdataandislimitedonlybyfilesizeconstraints. Ingeneral,theseparatorcharacteriscalledadelimiter,andthecommaisnottheonlyoneused.Otherpopulardelimitersincludethetab(\t),colon(:)andsemi-colon(;)characters.ProperlyparsingaCSVfilerequiresustoknowwhichdelimiterisbeingused. RemoveadsWhereDoCSVFilesComeFrom? CSVfilesarenormallycreatedbyprogramsthathandlelargeamountsofdata.Theyareaconvenientwaytoexportdatafromspreadsheetsanddatabasesaswellasimportoruseitinotherprograms.Forexample,youmightexporttheresultsofadataminingprogramtoaCSVfileandthenimportthatintoaspreadsheettoanalyzethedata,generategraphsforapresentation,orprepareareportforpublication. CSVfilesareveryeasytoworkwithprogrammatically.Anylanguagethatsupportstextfileinputandstringmanipulation(likePython)canworkwithCSVfilesdirectly. ParsingCSVFilesWithPython’sBuilt-inCSVLibrary ThecsvlibraryprovidesfunctionalitytobothreadfromandwritetoCSVfiles.DesignedtoworkoutoftheboxwithExcel-generatedCSVfiles,itiseasilyadaptedtoworkwithavarietyofCSVformats.Thecsvlibrarycontainsobjectsandothercodetoread,write,andprocessdatafromandtoCSVfiles. ReadingCSVFilesWithcsv ReadingfromaCSVfileisdoneusingthereaderobject.TheCSVfileisopenedasatextfilewithPython’sbuilt-inopen()function,whichreturnsafileobject.Thisisthenpassedtothereader,whichdoestheheavylifting. Here’stheemployee_birthday.txtfile: name,department,birthdaymonth JohnSmith,Accounting,November EricaMeyers,IT,March Here’scodetoreadit: importcsv withopen('employee_birthday.txt')ascsv_file: csv_reader=csv.reader(csv_file,delimiter=',') line_count=0 forrowincsv_reader: ifline_count==0: print(f'Columnnamesare{",".join(row)}') line_count+=1 else: print(f'\t{row[0]}worksinthe{row[1]}department,andwasbornin{row[2]}.') line_count+=1 print(f'Processed{line_count}lines.') Thisresultsinthefollowingoutput: Columnnamesarename,department,birthdaymonth JohnSmithworksintheAccountingdepartment,andwasborninNovember. EricaMeyersworksintheITdepartment,andwasborninMarch. Processed3lines. EachrowreturnedbythereaderisalistofStringelementscontainingthedatafoundbyremovingthedelimiters.Thefirstrowreturnedcontainsthecolumnnames,whichishandledinaspecialway. ReadingCSVFilesIntoaDictionaryWithcsv RatherthandealwithalistofindividualStringelements,youcanreadCSVdatadirectlyintoadictionary(technically,anOrderedDictionary)aswell. Again,ourinputfile,employee_birthday.txtisasfollows: name,department,birthdaymonth JohnSmith,Accounting,November EricaMeyers,IT,March Here’sthecodetoreaditinasadictionarythistime: importcsv withopen('employee_birthday.txt',mode='r')ascsv_file: csv_reader=csv.DictReader(csv_file) line_count=0 forrowincsv_reader: ifline_count==0: print(f'Columnnamesare{",".join(row)}') line_count+=1 print(f'\t{row["name"]}worksinthe{row["department"]}department,andwasbornin{row["birthdaymonth"]}.') line_count+=1 print(f'Processed{line_count}lines.') Thisresultsinthesameoutputasbefore: Columnnamesarename,department,birthdaymonth JohnSmithworksintheAccountingdepartment,andwasborninNovember. EricaMeyersworksintheITdepartment,andwasborninMarch. Processed3lines. Wheredidthedictionarykeyscomefrom?ThefirstlineoftheCSVfileisassumedtocontainthekeystousetobuildthedictionary.Ifyoudon’thavetheseinyourCSVfile,youshouldspecifyyourownkeysbysettingthefieldnamesoptionalparametertoalistcontainingthem. RemoveadsOptionalPythonCSVreaderParameters ThereaderobjectcanhandledifferentstylesofCSVfilesbyspecifyingadditionalparameters,someofwhichareshownbelow: delimiterspecifiesthecharacterusedtoseparateeachfield.Thedefaultisthecomma(','). quotecharspecifiesthecharacterusedtosurroundfieldsthatcontainthedelimitercharacter.Thedefaultisadoublequote('"'). escapecharspecifiesthecharacterusedtoescapethedelimitercharacter,incasequotesaren’tused.Thedefaultisnoescapecharacter. Theseparametersdeservesomemoreexplanation.Supposeyou’reworkingwiththefollowingemployee_addresses.txtfile: name,address,datejoined johnsmith,1132AnywhereLaneHobokenNJ,07030,Jan4 ericameyers,1234SmithLaneHobokenNJ,07030,March2 ThisCSVfilecontainsthreefields:name,address,anddatejoined,whicharedelimitedbycommas.Theproblemisthatthedatafortheaddressfieldalsocontainsacommatosignifythezipcode. Therearethreedifferentwaystohandlethissituation: Useadifferentdelimiter Thatway,thecommacansafelybeusedinthedataitself.Youusethedelimiteroptionalparametertospecifythenewdelimiter. Wrapthedatainquotes Thespecialnatureofyourchosendelimiterisignoredinquotedstrings.Therefore,youcanspecifythecharacterusedforquotingwiththequotecharoptionalparameter.Aslongasthatcharacteralsodoesn’tappearinthedata,you’refine. Escapethedelimitercharactersinthedata Escapecharactersworkjustastheydoinformatstrings,nullifyingtheinterpretationofthecharacterbeingescaped(inthiscase,thedelimiter).Ifanescapecharacterisused,itmustbespecifiedusingtheescapecharoptionalparameter. WritingCSVFilesWithcsv YoucanalsowritetoaCSVfileusingawriterobjectandthe.write_row()method: importcsv withopen('employee_file.csv',mode='w')asemployee_file: employee_writer=csv.writer(employee_file,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL) employee_writer.writerow(['JohnSmith','Accounting','November']) employee_writer.writerow(['EricaMeyers','IT','March']) Thequotecharoptionalparametertellsthewriterwhichcharactertousetoquotefieldswhenwriting.Whetherquotingisusedornot,however,isdeterminedbythequotingoptionalparameter: Ifquotingissettocsv.QUOTE_MINIMAL,then.writerow()willquotefieldsonlyiftheycontainthedelimiterorthequotechar.Thisisthedefaultcase. Ifquotingissettocsv.QUOTE_ALL,then.writerow()willquoteallfields. Ifquotingissettocsv.QUOTE_NONNUMERIC,then.writerow()willquoteallfieldscontainingtextdataandconvertallnumericfieldstothefloatdatatype. Ifquotingissettocsv.QUOTE_NONE,then.writerow()willescapedelimitersinsteadofquotingthem.Inthiscase,youalsomustprovideavaluefortheescapecharoptionalparameter. Readingthefilebackinplaintextshowsthatthefileiscreatedasfollows: JohnSmith,Accounting,November EricaMeyers,IT,March WritingCSVFileFromaDictionaryWithcsv Sinceyoucanreadourdataintoadictionary,it’sonlyfairthatyoushouldbeabletowriteitoutfromadictionaryaswell: importcsv withopen('employee_file2.csv',mode='w')ascsv_file: fieldnames=['emp_name','dept','birth_month'] writer=csv.DictWriter(csv_file,fieldnames=fieldnames) writer.writeheader() writer.writerow({'emp_name':'JohnSmith','dept':'Accounting','birth_month':'November'}) writer.writerow({'emp_name':'EricaMeyers','dept':'IT','birth_month':'March'}) UnlikeDictReader,thefieldnamesparameterisrequiredwhenwritingadictionary.Thismakessense,whenyouthinkaboutit:withoutalistoffieldnames,theDictWritercan’tknowwhichkeystousetoretrievevaluesfromyourdictionaries.Italsousesthekeysinfieldnamestowriteoutthefirstrowascolumnnames. Thecodeabovegeneratesthefollowingoutputfile: emp_name,dept,birth_month JohnSmith,Accounting,November EricaMeyers,IT,March RemoveadsParsingCSVFilesWiththepandasLibrary Ofcourse,thePythonCSVlibraryisn’ttheonlygameintown.ReadingCSVfilesispossibleinpandasaswell.Itishighlyrecommendedifyouhavealotofdatatoanalyze. pandasisanopen-sourcePythonlibrarythatprovideshighperformancedataanalysistoolsandeasytousedatastructures.pandasisavailableforallPythoninstallations,butitisakeypartoftheAnacondadistributionandworksextremelywellinJupyternotebookstosharedata,code,analysisresults,visualizations,andnarrativetext. InstallingpandasanditsdependenciesinAnacondaiseasilydone: $condainstallpandas Asisusingpip/pipenvforotherPythoninstallations: $pipinstallpandas Wewon’tdelveintothespecificsofhowpandasworksorhowtouseit.Foranin-depthtreatmentonusingpandastoreadandanalyzelargedatasets,checkoutShantnuTiwari’ssuperbarticleonworkingwithlargeExcelfilesinpandas. ReadingCSVFilesWithpandas ToshowsomeofthepowerofpandasCSVcapabilities,I’vecreatedaslightlymorecomplicatedfiletoread,calledhrdata.csv.Itcontainsdataoncompanyemployees: Name,HireDate,Salary,SickDaysremaining GrahamChapman,03/15/14,50000.00,10 JohnCleese,06/01/15,65000.00,8 EricIdle,05/12/14,45000.00,10 TerryJones,11/01/13,70000.00,3 TerryGilliam,08/12/14,48000.00,7 MichaelPalin,05/23/13,66000.00,8 ReadingtheCSVintoapandasDataFrameisquickandstraightforward: importpandas df=pandas.read_csv('hrdata.csv') print(df) That’sit:threelinesofcode,andonlyoneofthemisdoingtheactualwork.pandas.read_csv()opens,analyzes,andreadstheCSVfileprovided,andstoresthedatainaDataFrame.PrintingtheDataFrameresultsinthefollowingoutput: NameHireDateSalarySickDaysremaining 0GrahamChapman03/15/1450000.010 1JohnCleese06/01/1565000.08 2EricIdle05/12/1445000.010 3TerryJones11/01/1370000.03 4TerryGilliam08/12/1448000.07 5MichaelPalin05/23/1366000.08 Hereareafewpointsworthnoting: First,pandasrecognizedthatthefirstlineoftheCSVcontainedcolumnnames,andusedthemautomatically.IcallthisGoodness. However,pandasisalsousingzero-basedintegerindicesintheDataFrame.That’sbecausewedidn’ttellitwhatourindexshouldbe. Further,ifyoulookatthedatatypesofourcolumns,you’llseepandashasproperlyconvertedtheSalaryandSickDaysremainingcolumnstonumbers,buttheHireDatecolumnisstillaString.Thisiseasilyconfirmedininteractivemode: >>>>>>print(type(df['HireDate'][0])) Let’stackletheseissuesoneatatime.TouseadifferentcolumnastheDataFrameindex,addtheindex_coloptionalparameter: importpandas df=pandas.read_csv('hrdata.csv',index_col='Name') print(df) NowtheNamefieldisourDataFrameindex: HireDateSalarySickDaysremaining Name GrahamChapman03/15/1450000.010 JohnCleese06/01/1565000.08 EricIdle05/12/1445000.010 TerryJones11/01/1370000.03 TerryGilliam08/12/1448000.07 MichaelPalin05/23/1366000.08 Next,let’sfixthedatatypeoftheHireDatefield.Youcanforcepandastoreaddataasadatewiththeparse_datesoptionalparameter,whichisdefinedasalistofcolumnnamestotreatasdates: importpandas df=pandas.read_csv('hrdata.csv',index_col='Name',parse_dates=['HireDate']) print(df) Noticethedifferenceintheoutput: HireDateSalarySickDaysremaining Name GrahamChapman2014-03-1550000.010 JohnCleese2015-06-0165000.08 EricIdle2014-05-1245000.010 TerryJones2013-11-0170000.03 TerryGilliam2014-08-1248000.07 MichaelPalin2013-05-2366000.08 Thedateisnowformattedproperly,whichiseasilyconfirmedininteractivemode: >>>>>>print(type(df['HireDate'][0])) IfyourCSVfilesdoesn’thavecolumnnamesinthefirstline,youcanusethenamesoptionalparametertoprovidealistofcolumnnames.Youcanalsousethisifyouwanttooverridethecolumnnamesprovidedinthefirstline.Inthiscase,youmustalsotellpandas.read_csv()toignoreexistingcolumnnamesusingtheheader=0optionalparameter: importpandas df=pandas.read_csv('hrdata.csv', index_col='Employee', parse_dates=['Hired'], header=0, names=['Employee','Hired','Salary','SickDays']) print(df) Noticethat,sincethecolumnnameschanged,thecolumnsspecifiedintheindex_colandparse_datesoptionalparametersmustalsobechanged.Thisnowresultsinthefollowingoutput: HiredSalarySickDays Employee GrahamChapman2014-03-1550000.010 JohnCleese2015-06-0165000.08 EricIdle2014-05-1245000.010 TerryJones2013-11-0170000.03 TerryGilliam2014-08-1248000.07 MichaelPalin2013-05-2366000.08 RemoveadsWritingCSVFilesWithpandas Ofcourse,ifyoucan’tgetyourdataoutofpandasagain,itdoesn’tdoyoumuchgood.WritingaDataFrametoaCSVfileisjustaseasyasreadingonein.Let’swritethedatawiththenewcolumnnamestoanewCSVfile: importpandas df=pandas.read_csv('hrdata.csv', index_col='Employee', parse_dates=['Hired'], header=0, names=['Employee','Hired','Salary','SickDays']) df.to_csv('hrdata_modified.csv') Theonlydifferencebetweenthiscodeandthereadingcodeaboveisthattheprint(df)callwasreplacedwithdf.to_csv(),providingthefilename.ThenewCSVfilelookslikethis: Employee,Hired,Salary,SickDays GrahamChapman,2014-03-15,50000.0,10 JohnCleese,2015-06-01,65000.0,8 EricIdle,2014-05-12,45000.0,10 TerryJones,2013-11-01,70000.0,3 TerryGilliam,2014-08-12,48000.0,7 MichaelPalin,2013-05-23,66000.0,8 Conclusion IfyouunderstandthebasicsofreadingCSVfiles,thenyouwon’teverbecaughtflatfootedwhenyouneedtodealwithimportingdata.MostCSVreading,processing,andwritingtaskscanbeeasilyhandledbythebasiccsvPythonlibrary.Ifyouhavealotofdatatoreadandprocess,thepandaslibraryprovidesquickandeasyCSVhandlingcapabilitiesaswell. TaketheQuiz:Testyourknowledgewithourinteractive“ReadingandWritingCSVFilesinPython”quiz.Uponcompletionyouwillreceiveascoresoyoucantrackyourlearningprogressovertime:TaketheQuiz» Arethereotherwaystoparsetextfiles?Ofcourse!LibrarieslikeANTLR,PLY,andPlyPluscanallhandleheavy-dutyparsing,andifsimpleStringmanipulationwon’twork,therearealwaysregularexpressions. Butthosearetopicsforotherarticles… FreeDownload:GetasamplechapterfromPythonBasics:APracticalIntroductiontoPython3toseehowyoucangofrombeginnertointermediateinPythonwithacompletecurriculum,up-to-dateforPython3.8. MarkasCompleted WatchNowThistutorialhasarelatedvideocoursecreatedbytheRealPythonteam.Watchittogetherwiththewrittentutorialtodeepenyourunderstanding:ReadingandWritingCSVFiles 🐍PythonTricks💌 Getashort&sweetPythonTrickdeliveredtoyourinboxeverycoupleofdays.Nospamever.Unsubscribeanytime.CuratedbytheRealPythonteam. SendMePythonTricks» AboutJonFincher JontaughtPythonandJavaintwohighschoolsinWashingtonState.Previously,hewasaProgramManageratMicrosoft. »MoreaboutJon EachtutorialatRealPythoniscreatedbyateamofdeveloperssothatitmeetsourhighqualitystandards.Theteammemberswhoworkedonthistutorialare: Aldren GeirArne Joanna Jason MasterReal-WorldPythonSkillsWithUnlimitedAccesstoReal Python Joinusandgetaccesstothousandsoftutorials,hands-onvideocourses,andacommunityofexpert Pythonistas: LevelUpYourPythonSkills» MasterReal-WorldPythonSkillsWithUnlimitedAccesstoReal Python Joinusandgetaccesstothousandsoftutorials,hands-onvideocourses,andacommunityofexpertPythonistas: LevelUpYourPythonSkills» WhatDoYouThink? Ratethisarticle: Tweet Share Share Email What’syour#1takeawayorfavoritethingyoulearned?Howareyougoingtoputyournewfoundskillstouse?Leaveacommentbelowandletusknow. CommentingTips:Themostusefulcommentsarethosewrittenwiththegoaloflearningfromorhelpingoutotherstudents.Gettipsforaskinggoodquestionsandgetanswerstocommonquestionsinoursupportportal.Lookingforareal-timeconversation?VisittheRealPythonCommunityChatorjointhenext“Office Hours”LiveQ&ASession.HappyPythoning! KeepLearning RelatedTutorialCategories: data-science intermediate python RecommendedVideoCourse:ReadingandWritingCSVFiles KeepreadingReal Pythonbycreatingafreeaccountorsigning in: Continue» Alreadyhaveanaccount?Sign-In —FREEEmailSeries— 🐍PythonTricks💌 GetPythonTricks» 🔒Nospam.Unsubscribeanytime. AllTutorialTopics advanced api basics best-practices community databases data-science devops django docker flask front-end gamedev gui intermediate machine-learning projects python testing tools web-dev web-scraping TableofContents WhatIsaCSVFile? WhereDoCSVFilesComeFrom? ParsingCSVFilesWithPython’sBuilt-inCSVLibrary ReadingCSVFilesWithcsv ReadingCSVFilesIntoaDictionaryWithcsv OptionalPythonCSVreaderParameters WritingCSVFilesWithcsv WritingCSVFileFromaDictionaryWithcsv ParsingCSVFilesWiththepandasLibrary ReadingCSVFilesWithpandas WritingCSVFilesWithpandas Conclusion MarkasCompleted Tweet Share Email RecommendedVideoCourseReadingandWritingCSVFiles Almostthere!Completethisformandclickthebuttonbelowtogaininstantaccess: × "PythonBasics:APracticalIntroductiontoPython3"–FreeSampleChapter(PDF) SendMySampleChapter» 🔒Nospam.Wetakeyourprivacyseriously.



請為這篇文章評分?