Comma Separated Values (CSV) Data Files - of Jay Summet

文章推薦指數: 80 %
投票人數:10人

Of course, the specific python code to accomplish this depends upon what ... When reading CSV files, we typically either use readline or readlines(). Navigation index next| previous| DMSI» CommaSeparatedValues(CSV)DataFiles¶ InthischapteryouwilllearnhowtowriteandreaddatatoandfromCSVfiles usingPython.Wewillfirstreviewbasicfileoutput,andthenmoveonto writingdatainaCSVformatthatcanbeusedbymanyotherprograms.After welearntowritedatatoCSVformat,wewillproceedtotheslightlymore involvedtopicofreadingCSVfiles. BasicFileOutput¶ Writingtexttoafileisathreestepprocess.First,youmustopenafileforwriting(usingthe“w”argument)withtheopenfunction.Afteryouhavecreatedafileobject,youcanwritetoit.Andofcourse,onceyouarefinishedwritingtext,youmustclosethefileobject. 1 2 3 4 5myFileObj=open("filename.txt","w") myFileObj.write("Thistextiswrittentothefile!") myFileObj.write("Somemoretexttobewritten...") myFileObj.write("\n")#Startanewline myFileObj.close() Afewthingstonoteaboutthecodeexampleabove.First,thenameofthevariableusedtorefertothefileobjectischosenbytheuser.Inline1ofthisexampleitisspecifiedtobemyFileObj,butyoucouldaseasilyuseadifferentvariablenamesuchasmyFile.IfyouusedmyFile,youwouldthenusemyFile.write(…)andmyFile.close()whenworkingwiththatfileobject.Second,unliketheprint(…)function,thewrite(…)methodDOESNOTaddanewlinecharacter(\n)tothestringitwritestothefile.Ifyouwantthefiletohavenewlines,youmustmanuallyaddthemasisdoneinline4oftheexample.Third,althoughwedonottakeadvantageofithere,thewrite(…)methodwillreturnanintegerrepresentingtheactualnumberofcharactersthatweresuccessfullywritten.Typicallythisisthelengthofthestringthatyougaveittowrite,unlesssomethinggoeswrong! CSVFileOutput¶ CSVstandsforcommaseparatedvalue(s),andisasimpletextbasedfileformatusedtostoretabulardatainaformatthatiseasyforhumanstoreadaswellasbeingeasytoloadforprograms.HereisanexampleCSVfilethatrecords thenumberofAsSeenonTVproductsafactoryproduceseachday. Downloadasotv.csv 1 2 3 4 5 6Date,ShamWow,CamiLace,InstantSwitch,Flowbee Monday,1232,3221,638,893 Tuesday,1532,2832,543,789 Wednesday,1132,3148,593,827 Thursday,1341,2944,601,832 Friday,1242,1234,621,794 Notethateachlineisadataentry(datarow),andinthisexamplethefirstlineisspecial,inthatithasatextdescriptorofeachcolumn.AlsonotethateventhoughthecolumnsarenotlinedupinthetextCSVfile,itisunderstoodthatthecommasshouldbeusedtoindicatecolumnbreaks,suchthatonFridaythefactoryproduced794flowbees. YoucanexportdatainCSVformatsimplybyputtingacommabetweeneachdataitem,andplacinganewlinencharacterattheendofeachrecord.Ofcourse,thespecificpythoncodetoaccomplishthisdependsuponwhatinternalformatyourdataisstoredin.Hereisanexamplestoredinalist: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28factoryOutput=[["Monday",1232,3221,638,893], ["Tuesday",1532,2832,543,789], ["Wednesday",1132,3148,593,827], ["Thursday",1341,2944,601,832], ["Friday",1242,1234,621,794]] defwriteOutput(filename,output): #Openthefile ourFile=open(filename,"w") #First,writethe'special'headerofcolumnlabels,followedbyanewline ourFile.write("Date,ShamWow,CamiLace,InstantSwitch,Flowbee") ourFile.write("\n") #Second,iteratethroughthedataelements(records)withaforloop forrecordinfactoryOutput: foriteminrecord: itemAsString=str(item) ourFile.write(itemAsString) ourFile.write(",") #Endeachfullrecordwithanewline. ourFile.write("\n") #Third,closethefile. ourFile.close() writeOutput("factoryOutput.csv",factoryOutput) Notethelinesofcodethatdotheactualwriting.Theyconsistofadoublynestedforloop.Line16,thefirstforrecordinfactoryOutputloopgoesthrougheachday’sfactoryoutput,whilethesecondforiteminrecordloop(line17)writesouteachitem.Notethatbecausewecannotdirectlywriteaninteger,wehavetoconverteachitemtoastringbeforewritingit,includingthestringthatrepresentstheday.Line18doesthisconversionforus.Itissafetoconvertastringtoastring,whichhasnorealeffect.Ifwefailedtodothis,pythonwouldgiveusanerrorsuchasthefollowingassoonaswetriedtowriteaninteger. 1 2 3 4 5File"factoryOutput2CSV.py",line27,in writeOutput("factoryOutput.csv",factoryOutput) File"factoryOutput2CSV.py",line19,inwriteOutput ourFile.write(item) TypeError:mustbestr,notint Line20placesacommaaftereachitem.However,ouroutputwillNOTexactlymatchourexample,aswewillhaveacommaafterEVERYdataitem,includingthelast!: 1 2 3 4 5 6Date,ShamWow,CamiLace,InstantSwitch,Flowbee Monday,1232,3221,638,893, Tuesday,1532,2832,543,789, Wednesday,1132,3148,593,827, Thursday,1341,2944,601,832, Friday,1242,1234,621,794, Luckilyforus,mostprogramsthatreadCSVformatteddatawillacceptthis,althoughsomeofthemmayhaveblankornullentriesasthelastitemineachrecord,andsomemaycomplainthatthefirstrow(theheader)doesnothaveasmanyentriesastherestunlessyoualsoplaceanextracommaattheendoftheheader. ItwillbeleftasanexerciseforthereadertowriteCSVfilesthatomit thetrailingcomma. BasicFileInput¶ Openingafileforreadingisverysimilartowriting,theonlythingthatchangesisthatyougivetheopen(…)functionamodeof“r”forreading. aFile=open("fileName.txt","r") Readingfilesisslightlymorecomplicatedthanwriting,becauseyouhavea choiceofthreedifferentmethodsforreadingfromafiledependinguponhow youprefertodealwiththedata.Themethodsare: read()-Thisfunctionwillreturnastringofcharactersthatmayinclude everycharacterinthefile.IfyougivetheRead()methodanoptional integerparameter,itwillonlyreturnthefirstNcharacters,allowingyou toreadanddealwithsmallersectionsofthefilesequentially.Whenyou arefinishedreadingthefile,itwillreturnanemptystring. 1 2 3 4 5 6myFile=open("asotv.csv","r") first10=myFile.read(10) print("Thefirsttencharactersofthefileare:",first10) rest=myFile.read() print("Therestofthefileis:",rest) myFile.close()#Closingthefilewhenyouaredoneisalwaysgoodpractice readline()-Thisfunctionwillreturnastringofcharactersincluding thenewline(\n)characterattheendoftheline.Youcallitrepeatedly toreadeachlineinthefile.Whenyouarefinishedreadingthefile,it willreturnanemptystring.WhenreadingCSVfiles,wetypicallyeitheruse readlineorreadlines(). 1 2 3 4 5aFile=open("asotv.csv","r")#NotethistimeweusedaFileinsteadofmyFile firstLine=aFile.readline() secondLine=aFile.readline() print("Firstlineis:",firstLine,"Secondlineis:",secondLine) aFile.close() readlines()-Thisfunctionwillreturnalistcontainingallofthelines oftextinthefile.YoucangetthesamebehaviorbycallingReadLine() multipletimesandappendingthestringsitreturnsintoalist.Theonly downsidetoReadLines()isthatitwillattempttoreadtheentirefileall atonce,whichisusuallywhatyouwant,unlessthefileisofunmanageable size. f=open("asotv.csv","r") allLines=f.readlines() f.close() print("Filecontents:",allLines)#Printsthelistofstrings Asyoucansee,readingatextfileisnotdifficult.However,wemustdoextraworktoprocesstherawtextintoanicelistofdataitems.Asanexample,thevariableallLinesabovereferencesalistasfollows: ['Date,ShamWow,CamiLace,InstantSwitch,Flowbee\n', 'Monday,1232,3221,638,893\n','Tuesday,1532,2832,543,789\n', 'Wednesday,1132,3148,593,827\n','Thursday,1341,2944,601,832\n', 'Friday,1242,1234,621,794\n','\n'] Youcanseethatthestringsincludealineofdataeach,andhavetrailingnewlinecharacters(\n)tackedontotheend.Weevenhaveoneemptylineattheveryendwhichhasnothinginitotherthananewlinecharacter! CSVFileInput¶ ReadingdatafromCSVfilesisnotadifficultprospectifyoubreakthe problemdownintosmallmanageablepiecesandsolveeachsub-problemindividually. Youneedtodothefollowing: Readthedatafromthefile. Removeanyblanklines. Spliteachlineatcommas,toseparateeachdataitem. Thefollowingfunctionwillreadindatafromafileasalistofstrings. 1 2 3 4 5defreadData(fileName): f=open(fileName,"r") data=f.readlines() f.close() returndata Thisfunctionwillprocessalistofstrings,andremoveanystringsthat havenothingbutwhitespaceinthem. 1 2 3 4 5 6 7 8 9 10 11#Dataisalistofstrings defremoveBlankLines(data): goodLines=[] forthisLineindata: thisLine=thisLine.rstrip() iflen(thisLine)!=0: goodLines.append(thisLine) return(goodLines) Thisfunctionwillsplitlinesupusingacommaasthedelimiter. 1 2 3 4 5 6defsplitLines(data): newLines=[] forlineindata: myList=line.split(",") newLines.append(myList) return(newLines) Finally,youcanuseallofthesefunctionstogethertogetalistthatitselfcontainoneormorelistsofdataelements. rawdata=readData("asotv.csv") newData=removeBlankLines(rawdata) csvData=splitLines(newData) print(csvData) Thisresultsinadatastructurethatlookslikethefollowing: 1 2 3 4 5 6 7 8csvData=[ ['Date','ShamWow','CamiLace','InstantSwitch','Flowbee'], ['Monday','1232','3221','638','893'], ['Tuesday','1532','2832','543','789'], ['Wednesday','1132','3148','593','827'], ['Thursday','1341','2944','601','832'], ['Friday','1242','1234','621','794'] ] Noteafewthingsaboutthisdatastructure: Everyitemisstillastring,eventhestringsofdigitsthatrepresenthowmanyitemswereproducedoneachday. Thefirstrowofdatacontainsthecolumnheaders. Wecanusestandardlistindexingtoaccessanyelementinthisdoublynestedlist.Forexample,thenumberofCamiLace’sproducedonThursdayisaccessedusing: csvData[4][2] Notethatthefirstindex(4)isaccessingtherow,andthesecondindex(2)isaccessingthecolumn,becausewehavestoredourdatainarow-majordatastructure.Alsonotethatindexingstartsfromzero,soindex4isactuallythefifth“row”includingtherowofheaderdata. ReadingCSVfilesusingtheCSVmodule¶ AlthoughitisgoodforyoutounderstandHOWtoreadCSVfiles,thisoperation isdonesofrequentlythatPythonincludesamodulethatwilldotheheavyliftingforyou.Thenameofthemoduleis“csv”.Hereishowyouwoulduseareaderobjectfromthecsvmoduletodothesameprocedure: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15defreadCSV(filename): importcsv lines=[] f=open(filename) csvReader=csv.reader(f,delimiter=",") forrowincsvReader: lines.append(row) returnlines data=readCSV("asotv.csv") print(data) Notethatweusedanamedparameter(delimiter)tospecifywhatcharacterourfileusedasit’sdelimiter.Alsonotethatthecsv.readerconstructorgeneratedanobject(referencedbythecsvReadervariableabove)whichisiterable.Eachtimeyoucallthe.__next()__methodontheobject,itreturnsanewrowfromthefile.[Inthecodeabove,theFORloopdoesthisforus.TheFORloopalsohandlestheStopIterationexceptionthatisraisedwhenthedatainthefilerunsout.]Ifwewantedtogettheheadersseparatelyfromthedata,wecouldmodifytheabovecodeasfollows: 1 2 3 4 5 6 7 8 9data=readCSV("asotv.csv") #separateoutthefirstrow: headers=data[0] deldata[0]#removetherowwejustcopiedout! print(headers) print("\n")#newlinebetweentheheadersandthedata. print(data) RealCSVFiles¶ Inourexampleasotv.csvfileabove,wewereluckyinthatthenumberswereallformattedusingonlydigits.Whatifwehadformattednumberssuchas4,382?Thenthecommainthenumberswouldhavecompletelymessedupoursystemofbreakingapartlinesatcommas! (Numberssuchas$432,232.08areverycommoninREALCSVfiles…) Tohandlethis,mostrealprogramsthatexportdatatoCSVfilesactually“quote”eachdatausingsomeOTHERcharacter,suchasadouble-quotemark.Forexample,alineofdatamayberepresentedasfollows: “234,423.83”,”443,212.83”,”434,324.77” Notethatthisdatahascommasinsidethenumbers,aswellascommasseparatingeachdataitem.Makingourjobharderistheedgeconditions.Eachlinestartswitha”thendataitemsareseparatedwitha“,”andfinallythelastdataitemendswithasingle“. Theuseofdouble-quotesaroundeach“column”ofdataallowstheCSVmoduletoparsethisdata.Theonlyextraworkwehavetodoistospecifythecharacterthatisusedto“quote”thedatawhenwecreatethecsv.reader,asfollows: 1 2 3 4 5importcsv file=open("ourFileName.csv","r") csvReader=csv.reader(file,delimiter=",",quotechar='"') forrowincsvReader: print(row) Notehowweusesinglequotestomakeastringthatcontainsadouble-quote. Handlingtheedgeconditions(starts,middle,endoftheline)isdoneforusautomaticallybythecsvmodule! Insteadofspecifyingthedelimiterandquotechareverytimeyouwanttoopenafile,theCSVfilemoduleallowsyoutospecifya“dialect”ofCSVfiles,whichisawaytoeasilyspecifyadelimiterandquotingcharacter.Commondialectsare“excel”or“excel_tab”.Ifyoudonotspecifyadialect,delimiterorquotechar,thecsvmoduledefaultstousingthestandardexceldialect. Youshouldreadmoreaboutthecsvmoduleatthepython.orgdocumentationsite. WritingCSVfilesusingtheCSVmodule¶ InadditiontoreadingCSVfiles,thecsv.writerobjectmakesiteasytowritetoCSVfiles!Insteadofopeningthefileforreading,weopenitforwriting(“w”)andthencreateacsv.writerobject.Thisobjectallowsustowriteasinglerowatatime,ortowritemultiplerowsallatonce.DuetoaweirdnesswithhowPythonopensfilesonthewindowsplatform,youneedtotelltheopenmethodtonotusenewlines(thenewline=””bitonline2below).ThatnewlinenamedparameterisoptionalonLinuxandMacplatforms,butdoesn’thurt,soIsuggestyouuseitregardlessofwhattypeofcomputeryouaredevelopingyourcodeon,sothatthecodewillworkcorrectlyifitisranonawindowscomputer.Ifyouleaveitout,youwouldgetextrablanklinesbetweeneachlineofdataifyourunthecodeonawindowscomputer. 1 2 3 4 5 6importcsv file=open("outFile.csv","w",newline="") csvWriter=csv.writer(file)#Defaultstotheexceldialect csvWriter.writerow(['item1','item2','item3','4','5','6']) csvWriter.writerow(['1stInSecondRow',"2","22",str(3)]) file.close()#Required,orthedatawon'tgetflushedtothefile! Notethatifyouhavealloftherowsatonce,youcanwritethemall atthesametimebyusingthe“writerows”methodinsteadofthe“writerow”method: 1 2 3 4 5 6importcsv file=open("outFile.csv","w",newline="") csvWriter=csv.writer(file)#Defaultstotheexceldialect csvWriter.writerows([['item1','2','3'],['1stInSecondRow',"2",str(3)]]) file.close()#Required,orthedatawon'tgetflushedtothefile! Notethatwearegivingthewriterowsmethodalistoflists. TableOfContents CommaSeparatedValues(CSV)DataFiles BasicFileOutput CSVFileOutput BasicFileInput CSVFileInput ReadingCSVfilesusingtheCSVmodule RealCSVFiles WritingCSVfilesusingtheCSVmodule Previoustopic IntroductiontoPython3 Nexttopic GraphicalUserInterfaces ThisPage ShowSource Quicksearch Navigation index next| previous| DMSI»



請為這篇文章評分?