Of course, the specific python code to accomplish this depends upon what ... When reading CSV files, we typically either use readline or readlines().
Navigation
index
next|
previous|
DMSI»
CommaSeparatedValues(CSV)DataFiles¶
InthischapteryouwilllearnhowtowriteandreaddatatoandfromCSVfiles
usingPython.Wewillfirstreviewbasicfileoutput,andthenmoveonto
writingdatainaCSVformatthatcanbeusedbymanyotherprograms.After
welearntowritedatatoCSVformat,wewillproceedtotheslightlymore
involvedtopicofreadingCSVfiles.
BasicFileOutput¶
Writingtexttoafileisathreestepprocess.First,youmustopenafileforwriting(usingthe“w”argument)withtheopenfunction.Afteryouhavecreatedafileobject,youcanwritetoit.Andofcourse,onceyouarefinishedwritingtext,youmustclosethefileobject.
1
2
3
4
5myFileObj=open("filename.txt","w")
myFileObj.write("Thistextiswrittentothefile!")
myFileObj.write("Somemoretexttobewritten...")
myFileObj.write("\n")#Startanewline
myFileObj.close()
Afewthingstonoteaboutthecodeexampleabove.First,thenameofthevariableusedtorefertothefileobjectischosenbytheuser.Inline1ofthisexampleitisspecifiedtobemyFileObj,butyoucouldaseasilyuseadifferentvariablenamesuchasmyFile.IfyouusedmyFile,youwouldthenusemyFile.write(…)andmyFile.close()whenworkingwiththatfileobject.Second,unliketheprint(…)function,thewrite(…)methodDOESNOTaddanewlinecharacter(\n)tothestringitwritestothefile.Ifyouwantthefiletohavenewlines,youmustmanuallyaddthemasisdoneinline4oftheexample.Third,althoughwedonottakeadvantageofithere,thewrite(…)methodwillreturnanintegerrepresentingtheactualnumberofcharactersthatweresuccessfullywritten.Typicallythisisthelengthofthestringthatyougaveittowrite,unlesssomethinggoeswrong!
CSVFileOutput¶
CSVstandsforcommaseparatedvalue(s),andisasimpletextbasedfileformatusedtostoretabulardatainaformatthatiseasyforhumanstoreadaswellasbeingeasytoloadforprograms.HereisanexampleCSVfilethatrecords
thenumberofAsSeenonTVproductsafactoryproduceseachday.
Downloadasotv.csv
1
2
3
4
5
6Date,ShamWow,CamiLace,InstantSwitch,Flowbee
Monday,1232,3221,638,893
Tuesday,1532,2832,543,789
Wednesday,1132,3148,593,827
Thursday,1341,2944,601,832
Friday,1242,1234,621,794
Notethateachlineisadataentry(datarow),andinthisexamplethefirstlineisspecial,inthatithasatextdescriptorofeachcolumn.AlsonotethateventhoughthecolumnsarenotlinedupinthetextCSVfile,itisunderstoodthatthecommasshouldbeusedtoindicatecolumnbreaks,suchthatonFridaythefactoryproduced794flowbees.
YoucanexportdatainCSVformatsimplybyputtingacommabetweeneachdataitem,andplacinganewlinencharacterattheendofeachrecord.Ofcourse,thespecificpythoncodetoaccomplishthisdependsuponwhatinternalformatyourdataisstoredin.Hereisanexamplestoredinalist:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28factoryOutput=[["Monday",1232,3221,638,893],
["Tuesday",1532,2832,543,789],
["Wednesday",1132,3148,593,827],
["Thursday",1341,2944,601,832],
["Friday",1242,1234,621,794]]
defwriteOutput(filename,output):
#Openthefile
ourFile=open(filename,"w")
#First,writethe'special'headerofcolumnlabels,followedbyanewline
ourFile.write("Date,ShamWow,CamiLace,InstantSwitch,Flowbee")
ourFile.write("\n")
#Second,iteratethroughthedataelements(records)withaforloop
forrecordinfactoryOutput:
foriteminrecord:
itemAsString=str(item)
ourFile.write(itemAsString)
ourFile.write(",")
#Endeachfullrecordwithanewline.
ourFile.write("\n")
#Third,closethefile.
ourFile.close()
writeOutput("factoryOutput.csv",factoryOutput)
Notethelinesofcodethatdotheactualwriting.Theyconsistofadoublynestedforloop.Line16,thefirstforrecordinfactoryOutputloopgoesthrougheachday’sfactoryoutput,whilethesecondforiteminrecordloop(line17)writesouteachitem.Notethatbecausewecannotdirectlywriteaninteger,wehavetoconverteachitemtoastringbeforewritingit,includingthestringthatrepresentstheday.Line18doesthisconversionforus.Itissafetoconvertastringtoastring,whichhasnorealeffect.Ifwefailedtodothis,pythonwouldgiveusanerrorsuchasthefollowingassoonaswetriedtowriteaninteger.
1
2
3
4
5File"factoryOutput2CSV.py",line27,in
writeOutput("factoryOutput.csv",factoryOutput)
File"factoryOutput2CSV.py",line19,inwriteOutput
ourFile.write(item)
TypeError:mustbestr,notint
Line20placesacommaaftereachitem.However,ouroutputwillNOTexactlymatchourexample,aswewillhaveacommaafterEVERYdataitem,includingthelast!:
1
2
3
4
5
6Date,ShamWow,CamiLace,InstantSwitch,Flowbee
Monday,1232,3221,638,893,
Tuesday,1532,2832,543,789,
Wednesday,1132,3148,593,827,
Thursday,1341,2944,601,832,
Friday,1242,1234,621,794,
Luckilyforus,mostprogramsthatreadCSVformatteddatawillacceptthis,althoughsomeofthemmayhaveblankornullentriesasthelastitemineachrecord,andsomemaycomplainthatthefirstrow(theheader)doesnothaveasmanyentriesastherestunlessyoualsoplaceanextracommaattheendoftheheader.
ItwillbeleftasanexerciseforthereadertowriteCSVfilesthatomit
thetrailingcomma.
BasicFileInput¶
Openingafileforreadingisverysimilartowriting,theonlythingthatchangesisthatyougivetheopen(…)functionamodeof“r”forreading.
aFile=open("fileName.txt","r")
Readingfilesisslightlymorecomplicatedthanwriting,becauseyouhavea
choiceofthreedifferentmethodsforreadingfromafiledependinguponhow
youprefertodealwiththedata.Themethodsare:
read()-Thisfunctionwillreturnastringofcharactersthatmayinclude
everycharacterinthefile.IfyougivetheRead()methodanoptional
integerparameter,itwillonlyreturnthefirstNcharacters,allowingyou
toreadanddealwithsmallersectionsofthefilesequentially.Whenyou
arefinishedreadingthefile,itwillreturnanemptystring.
1
2
3
4
5
6myFile=open("asotv.csv","r")
first10=myFile.read(10)
print("Thefirsttencharactersofthefileare:",first10)
rest=myFile.read()
print("Therestofthefileis:",rest)
myFile.close()#Closingthefilewhenyouaredoneisalwaysgoodpractice
readline()-Thisfunctionwillreturnastringofcharactersincluding
thenewline(\n)characterattheendoftheline.Youcallitrepeatedly
toreadeachlineinthefile.Whenyouarefinishedreadingthefile,it
willreturnanemptystring.WhenreadingCSVfiles,wetypicallyeitheruse
readlineorreadlines().
1
2
3
4
5aFile=open("asotv.csv","r")#NotethistimeweusedaFileinsteadofmyFile
firstLine=aFile.readline()
secondLine=aFile.readline()
print("Firstlineis:",firstLine,"Secondlineis:",secondLine)
aFile.close()
readlines()-Thisfunctionwillreturnalistcontainingallofthelines
oftextinthefile.YoucangetthesamebehaviorbycallingReadLine()
multipletimesandappendingthestringsitreturnsintoalist.Theonly
downsidetoReadLines()isthatitwillattempttoreadtheentirefileall
atonce,whichisusuallywhatyouwant,unlessthefileisofunmanageable
size.
f=open("asotv.csv","r")
allLines=f.readlines()
f.close()
print("Filecontents:",allLines)#Printsthelistofstrings
Asyoucansee,readingatextfileisnotdifficult.However,wemustdoextraworktoprocesstherawtextintoanicelistofdataitems.Asanexample,thevariableallLinesabovereferencesalistasfollows:
['Date,ShamWow,CamiLace,InstantSwitch,Flowbee\n',
'Monday,1232,3221,638,893\n','Tuesday,1532,2832,543,789\n',
'Wednesday,1132,3148,593,827\n','Thursday,1341,2944,601,832\n',
'Friday,1242,1234,621,794\n','\n']
Youcanseethatthestringsincludealineofdataeach,andhavetrailingnewlinecharacters(\n)tackedontotheend.Weevenhaveoneemptylineattheveryendwhichhasnothinginitotherthananewlinecharacter!
CSVFileInput¶
ReadingdatafromCSVfilesisnotadifficultprospectifyoubreakthe
problemdownintosmallmanageablepiecesandsolveeachsub-problemindividually.
Youneedtodothefollowing:
Readthedatafromthefile.
Removeanyblanklines.
Spliteachlineatcommas,toseparateeachdataitem.
Thefollowingfunctionwillreadindatafromafileasalistofstrings.
1
2
3
4
5defreadData(fileName):
f=open(fileName,"r")
data=f.readlines()
f.close()
returndata
Thisfunctionwillprocessalistofstrings,andremoveanystringsthat
havenothingbutwhitespaceinthem.
1
2
3
4
5
6
7
8
9
10
11#Dataisalistofstrings
defremoveBlankLines(data):
goodLines=[]
forthisLineindata:
thisLine=thisLine.rstrip()
iflen(thisLine)!=0:
goodLines.append(thisLine)
return(goodLines)
Thisfunctionwillsplitlinesupusingacommaasthedelimiter.
1
2
3
4
5
6defsplitLines(data):
newLines=[]
forlineindata:
myList=line.split(",")
newLines.append(myList)
return(newLines)
Finally,youcanuseallofthesefunctionstogethertogetalistthatitselfcontainoneormorelistsofdataelements.
rawdata=readData("asotv.csv")
newData=removeBlankLines(rawdata)
csvData=splitLines(newData)
print(csvData)
Thisresultsinadatastructurethatlookslikethefollowing:
1
2
3
4
5
6
7
8csvData=[
['Date','ShamWow','CamiLace','InstantSwitch','Flowbee'],
['Monday','1232','3221','638','893'],
['Tuesday','1532','2832','543','789'],
['Wednesday','1132','3148','593','827'],
['Thursday','1341','2944','601','832'],
['Friday','1242','1234','621','794']
]
Noteafewthingsaboutthisdatastructure:
Everyitemisstillastring,eventhestringsofdigitsthatrepresenthowmanyitemswereproducedoneachday.
Thefirstrowofdatacontainsthecolumnheaders.
Wecanusestandardlistindexingtoaccessanyelementinthisdoublynestedlist.Forexample,thenumberofCamiLace’sproducedonThursdayisaccessedusing:
csvData[4][2]
Notethatthefirstindex(4)isaccessingtherow,andthesecondindex(2)isaccessingthecolumn,becausewehavestoredourdatainarow-majordatastructure.Alsonotethatindexingstartsfromzero,soindex4isactuallythefifth“row”includingtherowofheaderdata.
ReadingCSVfilesusingtheCSVmodule¶
AlthoughitisgoodforyoutounderstandHOWtoreadCSVfiles,thisoperation
isdonesofrequentlythatPythonincludesamodulethatwilldotheheavyliftingforyou.Thenameofthemoduleis“csv”.Hereishowyouwoulduseareaderobjectfromthecsvmoduletodothesameprocedure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15defreadCSV(filename):
importcsv
lines=[]
f=open(filename)
csvReader=csv.reader(f,delimiter=",")
forrowincsvReader:
lines.append(row)
returnlines
data=readCSV("asotv.csv")
print(data)
Notethatweusedanamedparameter(delimiter)tospecifywhatcharacterourfileusedasit’sdelimiter.Alsonotethatthecsv.readerconstructorgeneratedanobject(referencedbythecsvReadervariableabove)whichisiterable.Eachtimeyoucallthe.__next()__methodontheobject,itreturnsanewrowfromthefile.[Inthecodeabove,theFORloopdoesthisforus.TheFORloopalsohandlestheStopIterationexceptionthatisraisedwhenthedatainthefilerunsout.]Ifwewantedtogettheheadersseparatelyfromthedata,wecouldmodifytheabovecodeasfollows:
1
2
3
4
5
6
7
8
9data=readCSV("asotv.csv")
#separateoutthefirstrow:
headers=data[0]
deldata[0]#removetherowwejustcopiedout!
print(headers)
print("\n")#newlinebetweentheheadersandthedata.
print(data)
RealCSVFiles¶
Inourexampleasotv.csvfileabove,wewereluckyinthatthenumberswereallformattedusingonlydigits.Whatifwehadformattednumberssuchas4,382?Thenthecommainthenumberswouldhavecompletelymessedupoursystemofbreakingapartlinesatcommas!
(Numberssuchas$432,232.08areverycommoninREALCSVfiles…)
Tohandlethis,mostrealprogramsthatexportdatatoCSVfilesactually“quote”eachdatausingsomeOTHERcharacter,suchasadouble-quotemark.Forexample,alineofdatamayberepresentedasfollows:
“234,423.83”,”443,212.83”,”434,324.77”
Notethatthisdatahascommasinsidethenumbers,aswellascommasseparatingeachdataitem.Makingourjobharderistheedgeconditions.Eachlinestartswitha”thendataitemsareseparatedwitha“,”andfinallythelastdataitemendswithasingle“.
Theuseofdouble-quotesaroundeach“column”ofdataallowstheCSVmoduletoparsethisdata.Theonlyextraworkwehavetodoistospecifythecharacterthatisusedto“quote”thedatawhenwecreatethecsv.reader,asfollows:
1
2
3
4
5importcsv
file=open("ourFileName.csv","r")
csvReader=csv.reader(file,delimiter=",",quotechar='"')
forrowincsvReader:
print(row)
Notehowweusesinglequotestomakeastringthatcontainsadouble-quote.
Handlingtheedgeconditions(starts,middle,endoftheline)isdoneforusautomaticallybythecsvmodule!
Insteadofspecifyingthedelimiterandquotechareverytimeyouwanttoopenafile,theCSVfilemoduleallowsyoutospecifya“dialect”ofCSVfiles,whichisawaytoeasilyspecifyadelimiterandquotingcharacter.Commondialectsare“excel”or“excel_tab”.Ifyoudonotspecifyadialect,delimiterorquotechar,thecsvmoduledefaultstousingthestandardexceldialect.
Youshouldreadmoreaboutthecsvmoduleatthepython.orgdocumentationsite.
WritingCSVfilesusingtheCSVmodule¶
InadditiontoreadingCSVfiles,thecsv.writerobjectmakesiteasytowritetoCSVfiles!Insteadofopeningthefileforreading,weopenitforwriting(“w”)andthencreateacsv.writerobject.Thisobjectallowsustowriteasinglerowatatime,ortowritemultiplerowsallatonce.DuetoaweirdnesswithhowPythonopensfilesonthewindowsplatform,youneedtotelltheopenmethodtonotusenewlines(thenewline=””bitonline2below).ThatnewlinenamedparameterisoptionalonLinuxandMacplatforms,butdoesn’thurt,soIsuggestyouuseitregardlessofwhattypeofcomputeryouaredevelopingyourcodeon,sothatthecodewillworkcorrectlyifitisranonawindowscomputer.Ifyouleaveitout,youwouldgetextrablanklinesbetweeneachlineofdataifyourunthecodeonawindowscomputer.
1
2
3
4
5
6importcsv
file=open("outFile.csv","w",newline="")
csvWriter=csv.writer(file)#Defaultstotheexceldialect
csvWriter.writerow(['item1','item2','item3','4','5','6'])
csvWriter.writerow(['1stInSecondRow',"2","22",str(3)])
file.close()#Required,orthedatawon'tgetflushedtothefile!
Notethatifyouhavealloftherowsatonce,youcanwritethemall
atthesametimebyusingthe“writerows”methodinsteadofthe“writerow”method:
1
2
3
4
5
6importcsv
file=open("outFile.csv","w",newline="")
csvWriter=csv.writer(file)#Defaultstotheexceldialect
csvWriter.writerows([['item1','2','3'],['1stInSecondRow',"2",str(3)]])
file.close()#Required,orthedatawon'tgetflushedtothefile!
Notethatwearegivingthewriterowsmethodalistoflists.
TableOfContents
CommaSeparatedValues(CSV)DataFiles
BasicFileOutput
CSVFileOutput
BasicFileInput
CSVFileInput
ReadingCSVfilesusingtheCSVmodule
RealCSVFiles
WritingCSVfilesusingtheCSVmodule
Previoustopic
IntroductiontoPython3
Nexttopic
GraphicalUserInterfaces
ThisPage
ShowSource
Quicksearch
Navigation
index
next|
previous|
DMSI»