3 (Or More) Ways to Open a CSV in Python
文章推薦指數: 80 %
The idea behind just opening a file and calling readlines() or readline() is that it's simple. With readlines() you will get a list back that ... Blog-LatestNewsYouarehere:Home/3(OrMore)WaystoOpenaCSVinPython Ah.Whataclassic.TheonepieceofcodethatIendupwritingoverandoveragain,youwouldthinkIwouldhavestasheditawaybynow.NotgoingtolieIusuallyhavetoGoogleit,whilethinking,isthistherightway?ShouldIjustopenthecsvfileanditerateit?ShouldIimportthecsvmodule?ShouldIjustusePandas?Doesitmatter?Probablynot. So,letstrythemall.Notthatitmatterswhat’sslower,butsometimesyoudorunacrossthe2.5GBcsvfile,soit’sprobablynotabadideatocheckouttheoptions. Wewillbeusinganopensourcedataset,outstandingstudentloaddebtbystate.AllmycodeandthefilecanbefoundonGitHub.Let’sjustopenthefile,readtherowsandsplitthecolumnsupandcallthatwork. OptionsforworkingwithCSVfilesinPython. Justopenit.Pythonstandardcsvmodule.Pandas. JustOpentheCSVFileAlready. Thefirstoptionisjusttotheopenafilelikeyouwouldanythingelse,andthenreadthelinesoneatattime.Therearesomesuboptionshere. Readallthelinesintolist.readlines()Readonelineatatime.readline()Readintoasinglestring.read()–doesn’tapplyinthissituationforhowwewanttodealwithdata. Theideabehindjustopeningafileandcallingreadlines()orreadline()isthatit’ssimple.Withreadlines()youwillgetalistbackthatcontainsarowforeachlineorrecordinyourcsvfile.Usingreadline()youcanjustgetonelineatatime.Seemslikeyouwouldmaybewanttousereadline()ifyouwantedtokeepmemorydown,butmostofthetimewhocares? Let’scheckoutreadlines()first. fromtimeimporttime defopen_csv_file(file_location:str)->object: withopen(file_location,'r')asf: data=f.readlines() forlineindata: split_line(line) defsplit_line(line:str)->None: column_data=line.split(',') print(column_data) if__name__=='__main__': t1=time() open_csv_file(file_location='PortfoliobyBorrowerLocation-Table1.csv') t2=time() print('Thetotaltimetakenwas{t}seconds'.format(t=str(t2-t1))) Let’stryreadline()next.Wewouldexpectittobeslightlylowerbecauseitjustrequiresatouchmorecode. fromtimeimporttime defopen_csv_file(file_location:str)->object: withopen(file_location,'r')asf: line=True whileline: line=f.readline() split_line(line) defsplit_line(line:str)->None: column_data=line.split(',') print(column_data) if__name__=='__main__': t1=time() open_csv_file(file_location='PortfoliobyBorrowerLocation-Table1.csv') t2=time() print('Thetotaltimetakenwas{t}seconds'.format(t=str(t2-t1))) Iraneachmethod3times,belowyoucantellthatreadlines()isalittlefaster. sousingreadlines()isalittlefaster,nosurprisethere. Importcsv….whatcouldbeeasier? Boththosemethodsseemfairlystraightforward.Let’scheckoutthebuiltincsvmoduleinPython.Thisshouldbeeasiertouseintheorybecausewewon’thavetosplitoutourowncolumnsetc. fromtimeimporttime importcsv defopen_csv_file(file_location:str)->object: withopen(file_location)asf: csv_reader=csv.reader(f) forrowincsv_reader: print(row) if__name__=='__main__': t1=time() open_csv_file(file_location='PortfoliobyBorrowerLocation-Table1.csv') t2=time() print('Thetotaltimetakenwas{t}seconds'.format(t=str(t2-t1))) Interesting,fasterthenreadline()butslightlyslowerthenreadlines()andsplittingcolumnsourselves.Thisisalittlestrangetome,Ijustassumedthatthecsvmoduleofferedmorethenjustconvenience. openingcsvfilesinPython.Performancecomparison. WhocansaycsvandPythoninthesamesentenceandnotthinkofPandas?Ihavemycomplaintsaboutit,butwittheriseofdatascience,it’sheretostay.Ihavetosay,ofalltheoptions,readingacsvfilewithPandasistheeasiesttouseandremember. WhatmakesPandasniceisthattoopenafileintoadataframeallyouhavetodoiscallpandas.read_csv().Alsoasyoucanseecallingiterrows()willallowyoutoeasilyiterateovertherows. fromtimeimporttime importpandas defopen_csv_file(file_location:str)->object: dataframe=pandas.read_csv(file_location) forindex,rowindataframe.iterrows(): print(row['Location'],row['Balance(inbillions)'],row['Borrowers(inthousands)']) if__name__=='__main__': t1=time() open_csv_file(file_location='PortfoliobyBorrowerLocation-Table1.csv') t2=time() print('Thetotaltimetakenwas{t}seconds'.format(t=str(t2-t1))) Ohboy,easytousebyperformancewise,yikes.Saygoodbyetomynicelookingchart!!HaHa! Oh,andyoucan’tforgetthatpieceofjunkDask.Iknowitwasn’treallymadetoreadonecsvfile,butIhavetopokeatitanyways.Ifnothingelsetomakemyselffeelbetter. fromtimeimporttime importdask.dataframeasdd defopen_csv_file(file_location:str)->object: df=dd.read_csv(file_location) forindex,rowindf.iterrows(): print(row['Location'],row['Balance(inbillions)'],row['Borrowers(inthousands)']) if__name__=='__main__': t1=time() open_csv_file(file_location='PortfoliobyBorrowerLocation-Table1.csv') t2=time() print('Thetotaltimetakenwas{t}seconds'.format(t=str(t2-t1))) Bahaha! Nice!It’salwaysfuntogobacktothesimplestuff,loadingcsvfilesmightbeforthebirds,butanydataengineerisprobablygoingtohavetodoitafewtimesayear.Myvoteisforreadlines(),it’sfastandnotthatcomplicated. Iknowsomepeoplemightargueaboutthenuancesofthedifferenttools,andtherearegoodreasonstouseeachoneI’msure.But,Ithinkit’simportanttojustlookatthebasicsofloadinganditeratingaCSVfilewithallthedifferenttools.Mostlybecauseintherealworldwemightjustpicksomethingintheheatofthemoment,especiallyasadataengineer,andthousandsoffileslaterwhenthingsgrow,cometotherealizationtoolchoiceandspeeddidmatterafterall. Ohbytheway,incaseyouwerecuriousandhaveheardalotaboutthestudentloandebacle.Youwillnoticewewereusingadatasetoffederalstudentloansperstate.Hereitis.Classic,waytogoCali. https://www.confessionsofadataguy.com/wp-content/uploads/2019/03/DG_logo450-300x104.png 0 0 Daniel https://www.confessionsofadataguy.com/wp-content/uploads/2019/03/DG_logo450-300x104.png Daniel2019-11-2721:26:022019-11-2721:27:493(OrMore)WaystoOpenaCSVinPython IntroductiontoDataEngineeringEbook...$9.99!! MostPopular IntroductiontoUnitTestingwithPySpark. 14.7kviews HttpxvsRequestsinPython.PerformanceandotherMusings. 13.5kviews Top10DataEngineeringBlogs 11.8kviews AirflowvsDagster 11.4kviews PleaseSubscribeforUpdates! Emailaddress: Leavethisfieldemptyifyou'rehuman:Categories BigData Data DataEngineering DataQuality DataWarehousing Geospatial Golang MachineLearning Python Ramblings Rust Scala SQL Uncategorized Archives October2022 September2022 August2022 July2022 June2022 May2022 April2022 March2022 February2022 January2022 December2021 November2021 October2021 September2021 August2021 July2021 June2021 May2021 April2021 March2021 February2021 January2021 December2020 November2020 October2020 September2020 August2020 July2020 June2020 May2020 April2020 March2020 January2020 December2019 November2019 October2019 September2019 August2019 July2019 May2019 March2019 February2019 January2019 December2018 November2018 October2018 September2018 July2018 June2018 May2018 April2018 March2018 February2018 Interestinglinks Herearesomeinterestinglinksforyou!Enjoyyourstay:) PagesAbout Contact IntroductiontoDataEngineeringEbook Resources Categories BigData Data DataEngineering DataQuality DataWarehousing Geospatial Golang MachineLearning Python Ramblings Rust Scala SQL Uncategorized Archive October2022 September2022 August2022 July2022 June2022 May2022 April2022 March2022 February2022 January2022 December2021 November2021 October2021 September2021 August2021 July2021 June2021 May2021 April2021 March2021 February2021 January2021 December2020 November2020 October2020 September2020 August2020 July2020 June2020 May2020 April2020 March2020 January2020 December2019 November2019 October2019 September2019 August2019 July2019 May2019 March2019 February2019 January2019 December2018 November2018 October2018 September2018 July2018 June2018 May2018 April2018 March2018 February2018 HowSmartEngineersCreateBadSoftwareApproachingSoftwareasaCraft,thenasaEngineer. Scrolltotop
延伸文章資訊
- 1Python: Read a CSV file line by line with or without header
Open the file 'students. · Create a reader object (iterator) by passing file object in csv. · Now...
- 2Python - Read csv file with Pandas without header? - Tutorialspoint
- 3csv — CSV File Reading and Writing — Python 3.10.7 ...
The csv module implements classes to read and write tabular data in CSV format. ... in the dictio...
- 4How to read line from csv file in Python - Adam Smith
- 53 (Or More) Ways to Open a CSV in Python