Learning Curves in Machine Learning - Baeldung
文章推薦指數: 80 %
A learning curve is just a plot showing the progress over the experience of a specific metric related to learning during the training of a ... StartHereAbout ▼▲ FullArchive Thehighleveloverviewofallthearticlesonthesite. WriteforBaeldung Becomeawriteronthesite. AboutBaeldung AboutBaeldung. AuthorsTopIfyouhaveafewyearsofexperienceinComputerScienceorresearch,andyou’reinterestedinsharingthatexperiencewiththecommunity,havealookatourContributionGuidelines. 1.Overview Inthistutorial,we’llstudywhatarelearningcurvesandwhytheyarenecessaryduringthetrainingprocessofamachinelearningmodel. We’llalsodiscoverdifferenttypesofcurves,whattheyareusedfor,andhowtheyshouldbeinterpretedtomakethemostoutofthelearningprocess. Bytheendofthearticle,we’llhavethetheoreticalandpracticalknowledgerequiredtoavoidcommonproblemsinreal-lifemachinelearningtraining.Ready?Let’sbegin! 2.LearningCurves 2.1.Introduction Contrarytowhatpeopleoftenthink,machinelearningisfarfrombeingfullyautomated.Itrequireslotsof“babysitting”;monitoring,datapreparation,andexperimentation,especiallyifit’sanewproject.Inallthatprocess,learningcurvesplayafundamentalrole. Alearningcurveisjustaplotshowingtheprogressovertheexperienceofaspecificmetricrelatedtolearningduringthetrainingofamachinelearningmodel.Theyarejustamathematicalrepresentationofthelearningprocess. Accordingtothis,we’llhaveameasureoftimeorprogressinthex-axisandameasureoferrororperformanceinthey-axis. Weusethesechartstomonitortheevolutionofourmodelduringlearningsowecandiagnoseproblemsandoptimizethepredictionperformance. 2.2.SingleCurves Themostpopularexampleofalearningcurveislossovertime.Loss(orcost)measuresourmodelerror,or“howbadourmodelisdoing”.So,fornow,thelowerourlossbecomes,thebetterourmodelperformancewillbe. Inthepicturebelow,wecanseetheexpectedbehaviorofthelearningprocess: Despitethefactithasslightupsanddowns,inthelongterm,thelossdecreasesovertime,sothemodelislearning. Otherexamplesofverypopularlearningcurvesareaccuracy,precision,andrecall.Allofthesecapturemodelperformance,sothehighertheyare,thebetterourmodelbecomes. Seebelowanexampleofatypicalaccuracycurveovertime: Themodelperformanceisgrowingovertime,whichmeansthemodelisimprovingwithexperience(it’slearning). Wealsoseeitgrowsatthebeginning,butovertimeitreachesaplateau,meaningit’snotabletolearnanymore. 2.3.MultipleCurves Oneofthemostwidelyusedmetricscombinationsistrainingloss+validationlossovertime. Thetraininglossindicateshowwellthemodelisfittingthetrainingdata,whilethevalidationlossindicateshowwellthemodelfitsnewdata. Wewillseethiscombinationlateron,butfornow,seebelowatypicalplotshowingbothmetrics: Anothercommonpracticeistohavemultiplemetricsinthesamechartaswellasthosemetricsfordifferentmodels. 2.4.TwoMainTypes Weoftenseethesetwotypesoflearningcurvesappearingincharts: OptimizationLearningCurves:Learningcurvescalculatedonthemetricbywhichtheparametersofthemodelarebeingoptimized,suchaslossorMeanSquaredError PerformanceLearningCurves:Learningcurvescalculatedonthemetricbywhichthemodelwillbeevaluatedandselected,suchasaccuracy,precision,recall,orF1score BelowyoucanseeanexampleinMachineTranslationshowingBLEU(aperformancescore)togetherwiththeloss(optimizationscore)fortwodifferentmodels(orangeandgreen): 3.HowtoDetectModelBehavior Wecandetectissuesinthebehaviorofamodelbywatchingtheevolutionofalearningcurve. Next,we’llseeeachofthedifferentscenarioswecanfindformodelbehaviordetection: 3.1.HighBias/Underfitting Let’squicklyreviewwhattheseconceptsare: Bias:Highbiasoccurswhenthelearningalgorithmisnottakingintoaccountalltherelevantinformation,becomingunabletocapturethemodel’srichnessandcomplexity Underfitting:Whenthealgorithmisnotabletomodeleithertrainingdataornewdata,consistentlyobtaininghigherrorvaluesthatdon’tdecreaseovertime Wecanseetheyarecloselytied,asthemorebiasedamodelis,themoreitunderfitsthedata. Let’simagineourdataarethebluedotsbelow,andwewanttocomeupwithalinearmodelforregressionpurposes: Supposewe’reverylazymachinelearningpractitionersandweproposethislineasamodel: Clearly,astraightlinelikethatdoesn’trepresentthepatternofourdots.Itlackssomecomplexitytodescribethenatureofthegivendata.Wecanseehowthebiasedmodeldoesn’ttakeintoaccountrelevantinformation,whichleadstounderfitting. It’sdoingaterriblejobwiththetrainingdataalready,sowhatwouldbetheperformanceforanewexample? It’sprettyobviousitperformsaspoorlywiththenewexampleasitdoeswiththetrainingdata: Now,howcanweuselearningcurvestodetectourmodelisunderfitting?Seeanexampleshowingvalidationandtrainingcost(loss)curves: Thecost(loss)functionishighanddoesn’tdecreasewiththenumberofiterations,bothforthevalidationandtrainingcurves Wecouldactuallyusejustthetrainingcurveandcheckthatthelossishighandthatitdoesn’tdecrease,toseethatit’sunderfitting 3.2.HighVariance/Overfitting Let’sbrieflyreviewthesetwoconcepts: Variance:Highvariancehappenswhenthemodelistoocomplexanddoesn’trepresentthesimplerrealpatternsexistinginthedata Overfitting:Thealgorithmcaptureswellthetrainingdata,butitperformspoorlyonnewdata,soit’snotabletogeneralize Thesearealsodirectlyrelatedconcepts:Thehigherthevarianceofamodel,themoreitoverfitsthetrainingdata. Let’stakethesameexampleasbefore,wherewewantedalinearmodeltoapproximatethesebluedots: Now,ontheotherextreme,let’simaginewe’reveryperfectionistmachinelearningpractitionersandweproposealinearmodelthatperfectlyexplainsourdatalikethis: Well,weunderstandintuitivelythatthislineisnotwhatwewanted,either.Indeed,itfitsthedata,butitdoesn’trepresenttherealpatterninit. Whenanewexampleappears,itwillstruggletomodelit.Seeanewexample(inorange): Usingtheoverfittedmodel,itwon’tpredictwellenoughthenewexample: Howcouldweuselearningcurvestodetectamodelisoverfitting?We’llneedboththevalidationandtraininglosscurves: Thetraininglossgoesdownovertime,achievinglowerrorvalues Thevalidationlossgoesdownuntilaturningpointisfound,andthereitstartsgoingupagain.Thatpointrepresentsthebeginningofoverfitting 3.3.FindingtheRightBias/VarianceTradeoff Thesolutiontothebias/varianceproblemistofindasweetspotbetweenthem. Intheexamplegivenabove: agoodlinearmodelforthedatawouldbealinelikethis: So,whenanewexampleappears: Wewillmakeabetterprediction: Wecanusethevalidationandtraininglosscurvestofindtherightbias/variancetradeoff: Thetrainingprocessshouldbestoppedwhenthevalidationerrortrendchangesfromdescendingtoascending Ifwestoptheprocessbeforethatpoint,themodelwillunderfit Ifwestoptheprocessafterthatpoint,themodelwilloverfit 4.HowtoDetectRepresentativeness 4.1.WhatRepresentativenessMeans Arepresentativedatasetreflectsproportionallystatisticalcharacteristicsinanotherdatasetfromthesamedomain. Wecouldfindthetrainingdatasetisnotrepresentativeinrelationtothevalidationdatasetandvice-versa. 4.2.UnrepresentativeTrainingDataset Thishappenswhenthedataavailableduringtrainingisnotenoughtocapturethemodel,relativetothevalidationdataset. Wecanspotthisissuebyshowingonelosscurvefortrainingandanotherforvalidation: Thetrainandvalidationcurvesareimproving,butthere’sabiggapbetweenthem,whichmeanstheyoperatelikedatasetsfromdifferentdistributions. 4.3.UnrepresentativeValidationDataset Thishappenswhenthevalidationdatasetdoesnotprovideenoughinformationtoevaluatetheabilityofthemodeltogeneralize. Thefirstscenariois: Aswecansee,thetrainingcurvelooksok,butthevalidationfunctionmovesnoisilyaroundthetrainingcurve. Itcouldbethecasethatvalidationdataisscarceandnotveryrepresentativeofthetrainingdata,sothemodelstrugglestomodeltheseexamples. Thesecondscenariois: Herewefindthevalidationlossismuchbetterthanthetrainingone,whichreflectsthevalidationdatasetiseasiertopredictthanthetrainingdataset. Anexplanationcouldbethevalidationdataisscarcebutwidelyrepresentedbythetrainingdataset,sothemodelperformsextremelywellonthesefewexamples. Anyway,thismeansthevalidationdatasetdoesnotrepresentthetrainingdataset,sothereisaproblemwithrepresentativeness. 5.Conclusion Inthistutorial,wereviewedsomebasicconceptsrequiredtounderstandtheconceptsbehindlearningcurvesandhowtousethem. Next,welearnedhowtointerpretlearningcurvesandthewaytheycanbeusedtoavoidcommonlearningproblemssuchasunderfitting,overfitting,orunrepresentativeness. AuthorsBottomIfyouhaveafewyearsofexperienceinComputerScienceorresearch,andyou’reinterestedinsharingthatexperiencewiththecommunity,havealookatourContributionGuidelines. 1Comment Oldest Newest InlineFeedbacks Viewallcomments ViewComments LoadMoreComments Commentsareclosedonthisarticle! wpDiscuzInsert
延伸文章資訊
- 1Why you should be plotting learning curves in your next ...
Learning curves show the relationship between training set size and your chosen evaluation metric...
- 2Learning Curves in Machine Learning - Baeldung
A learning curve is just a plot showing the progress over the experience of a specific metric rel...
- 3Learning curve (machine learning) - Wikipedia
In machine learning, a learning curve (or training curve) plots the optimal value of a model's lo...
- 4What is a Learning Curve in machine learning? - Stack Overflow
An ROC curve is a graphical depiction of classifier performance that shows the trade-off between ...
- 5Machine Learning學習日記— Coursera篇(Week 6.2 ... - Medium
大綱. Diagnosing Bias vs. Variance; Regularization and Bias/Variance; Learning Curves; Deciding Wha...