Why you should be plotting learning curves in your next ...

2025-01-24

文章推薦指數： 80 %

投票人數：10人

Learning curves show the relationship between training set size and your chosen evaluation metric (e.g. RMSE, accuracy, etc.) on your training and validation ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceGettingStartedWhyyoushouldbeplottinglearningcurvesinyournextmachinelearningprojectSpoiler:theywillhelpyouunderstandwhetheryourmodelsuffersfromhighvarianceorhighbias—andI’llexplainwhatyoucandoaboutitImagebyauthorThebias-variancedilemmaisawidelyknownprobleminthefieldofmachinelearning.Itsimportanceissuch,thatifyoudon’tgetthetrade-offright,itwon’tmatterhowmanyhoursorhowmuchmoneyyouthrowatyourmodel.Intheillustrationabove,youcangetafeelforwhatbiasandvarianceareaswellashowtheycanaffectyourmodelperformance.Thefirstchartshowsamodel(blueline)thatisunderfittingthetrainingdata(redcrosses).Thismodelisbiased,becauseit“assumes”therelationshipbetweentheindependentvariableandthedependentvariableislinearwhenitisnot.Plottingascatterplotofthedataisalwayshelpfulasitwillrevealthetruerelationshipbetweenthevariables—aquadraticfunctionwouldfitthedata“justright”(secondchart).Thethirdchartisaclearexampleofoverfitting.Thehighcomplexityofthemodelallowsittofitthedataveryclosely—tooclosely.Althoughthismodelmightperformreallywellonthetrainingdata,itsperformanceonthetestdata(i.e.dataithasneverseenbefore)willbemuchworse.Inotherwords,thismodelsuffersfromhighvariance,whichmeansthatitwon’tbegoodatmakingpredictionsondataithasneverseenbefore.Becausethemainpointofbuildingamachinelearningmodelistobeabletoaccuratelymakepredictionsonnewdata,youshouldbefocusedonmakingsureitwillgeneralisewelltounseenobservations,ratherthanmaximisingitsperformanceonyourtrainingset.Whatcanyoudoifyourmodelperformanceisnotsogood?Thereareseveralthingsyoucando:GetmoredataTryasmallersetoffeatures(reducemodelcomplexity)Tryadding/creatingmorefeatures(increasemodelcomplexity)Trydecreasingtheregularisationparameterλ(increasemodelcomplexity)Tryincreasingtheregularisationparameterλ(decreasemodelcomplexity)Thequestionnowis:“howdoIknowwhichofthosethingstotryfirst?”.Theansweris:“well,itdepends.”.Anditbasicallydependsonwhetheryourmodelissufferingfromhighbiasorfromhighvariance.Theissuehere,youmightbewondering,is:“ok,somymodelisnotperformingasexpected…buthowdoIknowifithasabiasproblemoravarianceproblem?!”.Learningcurves!LearningcurvesLearningcurvesshowtherelationshipbetweentrainingsetsizeandyourchosenevaluationmetric(e.g.RMSE,accuracy,etc.)onyourtrainingandvalidationsets.Theycanbeanextremelyusefultoolwhendiagnosingyourmodelperformance,astheycantellyouwhetheryourmodelissufferingfrombiasorvariance.ImagebyauthorIfyourlearningcurveslooklikethis,itmeansyourmodelissufferingfromhighbias.Boththetrainingandvalidation(orcross-validation)errorishighanditdoesn’tseemtoimprovewithmoretrainingexamples.Thefactthatyourmodelisperformingsimilarlybadforboththetrainingandvalidationsetssuggeststhatthemodelisunderfittingthedataandthereforehashighbias.ImagebyauthorOntheotherhand,ifyourlearningcurveslooklikethis,yourmodelmighthaveahigh-varianceproblem.Inthischart,thevalidationerrorismuchhigherthanthetrainingerror,whichsuggeststhatyouareoverfittingthedata.Whatcanyoudoifyourmodelperformanceisnotsogood?(pt.II)Cool,soyouhavenowidentifiedwhat’sgoingonwithyourmodelandareinagreatpositiontodecidewhattodonext.Ifyourmodelhashighbias,youshould:Tryadding/creatingmorefeaturesTrydecreasingtheregularisationparameterλThesetwothingswillincreaseyourmodelcomplexityandthereforewillcontributetosolveyourunderfittingproblem.Ifyourmodelhashighvariance,youshould:GetmoredataTryasmallersetoffeaturesTryincreasingtheregularisationparameterλWhenyourmodelisoverfittingthetrainingdata,youcaneithertryreducingitscomplexityorgettingmoredata.Asyoucanseeabove,thelearningcurveschartofahigh-variancemodelsuggeststhat,withenoughdata,thevalidationandtrainingerrorwillendupclosertoeachother.Anintuitiveexplanationforthisisthatifyougiveyourmodelmoredata,thegapbetweenyourmodel’scomplexityandtheunderlyingcomplexityinyourdatawillgetsmallerandsmaller.PythonimplementationandreallifeexampleIwrotethisfunctiontoplotthelearningcurvesofamodel.Feelfreetouseitinyourownwork!IthoughtIwouldendthispostbyshowingyouareal-lifeexampleofalearningcurvesplot,whichwascreatedwiththeabovecode:ImagebyauthorFromtheplot,itisveryclearthatmyrandomforestmodelissufferingfromhighbias,asthetrainingandvalidationcurvesareveryclosetogetherandtheaccuracyisnotgreatataroundthe70%mark.Knowingthishelpedmewhenitcametodecidingwhatmynextstepwasgoingtobeinordertoimprovemymodelperformance.BecauseIhadahigh-biasproblem,Iknewgettingmoretrainingdatawasn’tgoingtohelpbyitself,andthatincreasingthecomplexityofmymodelbyengineeringnewandmorerelevantfeatureswasprobablygoingtodeliverthegreatestimpact.ConclusionNexttimeyouhaveabad-performingmodelinfrontofyou,remembertoplotthelearningcurves,analysethem,andworkoutwhetheryouhaveabiasoravarianceproblem.Knowingthiswillhelpyoudecidewhatyournextstepsshouldbeanditcouldsaveyoucountlessheadachesandhourswastedonworkthatisnotgoingtohelpyourmodel.MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumShubhamGuptaHandleImbalancedDatasetStephenWhiteDataNoirUlysses-PacomeKoudouinDataScienceDemystifiedTheGolfBallTheory — EasyMLNishikantMundokarLinearAlgebraforDataScienceandMachineLearningGeoffLeighinAnalyticsVidhyaCreditRiskandMachineLearningConcepts-3FreemanMakinAnalyticsVidhyaNHLvsNBA:Whydounderdogsdobetterinhockey?QunyquekyaWallisGoogleDataAnalyticsCertificateCapstone:BellabeatCaseStudySabinaLiminUNLEASHLabWhatwedon’tcount,wecan’taccountfor.AboutHelpTermsPrivacyGettheMediumappGetstartedAdriàLuz70FollowersTalesaboutdata,statistics,machinelearning,visualisation,andmuchmore.ByAdriàLuz(@adrialuz)andSaraGaspar(@sargaspar).FollowMorefromMediumMagdalenaKonkiewiczinTowardsDataScienceEvaluatingsearchrelevanceon-demandwithcrowdsourcingTatevKareninTowardsAIEssentialStatisticalTestsForStatisticalSignificanceinMachineLearningKurtisPykesinProjectProHowtoEffectivelyPlanYourFirstMachineLearningProject?ScottBishopinCars.ComTechnologyDealBadgesHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable

請為這篇文章評分？

延伸文章資訊

Machine Learning學習日記— Coursera篇(Week 6.2 ... - Medium

大綱. Diagnosing Bias vs. Variance; Regularization and Bias/Variance; Learning Curves; Deciding Wha...

Learning curve (machine learning) - Wikipedia

In machine learning, a learning curve (or training curve) plots the optimal value of a model's lo...

Learning Curves Tutorial: What Are Learning Curves?

Learning curves are plots used to show a model's performance as the training set size increases. ...

Learning Curve to identify Overfitting and Underfitting in ...

Learning curves plot the training and validation loss of a sample of training examples by increme...

How to use Learning Curves to Diagnose Machine Learning ...

A learning curve is a plot of model learning performance over experience or time. Learning curves...

Why you should be plotting learning curves in your next ...

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用