Learning Curve to identify Overfitting and Underfitting in ...

2025-01-24

文章推薦指數： 80 %

投票人數：10人

Learning curves plot the training and validation loss of a sample of training examples by incrementally adding new training examples. Learning curves help us in ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceLearningCurvetoidentifyOverfittingandUnderfittinginMachineLearningThisarticlediscussesoverfittingandunderfittinginmachinelearningalongwiththeuseoflearningcurvestoeffectivelyidentifyoverfittingandunderfittinginmachinelearningmodels.ImagebyChrisRiedonUnsplashOverfittingandunderfittingOverfitting(akavariance):Amodelissaidtobeoverfitifitisovertrainedonthedatasuchthat,itevenlearnsthenoisefromit.Anoverfitmodellearnseachandeveryexamplesoperfectlythatitmisclassifiesanunseen/newexample.Foramodelthat’soverfit,wehaveaperfect/closetoperfecttrainingsetscorewhileapoortest/validationscore.Reasonsbehindoverfitting:Usingacomplexmodelforasimpleproblemwhichpicksupthenoisefromthedata.Example:FittinganeuralnetworktotheIrisdataset.Smalldatasets,asthetrainingsetmaynotbearightrepresentationoftheuniverse.Underfitting(akabias):Amodelissaidtobeunderfitifitisunabletolearnthepatternsinthedataproperly.Anunderfitmodeldoesn’tfullylearneachandeveryexampleinthedataset.Insuchcases,weseealowscoreonboththetrainingsetandtest/validationset.Reasonsbehindunderfitting:Usingasimplemodelforacomplexproblemwhichdoesn’tlearnallthepatternsinthedata.Example:UsingalogisticregressionforimageclassificationTheunderlyingdatahasnoinherentpattern.Example,tryingtopredictastudent’smarkswithhisfather’sweight.IntroductiontolearningcurveLearningcurvesplotthetrainingandvalidationlossofasampleoftrainingexamplesbyincrementallyaddingnewtrainingexamples.Learningcurveshelpusinidentifyingwhetheraddingadditionaltrainingexampleswouldimprovethevalidationscore(scoreonunseendata).Ifamodelisoverfit,thenaddingadditionaltrainingexamplesmightimprovethemodelperformanceonunseendata.Similarly,ifamodelisunderfit,thenaddingtrainingexamplesdoesn’thelp.‘learning_curve’methodcanbeimportedfromScikit-Learn’s‘model_selection’moduleasshownbelow.Inthisarticle,we’lluseLogisticRegressiontopredictthe‘species’ofthe‘Irisdata’.We’llcreateafunctionnamed‘learn_curve’thatfitsaLogisticRegressionmodeltotheIrisdataandreturnscrossvalidationscores,trainscoreandlearningcurvedata.LearningcurveofagoodfitmodelWe’llusethe‘learn_curve’functiontogetagoodfitmodelbysettingtheinverseregularizationvariable/parameter‘c’to1(i.e.wearenotperforminganyregularization).ImagebyauthorIntheaboveresults,crossvalidationaccuracyandtrainingaccuracyareclosetoeachother.ImagebyauthorInterpretingthetraininglossLearningcurveofagoodfitmodelhasamoderatelyhightraininglossatthebeginningwhichgraduallydecreasesuponaddingtrainingexamplesandflattensgradually,indicatingadditionofmoretrainingexamplesdoesn’timprovethemodelperformanceontrainingdata.InterpretingthevalidationlossLearningcurveofagoodfitmodelhasahighvalidationlossatthebeginningwhichgraduallydecreasesuponaddingtrainingexamplesandflattensgradually,indicatingadditionofmoretrainingexamplesdoesn’timprovethemodelperformanceonunseendata.Wecanalsoseethatuponaddingareasonablenumberoftrainingexamples,boththetrainingandvalidationlossmovedclosetoeachother.TypicalfeaturesofthelearningcurveofagoodfitmodelTraininglossandValidationlossareclosetoeachotherwithvalidationlossbeingslightlygreaterthanthetrainingloss.Initiallydecreasingtrainingandvalidationlossandaprettyflattrainingandvalidationlossaftersomepointtilltheend.LearningcurveofanoverfitmodelWe’llusethe‘learn_curve’functiontogetanoverfitmodelbysettingtheinverseregularizationvariable/parameter‘c’to10000(highvalueof‘c’causesoverfitting).ImagebyauthorThestandarddeviationofcrossvalidationaccuraciesishighcomparedtounderfitandgoodfitmodel.Trainingaccuracyishigherthancrossvalidationaccuracy,typicaltoanoverfitmodel,butnottoohightodetectoverfitting.Butoverfittingcanbedetectedfromthelearningcurve.ImagebyauthorInterpretingthetraininglossLearningcurveofanoverfitmodelhasaverylowtraininglossatthebeginningwhichgraduallyincreasesveryslightlyuponaddingtrainingexamplesanddoesn’tflatten.InterpretingthevalidationlossLearningcurveofanoverfitmodelhasahighvalidationlossatthebeginningwhichgraduallydecreasesuponaddingtrainingexamplesanddoesn’tflatten,indicatingadditionofmoretrainingexamplescanimprovethemodelperformanceonunseendata.Wecanalsoseethatthetrainingandvalidationlossesarefarawayfromeachother,whichmaycomeclosetoeachotheruponaddingadditionaltrainingdata.TypicalfeaturesofthelearningcurveofanoverfitmodelTraininglossandValidationlossarefarawayfromeachother.Graduallydecreasingvalidationloss(withoutflattening)uponaddingtrainingexamples.Verylowtraininglossthat’sveryslightlyincreasinguponaddingtrainingexamples.LearningcurveofanunderfitmodelWe’llusethe‘learn_curve’functiontogetanunderfitmodelbysettingtheinverseregularizationvariable/parameter‘c’to1/10000(lowvalueof‘c’causesunderfitting).ImagebyauthorThestandarddeviationofcrossvalidationaccuraciesislowcomparedtooverfitandgoodfitmodel.However,underfittingcanbedetectedfromthelearningcurve.ImagebyauthorInterpretingthetraininglossLearningcurveofanunderfitmodelhasalowtraininglossatthebeginningwhichgraduallyincreasesuponaddingtrainingexamplesandsuddenlyfallstoanarbitraryminimumpoint(minimumdoesn’tmeanzeroloss)attheend.Thissuddenfallattheendmaynotalwayshappen.Theimagebelowalsoshowsunderfitting.ImagebyauthorInterpretingthevalidationlossLearningcurveofanunderfitmodelhasahighvalidationlossatthebeginningwhichgraduallylowersuponaddingtrainingexamplesandsuddenlyfallstoanarbitraryminimumattheend(thissuddenfallattheendmaynotalwayshappen,butitmaystayflat),indicatingadditionofmoretrainingexamplescan’timprovethemodelperformanceonunseendata.TypicalfeaturesofthelearningcurveofanunderfitmodelIncreasingtraininglossuponaddingtrainingexamples.Traininglossandvalidationlossareclosetoeachotherattheend.Suddendipinthetraininglossandvalidationlossattheend(notalways).Theaboveillustrationmakesitclearthatlearningcurvesareanefficientwayofidentifyingoverfittingandunderfittingproblems,evenifthecrossvalidationmetricsmayfailtoidentifythem.MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumActZeroinActZero.aiRecall&Precision:NottheWholeStoryonCybersecurityMachineLearningModelsRenaudBauvininCriteoR&DBlogAurélienGéronDeepLearningcrash-course&bonusinterview(part2/3)MonaFaceRecognitiononlivevideofromwebcamVandanaRajanAboutEigenValuesandVectors:Part1JellysmackLabsProjectTopicFinderDharmarajWhatisComputerVision?&ItsApplicationsJudyShihDeepLearning—ConvolutionalNeuralNetworksBasic101ShamaneSiriwardhanaPolicyGradients—PaperNoteAboutHelpTermsPrivacyGettheMediumappGetstartedKSVMuralidhar167FollowersDataScience|ML|Webscraping|Kaggler|Perpetuallearner|Out-of-the-boxThinker|Python|SQL|ExcelVBA|Tableau|LinkedIn:https://bit.ly/2VexKQuFollowMorefromMediumNimaBeheshtiinTowardsDataScienceCrossValidationandGridSearchabhinayarajaraminCodeXBeginnersGuidetoClassificationModels(CatchCreditCardFraud)EashanKaushikinRandomForestTheTrade-OffthatPlaguesallofMachineLearningRaheelHussaininDataDrivenInvestorDataTransformationinMachineLearningPart-IIHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable

請為這篇文章評分？

延伸文章資訊

Learning curve (machine learning) - Wikipedia

In machine learning, a learning curve (or training curve) plots the optimal value of a model's lo...

Learning Curves Tutorial: What Are Learning Curves?

Learning curves are plots used to show a model's performance as the training set size increases. ...

Learning Curve to identify Overfitting and Underfitting in ...

Learning curves plot the training and validation loss of a sample of training examples by increme...

Machine Learning學習日記— Coursera篇(Week 6.2 ... - Medium

大綱. Diagnosing Bias vs. Variance; Regularization and Bias/Variance; Learning Curves; Deciding Wha...

What is a Learning Curve in machine learning? - Stack Overflow

An ROC curve is a graphical depiction of classifier performance that shows the trade-off between ...

Learning Curve to identify Overfitting and Underfitting in ...

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用