Learning Curves Tutorial: What Are Learning Curves?

2025-01-24

文章推薦指數： 80 %

投票人數：10人

Learning curves are plots used to show a model's performance as the training set size increases. Another way it can be used is to show the ... SkiptomaincontentSignInGetStartedBlogBlogArticlesPodcastTutorialsCheatSheetsCategoryCategoryAboutDataCampLatestnewsaboutourproductsandteamForBusinessCategoryTechnologiesDiscovercontentbytoolsandtechnologyGitPowerBIPythonRProgrammingScalaSpreadsheetsSQLTableauCategoryTopicsDiscovercontentbydatasciencetopicsAIBigDataDataAnalysisDataEngineeringDataLiteracyDataScienceDataVisualizationDeepLearningMachineLearningWorkspaceWriteforusCategorySearch Alookatthebias-variancetradeoff Theanatomyofalearningcurve Usecase:Predictingrealestatevaluations Diagnosinglearningcurves Model1:Decisiontreeregressor Model2:SupportVectorMachine Model3:RandomForestRegressor Machinelearningmodelsareemployedtolearnpatternsindata.Thebestmodelscangeneralizewellwhenfacedwithinstancesthatwerenotpartoftheinitialtrainingdata.Duringtheresearchphase,severalexperimentsareconductedtofindthesolutionthatbestsolvesthebusiness'sproblem,andreducestheerrorbeingmadebythemodel.Anerrormaybedefinedasthedifferencebetweenthepredictionofobservationandthetruevalueoftheobservation. Therearetwomajorcausesforerrorsinmachinelearningmodels: Biasdescribesamodelwhichmakessimplifiedassumptionssothetargetfunctioniseasiertoapproximate;amodelmaylearnthatevery5'9maleintheworldwearsasizemediumtop-thisisclearlybiased. Variancedescribesthevariabilityinthemodelprediction;howmuchthepredictionofthemodelchangeswhenwechangethedatausedtotrainit. Toattainamoreaccuratesolution,weseektoreducetheamountofbiasandvariancepresentinourmodel.Thisisnotastraightforwardtask.Biasandvarianceareatoddswitheachother-reducingonewillincreasetheotherbecauseofaconceptknownasthebias-variancetradeoff. Inthisarticleyou'lllearn: Howtodetectwhetheramodelsuffersfromhighbiasorhighvariance Howtodiagnoseamodelsufferingfromeithersymptom Howtobuildagood-fitmodel Beforewegetintodetectingtheerrorsymptoms,let'sfirstgointomoredepthwiththebias-variancetradeoff. Alookatthebias-variancetradeoff Allsupervisedlearningalgorithmsstrivetoachievethesameobjective:estimatingthemappingfunction(f_hat)foratargetvariable(y)givensomeinputdata(X).Werefertothefunctionthatamachinelearningmodelaimstoapproximateasthetargetfunction. Changingtheinputdatausedtoapproximatethetargetvariablewilllikelyresultinadifferenttargetfunction,whichmayimpacttheoutputspredictedbythemodel.Howmuchourtargetfunctionvariesasthetrainingdataischangedisknownasthevariance.Wedon'twantourmodeltohavehighvariancebecausewhileouralgorithmmayperformflawlesslyduringtraining,itfailstogeneralizetounseeninstances. [Source:Wikipedia] Intheaboveimage,theapproximatedtargetfunctionisthegreenlineandthelineofbestfitisinblack.Noticehowwellthemodellearnsthetrainingdatawiththegreenline.Itdoesitsbesttoensureallredandblueobservationsareseparated.Ifwetrainedthismodelonnewobservations,itwouldlearnanentirelynewtargetfunctionandattempttoenactthesamebehavior. Considerascenarioinwhichweusealinearmethodlikelinearregressiontoapproximatethetargetfunction.Thefirstthingtonoteaboutlinearregressionisthatitassumesalinearrelationshipbetweentheinputdataandthetargetwearetryingtopredict.Eventsintherealworldarealotmorecomplex.Atthecostofsomeflexibility,thissimpleassumptionmakesthetargetfunctionmuchquickertolearnandeasiertounderstand.WerefertothisparadigmasBias. [Source:Wikipedia] Intheimageabove,theredlinerepresentsthelearnedtargetfunction.Manyoftheobservationsfallfarawayfromthevaluespredictedbythemodel. Wecanreducethebiasinamodelbymakingitmoreflexible,butthisintroducesvariance.Ontheflipside,wecanreducethevarianceofamodelbysimplifyingit,butthisisintroducingbias.There'snowaytoescapethisrelationship.Thebestalternativeistochooseamodelandconfigureitsuchthatitstrikesabalanceinthetradeoffbetweenbiasandvariance. [Source:Wikipedia] Duetounknownfactorsinfluencingthetargetfunction,therewillalwaysbesomeerrorpresentinthemodel,knownastheirreducibleerror.ThismaybeobservedintheimageabovebynotingtheamountoferrorthatoccursunderthelowestpointoftheTotalErrorplot.Tobuildtheidealmodel,wemustfindabalancebetweenbiasandvariancesuchthatthetotalerrorisminimized.ThisisillustratedwiththedottedlinecalledOptimumModelComplexity. Let'sexpandonbiasandvarianceusinglearningcurves. Theanatomyofalearningcurve Learningcurvesareplotsusedtoshowamodel'sperformanceasthetrainingsetsizeincreases.Anotherwayitcanbeusedistoshowthemodel'sperformanceoveradefinedperiodoftime.Wetypicallyusedthemtodiagnosealgorithmsthatlearnincrementallyfromdata.Itworksbyevaluatingamodelonthetrainingandvalidationdatasets,thenplottingthemeasuredperformance. Forexample,imaginewe'vemodeledtherelationshipbetweensomeinputsandoutputsusingamachinelearningalgorithm.Westartoffbytrainingthemodelononeinstanceandvalidatingagainstone-hundredinstances.Whatdoyouthinkwillhappen?Ifyousaidthemodelwilllearnthetrainingdataperfectlythenyou'recorrect-therewouldbenoerrors. It'snothardtomodeltherelationshipofoneinputtooutput;allyouhavetodoisrememberthatrelationship.Thedifficultpartwouldbetryingtomakeaccuratepredictionswhenpresentedwithnew,unseeninstances.Sinceourmodellearnedthetrainingdatasowell,itwouldhaveaterribletimetryingtogeneralizetodatait'snotseenbefore.Themodelwillperformpoorlyonourvalidationdataasaresult.Thiswouldmeantherewouldbealargedifferencebetweentheperformanceofourmodelonthetrainingdataandvalidationdata.Wecallthisdifferencethegeneralizationerror. Ifouralgorithmisgoingtostandachanceofmakingbetterpredictionsonthevalidationdataset,weneedtoaddmoredata.Introducingnewinstancestothetrainingdatawillinevitablychangethetargetfunctionofourmodel.Howthemodelperformsaswegrowthetrainingdatasetcouldbemonitoredandplottedtorevealtheevolutionofthetrainingandvalidationerrorscores. Thismeansthegraphwilldisplaytwodifferentresults: Trainingcurve:Thecurvecalculatedfromthetrainingdata;usedtoinformhowwellamodelislearning. Validationcurve:Thecurvecalculatedfromthevalidationdata;usedtoinformofhowwellthemodelisgeneralizingtounseeninstances. Thesecurvesshowushowwellthemodelisperformingasthedatagrows,hencethenamelearningcurves. Note:Thesameprocessmaybeusedtoinformusofhowourmodellearnsovertime.Insteadofmonitoringhowthemodelisdoingasthedatagetslarger,wemonitorhowwellthemodellearnsovertime.Forexample,youmaydecidetolearnanewlanguage.Yourgraspofthatlanguagecouldbeevaluatedandassignedanumericalscoretoshowhowyou'vefairedoverthecourseof52weeks. You'venowlearnedtheanatomyofalearningcurve;let'sputitintopracticewithareal-worlddatasettogiveyouavisualunderstanding. Usecase:Predictingrealestatevaluations Wewillbeusingthedataset:themarkethistoricaldatasetofrealestatevaluation.ThisdatawascollectedfromSindianDist.,NewTaipei,Taiwanandconsistsofmarkethistoricaldata. Ourtaskistopredicttherealestatevaluationgiventhefollowingfeatures: X1=thetransactiondate(forexample,2013.250=2013March,2013.500=2013June,etc.) X2=thehouseage(unit:year) X3=thedistancetothenearestMRTstation(unit:meter) X4=thenumberofconveniencestoresinthelivingcircleonfoot(integer) X5=thegeographiccoordinate,latitude.(unit:degree) X6=thegeographiccoordinate,longitude.(unit:degree) Thetargetvariableisdefinedas: Y=housepriceofunitarea(10000NewTaiwanDollar/Ping,wherePingisalocalunit,1Ping=3.3meterssquared) Thetargetwearepredictingiscontinuous,thustheproblemisgoingtorequireregressiontechniques. Let'sstartbypeekingatthedata: importpandasaspd data=pd.read_excel("/content/gdrive/MyDrive/real_estate_valuation_data.xlsx") print(data.info()) data.head() >>>> RangeIndex:414entries,0to413 Datacolumns(total8columns): #ColumnNon-NullCountDtype ---------------------------- 0No414non-nullint64 1X1transactiondate414non-nullfloat64 2X2houseage414non-nullfloat64 3X3distancetothenearestMRTstation414non-nullfloat64 4X4numberofconveniencestores414non-nullint64 5X5latitude414non-nullfloat64 6X6longitude414non-nullfloat64 7Yhousepriceofunitarea414non-nullfloat64 dtypes:float64(6),int64(2) memoryusage:26.0KB None NoX1transactiondateX2houseageX3distancetothenearestMRTstationX4numberofconveniencestoresX5latitudeX6longitudeYhousepriceofunitarea 012012.91666732.084.878821024.98298121.5402437.9 122012.91666719.5306.59470924.98034121.5395142.2 232013.58333313.3561.98450524.98746121.5439147.3 342013.50000013.3561.98450524.98746121.5439154.8 452012.8333335.0390.56840524.97937121.5424543.1 There'sanextrafeaturedcalledNowhichwasnotreferencedinthedocumentationofthedata.It'spossibleitreferstoanindex,butforsimplicity'ssakewearegoingtoremoveit.Also,thefeaturenamesdonotreflectwhatwasgiveninthedocumentationsowearegoingtocleanthisup. #renamethecolumns renamed_columns=[col.split()[0]forcolindata.columns] renamed_columns_map={data.columns[i]:renamed_columns[i]foriinrange(len(data.columns))} data.rename(renamed_columns_map,axis=1,inplace=True) #removeNocolumn data.drop("No",axis=1,inplace=True) print(data.head()) #separatefeaturesandtargetdata features,target=data.columns[:-1],data.columns[-1] X=data[features] y=data[target] Thisishowthefinaldatasetlooksbeforewesplitthefeaturesandtargetlabels: X1X2X3X4X5X6Y 02012.91666732.084.878821024.98298121.5402437.9 12012.91666719.5306.59470924.98034121.5395142.2 22013.58333313.3561.98450524.98746121.5439147.3 32013.50000013.3561.98450524.98746121.5439154.8 42012.8333335.0390.56840524.97937121.5424543.1 Todemonstratebias,variance,andgoodfitsolutions,wearegoingtobuildthreemodels:adecisiontreeregressor,asupportvectormachineforregression,andarandomforestregressor.Afterbuildingthemodel,wewillplotlearningcurvesforeachoneandsharesomediagnostictechniques. Diagnosinglearningcurves Learningcurvesareinterpretedbyassessingtheirshape.Oncetheshapeanddynamicshavebeeninterpreted,wecanusethemtodiagnoseanyproblemsinamachinelearningmodel'sbehavior. Thelearning_curve()functioninScikit-learnmakesiteasyforustomonitortrainingandvalidationscores,whichiswhatisrequiredtoplotalearningcurve. Theparameterswepasstothelearning_curve()functionareasfollows: estimator:themodelusedtoapproximatethetargetfunction X:theinputdata y:thetarget cv:thecross-validationsplitstrategy scoring:themetricusedtoevaluatetheperformanceofthemodel train_sizes:theabsolutenumbersoftrainingexamplesthatwillbeusedtogeneratethelearningcurve;thevaluesweareusingarecompletelyrandom. Model1:Decisiontreeregressor Amodelwithhighvarianceissaidtobeoverfit.Itlearnsthetrainingdataandtherandomnoiseextremelywell,thusresultinginamodelthatperformswellonthetrainingdata,butfailstogeneralizetounseeninstances.Weobservesuchbehaviorwhenthealgorithmbeingusedistooflexiblefortheproblembeingsolved,orwhenthemodelistrainedfortoolong. Forexample,thedecisiontreeregressorisanon-linearmachinelearningalgorithm.Non-linearalgorithmstypicallyhavelowbiasandhighvariance.Thissuggeststhatchangestothedatasetwillcauselargevariationstothetargetfunction. Let'sdemonstratehighvariancewithourdecisiontreeregressor: fromsklearn.model_selectionimportlearning_curve fromsklearn.treeimportDecisionTreeRegressor importmatplotlib.pyplotasplt #overfitting decision_tree=DecisionTreeRegressor() train_sizes,train_scores,test_scores=learning_curve( estimator=decision_tree, X=X, y=y, cv=5, scoring="neg_root_mean_squared_error", train_sizes=[1,75,165,270,331] ) train_mean=-train_scores.mean(axis=1) test_mean=-test_scores.mean(axis=1) plt.subplots(figsize=(10,8)) plt.plot(train_sizes,train_mean,label="train") plt.plot(train_sizes,test_mean,label="validation") plt.title("LearningCurve") plt.xlabel("TrainingSetSize") plt.ylabel("RMSE") plt.legend(loc="best") plt.show() Themodelmakesveryfewmistakeswhenit'srequiredtopredictinstancesit'sseenduringtraining,butperformsterriblyonnewinstancesithasn'tbeenexposedto.Youcanobservethisbehaviorbynoticinghowlargethegeneralizationerrorisbetweenthetrainingcurveandthevalidationcurve.Asolutiontoimprovethisbehaviormaybetoaddmoreinstancestoourtrainingdatasetwhichintroducesbias.Anothersolutionmaybetoaddregularizationtothemodel(i.e.restrictingthetreefromgrowingtoitsfulldepth). Model2:SupportVectorMachine Amodelwithhighbiasissaidtobeunderfit.Itmakessimplisticassumptionsaboutthetrainingdata,whichmakesitdifficulttolearntheunderlyingpatterns.Thisresultsinamodelthathashigherroronthetrainingandvalidationdatasets.Wecanobservesuchbehaviorwhenthemodelbeingusedistoosimplefortheproblembeingsolved,orwhenthemodelisnotbeingtrainedforlongenough. Forexample,thesupportvectormachineisalinearmachinelearningalgorithm.Linearalgorithmstypicallyhavehighbiasandlowvariance.Thissuggeststhatmoreassumptionsaremadeabouttheformofthetargetfunction.Tointroducemorebiasintoourmodel,we'veaddedregularizationbysettingtheCparameterinourmodel. Let'sdemonstratehighbiaswithoursupportvectormachine: fromsklearn.svmimportSVR fromsklearn.preprocessingimportStandardScaler #Underfitting scaler=StandardScaler() X_scaled=scaler.fit_transform(X) svm=SVR(C=0.25) train_sizes,train_scores,test_scores=learning_curve( estimator=svm, X=X_scaled, y=y, cv=5, scoring="neg_root_mean_squared_error", train_sizes=[1,75,150,270,331] ) train_mean=-train_scores.mean(axis=1) test_mean=-test_scores.mean(axis=1) plt.subplots(figsize=(10,8)) plt.plot(train_sizes,train_mean,label="train") plt.plot(train_sizes,test_mean,label="validation") plt.title("LearningCurve") plt.xlabel("TrainingSetSize") plt.ylabel("RMSE") plt.legend(loc="best") plt.show() Thegeneralizationgapforthetrainingandvalidationcurvebecomesextremelysmallasthetrainingdatasetsizeincreases.Thisindicatesthataddingmoreexamplestoourmodelisnotgoingtoimproveitsperformance.Asolutiontothisproblemmaybetocreatemorefeaturesortomakethemodelmoreflexibletoreducethenumberofassumptionsbeingmade. Model3:RandomForestRegressor Agoodfitmodelexistsinthegrayareabetweenanunderfitandoverfitmodel.Themodelmaynotbeasgoodonthetrainingdataasitisintheoverfitinstance,butitwillmakefarfewererrorswhenfacedwithunseeninstances.Thisbehaviourcanbeobservedwhenthetrainingerrorrises,butonlytothepointofstability,asthevalidationerrordecreasestothepointofstability Todemonstratethiswearegoingtousearandomforestwhichisanensembleofdecisiontrees.Thismeansthemodelisalsonon-linear,butbiasisaddedtothemodelbycreatingseveraldiversemodelsandcombiningtheirpredictions. We'vealsoaddedmoreregularizationbysettingthemax_depth,whichcontrolsthemaximumdepthofeachtree,toavalueofthree. Let'sseehowthislooksincode: fromsklearn.ensembleimportRandomForestRegressor #better random_forest=RandomForestRegressor(max_depth=3) train_sizes,train_scores,test_scores=learning_curve( estimator=random_forest, X=X, y=y, cv=5, scoring="neg_root_mean_squared_error", train_sizes=[1,75,150,270,331] ) train_mean=-train_scores.mean(axis=1) test_mean=-test_scores.mean(axis=1) plt.subplots(figsize=(10,8)) plt.plot(train_sizes,train_mean,label="train") plt.plot(train_sizes,test_mean,label="validation") plt.title("LearningCurve") plt.xlabel("TrainingSetSize") plt.ylabel("RMSE") plt.legend(loc="best") plt.show() Nowyoucanseewe'vereducedtheerrorinthevalidationdata.Itcameatthecostofweakenedperformanceonthetrainingdata,butoverallit'sabettermodel. Thegeneralizationerrorismuchsmaller,withalownumberoferrorsbeingmade.Also,bothcurvesarestablebeyonda250trainingsetsize,whichimpliesthataddingmoreinstancesmaynotimprovethismodelmuchfurther. Insummary,amodel'sbehaviorcanbeobservedusinglearningcurves.Theidealscenariowhenbuildingmachinelearningmodelsistokeeptheerroraslowaspossible.Twofactorsthatresultinhigherrorarebiasandvariance,andbeingabletostrikeabalanceofbothwillresultinabetter-performingmodel.Postedin:DataScienceShareon:LinkedInLinkedInFacebookFacebookTwitterTwitterCopyCopylink←Backtotutorial

請為這篇文章評分？

延伸文章資訊

How to use Learning Curves to Diagnose Machine Learning ...

A learning curve is a plot of model learning performance over experience or time. Learning curves...

Why you should be plotting learning curves in your next ...

Learning curves show the relationship between training set size and your chosen evaluation metric...

Learning curve (machine learning) - Wikipedia

In machine learning, a learning curve (or training curve) plots the optimal value of a model's lo...

Learning Curves in Machine Learning - SpringerLink

A learning curve shows a measure of predictive performance on a given domain as a function of som...

What is a Learning Curve in machine learning? - Stack Overflow

An ROC curve is a graphical depiction of classifier performance that shows the trade-off between ...

Learning Curves Tutorial: What Are Learning Curves?

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用