The truth about Bayesian priors and overfitting

文章推薦指數: 80 %
投票人數:10人

The truth about Bayesian priors and overfitting. Have you ever thought about how strong a prior is compared to observed data? It's not an entirely easy ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceThetruthaboutBayesianpriorsandoverfittingHaveyoueverthoughtabouthowstrongaprioriscomparedtoobserveddata?It’snotanentirelyeasythingtoconceptualize.InordertoalleviatethistroubleIwilltakeyouthroughsomesimulationexercises.Thesearemeantasafruitforthoughtandnotnecessarilyarecommendation.However,manyoftheconsiderationswewillrunthroughwillbedirectlyapplicabletoyoureverydaylifeofapplyingBayesianmethodstoyourspecificdomain.Wewillstartoutbycreatingsomedatageneratedfromaknownprocess.Theprocessisthefollowing.Itfeaturesacyclicprocesswithoneeventrepresentedbythevariabled.Thereisonly1observationofthateventsoitmeansthatmaximumlikelihoodwillalwaysassigneverythingtothisvariablethatcannotbeexplainedbyotherdata.Thisisnotalwayswantedbutthat’sjustlife.Thedataandthemaximumlikelihoodfitlookslikebelow.Thefirstthingyoucannoticeisthatthemaximumlikelihoodoverfitsthedparameterinfrontofby20.2percentsincethetruevalueis5.NowimaginethatwedothistheBayesianwayandfittheparametersofthegeneratingprocessbutnotthefunctionalform.Assuchwewillsamplethebetaparameterswithnopriorswhatsoeverandlookatwhatcomesout.Intheplotbelowyouwillseethetruthwhichisyand3linescorrespondingto3independentsamplesfromthefittedresultingposteriordistribution.PrettysimilartothemaximumlikelihoodexampleexceptthatnowwealsoknowthecredibilityintervalsandallothergoodiesthattheBayesianapproachgivesus.Wecansummarizethisquicklyforthebetaparameters.SowecanseethatwearestilloverfittingeventhoughwehaveaBayesianapproach.Nowtothetopicathand!Howstrongarepriorscomparedtodata?AboutweakpriorsandbeingignorantInordertoanalyzethestrengthofpriorswewillconsistentlysetevermorerestrictivepriorsandseewhathappenstotheresult.Rememberthatthehappysituationisthatweknowthetruth.Wewillstartbybuildingamodellikeshownbelowwhichmeansthatwewillonlyassignpriorstothebetasandnottheintercept.Thusthismodelconformstothethesameprocessasbeforebutwithweakpriorsintroduced.ThepriorsherestatethatthebetaparametersareallGaussiandistributionswithalotofvariancearoundthemmeaningthatwearenotveryconfidentaboutwhatthesevaluesshouldbe.Ifyoulookatthetableabovewherewehadnopriors,whichbasicallyjustmeansthatourpriorswereuniformdistributionsbetweenminusinfinityandinfinity,youcanseethattheinferenceisnotmuchdifferentatall.Onethingtonoteisthatthecredibleintervalhasnotshrunkenwhichmeansthatthemodelsuncertaintyabouteachparametersisaboutthesame.Nowwhyisthat?Wellforstartersinthefirstmodel,evenifwe“believed”thatinfinitywasareasonableguessforeachparameter,thesamplerfoundit’sway.Themeanoftheposteriordistributionsforeachparameterisnearlyidenticalbetweenthemodels.Sothat’sgreat.Twoinfinitelydifferentpriorsresultsinthesameaverageinference.Let’strytoseeatwhatscalethepriorswouldchangetheaverageinference.Seethenewmodeldescriptionhere.Nowwhatdoesthatlooklikeforourinference?Itlookslikethis!Stillnotalotofdifferencesolet’sdoascaleof10reductionagain.Herewecantotallyseeadifference.Lookatthemeanforparameterβ[d]inthetablebelow.Itgoesfrom6.03to4.73whichisachangeof21percent.Nowthisaverageisonly5.4percentdifferentfromthetruth.Butlet’stakeawhiletothinkaboutthis.Whydidthishappen?Thereasonisthatyourknowledgecanbesubstantial.Sometimesalotmoresubstantialthandata.SoyourexperienceaboutthisdomainSHOULDbetakenintoaccountandweightedagainsttheevidence.Nowitisuptoyoutomathematicallystateyourexperiencewhichiswhatwedidinthelastmodel.Beforeyoustarttoarguewithmyreasoningtakealookattheplotswhereweplotthelastpriorvstheposteriorandthepointestimatefromourgeneratingprocess.Asyoucanseethepriorisinthevicinityofthetruevaluebutnotreallycoveringit.Thisisnotnecessarilyabadthingasbeingignorantallowsdatatomoveyouintoinsanedirections.Anexampleofthisisshownintheplotbelowwhereweplotthepriorfrommodelthreeagainsttheposteriorofmodelthree.It’sapparentthatthedatawasallowedtodrivethevaluetoatoohighvaluemeaningthatweareoverfitting.Thisisexactlywhymaximumlikelihoodsuffersfromthecurseofdimensionality.Weshouldn’tbesurprisedbythissinceweliterallytoldthemodelthatavalueupto10isquiteprobable.Wecanformulatealearningfromthis.Theweakeryourpriorsarethemoreyouaresimulatingamaximumlikelihoodsolution.AboutstrongpriorsandbeingoverlyconfidentIfthelastchapterwasaboutstatingyourmindandbeingconfidentinyourknowledgeaboutthedomainthereisalsoadangerinoverstatingthisandbeingoverlyconfident.Toillustratethislet’sdoasmallexamplewherewesaythatthebeta’sswingaround0withastandarddeviationof0.5whichishalfthewidthoftheprevious.Takealookattheparameterestimatesnow.It’squiteapparentthatherewewereoverlyconfidentandtheresultsarenowquiteabitofffromthetruth.However,Iwouldarguethatthisisarathersanepriorstill.Why?Becausewehadnorelationtotheproblemathandandit’sbetterinthissettingtobeabitconservative.Assuchweweresuccessful.Westatedourmindandthe“one”datapointupdateditbyalot.Nowimagineifwewouldhavehadtwo?Assuchmaybeit’snotsobadthatonedatapointwasabletoupdateouropinionquiteabitandmaybeitwasn’tsuchabadideatobeconservativeinthefirstplace?Naturallywhetherornotit’srecommendedtobeconservativeisofcourseuptotheapplicationathand.Foranapplicationdeterminingwhetherasuspectisindeedguiltyofthecrimeinthefaceofevidenceitisperhapsquitenaturaltobeskepticofthe“evidence”meanwhileforapotentialinvestmentitmaypayofftobemoreriskyandacceptahighererrorrateatthehopeofabigwin.ConclusionSowhatdidwelearnfromallofthis?Wellhopefullyyoulearnedthatsettingpriorsisnotsomethingyoulearnover-night.Ittakespracticetogetafeelforit.However,theprinciplesareexceedinglyobvious.Iwillleaveyouwithsomehardcoreadviceonhowtosetpriors.AlwayssetthepriorsinthevicinityofwhatyoubelievethetruthisAlwayssetthepriorssuchthattheyreflectthesameorderofmagnitudeasthephenomenonyou’retryingtopredictDon’tbeoverconfident,leavespacefordoubtNeverusecompletelyuninformativepriorsWheneverpossiblerefrainfromusinguniformdistributionsAlwayssumuptheconsequenceofallofyourpriorssuchthatifnodatawasavailableyourmodelstillpredictsinthesameorderofmagnitudeasyourobservedresponseBecareful,andbehonest!NeverpostulateveryinformativepriorsonresultsyouWANTtobetrue.It’sOKifyouBELIEVEthemtobetrue.Don’trestyourminduntilyouseethedifference.Happyhacking!Originallypublishedatdoktormike.github.io.MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumOlalekanElesinFromPHPBugFixtoCustomerSegmentationDataProject — MyFreelanceStoryAaronTayinAcademiclibrariansandopenaccessCheckingforretractions&otherqualitychecksonyourmanuscriptbeforejournalsubmissions—…BhuvanCWSvsKCRDream11TeamPrediction:14May2021-MLBMartinKeywoodinTowardsDataScienceHowcouldIdosimpleinferenceonmyFine-TunedTransformersNERmodels?PrinceJhainAnalyticsVidhyaTopTenBusinessIntelligenceToolsPhilippTomacinMachineLearningReplyDACHWhyshouldyoucareaboutETLpipelines?AvikarBanikWhatitneedsforsuccessfulimplementationofaDataScienceprojectAlexYeskovinCubeDevSQLWindowFunctionsTutorialforBusinessAnalysisAboutHelpTermsPrivacyGettheMediumappGetstartedMichaelGreen186FollowersAtechnologydrivenartificialintelligenceevangelistandmachinelearningexpert,tryingtodomypartinmovingthisworldforwardthroughscience.FollowMorefromMediumAudhiAprilliantIntroductiontoFuzzyc-meansforClusteringAlgorithmGabrieleOrlandiinTowardsDataScienceTheReasonableEffectivenessofDeepLearningforTimeSeriesForecastingChrisKuo/Dr.DatamaninDatamaninAIHowIsthePartialDependencePlotComputed?SolefromTraininDataVariancestabilizingtransformationsinmachinelearningHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?