The truth about Bayesian priors and overfitting

2024-11-25

文章推薦指數： 80 %

投票人數：10人

The truth about Bayesian priors and overfitting. Have you ever thought about how strong a prior is compared to observed data? It's not an entirely easy ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceThetruthaboutBayesianpriorsandoverfittingHaveyoueverthoughtabouthowstrongaprioriscomparedtoobserveddata?It’snotanentirelyeasythingtoconceptualize.InordertoalleviatethistroubleIwilltakeyouthroughsomesimulationexercises.Thesearemeantasafruitforthoughtandnotnecessarilyarecommendation.However,manyoftheconsiderationswewillrunthroughwillbedirectlyapplicabletoyoureverydaylifeofapplyingBayesianmethodstoyourspecificdomain.Wewillstartoutbycreatingsomedatageneratedfromaknownprocess.Theprocessisthefollowing.Itfeaturesacyclicprocesswithoneeventrepresentedbythevariabled.Thereisonly1observationofthateventsoitmeansthatmaximumlikelihoodwillalwaysassigneverythingtothisvariablethatcannotbeexplainedbyotherdata.Thisisnotalwayswantedbutthat’sjustlife.Thedataandthemaximumlikelihoodfitlookslikebelow.Thefirstthingyoucannoticeisthatthemaximumlikelihoodoverfitsthedparameterinfrontofby20.2percentsincethetruevalueis5.NowimaginethatwedothistheBayesianwayandfittheparametersofthegeneratingprocessbutnotthefunctionalform.Assuchwewillsamplethebetaparameterswithnopriorswhatsoeverandlookatwhatcomesout.Intheplotbelowyouwillseethetruthwhichisyand3linescorrespondingto3independentsamplesfromthefittedresultingposteriordistribution.PrettysimilartothemaximumlikelihoodexampleexceptthatnowwealsoknowthecredibilityintervalsandallothergoodiesthattheBayesianapproachgivesus.Wecansummarizethisquicklyforthebetaparameters.SowecanseethatwearestilloverfittingeventhoughwehaveaBayesianapproach.Nowtothetopicathand!Howstrongarepriorscomparedtodata?AboutweakpriorsandbeingignorantInordertoanalyzethestrengthofpriorswewillconsistentlysetevermorerestrictivepriorsandseewhathappenstotheresult.Rememberthatthehappysituationisthatweknowthetruth.Wewillstartbybuildingamodellikeshownbelowwhichmeansthatwewillonlyassignpriorstothebetasandnottheintercept.Thusthismodelconformstothethesameprocessasbeforebutwithweakpriorsintroduced.ThepriorsherestatethatthebetaparametersareallGaussiandistributionswithalotofvariancearoundthemmeaningthatwearenotveryconfidentaboutwhatthesevaluesshouldbe.Ifyoulookatthetableabovewherewehadnopriors,whichbasicallyjustmeansthatourpriorswereuniformdistributionsbetweenminusinfinityandinfinity,youcanseethattheinferenceisnotmuchdifferentatall.Onethingtonoteisthatthecredibleintervalhasnotshrunkenwhichmeansthatthemodelsuncertaintyabouteachparametersisaboutthesame.Nowwhyisthat?Wellforstartersinthefirstmodel,evenifwe“believed”thatinfinitywasareasonableguessforeachparameter,thesamplerfoundit’sway.Themeanoftheposteriordistributionsforeachparameterisnearlyidenticalbetweenthemodels.Sothat’sgreat.Twoinfinitelydifferentpriorsresultsinthesameaverageinference.Let’strytoseeatwhatscalethepriorswouldchangetheaverageinference.Seethenewmodeldescriptionhere.Nowwhatdoesthatlooklikeforourinference?Itlookslikethis!Stillnotalotofdifferencesolet’sdoascaleof10reductionagain.Herewecantotallyseeadifference.Lookatthemeanforparameterβ[d]inthetablebelow.Itgoesfrom6.03to4.73whichisachangeof21percent.Nowthisaverageisonly5.4percentdifferentfromthetruth.Butlet’stakeawhiletothinkaboutthis.Whydidthishappen?Thereasonisthatyourknowledgecanbesubstantial.Sometimesalotmoresubstantialthandata.SoyourexperienceaboutthisdomainSHOULDbetakenintoaccountandweightedagainsttheevidence.Nowitisuptoyoutomathematicallystateyourexperiencewhichiswhatwedidinthelastmodel.Beforeyoustarttoarguewithmyreasoningtakealookattheplotswhereweplotthelastpriorvstheposteriorandthepointestimatefromourgeneratingprocess.Asyoucanseethepriorisinthevicinityofthetruevaluebutnotreallycoveringit.Thisisnotnecessarilyabadthingasbeingignorantallowsdatatomoveyouintoinsanedirections.Anexampleofthisisshownintheplotbelowwhereweplotthepriorfrommodelthreeagainsttheposteriorofmodelthree.It’sapparentthatthedatawasallowedtodrivethevaluetoatoohighvaluemeaningthatweareoverfitting.Thisisexactlywhymaximumlikelihoodsuffersfromthecurseofdimensionality.Weshouldn’tbesurprisedbythissinceweliterallytoldthemodelthatavalueupto10isquiteprobable.Wecanformulatealearningfromthis.Theweakeryourpriorsarethemoreyouaresimulatingamaximumlikelihoodsolution.AboutstrongpriorsandbeingoverlyconfidentIfthelastchapterwasaboutstatingyourmindandbeingconfidentinyourknowledgeaboutthedomainthereisalsoadangerinoverstatingthisandbeingoverlyconfident.Toillustratethislet’sdoasmallexamplewherewesaythatthebeta’sswingaround0withastandarddeviationof0.5whichishalfthewidthoftheprevious.Takealookattheparameterestimatesnow.It’squiteapparentthatherewewereoverlyconfidentandtheresultsarenowquiteabitofffromthetruth.However,Iwouldarguethatthisisarathersanepriorstill.Why?Becausewehadnorelationtotheproblemathandandit’sbetterinthissettingtobeabitconservative.Assuchweweresuccessful.Westatedourmindandthe“one”datapointupdateditbyalot.Nowimagineifwewouldhavehadtwo?Assuchmaybeit’snotsobadthatonedatapointwasabletoupdateouropinionquiteabitandmaybeitwasn’tsuchabadideatobeconservativeinthefirstplace?Naturallywhetherornotit’srecommendedtobeconservativeisofcourseuptotheapplicationathand.Foranapplicationdeterminingwhetherasuspectisindeedguiltyofthecrimeinthefaceofevidenceitisperhapsquitenaturaltobeskepticofthe“evidence”meanwhileforapotentialinvestmentitmaypayofftobemoreriskyandacceptahighererrorrateatthehopeofabigwin.ConclusionSowhatdidwelearnfromallofthis?Wellhopefullyyoulearnedthatsettingpriorsisnotsomethingyoulearnover-night.Ittakespracticetogetafeelforit.However,theprinciplesareexceedinglyobvious.Iwillleaveyouwithsomehardcoreadviceonhowtosetpriors.AlwayssetthepriorsinthevicinityofwhatyoubelievethetruthisAlwayssetthepriorssuchthattheyreflectthesameorderofmagnitudeasthephenomenonyou’retryingtopredictDon’tbeoverconfident,leavespacefordoubtNeverusecompletelyuninformativepriorsWheneverpossiblerefrainfromusinguniformdistributionsAlwayssumuptheconsequenceofallofyourpriorssuchthatifnodatawasavailableyourmodelstillpredictsinthesameorderofmagnitudeasyourobservedresponseBecareful,andbehonest!NeverpostulateveryinformativepriorsonresultsyouWANTtobetrue.It’sOKifyouBELIEVEthemtobetrue.Don’trestyourminduntilyouseethedifference.Happyhacking!Originallypublishedatdoktormike.github.io.MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumOlalekanElesinFromPHPBugFixtoCustomerSegmentationDataProject — MyFreelanceStoryAaronTayinAcademiclibrariansandopenaccessCheckingforretractions&otherqualitychecksonyourmanuscriptbeforejournalsubmissions—…BhuvanCWSvsKCRDream11TeamPrediction:14May2021-MLBMartinKeywoodinTowardsDataScienceHowcouldIdosimpleinferenceonmyFine-TunedTransformersNERmodels?PrinceJhainAnalyticsVidhyaTopTenBusinessIntelligenceToolsPhilippTomacinMachineLearningReplyDACHWhyshouldyoucareaboutETLpipelines?AvikarBanikWhatitneedsforsuccessfulimplementationofaDataScienceprojectAlexYeskovinCubeDevSQLWindowFunctionsTutorialforBusinessAnalysisAboutHelpTermsPrivacyGettheMediumappGetstartedMichaelGreen186FollowersAtechnologydrivenartificialintelligenceevangelistandmachinelearningexpert,tryingtodomypartinmovingthisworldforwardthroughscience.FollowMorefromMediumAudhiAprilliantIntroductiontoFuzzyc-meansforClusteringAlgorithmGabrieleOrlandiinTowardsDataScienceTheReasonableEffectivenessofDeepLearningforTimeSeriesForecastingChrisKuo/Dr.DatamaninDatamaninAIHowIsthePartialDependencePlotComputed?SolefromTraininDataVariancestabilizingtransformationsinmachinelearningHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable

請為這篇文章評分？

延伸文章資訊

Relationship between Bayesian prior, posterior, and data. Prior...

Download scientific diagram | Relationship between Bayesian prior, posterior, and data. Prior kno...

The use of Bayesian priors in Ecology: The good, the bad and ...

Bayesian data analysis (BDA) is a powerful tool for making inference from ecological data, but it...

Prior Probability Definition - Investopedia

Bayes' Theorem

Prior probability - Wikipedia

In Bayesian statistical inference, a prior probability distribution, often simply called the prio...

Priors in Whole-Genome Regression: The Bayesian Alphabet ...

It follows that Bayes B assigns, a priori, the same mean and variance to all marker effects and t...

The truth about Bayesian priors and overfitting

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

中日口譯課程

中國生產力中心口譯評價

紙的應用