The truth about Bayesian priors and overfitting
文章推薦指數: 80 %
The truth about Bayesian priors and overfitting. Have you ever thought about how strong a prior is compared to observed data? It's not an entirely easy ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceThetruthaboutBayesianpriorsandoverfittingHaveyoueverthoughtabouthowstrongaprioriscomparedtoobserveddata?It’snotanentirelyeasythingtoconceptualize.InordertoalleviatethistroubleIwilltakeyouthroughsomesimulationexercises.Thesearemeantasafruitforthoughtandnotnecessarilyarecommendation.However,manyoftheconsiderationswewillrunthroughwillbedirectlyapplicabletoyoureverydaylifeofapplyingBayesianmethodstoyourspecificdomain.Wewillstartoutbycreatingsomedatageneratedfromaknownprocess.Theprocessisthefollowing.Itfeaturesacyclicprocesswithoneeventrepresentedbythevariabled.Thereisonly1observationofthateventsoitmeansthatmaximumlikelihoodwillalwaysassigneverythingtothisvariablethatcannotbeexplainedbyotherdata.Thisisnotalwayswantedbutthat’sjustlife.Thedataandthemaximumlikelihoodfitlookslikebelow.Thefirstthingyoucannoticeisthatthemaximumlikelihoodoverfitsthedparameterinfrontofby20.2percentsincethetruevalueis5.NowimaginethatwedothistheBayesianwayandfittheparametersofthegeneratingprocessbutnotthefunctionalform.Assuchwewillsamplethebetaparameterswithnopriorswhatsoeverandlookatwhatcomesout.Intheplotbelowyouwillseethetruthwhichisyand3linescorrespondingto3independentsamplesfromthefittedresultingposteriordistribution.PrettysimilartothemaximumlikelihoodexampleexceptthatnowwealsoknowthecredibilityintervalsandallothergoodiesthattheBayesianapproachgivesus.Wecansummarizethisquicklyforthebetaparameters.SowecanseethatwearestilloverfittingeventhoughwehaveaBayesianapproach.Nowtothetopicathand!Howstrongarepriorscomparedtodata?AboutweakpriorsandbeingignorantInordertoanalyzethestrengthofpriorswewillconsistentlysetevermorerestrictivepriorsandseewhathappenstotheresult.Rememberthatthehappysituationisthatweknowthetruth.Wewillstartbybuildingamodellikeshownbelowwhichmeansthatwewillonlyassignpriorstothebetasandnottheintercept.Thusthismodelconformstothethesameprocessasbeforebutwithweakpriorsintroduced.ThepriorsherestatethatthebetaparametersareallGaussiandistributionswithalotofvariancearoundthemmeaningthatwearenotveryconfidentaboutwhatthesevaluesshouldbe.Ifyoulookatthetableabovewherewehadnopriors,whichbasicallyjustmeansthatourpriorswereuniformdistributionsbetweenminusinfinityandinfinity,youcanseethattheinferenceisnotmuchdifferentatall.Onethingtonoteisthatthecredibleintervalhasnotshrunkenwhichmeansthatthemodelsuncertaintyabouteachparametersisaboutthesame.Nowwhyisthat?Wellforstartersinthefirstmodel,evenifwe“believed”thatinfinitywasareasonableguessforeachparameter,thesamplerfoundit’sway.Themeanoftheposteriordistributionsforeachparameterisnearlyidenticalbetweenthemodels.Sothat’sgreat.Twoinfinitelydifferentpriorsresultsinthesameaverageinference.Let’strytoseeatwhatscalethepriorswouldchangetheaverageinference.Seethenewmodeldescriptionhere.Nowwhatdoesthatlooklikeforourinference?Itlookslikethis!Stillnotalotofdifferencesolet’sdoascaleof10reductionagain.Herewecantotallyseeadifference.Lookatthemeanforparameterβ[d]inthetablebelow.Itgoesfrom6.03to4.73whichisachangeof21percent.Nowthisaverageisonly5.4percentdifferentfromthetruth.Butlet’stakeawhiletothinkaboutthis.Whydidthishappen?Thereasonisthatyourknowledgecanbesubstantial.Sometimesalotmoresubstantialthandata.SoyourexperienceaboutthisdomainSHOULDbetakenintoaccountandweightedagainsttheevidence.Nowitisuptoyoutomathematicallystateyourexperiencewhichiswhatwedidinthelastmodel.Beforeyoustarttoarguewithmyreasoningtakealookattheplotswhereweplotthelastpriorvstheposteriorandthepointestimatefromourgeneratingprocess.Asyoucanseethepriorisinthevicinityofthetruevaluebutnotreallycoveringit.Thisisnotnecessarilyabadthingasbeingignorantallowsdatatomoveyouintoinsanedirections.Anexampleofthisisshownintheplotbelowwhereweplotthepriorfrommodelthreeagainsttheposteriorofmodelthree.It’sapparentthatthedatawasallowedtodrivethevaluetoatoohighvaluemeaningthatweareoverfitting.Thisisexactlywhymaximumlikelihoodsuffersfromthecurseofdimensionality.Weshouldn’tbesurprisedbythissinceweliterallytoldthemodelthatavalueupto10isquiteprobable.Wecanformulatealearningfromthis.Theweakeryourpriorsarethemoreyouaresimulatingamaximumlikelihoodsolution.AboutstrongpriorsandbeingoverlyconfidentIfthelastchapterwasaboutstatingyourmindandbeingconfidentinyourknowledgeaboutthedomainthereisalsoadangerinoverstatingthisandbeingoverlyconfident.Toillustratethislet’sdoasmallexamplewherewesaythatthebeta’sswingaround0withastandarddeviationof0.5whichishalfthewidthoftheprevious.Takealookattheparameterestimatesnow.It’squiteapparentthatherewewereoverlyconfidentandtheresultsarenowquiteabitofffromthetruth.However,Iwouldarguethatthisisarathersanepriorstill.Why?Becausewehadnorelationtotheproblemathandandit’sbetterinthissettingtobeabitconservative.Assuchweweresuccessful.Westatedourmindandthe“one”datapointupdateditbyalot.Nowimagineifwewouldhavehadtwo?Assuchmaybeit’snotsobadthatonedatapointwasabletoupdateouropinionquiteabitandmaybeitwasn’tsuchabadideatobeconservativeinthefirstplace?Naturallywhetherornotit’srecommendedtobeconservativeisofcourseuptotheapplicationathand.Foranapplicationdeterminingwhetherasuspectisindeedguiltyofthecrimeinthefaceofevidenceitisperhapsquitenaturaltobeskepticofthe“evidence”meanwhileforapotentialinvestmentitmaypayofftobemoreriskyandacceptahighererrorrateatthehopeofabigwin.ConclusionSowhatdidwelearnfromallofthis?Wellhopefullyyoulearnedthatsettingpriorsisnotsomethingyoulearnover-night.Ittakespracticetogetafeelforit.However,theprinciplesareexceedinglyobvious.Iwillleaveyouwithsomehardcoreadviceonhowtosetpriors.AlwayssetthepriorsinthevicinityofwhatyoubelievethetruthisAlwayssetthepriorssuchthattheyreflectthesameorderofmagnitudeasthephenomenonyou’retryingtopredictDon’tbeoverconfident,leavespacefordoubtNeverusecompletelyuninformativepriorsWheneverpossiblerefrainfromusinguniformdistributionsAlwayssumuptheconsequenceofallofyourpriorssuchthatifnodatawasavailableyourmodelstillpredictsinthesameorderofmagnitudeasyourobservedresponseBecareful,andbehonest!NeverpostulateveryinformativepriorsonresultsyouWANTtobetrue.It’sOKifyouBELIEVEthemtobetrue.Don’trestyourminduntilyouseethedifference.Happyhacking!Originallypublishedatdoktormike.github.io.MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumOlalekanElesinFromPHPBugFixtoCustomerSegmentationDataProject — MyFreelanceStoryAaronTayinAcademiclibrariansandopenaccessCheckingforretractions&otherqualitychecksonyourmanuscriptbeforejournalsubmissions—…BhuvanCWSvsKCRDream11TeamPrediction:14May2021-MLBMartinKeywoodinTowardsDataScienceHowcouldIdosimpleinferenceonmyFine-TunedTransformersNERmodels?PrinceJhainAnalyticsVidhyaTopTenBusinessIntelligenceToolsPhilippTomacinMachineLearningReplyDACHWhyshouldyoucareaboutETLpipelines?AvikarBanikWhatitneedsforsuccessfulimplementationofaDataScienceprojectAlexYeskovinCubeDevSQLWindowFunctionsTutorialforBusinessAnalysisAboutHelpTermsPrivacyGettheMediumappGetstartedMichaelGreen186FollowersAtechnologydrivenartificialintelligenceevangelistandmachinelearningexpert,tryingtodomypartinmovingthisworldforwardthroughscience.FollowMorefromMediumAudhiAprilliantIntroductiontoFuzzyc-meansforClusteringAlgorithmGabrieleOrlandiinTowardsDataScienceTheReasonableEffectivenessofDeepLearningforTimeSeriesForecastingChrisKuo/Dr.DatamaninDatamaninAIHowIsthePartialDependencePlotComputed?SolefromTraininDataVariancestabilizingtransformationsinmachinelearningHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable
延伸文章資訊
- 1Relationship between Bayesian prior, posterior, and data. Prior...
Download scientific diagram | Relationship between Bayesian prior, posterior, and data. Prior kno...
- 2The use of Bayesian priors in Ecology: The good, the bad and ...
Bayesian data analysis (BDA) is a powerful tool for making inference from ecological data, but it...
- 3Prior Probability Definition - Investopedia
Bayes' Theorem
- 4Prior probability - Wikipedia
In Bayesian statistical inference, a prior probability distribution, often simply called the prio...
- 5Priors in Whole-Genome Regression: The Bayesian Alphabet ...
It follows that Bayes B assigns, a priori, the same mean and variance to all marker effects and t...