When to use Bayesian - Towards Data Science
文章推薦指數: 80 %
Bayesian statistics is all about belief. We have some prior belief about the true model, and we combine that with the likelihood of our data to get our ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceWhentouseBayesian5ScenariosWhereBayesianModelingShouldbeConsideredIntroductionMoststatisticalmodelshaveafrequentistandaBayesianversion.Thedecisionbetweentwoapproachesarenotjustachoicebetweenmodels,itismoreachoicebetweenstatistical“languages”.Frequentiststatisticsisbasedonhypotheticallyresamplingthedatafromanunderlyingtruemodel.Takemaximumlikelihoodestimation,inasplainEnglishaspossible.Itsayslikelihoodistheprobabilityoftheobserveddatagiven(conditionalon)theunderlyingmodel.Whichofallthepossibleunderlyingmodelsgivesusthehighestprobabilityofseeingthedata?TheBayesianequivalentofthismodelwearelookingforis“Whichmodel,orsetofparametervalues,arewemostcertainisthetruemodel?”Philosophiesaside,therearesomepracticalreasonstochooseoneapproachortheother.Thefrequentistapproachisoftenthego-to,sointhisarticle,I’llgothroughsomescenarioswhereitmaymakesensetogowithBayesianmodeling.Note:foraprimerontheintuitionofBayesianInference,seethispiece.PhotobyCarlosMuzaonUnsplashPriorInformationBayesianstatisticsisallaboutbelief.Wehavesomepriorbeliefaboutthetruemodel,andwecombinethatwiththelikelihoodofourdatatogetourposteriorbeliefaboutthetruemodel.(Iwon’tgotoointothemathhere,butagainseehereifinterested.)Insomecases,wehaveknowledgeaboutourdomainbeforeweseeanyofthedata.Bayesianinferenceprovidesastraightforwardwaytoencodethatbeliefintoapriorprobabilitydistribution.Forexample,sayIamaneconomistpredictingtheeffectsofinterestratesontechstockpricechanges.Mybeliefsabouteconomicssaythatlowinterestratesgenerallyboostpricesbysomeamount,probablybyaround5%perinterestpointbutprobablynotbymorethan25%perinterestpoint.Wemightuseanormaldistributionforourprior,centeredat-5%withavarianceof40%.Itwouldlooksomethinglikethis:Priorbelief~Normal(-5,40)abouttheeffectofinterestratesonstockprices.Imagebyauthor.IfI’mworriedabouttheeffectthatmypotentiallybiasedpriorbeliefshaveontheposterior,Icaneitherusea“weakerprior”,e.g.normaldistributionwithmean0andvariance1600(akastandarddeviation=40).Weakpriorbelief~Normal(0,1600)abouttheeffectofinterestratesonstockprices.Imagebyauthor.or2.DosensitivityanalysiswhereItryawholebunchofdifferentpriorsandseehowmuchitactuallyeffectstheposteriordistribution.LimitedDataThisscenarioisrelatedtothepreviousone.Ifourdatasetisreallysmall,someoutlierscouldleadtoanincorrectfitinafrequentistmodel(i.e.usingsklearnordinaryleastsquareslinearregression).However,ifweuseaBayesianapproach,apriordistributionovertheparameterscanactasaregularizationtopreventunlikelyextremevalues.Inasituationwheredataisobtainedovertime,youcandotheBayesianinferencewiththedatayouhave,obtainaposteriordistribution,andthenusethatposteriorasthepriorwhennewdataareobtained.Thisprocess,called“BayesianUpdating”,canberepeatedoverandover.UncertaintymeasurementAsalludedtoearlier,Bayesianinferenceprovidesmoreinterpretableconfidenceintervals,usuallyreferredtoas“credibleintervals.”Thisposteriordistributionisdirectlyanalogousourbeliefaboutaparameterinthemodelorapredictiongivennewdata.Youcansay“Iam95%surethatparameterθisbetween2.2and3.6.”Comparethistothefrequentistconfidenceintervalwhichcansay“inalargenumberofrepeatedsamples,thesimilarlycalculatedintervalsasoursbetween1.7and3.4wouldcontainthetruevalue95%ofthetime.”Ifyoucareaboutconveyinguncertainty(especiallytonon-statisticians),theBayesianapproachprobablymakesmoresense.LimitedTestDataIfyouaretrainingandtestingontwodifferentdistributions,alikelyscenarioisthattheamountofdatainthetestingdistributionisfarsmaller.Saywewanttotrainacomputervisionalgorithmtolocateararecancertypefromawhole-bodyCTscan.Weincorporatescansfrompatientswithamorecommoncancertypefortrainingandsavesomeoftherarecancerpatientsfortesting.Wecangetanaccuracypointestimatefromourevaluation,butamoreappropriatemethodmaybetouseaBayesianapproachtoevaluation[1].Wecancreatearelativelyweakpriorforaccuracyofourclassifier,addevaluationtrialsasdata,andobtainaposteriordistributionwherewecansay“Iam95%confidentthattheclassifieraccuracyonthetestdistributionisbetween65%and82%”.HierarchicalModelsHierarchicalmodelsworkwellintheBayesianFramework.Inthehierarchicalmodel,therearemultiplelevelsofrandomvariables.Forexample,youaremodelingstandardizedtestscoresofstudents.Eachstudent’sgradecomesfromadistributionofscoresintheirschooldistrict.Andtheparametersoftheschooldistrictcomefromamoreglobaldistributionofschooldistrictaveragescoresinthenation.Hierarchicalmodel.ϕrepresentsglobalparameters.𝜃aregroup-specificparameters,comingfromaglobaldistributiondefinedbyϕ.yarespecificdatapointsbelongingtooneofthesgroups.ImagebyauthorAbenefitofthehierarchicalapproachisthatyoucanmodelpropertiesofalloftheclusters,evenifthereareveryfewdatapointsfromagivencluster.Inthisexample,ifwehaveveryfewdatapointsfromagivenschool,itsdistributionwillbewiderandmoreresembletheoverallglobaldistribution(ϕ).Again,ahierarchicalmodelismoreresistanttooutliersandlimiteddatathancreatingseparatemodelsforeverycluster.Frequentistapproachestohierarchicallinearmodelsmightlookforthemodeoftheposteriordistribution,which,ofteninhierarchicalmodels,canbeontheedgeorboundaryoftheposteriorspace[2].Thiscangiveyouanincorrectresultwhenlearningmodelparameters.WithBayesianinference,weobtainawholeposteriordistributionandwecancomputemoreappropriate(foracomplexdistribution)statisticslikemean,median,and95%credibilityintervals.Withrecentcomputationalandalgorithmicadvances,Bayesianinferenceismorefeasibleforlargermodelsandmoredata.Whileinpracticefrequentistapproachesareoftenthedefaultchoice,therearesomescenarioswhereaBayesianapproachcanbeabetteroption,mostfrequentlywhen:YouhavequantifiablepriorbeliefsDataislimitedUncertaintyisimportantThemodel(data-generatingprocess)ishierarchicalThankyouforreading.RegisteringforMediummembershipsupportsmywork.References:[1]https://www.youtube.com/watch?v=5f-9xCuyZh4[2]https://discourse.mc-stan.org/t/hierarchical-linear-models-bayes-vs-frequentist/12012/1MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumRochDeriloinAnalyticsVidhyaAdventureswithPython:StorytellingwithpandasandMatplotlib(ft.Seaborn)DatylonOurtakeondatastorytellingExcelQuickerinTipoftheweekfromExcelQuickerAbsoluteandrelativereferencesMultiTechPowerBINewFeaturesAlbertChristopherWhySeniorBigDataEngineeringCertificationSuitsYouXichuZhanginTowardsDataScienceThesolutionoftheHeatequationFANLI[Coursera]BigDataSpecialization-Course1(week2)LindsayDevonBrinWhenIsDataVisualizationaGoodChoice?AboutHelpTermsPrivacyGettheMediumappGetstartedMaxReynolds337FollowersResearchingMLandneuroimaging|maxwellreynolds.comFollowMorefromMediumJavierFernandezinTowardsDataScienceImplementingBayesianLinearRegressionTarekSamaaliinMLearning.aiHumansoverfitaswellCarlaMartinsinTowardsAILogisticRegressionforMulti-ClassClassification:Hands-OnwithSciKit-LearnFedericoComottoinTowardsDataScienceStatistics101:CrediblevsConfidenceIntervalHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable
延伸文章資訊
- 1Bayesian Statistics Explained in Simple English For Beginners
“Bayesian statistics is a mathematical procedure that applies probabilities to statistical proble...
- 2What exactly is a Bayesian model? - Cross Validated
- 3Bayesian Modelling - Cambridge Machine Learning Group
Bayes rule tells us how to do inference about hypotheses from data. • Learning and prediction can...
- 4Bayesian Statistics: Techniques and Models | Coursera
由加州大学圣克鲁兹分校提供。 This is the second of a two-course sequence introducing the fundamentals of Bayesi...
- 5Bayesian statistics and modelling | Nature Reviews Methods ...
Bayesian statistics is an approach to data analysis and parameter estimation based on Bayes' theo...