Chapter 1 The Basics of Bayesian Statistics
文章推薦指數: 80 %
Bayesian statistics mostly involves conditional probability, which is the the probability of an event A given event B, and it can be calculated using the ... BayesianStatistics Preface 1TheBasicsofBayesianStatistics 1.1Bayes’Rule 1.1.1ConditionalProbabilities&Bayes’Rule 1.1.2Bayes’RuleandDiagnosticTesting 1.1.3BayesUpdating 1.1.4Bayesianvs. FrequentistDefinitionsofProbability 1.2InferenceforaProportion 1.2.1InferenceforaProportion:FrequentistApproach 1.2.2InferenceforaProportion:BayesianApproach 1.2.3EffectofSampleSizeonthePosterior 1.3Frequentistvs. BayesianInference 1.3.1Frequentistvs. BayesianInference 1.4Exercises 2BayesianInference 2.1ContinuousVariablesandElicitingProbabilityDistributions 2.1.1FromtheDiscretetotheContinuous 2.1.2Elicitation 2.1.3Conjugacy 2.2ThreeConjugateFamilies 2.2.1InferenceonaBinomialProportion 2.2.2TheGamma-PoissonConjugateFamilies 2.2.3TheNormal-NormalConjugateFamilies 2.3CredibleIntervalsandPredictiveInference 2.3.1Non-ConjugatePriors 2.3.2CredibleIntervals 2.3.3PredictiveInference 3LossesandDecisionMaking 3.1BayesianDecisionMaking 3.2LossFunctions 3.3WorkingwithLossFunctions 3.4MinimizingExpectedLossforHypothesisTesting 3.5PosteriorProbabilitiesofHypothesesandBayesFactors 4InferenceandDecision-MakingwithMultipleParameters 4.1TheNormal-GammaConjugateFamily 4.1.1ConjugatePriorfor\(\mu\)and\(\sigma^2\) 4.1.2ConjugatePosteriorDistribution 4.1.3MarginalDistributionfor\(\mu\):Student\(t\) 4.1.4CredibleIntervalsfor\(\mu\) 4.1.5Example:TTHMinTapwater 4.1.6SectionSummary 4.1.7(Optional)Derivations 4.2MonteCarloInference 4.2.1MonteCarloSampling 4.2.2MonteCarloInference:TapWaterExample 4.2.3MonteCarloInferenceforFunctionsofParameters 4.2.4Summary 4.3PredictiveDistributions 4.3.1PriorPredictiveDistribution 4.3.2TapWaterExample(continued) 4.3.3SamplingfromthePriorPredictiveinR 4.3.4PosteriorPredictive 4.3.5Summary 4.4ReferencePriors 4.5MixturesofConjugatePriors 4.6MarkovChainMonteCarlo(MCMC) 5HypothesisTestingwithNormalPopulations 5.1BayesFactorsforTestingaNormalMean:varianceknown 5.2ComparingTwoPairedMeansusingBayesFactors 5.3ComparingIndependentMeans:HypothesisTesting 5.4InferenceafterTesting 6IntroductiontoBayesianRegression 6.1BayesianSimpleLinearRegression 6.1.1FrequentistOrdinaryLeastSquare(OLS)SimpleLinearRegression 6.1.2BayesianSimpleLinearRegressionUsingtheReferencePrior 6.1.3InformativePriors 6.1.4(Optional)DerivationsofMarginalPosteriorDistributionsof\(\alpha\),\(\beta\),\(\sigma^2\) 6.1.5MarginalPosteriorDistributionof\(\beta\) 6.1.6MarginalPosteriorDistributionof\(\alpha\) 6.1.7MarginalPosteriorDistributionof\(\sigma^2\) 6.1.8JointNormal-GammaPosteriorDistributions 6.2CheckingOutliers 6.2.1PosteriorDistributionof\(\epsilon_j\)ConditioningOn\(\sigma^2\) 6.2.2ImplementationUsingBASPackage 6.3BayesianMultipleLinearRegression 6.3.1TheModel 6.3.2DataPre-processing 6.3.3SpecifyBayesianPriorDistributions 6.3.4FittingtheBayesianModel 6.3.5PosteriorMeansandPosteriorStandardDeviations 6.3.6CredibleIntervalsSummary 6.4Summary 7BayesianModelChoice 7.1BayesianInformationCriterion(BIC) 7.1.1DefinitionofBIC 7.1.2BackwardEliminationwithBIC 7.1.3CoefficientEstimatesUnderReferencePriorforBestBICModel 7.1.4OtherCriteria 7.2BayesianModelUncertainty 7.2.1ModelUncertainty 7.2.2CalculatingPosteriorProbabilityinR 7.3BayesianModelAveraging 7.3.1VisualizingModelUncertainty 7.3.2BayesianModelAveragingUsingPosteriorProbability 7.3.3CoefficientSummaryunderBMA 7.4Summary 8StochasticExplorationsUsingMCMC 8.1StochasticExploration 8.1.1MarkovChainMonteCarloExploration 8.2OtherPriorsforBayesianModelUncertainty 8.2.1Zellner’s\(g\)-Prior 8.2.2BayesFactorofZellner’s\(g\)-Prior 8.2.3Kid’sCognitiveScoreExample 8.3RDemoonBASPackage 8.3.1TheUScrimeDataSetandDataProcessing 8.3.2BayesianModelsandDiagnostics 8.3.3PosteriorUncertaintyinCoefficients 8.3.4Prediction 8.4DecisionMakingUnderModelUncertainty 8.4.1ModelChoice 8.4.2PredictionwithNewData 8.5Summary Publishedwithbookdown AnIntroductiontoBayesianThinking Chapter1TheBasicsofBayesianStatistics Bayesianstatisticsmostlyinvolvesconditionalprobability,whichisthetheprobabilityofaneventAgiveneventB,anditcanbecalculatedusingtheBayesrule.Theconceptofconditionalprobabilityiswidelyusedinmedicaltesting,inwhichfalsepositivesandfalsenegativesmayoccur.Afalsepositivecanbedefinedasapositiveoutcomeonamedicaltestwhenthepatientdoesnotactuallyhavethediseasetheyarebeingtestedfor.Inotherwords,it’stheprobabilityoftestingpositivegivennodisease.Similarly,afalsenegativecanbedefinedasanegativeoutcomeonamedicaltestwhenthepatientdoeshavethedisease.Inotherwords,testingnegativegivendisease.Bothindicatorsarecriticalforanymedicaldecisions. ForhowtheBayes’ruleisapplied,wecansetupaprior,thencalculateposteriorprobabilitiesbasedonapriorandlikelihood.Thatistosay,thepriorprobabilitiesareupdatedthroughaniterativeprocessofdatacollection. 1.1Bayes’Rule ThissectionintroduceshowtheBayes’ruleisappliedtocalculatingconditionalprobability,andseveralreal-lifeexamplesaredemonstrated.Finally,wecomparetheBayesianandfrequentistdefinitionofprobability. 1.1.1ConditionalProbabilities&Bayes’Rule ConsiderTable1.1. Itshowstheresultsofapollamong1,738adultAmericans.Thistableallowsustocalculateprobabilities. Table1.1:Resultsfroma2015Galluppollontheuseofonlinedatingsitesbyagegroup Usedonlinedatingsite 60 86 58 21 225 Didnotuseonlinedatingsite 255 426 450 382 1513 Total 315 512 508 403 1738 Forinstance,theprobabilityofanadultAmericanusinganonlinedatingsitecanbecalculatedas \[\begin{multline*} P(\text{usinganonlinedatingsite})=\\ \frac{\text{Numberthatindicatedtheyusedanonlinedatingsite}}{\text{Totalnumberofpeopleinthepoll}} =\frac{225}{1738}\approx13\%. \end{multline*}\] Thisistheoverallprobabilityofusinganonlinedatingsite.Say,wearenowinterestedintheprobabilityofusinganonlinedatingsiteifonefallsintheagegroup30-49.Similartotheabove,wehave \[\begin{multline*} P(\text{usinganonlinedatingsite}\mid\text{inagegroup30-49})=\\ \frac{\text{Numberinagegroup30-49thatindicatedtheyusedanonlinedatingsite}}{\text{Totalnumberinagegroup30-49}} =\frac{86}{512}\approx17\%. \end{multline*}\] Here,thepipesymbol`|’meansconditionalon.Thisisaconditionalprobabilityasonecanconsiderittheprobabilityofusinganonlinedatingsiteconditionalonbeinginagegroup30-49. Wecanrewritethisconditionalprobabilityintermsof‘regular’probabilitiesbydividingbothnumeratorandthedenominatorbythetotalnumberofpeopleinthepoll.Thatis, \[\begin{multline*} P(\text{usinganonlinedatingsite}\mid\text{inagegroup30-49})\\ \begin{split} &=\frac{\text{Numberinagegroup30-49thatindicatedtheyusedanonlinedatingsite}}{\text{Totalnumberinagegroup30-49}}\\ &=\frac{\frac{\text{Numberinagegroup30-49thatindicatedtheyusedanonlinedatingsite}}{\text{Totalnumberofpeopleinthepoll}}}{\frac{\text{Totalnumberinagegroup30-49}}{\text{Totalnumberofpeopleinthepoll}}}\\ &=\frac{P(\text{usinganonlinedatingsite\&fallinginagegroup30-49})}{P(\text{Fallinginagegroup30-49})}. \end{split} \end{multline*}\] ItturnsoutthisrelationshipholdstrueforanyconditionalprobabilityandisknownasBayes’rule: Definition1.1(Bayes'Rule)Theconditionalprobabilityoftheevent\(A\)conditionalontheevent\(B\)isgivenby \[ P(A\midB)=\frac{P(A\,\&\,B)}{P(B)}. \] Example1.1Whatistheprobabilitythatan18-29yearoldfromTable1.1usesonlinedatingsites? Notethatthequestionasksaquestionabout18-29yearolds.Therefore,itconditionsonbeing18-29yearsold. Bayes’ruleprovidesawaytocomputethisconditionalprobability: \[\begin{multline*} P(\text{usinganonlinedatingsite}\mid\text{inagegroup18-29})\\ \begin{split} &=\frac{P(\text{usinganonlinedatingsite\&fallinginagegroup18-29})}{P(\text{Fallinginagegroup18-29})}\\ &=\frac{\frac{\text{Numberinagegroup18-29thatindicatedtheyusedanonlinedatingsite}}{\text{Totalnumberofpeopleinthepoll}}}{\frac{\text{Totalnumberinagegroup18-29}}{\text{Totalnumberofpeopleinthepoll}}}\\ &=\frac{\text{Numberinagegroup18-29thatindicatedtheyusedanonlinedatingsite}}{\text{Totalnumberinagegroup18-29}}=\frac{60}{315}\approx19\%. \end{split} \end{multline*}\] 1.1.2Bayes’RuleandDiagnosticTesting Tobetterunderstandconditionalprobabilitiesandtheirimportance,letusconsideranexampleinvolvingthehumanimmunodeficiencyvirus(HIV).Intheearly1980s,HIVhadjustbeendiscoveredandwasrapidlyexpanding.Therewasmajorconcernwiththesafetyofthebloodsupply.Also,virtuallynocureexistedmakinganHIVdiagnosisbasicallyadeathsentence,inadditiontothestigmathatwasattachedtothedisease. ThesemadefalsepositivesandfalsenegativesinHIVtestinghighlyundesirable.Afalsepositiveiswhenatestreturnspostivewhilethetruthisnegative.ThatwouldforinstancebethatsomeonewithoutHIViswronglydiagnosedwithHIV,wronglytellingthatpersontheyaregoingtodieandcastingthestigmaonthem.Afalsenegativeiswhenatestreturnsnegativewhilethetruthispositive.ThatiswhensomeonewithHIVundergoesanHIVtestwhichwronglycomesbacknegative.Thelatterposesathreattothebloodsupplyifthatpersonisabouttodonateblood. Theprobabilityofafalsepositiveifthetruthisnegativeiscalledthefalsepositiverate.Similarly,thefalsenegativerateistheprobabilityofafalsenegativeifthetruthispositive.Notethatboththeseratesareconditionalprobabilities:ThefalsepositiverateofanHIVtestistheprobabilityofapositiveresultconditionalonthepersontestedhavingnoHIV. TheHIVtestweconsiderisanenzyme-linkedimmunosorbentassay,commonlyknownasanELISA. Wewouldliketoknowtheprobabilitythatsomeone(intheearly1980s)hasHIVifELISAtestspositive.Forthis,weneedthefollowinginformation. ELISA’struepositiverate(oneminusthefalsenegativerate),alsoreferredtoassensitivity,recall,orprobabilityofdetection,isestimatedas \[ P(\text{ELISAispositive}\mid\text{PersontestedhasHIV})=93\%=0.93. \] Itstruenegativerate(oneminusthefalsepositiverate),alsoreferredtoasspecificity,isestimatedas \[ P(\text{ELISAisnegative}\mid\text{PersontestedhasnoHIV})=99\%=0.99. \] AlsorelevanttoourquestionistheprevalenceofHIVintheoverallpopulation,whichisestimatedtobe1.48outofevery1000Americanadults.Wethereforeassume \[\begin{equation} P(\text{PersontestedhasHIV})=\frac{1.48}{1000}=0.00148. \tag{1.1} \end{equation}\] Notethattheabovenumbersareestimates.Forourpurposes,however,wewilltreatthemasiftheywereexact. OurgoalistocomputetheprobabilityofHIVifELISAispositive,thatis\(P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})\).InnoneoftheabovenumbersdidweconditionontheoutcomeofELISA.Fortunately,Bayes’ruleallowsistousetheabovenumberstocomputetheprobabilityweseek.Bayes’rulestatesthat \[\begin{equation} P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})=\frac{P(\text{PersontestedhasHIV}\,\&\,\text{ELISAispositive})}{P(\text{ELISAispositive})}. \tag{1.2} \end{equation}\] Thiscanbederivedasfollows.ForsomeonetotestpositiveandbeHIVpositive,thatpersonfirstneedstobeHIVpositiveandthensecondlytestpositive.Theprobabilityofthefirstthinghappeningis\(P(\text{HIVpositive})=0.00148\).Theprobabilityofthentestingpositiveis\(P(\text{ELISAispositive}\mid\text{PersontestedhasHIV})=0.93\),thetruepositiverate.Thisyieldsforthenumerator \[\begin{multline} P(\text{PersontestedhasHIV}\,\&\,\text{ELISAispositive})\\ \begin{split} &=P(\text{PersontestedhasHIV})P(\text{ELISAispositive}\mid\text{PersontestedhasHIV})\\ &=0.00148\cdot0.93 =0.0013764. \end{split} \tag{1.3} \end{multline}\] ThefirststepintheaboveequationisimpliedbyBayes’rule:Bymultiplyingtheleft-andright-handsideofBayes’ruleaspresentedinSection1.1.1by\(P(B)\),weobtain \[ P(A\midB)P(B)=P(A\,\&\,B). \] Thedenominatorin(1.2)canbeexpandedas \[\begin{multline*} P(\text{ELISAispositive})\\ \begin{split} &=P(\text{PersontestedhasHIV}\,\&\,\text{ELISAispositive})+P(\text{PersontestedhasnoHIV}\,\&\,\text{ELISAispositive})\\ &=0.0013764+0.0099852=0.0113616 \end{split} \end{multline*}\] whereweused(1.3)and \[\begin{multline*} P(\text{PersontestedhasnoHIV}\,\&\,\text{ELISAispositive})\\ \begin{split} &=P(\text{PersontestedhasnoHIV})P(\text{ELISAispositive}\mid\text{PersontestedhasnoHIV})\\ &=\left(1-P(\text{PersontestedhasHIV})\right)\cdot\left(1-P(\text{ELISAisnegative}\mid\text{PersontestedhasnoHIV})\right)\\ &=\left(1-0.00148\right)\cdot\left(1-0.99\right)=0.0099852. \end{split} \end{multline*}\] Puttingthisalltogetherandinsertinginto(1.2)reveals \[\begin{equation} P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})=\frac{0.0013764}{0.0113616}\approx0.12. \tag{1.4} \end{equation}\] SoevenwhentheELISAreturnspositive,theprobabilityofhavingHIVisonly12%.AnimportantreasonwhythisnumberissolowisduetotheprevalenceofHIV.Beforetesting,one’sprobabilityofHIVwas0.148%,sothepositivetestchangesthatprobabilitydramatically,butitisstillbelow50%.Thatis,itismorelikelythatoneisHIVnegativeratherthanpositiveafteronepositiveELISAtest. Questionsliketheonewejustanswered(Whatistheprobabilityofadiseaseifatestreturnspositive?)arecrucialtomakemedicaldiagnoses.Aswesaw,justthetruepositiveandtruenegativeratesofatestdonottellthefullstory,butalsoadisease’sprevalenceplaysarole.Bayes’ruleisatooltosynthesizesuchnumbersintoamoreusefulprobabilityofhavingadiseaseafteratestresult. Example1.2WhatistheprobabilitythatsomeonewhotestspositivedoesnotactuallyhaveHIV? Wefoundin(1.4)thatsomeonewhotestspositivehasa\(0.12\)probabilityofhavingHIV.Thatimpliesthatthesamepersonhasa\(1-0.12=0.88\)probabilityofnothavingHIV,despitetestingpositive. Example1.3IftheindividualisatahigherriskforhavingHIVthanarandomlysampledpersonfromthepopulationconsidered,how,ifatall,wouldyouexpect\(P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})\)tochange? IfthepersonhasaprioriahigherriskforHIVandtestspositive,thentheprobabilityofhavingHIVmustbehigherthanforsomeonenotatincreasedriskwhoalsotestspositive.Therefore,\(P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})>0.12\)where\(0.12\)comesfrom(1.4). Onecanderivethismathematicallybyplugginginalargernumberin(1.1)than0.00148,asthatnumberrepresentsthepriorriskofHIV.Changingthecalculationsaccordinglyshows\(P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})>0.12\). Example1.4Ifthefalsepositiverateofthetestishigherthan1%,how,ifatall,wouldyouexpect\(P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})\)tochange? Ifthefalsepositiverateincreases,theprobabilityofawrongpositiveresultincreases.ThatmeansthatapositivetestresultismorelikelytobewrongandthuslessindicativeofHIV.Therefore,theprobabilityofHIVafterapositiveELISAgoesdownsuchthat\(P(\text{PersontestedhasHIV}\mid\text{ELISAispositive})<0.12\). 1.1.3BayesUpdating Intheprevioussection,wesawthatonepositiveELISAtestyieldsaprobabilityofhavingHIVof12%.Toobtainamoreconvincingprobability,onemightwanttodoasecondELISAtestafterafirstonecomesuppositive.WhatistheprobabilityofbeingHIVpositiveifalsothesecondELISAtestcomesbackpositive? Tosolvethisproblem,wewillassumethatthecorrectnessofthissecondtestisnotinfluencedbythefirstELISA,thatis,thetestsareindependentfromeachother.Thisassumptionprobablydoesnotholdtrueasitisplausiblethatifthefirsttestwasafalsepositive,itismorelikelythatthesecondonewillbeoneaswell.Nonetheless,westickwiththeindependenceassumptionforsimplicity. Inthelastsection,weused\(P(\text{PersontestedhasHIV})=0.00148\),see(1.1),tocomputetheprobabilityofHIVafteronepositivetest.Ifwerepeatthosestepsbutnowwith\(P(\text{PersontestedhasHIV})=0.12\),theprobabilitythatapersonwithonepositivetesthasHIV,weexactlyobtaintheprobabilityofHIVaftertwopositivetests.Repeatingthemathsfromtheprevioussection,involvingBayes’rule,gives \[\begin{multline} P(\text{PersontestedhasHIV}\mid\text{SecondELISAisalsopositive})\\ \begin{split} &=\frac{P(\text{PersontestedhasHIV})P(\text{SecondELISAispositive}\mid\text{PersontestedhasHIV})}{P(\text{SecondELISAisalsopositive})}\\ &=\frac{0.12\cdot0.93}{ \begin{split} &P(\text{PersontestedhasHIV})P(\text{SecondELISAispositive}\mid\text{HasHIV})\\ &+P(\text{PersontestedhasnoHIV})P(\text{SecondELISAispositive}\mid\text{HasnoHIV}) \end{split} }\\ &=\frac{0.1116}{0.12\cdot0.93+(1-0.12)\cdot(1-0.99)}\approx0.93. \end{split} \tag{1.5} \end{multline}\] SinceweareconsideringthesameELISAtest,weusedthesametruepositiveandtruenegativeratesasinSection1.1.2. WeseethattwopositivetestsmakesitmuchmoreprobableforsomeonetohaveHIVthanwhenonlyonetestcomesuppositive. Thisprocess,ofusingBayes’ruletoupdateaprobabilitybasedonaneventaffectingit,iscalledBayes’updating.Moregenerally,thewhatonetriestoupdatecanbeconsidered‘prior’information,sometimessimplycalledtheprior.Theeventprovidinginformationaboutthiscanalsobedata.Then,updatingthispriorusingBayes’rulegivestheinformationconditionalonthedata,alsoknownastheposterior,asintheinformationafterhavingseenthedata.GoingfromthepriortotheposteriorisBayesupdating. TheprobabilityofHIVafteronepositiveELISA,0.12,wastheposteriorintheprevioussectionasitwasanupdateoftheoverallprevalenceofHIV,(1.1).However,inthissectionweansweredaquestionwhereweusedthisposteriorinformationastheprior.ThisprocessofusingaposterioraspriorinanewproblemisnaturalintheBayesianframeworkofupdatingknowledgebasedonthedata. Example1.5WhatistheprobabilitythatoneactuallyhasHIVaftertestingpositive3timesontheELISA?Again,assumethatallthreeELISAsareindependent. Analogoustowhatwedidinthissection,wecanuseBayes’updatingforthis.However,nowtheprioristheprobabilityofHIVaftertwopositiveELISAs,thatis\(P(\text{PersontestedhasHIV})=0.93\).Analogousto(1.5),theanswerfollowsas \[\begin{multline} P(\text{PersontestedhasHIV}\mid\text{ThirdELISAisalsopositive})\\ \begin{split} &=\frac{P(\text{PersontestedhasHIV})P(\text{ThirdELISAispositive}\mid\text{PersontestedhasHIV})}{P(\text{ThirdELISAisalsopositive})}\\ &=\frac{0.93\cdot0.93}{\begin{split} &P(\text{PersontestedhasHIV})P(\text{ThirdELISAispositive}\mid\text{HasHIV})\\ +&P(\text{PersontestedhasnoHIV})P(\text{ThirdELISAispositive}\mid\text{HasnoHIV}) \end{split}}\\ &=\frac{0.8649}{0.93\cdot0.93+(1-0.93)\cdot(1-0.99)}\approx0.999. \end{split} \end{multline}\] 1.1.4Bayesianvs. FrequentistDefinitionsofProbability Thefrequentistdefinitionofprobabilityisbasedonobservationofalargenumberoftrials.Theprobabilityforanevent\(E\)tooccuris\(P(E)\),andassumeweget\(n_E\)successesoutof\(n\)trials.Thenwehave \[\begin{equation} P(E)=\lim_{n\rightarrow\infty}\dfrac{n_E}{n}. \end{equation}\] Ontheotherhand,theBayesiandefinitionofprobability\(P(E)\)reflectsourpriorbeliefs,so\(P(E)\)canbeanyprobabilitydistribution,providedthatitisconsistentwithallofourbeliefs.(Forexample,wecannotbelievethattheprobabilityofacoinlandingheadsis0.7andthattheprobabilityofgettingtailsis0.8,becausetheyareinconsistent.) Thetwodefinitionsresultindifferentmethodsofinference.Usingthefrequentistapproach,wedescribetheconfidencelevelastheproportionofrandomsamplesfromthesamepopulationthatproducedconfidenceintervalswhichcontainthetruepopulationparameter.Forexample,ifwegenerated100randomsamplesfromthepopulation,and95ofthesamplescontainthetrueparameter,thentheconfidencelevelis95%.Notethateachsampleeithercontainsthetrueparameterordoesnot,sotheconfidencelevelisNOTtheprobabilitythatagivenintervalincludesthetruepopulationparameter. Example1.6Basedona2015PewResearchpollon1,500adults:“Weare95%confidentthat60%to64%ofAmericansthinkthefederalgovernmentdoesnotdoenoughformiddleclasspeople. Thecorrectinterpretationis:95%ofrandomsamplesof1,500adultswillproduce confidenceintervalsthatcontainthetrueproportionofAmericanswhothinkthefederalgovernmentdoesnotdoenoughformiddleclasspeople. Herearetwocommonmisconceptions: Thereisa95%chancethatthisconfidenceintervalincludesthetruepopulationproportion. Thetruepopulationproportionisinthisinterval95%ofthetime. Theprobabilitythatagivenconfidenceintervalcapturesthetrueparameteriseitherzeroorone.Toafrequentist,theproblemisthatoneneverknowswhetheraspecificintervalcontainsthetruevaluewithprobabilityzeroorone.Soafrequentistsaysthat“95%ofsimilarlyconstructedintervalscontainthetruevalue”. Thesecond(incorrect)statementsoundslikethetrueproportionisavaluethatmovesaroundthatissometimesinthegivenintervalandsometimesnotinit.Actuallythetrueproportionisconstant,it’sthevariousintervalsconstructedbasedonnewsamplesthataredifferent. TheBayesianalternativeisthecredibleinterval,whichhasadefinitionthatiseasiertointerpret.SinceaBayesianisallowedtoexpressuncertaintyintermsofprobability,aBayesiancredibleintervalisarangeforwhichtheBayesianthinksthattheprobabilityofincludingthetruevalueis,say,0.95.ThusaBayesiancansaythatthereisa95%chancethatthecredibleintervalcontainsthetrueparametervalue. Example1.7Theposteriordistributionyieldsa95%credibleintervalof60%to64%fortheproportionofAmericanswhothinkthefederalgovernmentdoesnotdoenoughformiddleclasspeople. Wecansaythatthereisa95%probabilitythattheproportionisbetween60%and64%becausethisisacredibleinterval,andmoredetailswillbeintroducedlaterinthecourse. 1.2InferenceforaProportion 1.2.1InferenceforaProportion:FrequentistApproach Example1.8RU-486isclaimedtobeaneffective“morningafter”contraceptivepill,butisitreallyeffective? Data:Atotalof40womencametoahealthclinicaskingforemergencycontraception(usuallytopreventpregnancyafterunprotectedsex).TheywererandomlyassignedtoRU-486(treatment)orstandardtherapy(control),20ineachgroup.Inthetreatmentgroup,4outof20becamepregnant.Inthecontrolgroup,thepregnancyrateis16outof20. Question:Howstronglydothesedataindicatethatthetreatmentismoreeffectivethanthecontrol? Tosimplifytheframework,let’smakeitaoneproportionproblemandjustconsiderthe20totalpregnanciesbecausethetwogroupshavethesamesamplesize.Ifthetreatmentandcontrolareequallyeffective,thentheprobabilitythatapregnancycomesfromthetreatmentgroup(\(p\))shouldbe0.5.IfRU-486ismoreeffective,thentheprobabilitythatapregnancycomesfromthetreatmentgroup(\(p\))shouldbelessthan0.5. Therefore,wecanformthehypothesesasbelow: \(p=\)probabilitythatagivenpregnancycomesfromthetreatmentgroup \(H_0:p=0.5\)(nodifference,apregnancyisequallylikelytocomefromthetreatmentorcontrolgroup) \(H_A:p<0.5\)(treatmentismoreeffective,apregnancyislesslikelytocomefromthetreatmentgroup) Ap-valueisneededtomakeaninferencedecisionwiththefrequentistapproach.Thedefinitionofp-valueistheprobabilityofobservingsomethingatleastasextremeasthedata,giventhatthenullhypothesis(\(H_0\))istrue.“Moreextreme”meansinthedirectionofthealternativehypothesis(\(H_A\)). Since\(H_0\)statesthattheprobabilityofsuccess(pregnancy)is0.5,wecancalculatethep-valuefrom20independentBernoullitrialswheretheprobabilityofsuccessis0.5.Theoutcomeofthisexperimentis4successesin20trials,sothegoalistoobtain4orfewersuccessesinthe20Bernoullitrials. Thisprobabilitycanbecalculatedexactlyfromabinomialdistributionwith\(n=20\)trialsandsuccessprobability\(p=0.5\).Assume\(k\)istheactualnumberofsuccessesobserved,thep-valueis \[P(k\leq4)=P(k=0)+P(k=1)+P(k=2)+P(k=3)+P(k=4)\]. sum(dbinom(0:4,size=20,p=0.5)) ##[1]0.005908966 Accordingto\(\mathsf{R}\),theprobabilityofgetting4orfewersuccessesin20trialsis0.0059.Therefore,giventhatpregnancyisequallylikelyinthetwogroups,wegetthechanceofobserving4orfewerpreganancyinthetreatmentgroupis0.0059.Withsuchasmallprobability,werejectthenullhypothesisandconcludethatthedataprovideconvincingevidenceforthetreatmentbeingmoreeffectivethanthecontrol. 1.2.2InferenceforaProportion:BayesianApproach Thissectionusesthesameexample,butthistimewemaketheinferencefortheproportionfromaBayesianapproach.Recallthatwestillconsideronlythe20totalpregnancies,4ofwhichcomefromthetreatmentgroup.Thequestionwewouldliketoansweristhathowlikelyisfor4pregnanciestooccurinthetreatmentgroup.Alsorememberthatifthetreatmentandcontrolareequallyeffective,andthesamplesizesforthetwogroupsarethesame,thentheprobability(\(p\))thatthepregnancycomesfromthetreatmentgroupis0.5. WithintheBayesianframework,weneedtomakesomeassumptionsonthemodelswhichgeneratedthedata.First,\(p\)isaprobability,soitcantakeonanyvaluebetween0and1.However,let’ssimplifybyusingdiscretecases–assume\(p\),thechanceofapregnancycomesfromthetreatmentgroup,cantakeonninevalues,from10%,20%,30%,upto90%.Forexample,\(p=20\%\)meansthatamong10pregnancies,itisexpectedthat2ofthemwilloccurinthetreatmentgroup.Notethatweconsiderallninemodels,comparedwiththefrequentistparadigmthatwheconsideronlyonemodel. Table1.2specifiesthepriorprobabilitiesthatwewanttoassigntoourassumption.Thereisnouniquecorrectprior,butanypriorprobabilityshouldreflectourbeliefspriortotheexperiement.Thepriorprobabilitiesshouldincorporatetheinformationfromallrelevantresearchbeforeweperformthecurrentexperiement. Table1.2:Prior,likelihood,andposteriorprobabilitiesforeachofthe9models Model(p) 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.70 0.80 0.90 PriorP(model) 0.0600 0.0600 0.0600 0.0600 0.5200 0.0600 0.06 0.06 0.06 LikelihoodP(data|model) 0.0898 0.2182 0.1304 0.0350 0.0046 0.0003 0.00 0.00 0.00 P(data|model)xP(model) 0.0054 0.0131 0.0078 0.0021 0.0024 0.0000 0.00 0.00 0.00 PosteriorP(model|data) 0.1748 0.4248 0.2539 0.0681 0.0780 0.0005 0.00 0.00 0.00 Thispriorincorporatestwobeliefs:theprobabilityof\(p=0.5\)ishighest,andthebenefitofthetreatmentissymmetric.Thesecondbeliefmeansthatthetreatmentisequallylikelytobebetterorworsethanthestandardtreatment.NowitisnaturaltoaskhowIcameupwiththisprior,andthespecificationwillbediscussedindetaillaterinthecourse. Next,let’scalculatethelikelihood–theprobabilityofobserveddataforeachmodelconsidered.Inmathematicalterms,wehave \[P(\text{data}|\text{model})=P(k=4|n=20,p)\] Thelikelihoodcanbecomputedasabinomialwith4successesand20trialswith\(p\)isequaltotheassumedvalueineachmodel.ThevaluesarelistedinTable1.2. Aftersettingupthepriorandcomputingthelikelihood,wearereadytocalculatetheposteriorusingtheBayes’rule,thatis, \[P(\text{model}|\text{data})=\frac{P(\text{model})P(\text{data}|\text{model})}{P(\text{data})}\] TheposteriorprobabilityvaluesarealsolistedinTable1.2,andthehighestprobabilityoccursat\(p=0.2\),whichis42.48%.Notethatthepriorsandposteriorsacrossallmodelsbothsumto1. Indecisionmaking,wechoosethemodelwiththehighestposteriorprobability,whichis\(p=0.2\).Incomparison,thehighestpriorprobabilityisat\(p=0.5\)with52%,andtheposteriorprobabilityof\(p=0.5\)dropsto7.8%.Thisdemonstrateshowweupdateourbeliefsbasedonobserveddata.Notethatthecalculationofposterior,likelihood,andpriorisunrelatedtothefrequentistconcept(data“atleastasextremeasobserved”). Herearethehistogramsoftheprior,thelikelihood,andtheposteriorprobabilities: Figure1.1:Original:samplesize\(n=20\)andnumberofsuccesses\(k=4\) Westartedwiththehighpriorat\(p=0.5\),butthedatalikelihoodpeaksat\(p=0.2\).Andweupdatedourpriorbasedonobserveddatatofindtheposterior.TheBayesianparadigm,unlikethefrequentistapproach,allowsustomakedirectprobabilitystatementsaboutourmodels.Forexample,wecancalculatetheprobabilitythatRU-486,thetreatment,ismoreeffectivethanthecontrolasthesumoftheposteriorsofthemodelswhere\(p<0.5\).AddinguptherelevantposteriorprobabilitiesinTable1.2,wegetthechancethatthetreatmentismoreeffectivethanthecontrolis92.16%. 1.2.3EffectofSampleSizeonthePosterior TheRU-486exampleissummarizedinFigure1.1,andlet’slookatwhattheposteriordistributionwouldlooklikeifwehadmoredata. Figure1.2:Moredata:samplesize\(n=40\)andnumberofsuccesses\(k=8\) Supposeoursamplesizewas40insteadof20,andthenumberofsuccesseswas8insteadof4.Notethattheratiobetweenthesamplesizeandthenumberofsuccessesisstill20%.Wewillstartwiththesamepriordistribution.Thencalculatethelikelihoodofthedatawhichisalsocenteredat0.20,butislessvariablethantheoriginallikelihoodwehadwiththesmallersamplesize.Andfinallyputthesetwotogethertoobtaintheposteriordistribution.Theposterioralsohasapeakatpisequalto0.20,butthepeakistaller,asshowninFigure1.2.Inotherwords,thereismoremassonthatmodel,andlessontheothers. Figure1.3:Moredata:samplesize\(n=200\)andnumberofsuccesses\(k=40\) Toillustratetheeffectofthesamplesizeevenfurther,wearegoingtokeepincreasingoursamplesize,butstillmaintainthethe20%ratiobetweenthesamplesizeandthenumberofsuccesses.Solet’sconsiderasamplewith200observationsand40successes.Onceagain,wearegoingtousethesamepriorandthelikelihoodisagaincenteredat20%andalmostalloftheprobabilitymassintheposteriorisatpisequalto0.20.Theothermodelsdonothavezeroprobabilitymass,butthey’reposteriorprobabilitiesareveryclosetozero. Figure1.3demonstratesthatasmoredataarecollected,thelikelihoodendsupdominatingtheprior.Thisiswhy,whileagoodpriorhelps,abadpriorcanbeovercomewithalargesample.However,it’simportanttonotethatthiswillonlyworkaslongaswedonotplaceazeroprobabilitymassonanyofthemodelsintheprior. 1.3Frequentistvs. BayesianInference 1.3.1Frequentistvs. BayesianInference Inthissection,wewillsolveasimpleinferenceproblemusingbothfrequentistandBayesianapproaches.Thenwewillcompareourresultsbasedondecisionsbasedonthetwomethods,toseewhetherwegetthesameanswerornot.Ifwedonot,wewilldiscusswhythathappens. Example1.9WehaveapopulationofM&M’s,andinthispopulationthepercentageofyellowM&M’siseither10%or20%.YouhavebeenhiredasastatisticalconsultanttodecidewhetherthetruepercentageofyellowM&M’sis10%or20%. Payoffs/losses:Youarebeingaskedtomakeadecision,andthereareassociatedpayoff/lossesthatyoushouldconsider.Ifyoumakethecorrectdecision,yourbossgivesyouabonus.Ontheotherhand,ifyoumakethewrongdecision,youloseyourjob. Data:Youcan“buy”arandomsamplefromthepopulation–Youpay$200foreachM&M,andyoumustbuyin$1,000increments(5M&Msatatime).Youhaveatotalof$4,000tospend,i.e.,youmaybuy5,10,15,or20M&Ms. Remark:Rememberthatthecostofmakingawrongdecisionishigh,soyouwanttobefairlyconfidentofyourdecision.Atthesametime,though,datacollectionisalsocostly,soyoudon’twanttopayforasamplelargerthanyouneed.Ifyoubelievethatyoucouldactuallymakeacorrectdecisionusingasmallersamplesize,youmightchoosetodosoandsavemoneyandresources. Let’sstartwiththefrequentistinference. Hypothesis:\(H_0\)is10%yellowM&Ms,and\(H_A\)is>10%yellowM&Ms. Significancelevel:\(\alpha=0.05\). Onesample:red,green,yellow,blue,orange Observeddata:\(k=1,n=5\) P-value:\(P(k\geq1|n=5,p=0.10)=1-P(k=0|n=5,p=0.10)=1-0.90^5\approx0.41\) Notethatthep-valueistheprobabilityofobservedormoreextremeoutcomegiventhatthenullhypothesisistrue. Therefore,wefailtoreject\(H_0\)andconcludethatthedatadonotprovideconvincingevidencethattheproportionofyellowM&M’sisgreaterthan10%.Thismeansthatifwehadtopickbetween10%and20%fortheproportionofM&M’s,eventhoughthishypothesistestingproceduredoesnotactuallyconfirmthenullhypothesis,wewouldlikelystickwith10%sincewecouldn’tfindevidencethattheproportionofyellowM&M’sisgreaterthan10%. TheBayesianinferenceworksdifferentlyasbelow. Hypotheses:\(H_1\)is10%yellowM&Ms,and\(H_2\)is20%yellowM&Ms. Prior:\(P(H_1)=P(H_2)=0.5\) Sample:red,green,yellow,blue,orange Observeddata:\(k=1,n=5\) Likelihood: \[\begin{aligned} P(k=1|H_1)&=\left(\begin{array}{c}5\\1\end{array}\right)\times0.10\times0.90^4\approx0.33\\ P(k=1|H_2)&=\left(\begin{array}{c}5\\1\end{array}\right)\times0.20\times0.80^4\approx0.41 \end{aligned}\] Posterior \[\begin{aligned} P(H_1|k=1)&=\frac{P(H_1)P(k=1|H_1)}{P(k=1)}=\frac{0.5\times0.33}{0.5\times0.33+0.5\times0.41}\approx0.45\\ P(H_2|k=1)&=1-0.45=0.55 \end{aligned}\] Theposteriorprobabilitiesofwhether\(H_1\)or\(H_2\)iscorrectareclosetoeachother.Asaresult,withequalpriorsandalowsamplesize,itisdifficulttomakeadecisionwithastrongconfidence,giventheobserveddata.However,\(H_2\)hasahigherposteriorprobabilitythan\(H_1\),soifwehadtomakeadecisionatthispoint,weshouldpick\(H_2\),i.e.,theproportionofyellowM&Msis20%.Notethatthisdecisioncontradictswiththedecisionbasedonthefrequentistapproach. Table1.3summarizeswhattheresultswouldlooklikeifwehadchosenlargersamplesizes.Undereachofthesescenarios,thefrequentistmethodyieldsahigherp-valuethanoursignificancelevel,sowewouldfailtorejectthenullhypothesiswithanyofthesesamples.Ontheotherhand,theBayesianmethodalwaysyieldsahigherposteriorforthesecondmodelwhere\(p\)isequalto0.20.Sothedecisionsthatwewouldmakearecontradictorytoeachother. Table1.3:FrequentistandBayesianprobabilitiesforlargersamplesizes ObservedData P(kormore|10%yellow) P(10%yellow|n,k) P(20%yellow|n,k) n=5,k=1 0.41 0.45 0.55 n=10,k=2 0.26 0.39 0.61 n=15,k=3 0.18 0.34 0.66 n=20,k=4 0.13 0.29 0.71 However,ifwehadsetupourframeworkdifferentlyinthefrequentistmethodandsetournullhypothesistobe\(p=0.20\)andouralternativetobe\(p<0.20\),wewouldobtaindifferentresults.Thisshowsthatthefrequentistmethodishighlysensitivetothenullhypothesis,whileintheBayesianmethod,ourresultswouldbethesameregardlessofwhichorderweevaluateourmodels. 1.4Exercises Conditioningondatingsiteusage. RecallTable1.1.Whatistheprobabilitythatanonlinedatingsiteuserfromthissampleis18-29yearsold? ProbabilityofnoHIV. ConsidertheELISAtestfromSection1.1.2.WhatistheprobabilitythatsomeonehasnoHIVifthatpersonhasanegativeELISAresult?HowdoesthiscomparetotheprobabilityofhavingnoHIVbeforeanytestwasdone? ProbabilityofnoHIVaftercontradictivetests. ConsidertheELISAtestfromSection1.1.2.WhatistheprobabilitythatsomeonehasnoHIVifthatpersonfirsttestspositiveontheELISAandsecondlytestnegative?Assumethatthetestsareindependentfromeachother.
延伸文章資訊
- 1Increasing Interpretability of Bayesian Probabilistic Programming Models ...
- 2Bayes' theorem - Wikipedia
Using pedigree to calculate probabilities
- 3Bayesian Statistics Explained in Simple English For Beginners
- 4Understanding Statistics And Probability: Bayesian Inference
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update...
- 5Chapter 1 The Basics of Bayesian Statistics
Bayesian statistics mostly involves conditional probability, which is the the probability of an e...