Bayesian inference - Wikipedia

文章推薦指數: 80 %
投票人數:10人

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or ... Bayesianinference FromWikipedia,thefreeencyclopedia Jumptonavigation Jumptosearch Methodofstatisticalinference PartofaseriesonBayesianstatistics Theory Admissibledecisionrule Bayesianefficiency Bayesianepistemology Bayesianprobability Probabilityinterpretations Bayes'theorem Bayesfactor Bayesianinference Bayesiannetwork Prior Posterior Likelihood Conjugateprior Posteriorpredictive Hyperparameter Hyperprior Principleofindifference Principleofmaximumentropy EmpiricalBayesmethod Cromwell'srule Bernstein–vonMisestheorem Schwarzcriterion Credibleinterval Maximumaposterioriestimation Radicalprobabilism Techniques Bayesianlinearregression Bayesianestimator ApproximateBayesiancomputation MarkovchainMonteCarlo IntegratednestedLaplaceapproximations  Mathematicsportalvte BayesianinferenceisamethodofstatisticalinferenceinwhichBayes'theoremisusedtoupdatetheprobabilityforahypothesisasmoreevidenceorinformationbecomesavailable.Bayesianinferenceisanimportanttechniqueinstatistics,andespeciallyinmathematicalstatistics.Bayesianupdatingisparticularlyimportantinthedynamicanalysisofasequenceofdata.Bayesianinferencehasfoundapplicationinawiderangeofactivities,includingscience,engineering,philosophy,medicine,sport,andlaw.Inthephilosophyofdecisiontheory,Bayesianinferenceiscloselyrelatedtosubjectiveprobability,oftencalled"Bayesianprobability". Contents 1IntroductiontoBayes'rule 1.1Formalexplanation 1.2AlternativestoBayesianupdating 2FormaldescriptionofBayesianinference 2.1Definitions 2.2Bayesianinference 2.3Bayesianprediction 3Inferenceoverexclusiveandexhaustivepossibilities 3.1Generalformulation 3.2Multipleobservations 3.3Parametricformulation 4Mathematicalproperties 4.1Interpretationoffactor 4.2Cromwell'srule 4.3Asymptoticbehaviourofposterior 4.4Conjugatepriors 4.5Estimatesofparametersandpredictions 5Examples 5.1Probabilityofahypothesis 5.2Makingaprediction 6Infrequentiststatisticsanddecisiontheory 6.1Modelselection 7Probabilisticprogramming 8Applications 8.1Statisticaldataanalysis 8.2Computerapplications 8.3Bioinformaticsandhealthcareapplications 8.4Inthecourtroom 8.5Bayesianepistemology 8.6Other 9BayesandBayesianinference 10History 11Seealso 12References 12.1Citations 12.2Sources 13Furtherreading 13.1Elementary 13.2Intermediateoradvanced 14Externallinks IntroductiontoBayes'rule[edit] AgeometricvisualisationofBayes'theorem.Inthetable,thevalues2,3,6and9givetherelativeweightsofeachcorrespondingconditionandcase.Thefiguresdenotethecellsofthetableinvolvedineachmetric,theprobabilitybeingthefractionofeachfigurethatisshaded.ThisshowsthatP(A|B)P(B)=P(B|A)P(A)i.e.P(A|B)=P(B|A)P(A)/P(B).SimilarreasoningcanbeusedtoshowthatP(¬A|B)=P(B|¬A)P(¬A)/P(B)etc. Mainarticle:Bayes'theorem Seealso:Bayesianprobability Formalexplanation[edit] Contingencytable HypothesisEvidence SatisfieshypothesisH Violateshypothesis¬H Total HasevidenceE P(H|E)·P(E)=P(E|H)·P(H) P(¬H|E)·P(E)=P(E|¬H)·P(¬H) P(E) Noevidence¬E P(H|¬E)·P(¬E)=P(¬E|H)·P(H) P(¬H|¬E)·P(¬E)=P(¬E|¬H)·P(¬H) P(¬E)=1−P(E) Total   P(H) P(¬H)=1−P(H) 1 Bayesianinferencederivestheposteriorprobabilityasaconsequenceoftwoantecedents:apriorprobabilityanda"likelihoodfunction"derivedfromastatisticalmodelfortheobserveddata.BayesianinferencecomputestheposteriorprobabilityaccordingtoBayes'theorem: P ( H ∣ E ) = P ( E ∣ H ) ⋅ P ( H ) P ( E ) {\displaystyleP(H\midE)={\frac{P(E\midH)\cdotP(H)}{P(E)}}} where H {\textstyleH} standsforanyhypothesiswhoseprobabilitymaybeaffectedbydata(calledevidencebelow).Oftentherearecompetinghypotheses,andthetaskistodeterminewhichisthemostprobable. P ( H ) {\textstyleP(H)} ,thepriorprobability,istheestimateoftheprobabilityofthehypothesis H {\textstyleH} beforethedata E {\textstyleE} ,thecurrentevidence,isobserved. E {\textstyleE} ,theevidence,correspondstonewdatathatwerenotusedincomputingthepriorprobability. P ( H ∣ E ) P(H\midE) ,theposteriorprobability,istheprobabilityof H {\textstyleH} given E {\textstyleE} ,i.e.,after E {\textstyleE} isobserved.Thisiswhatwewanttoknow:theprobabilityofahypothesisgiventheobservedevidence. P ( E ∣ H ) {\textstyleP(E\midH)} istheprobabilityofobserving E {\textstyleE} given H {\textstyleH} ,andiscalledthelikelihood.Asafunctionof E {\textstyleE} with H {\textstyleH} fixed,itindicatesthecompatibilityoftheevidencewiththegivenhypothesis.Thelikelihoodfunctionisafunctionoftheevidence, E {\textstyleE} ,whiletheposteriorprobabilityisafunctionofthehypothesis, H {\textstyleH} . P ( E ) {\textstyleP(E)} issometimestermedthemarginallikelihoodor"modelevidence".Thisfactoristhesameforallpossiblehypothesesbeingconsidered(asisevidentfromthefactthatthehypothesis H {\textstyleH} doesnotappearanywhereinthesymbol,unlikeforalltheotherfactors),sothisfactordoesnotenterintodeterminingtherelativeprobabilitiesofdifferenthypotheses. Fordifferentvaluesof H {\textstyleH} ,onlythefactors P ( H ) {\textstyleP(H)} and P ( E ∣ H ) {\textstyleP(E\midH)} ,bothinthenumerator,affectthevalueof P ( H ∣ E ) {\textstyleP(H\midE)} –theposteriorprobabilityofahypothesisisproportionaltoitspriorprobability(itsinherentlikeliness)andthenewlyacquiredlikelihood(itscompatibilitywiththenewobservedevidence). Bayes'rulecanalsobewrittenasfollows: P ( H ∣ E ) = P ( E ∣ H ) P ( H ) P ( E ) = P ( E ∣ H ) P ( H ) P ( E ∣ H ) P ( H ) + P ( E ∣ ¬ H ) P ( ¬ H ) = 1 1 + ( 1 P ( H ) − 1 ) P ( E ∣ ¬ H ) P ( E ∣ H ) {\displaystyle{\begin{aligned}P(H\midE)&={\frac{P(E\midH)P(H)}{P(E)}}\\\\&={\frac{P(E\midH)P(H)}{P(E\midH)P(H)+P(E\mid\negH)P(\negH)}}\\\\&={\frac{1}{1+\left({\frac{1}{P(H)}}-1\right){\frac{P(E\mid\negH)}{P(E\midH)}}}}\\\end{aligned}}} because P ( E ) = P ( E ∣ H ) P ( H ) + P ( E ∣ ¬ H ) P ( ¬ H ) {\displaystyleP(E)=P(E\midH)P(H)+P(E\mid\negH)P(\negH)} and P ( H ) + P ( ¬ H ) = 1 {\displaystyleP(H)+P(\negH)=1} where ¬ H {\displaystyle\negH} is"not H {\textstyleH} ",thelogicalnegationof H {\textstyleH} . OnequickandeasywaytoremembertheequationwouldbetouseRuleofMultiplication: P ( E ∩ H ) = P ( E ∣ H ) P ( H ) = P ( H ∣ E ) P ( E ) {\displaystyleP(E\capH)=P(E\midH)P(H)=P(H\midE)P(E)} AlternativestoBayesianupdating[edit] Bayesianupdatingiswidelyusedandcomputationallyconvenient.However,itisnottheonlyupdatingrulethatmightbeconsideredrational. IanHackingnotedthattraditional"Dutchbook"argumentsdidnotspecifyBayesianupdating:theyleftopenthepossibilitythatnon-BayesianupdatingrulescouldavoidDutchbooks.Hackingwrote[1][2]"AndneithertheDutchbookargumentnoranyotherinthepersonalistarsenalofproofsoftheprobabilityaxiomsentailsthedynamicassumption.NotoneentailsBayesianism.SothepersonalistrequiresthedynamicassumptiontobeBayesian.ItistruethatinconsistencyapersonalistcouldabandontheBayesianmodeloflearningfromexperience.Saltcouldloseitssavour." Indeed,therearenon-BayesianupdatingrulesthatalsoavoidDutchbooks(asdiscussedintheliteratureon"probabilitykinematics")followingthepublicationofRichardC.Jeffrey'srule,whichappliesBayes'ruletothecasewheretheevidenceitselfisassignedaprobability.[3]TheadditionalhypothesesneededtouniquelyrequireBayesianupdatinghavebeendeemedtobesubstantial,complicated,andunsatisfactory.[4] FormaldescriptionofBayesianinference[edit] Definitions[edit] x {\displaystylex} ,adatapointingeneral.Thismayinfactbeavectorofvalues. θ {\displaystyle\theta} ,theparameterofthedatapoint'sdistribution,i.e., x ∼ p ( x ∣ θ ) {\displaystylex\simp(x\mid\theta)} .Thismaybeavectorofparameters. α {\displaystyle\alpha} ,thehyperparameteroftheparameterdistribution,i.e., θ ∼ p ( θ ∣ α ) {\displaystyle\theta\simp(\theta\mid\alpha)} .Thismaybeavectorofhyperparameters. X {\displaystyle\mathbf{X}} isthesample,asetof n {\displaystylen} observeddatapoints,i.e., x 1 , … , x n {\displaystylex_{1},\ldots,x_{n}} . x ~ {\displaystyle{\tilde{x}}} ,anewdatapointwhosedistributionistobepredicted. Bayesianinference[edit] Thepriordistributionisthedistributionoftheparameter(s)beforeanydataisobserved,i.e. p ( θ ∣ α ) {\displaystylep(\theta\mid\alpha)} .Thepriordistributionmightnotbeeasilydetermined;insuchacase,onepossibilitymaybetousetheJeffreyspriortoobtainapriordistributionbeforeupdatingitwithnewerobservations. Thesamplingdistributionisthedistributionoftheobserveddataconditionalonitsparameters,i.e. p ( X ∣ θ ) {\displaystylep(\mathbf{X}\mid\theta)} .Thisisalsotermedthelikelihood,especiallywhenviewedasafunctionoftheparameter(s),sometimeswritten L ⁡ ( θ ∣ X ) = p ( X ∣ θ ) {\displaystyle\operatorname{L}(\theta\mid\mathbf{X})=p(\mathbf{X}\mid\theta)} . Themarginallikelihood(sometimesalsotermedtheevidence)isthedistributionoftheobserveddatamarginalizedovertheparameter(s),i.e. p ( X ∣ α ) = ∫ p ( X ∣ θ ) p ( θ ∣ α ) d θ {\displaystylep(\mathbf{X}\mid\alpha)=\intp(\mathbf{X}\mid\theta)p(\theta\mid\alpha)d\theta} . Theposteriordistributionisthedistributionoftheparameter(s)aftertakingintoaccounttheobserveddata.ThisisdeterminedbyBayes'rule,whichformstheheartofBayesianinference: p ( θ ∣ X , α ) = p ( θ , X , α ) p ( X , α ) = p ( X ∣ θ , α ) p ( θ , α ) p ( X ∣ α ) p ( α ) = p ( X ∣ θ , α ) p ( θ ∣ α ) p ( X ∣ α ) ∝ p ( X ∣ θ , α ) p ( θ ∣ α ) . {\displaystylep(\theta\mid\mathbf{X},\alpha)={\frac{p(\theta,\mathbf{X},\alpha)}{p(\mathbf{X},\alpha)}}={\frac{p(\mathbf{X}\mid\theta,\alpha)p(\theta,\alpha)}{p(\mathbf{X}\mid\alpha)p(\alpha)}}={\frac{p(\mathbf{X}\mid\theta,\alpha)p(\theta\mid\alpha)}{p(\mathbf{X}\mid\alpha)}}\proptop(\mathbf{X}\mid\theta,\alpha)p(\theta\mid\alpha).} Thisisexpressedinwordsas"posteriorisproportionaltolikelihoodtimesprior",orsometimesas"posterior=likelihoodtimesprior,overevidence". Inpractice,foralmostallcomplexBayesianmodelsusedinmachinelearning,theposteriordistribution p ( θ ∣ X , α ) {\displaystylep(\theta\mid\mathbf{X},\alpha)} isnotobtainedinaclosedformdistribution,mainlybecausetheparameterspacefor θ {\displaystyle\theta} canbeveryhigh,ortheBayesianmodelretainscertainhierarchicalstructureformulatedfromtheobservations X {\displaystyle\mathbf{X}} andparameter θ {\displaystyle\theta} .Insuchsituations,weneedtoresorttoapproximationtechniques.[5] Bayesianprediction[edit] Theposteriorpredictivedistributionisthedistributionofanewdatapoint,marginalizedovertheposterior: p ( x ~ ∣ X , α ) = ∫ p ( x ~ ∣ θ ) p ( θ ∣ X , α ) d θ {\displaystylep({\tilde{x}}\mid\mathbf{X},\alpha)=\intp({\tilde{x}}\mid\theta)p(\theta\mid\mathbf{X},\alpha)d\theta} Thepriorpredictivedistributionisthedistributionofanewdatapoint,marginalizedovertheprior: p ( x ~ ∣ α ) = ∫ p ( x ~ ∣ θ ) p ( θ ∣ α ) d θ {\displaystylep({\tilde{x}}\mid\alpha)=\intp({\tilde{x}}\mid\theta)p(\theta\mid\alpha)d\theta} Bayesiantheorycallsfortheuseoftheposteriorpredictivedistributiontodopredictiveinference,i.e.,topredictthedistributionofanew,unobserveddatapoint.Thatis,insteadofafixedpointasaprediction,adistributionoverpossiblepointsisreturned.Onlythiswayistheentireposteriordistributionoftheparameter(s)used.Bycomparison,predictioninfrequentiststatisticsofteninvolvesfindinganoptimumpointestimateoftheparameter(s)—e.g.,bymaximumlikelihoodormaximumaposterioriestimation(MAP)—andthenpluggingthisestimateintotheformulaforthedistributionofadatapoint.Thishasthedisadvantagethatitdoesnotaccountforanyuncertaintyinthevalueoftheparameter,andhencewillunderestimatethevarianceofthepredictivedistribution. Insomeinstances,frequentiststatisticscanworkaroundthisproblem.Forexample,confidenceintervalsandpredictionintervalsinfrequentiststatisticswhenconstructedfromanormaldistributionwithunknownmeanandvarianceareconstructedusingaStudent'st-distribution.Thiscorrectlyestimatesthevariance,duetothefactsthat(1) theaverageofnormallydistributedrandomvariablesisalsonormallydistributed,and(2)thepredictivedistributionofanormallydistributeddatapointwithunknownmeanandvariance,usingconjugateoruninformativepriors,hasaStudent'st-distribution.InBayesianstatistics,however,theposteriorpredictivedistributioncanalwaysbedeterminedexactly—oratleasttoanarbitrarylevelofprecisionwhennumericalmethodsareused. Bothtypesofpredictivedistributionshavetheformofacompoundprobabilitydistribution(asdoesthemarginallikelihood).Infact,ifthepriordistributionisaconjugateprior,suchthatthepriorandposteriordistributionscomefromthesamefamily,itcanbeseenthatbothpriorandposteriorpredictivedistributionsalsocomefromthesamefamilyofcompounddistributions.Theonlydifferenceisthattheposteriorpredictivedistributionusestheupdatedvaluesofthehyperparameters(applyingtheBayesianupdaterulesgivenintheconjugatepriorarticle),whilethepriorpredictivedistributionusesthevaluesofthehyperparametersthatappearinthepriordistribution. Inferenceoverexclusiveandexhaustivepossibilities[edit] Ifevidenceissimultaneouslyusedtoupdatebeliefoverasetofexclusiveandexhaustivepropositions,Bayesianinferencemaybethoughtofasactingonthisbeliefdistributionasawhole. Generalformulation[edit] Diagramillustratingeventspace Ω {\displaystyle\Omega} ingeneralformulationofBayesianinference.Althoughthisdiagramshowsdiscretemodelsandevents,thecontinuouscasemaybevisualizedsimilarlyusingprobabilitydensities. Supposeaprocessisgeneratingindependentandidenticallydistributedevents E n , n = 1 , 2 , 3 , … {\displaystyleE_{n},\,\,n=1,2,3,\ldots} ,buttheprobabilitydistributionisunknown.Lettheeventspace Ω {\displaystyle\Omega} representthecurrentstateofbeliefforthisprocess.Eachmodelisrepresentedbyevent M m {\displaystyleM_{m}} .Theconditionalprobabilities P ( E n ∣ M m ) {\displaystyleP(E_{n}\midM_{m})} arespecifiedtodefinethemodels. P ( M m ) {\displaystyleP(M_{m})} isthedegreeofbeliefin M m {\displaystyleM_{m}} .Beforethefirstinferencestep, { P ( M m ) } {\displaystyle\{P(M_{m})\}} isasetofinitialpriorprobabilities.Thesemustsumto1,butareotherwisearbitrary. Supposethattheprocessisobservedtogenerate E ∈ { E n } {\textstyleE\in\{E_{n}\}} .Foreach M ∈ { M m } {\displaystyleM\in\{M_{m}\}} ,theprior P ( M ) {\displaystyleP(M)} isupdatedtotheposterior P ( M ∣ E ) {\displaystyleP(M\midE)} .FromBayes'theorem:[6] P ( M ∣ E ) = P ( E ∣ M ) ∑ m P ( E ∣ M m ) P ( M m ) ⋅ P ( M ) {\displaystyleP(M\midE)={\frac{P(E\midM)}{\sum_{m}{P(E\midM_{m})P(M_{m})}}}\cdotP(M)} Uponobservationoffurtherevidence,thisproceduremayberepeated. VenndiagramforthefundamentalsetsfrequentlyusedinBayesianinferenceandcomputations[5] Multipleobservations[edit] Forasequenceofindependentandidenticallydistributedobservations E = ( e 1 , … , e n ) {\displaystyle\mathbf{E}=(e_{1},\dots,e_{n})} ,itcanbeshownbyinductionthatrepeatedapplicationoftheaboveisequivalentto P ( M ∣ E ) = P ( E ∣ M ) ∑ m P ( E ∣ M m ) P ( M m ) ⋅ P ( M ) {\displaystyleP(M\mid\mathbf{E})={\frac{P(\mathbf{E}\midM)}{\sum_{m}{P(\mathbf{E}\midM_{m})P(M_{m})}}}\cdotP(M)} where P ( E ∣ M ) = ∏ k P ( e k ∣ M ) . {\displaystyleP(\mathbf{E}\midM)=\prod_{k}{P(e_{k}\midM)}.} Parametricformulation[edit] Byparameterizingthespaceofmodels,thebeliefinallmodelsmaybeupdatedinasinglestep.Thedistributionofbeliefoverthemodelspacemaythenbethoughtofasadistributionofbeliefovertheparameterspace.Thedistributionsinthissectionareexpressedascontinuous,representedbyprobabilitydensities,asthisistheusualsituation.Thetechniqueishoweverequallyapplicabletodiscretedistributions. Letthevector θ {\displaystyle\mathbf{\theta}} spantheparameterspace.Lettheinitialpriordistributionover θ {\displaystyle\mathbf{\theta}} be p ( θ ∣ α ) {\displaystylep(\mathbf{\theta}\mid\mathbf{\alpha})} ,where α {\displaystyle\mathbf{\alpha}} isasetofparameterstotheprioritself,orhyperparameters.Let E = ( e 1 , … , e n ) {\displaystyle\mathbf{E}=(e_{1},\dots,e_{n})} beasequenceofindependentandidenticallydistributedeventobservations,whereall e i {\displaystylee_{i}} aredistributedas p ( e ∣ θ ) {\displaystylep(e\mid\mathbf{\theta})} forsome θ {\displaystyle\mathbf{\theta}} .Bayes'theoremisappliedtofindtheposteriordistributionover θ {\displaystyle\mathbf{\theta}} : p ( θ ∣ E , α ) = p ( E ∣ θ , α ) p ( E ∣ α ) ⋅ p ( θ ∣ α ) = p ( E ∣ θ , α ) ∫ p ( E | θ , α ) p ( θ ∣ α ) d θ ⋅ p ( θ ∣ α ) {\displaystyle{\begin{aligned}p(\mathbf{\theta}\mid\mathbf{E},\mathbf{\alpha})&={\frac{p(\mathbf{E}\mid\mathbf{\theta},\mathbf{\alpha})}{p(\mathbf{E}\mid\mathbf{\alpha})}}\cdotp(\mathbf{\theta}\mid\mathbf{\alpha})\\&={\frac{p(\mathbf{E}\mid\mathbf{\theta},\mathbf{\alpha})}{\intp(\mathbf{E}|\mathbf{\theta},\mathbf{\alpha})p(\mathbf{\theta}\mid\mathbf{\alpha})\,d\mathbf{\theta}}}\cdotp(\mathbf{\theta}\mid\mathbf{\alpha})\end{aligned}}} where p ( E ∣ θ , α ) = ∏ k p ( e k ∣ θ ) {\displaystylep(\mathbf{E}\mid\mathbf{\theta},\mathbf{\alpha})=\prod_{k}p(e_{k}\mid\mathbf{\theta})} Mathematicalproperties[edit] Thissectionincludesalistofgeneralreferences,butitlackssufficientcorrespondinginlinecitations.Pleasehelptoimprovethissectionbyintroducingmoreprecisecitations.(February2012)(Learnhowandwhentoremovethistemplatemessage) Interpretationoffactor[edit] P ( E ∣ M ) P ( E ) > 1 ⇒ P ( E ∣ M ) > P ( E ) {\textstyle{\frac{P(E\midM)}{P(E)}}>1\RightarrowP(E\midM)>P(E)} .Thatis,ifthemodelweretrue,theevidencewouldbemorelikelythanispredictedbythecurrentstateofbelief.Thereverseappliesforadecreaseinbelief.Ifthebeliefdoesnotchange, P ( E ∣ M ) P ( E ) = 1 ⇒ P ( E ∣ M ) = P ( E ) {\textstyle{\frac{P(E\midM)}{P(E)}}=1\RightarrowP(E\midM)=P(E)} .Thatis,theevidenceisindependentofthemodel.Ifthemodelweretrue,theevidencewouldbeexactlyaslikelyaspredictedbythecurrentstateofbelief. Cromwell'srule[edit] Mainarticle:Cromwell'srule If P ( M ) = 0 {\displaystyleP(M)=0} then P ( M ∣ E ) = 0 {\displaystyleP(M\midE)=0} .If P ( M ) = 1 {\displaystyleP(M)=1} ,then P ( M | E ) = 1 {\displaystyleP(M|E)=1} .Thiscanbeinterpretedtomeanthathardconvictionsareinsensitivetocounter-evidence. TheformerfollowsdirectlyfromBayes'theorem.Thelattercanbederivedbyapplyingthefirstruletotheevent"not M {\displaystyleM} "inplaceof" M {\displaystyleM} ",yielding"if 1 − P ( M ) = 0 {\displaystyle1-P(M)=0} ,then 1 − P ( M ∣ E ) = 0 {\displaystyle1-P(M\midE)=0} ",fromwhichtheresultimmediatelyfollows. Asymptoticbehaviourofposterior[edit] Considerthebehaviourofabeliefdistributionasitisupdatedalargenumberoftimeswithindependentandidenticallydistributedtrials.Forsufficientlynicepriorprobabilities,theBernstein-vonMisestheoremgivesthatinthelimitofinfinitetrials,theposteriorconvergestoaGaussiandistributionindependentoftheinitialpriorundersomeconditionsfirstlyoutlinedandrigorouslyprovenbyJosephL.Doobin1948,namelyiftherandomvariableinconsiderationhasafiniteprobabilityspace.ThemoregeneralresultswereobtainedlaterbythestatisticianDavidA.Freedmanwhopublishedintwoseminalresearchpapersin1963[7]and1965[8]whenandunderwhatcircumstancestheasymptoticbehaviourofposteriorisguaranteed.His1963papertreats,likeDoob(1949),thefinitecaseandcomestoasatisfactoryconclusion.However,iftherandomvariablehasaninfinitebutcountableprobabilityspace(i.e.,correspondingtoadiewithinfinitemanyfaces)the1965paperdemonstratesthatforadensesubsetofpriorstheBernstein-vonMisestheoremisnotapplicable.Inthiscasethereisalmostsurelynoasymptoticconvergence.Laterinthe1980sand1990sFreedmanandPersiDiaconiscontinuedtoworkonthecaseofinfinitecountableprobabilityspaces.[9]Tosummarise,theremaybeinsufficienttrialstosuppresstheeffectsoftheinitialchoice,andespeciallyforlarge(butfinite)systemstheconvergencemightbeveryslow. Conjugatepriors[edit] Mainarticle:Conjugateprior Inparameterizedform,thepriordistributionisoftenassumedtocomefromafamilyofdistributionscalledconjugatepriors.Theusefulnessofaconjugateprioristhatthecorrespondingposteriordistributionwillbeinthesamefamily,andthecalculationmaybeexpressedinclosedform. Estimatesofparametersandpredictions[edit] Itisoftendesiredtouseaposteriordistributiontoestimateaparameterorvariable.SeveralmethodsofBayesianestimationselectmeasurementsofcentraltendencyfromtheposteriordistribution. Forone-dimensionalproblems,auniquemedianexistsforpracticalcontinuousproblems.Theposteriormedianisattractiveasarobustestimator.[10] Ifthereexistsafinitemeanfortheposteriordistribution,thentheposteriormeanisamethodofestimation.[11] θ ~ = E ⁡ [ θ ] = ∫ θ p ( θ ∣ X , α ) d θ {\displaystyle{\tilde{\theta}}=\operatorname{E}[\theta]=\int\theta\,p(\theta\mid\mathbf{X},\alpha)\,d\theta} Takingavaluewiththegreatestprobabilitydefinesmaximuma posteriori(MAP)estimates:[12] { θ MAP } ⊂ arg ⁡ max θ p ( θ ∣ X , α ) . {\displaystyle\{\theta_{\text{MAP}}\}\subset\arg\max_{\theta}p(\theta\mid\mathbf{X},\alpha).} Thereareexampleswherenomaximumisattained,inwhichcasethesetofMAPestimatesisempty. Thereareothermethodsofestimationthatminimizetheposteriorrisk(expected-posteriorloss)withrespecttoalossfunction,andtheseareofinteresttostatisticaldecisiontheoryusingthesamplingdistribution("frequentiststatistics").[13] Theposteriorpredictivedistributionofanewobservation x ~ {\displaystyle{\tilde{x}}} (thatisindependentofpreviousobservations)isdeterminedby[14] p ( x ~ | X , α ) = ∫ p ( x ~ , θ ∣ X , α ) d θ = ∫ p ( x ~ ∣ θ ) p ( θ ∣ X , α ) d θ . {\displaystylep({\tilde{x}}|\mathbf{X},\alpha)=\intp({\tilde{x}},\theta\mid\mathbf{X},\alpha)\,d\theta=\intp({\tilde{x}}\mid\theta)p(\theta\mid\mathbf{X},\alpha)\,d\theta.} Examples[edit] Probabilityofahypothesis[edit] Contingencytable BowlCookie #1H1 #2H2 Total Plain,E 30 20 50 Choc,¬E 10 20 30 Total 40 40 80 P(H1|E)=30/50=0.6 Supposetherearetwofullbowlsofcookies.Bowl#1has10chocolatechipand30plaincookies,whilebowl#2has20ofeach.OurfriendFredpicksabowlatrandom,andthenpicksacookieatrandom.WemayassumethereisnoreasontobelieveFredtreatsonebowldifferentlyfromanother,likewiseforthecookies.Thecookieturnsouttobeaplainone.HowprobableisitthatFredpickeditoutofbowl#1? Intuitively,itseemsclearthattheanswershouldbemorethanahalf,sincetherearemoreplaincookiesinbowl#1.ThepreciseanswerisgivenbyBayes'theorem.Let H 1 {\displaystyleH_{1}} correspondtobowl#1,and H 2 {\displaystyleH_{2}} tobowl#2. ItisgiventhatthebowlsareidenticalfromFred'spointofview,thus P ( H 1 ) = P ( H 2 ) {\displaystyleP(H_{1})=P(H_{2})} ,andthetwomustaddupto1,sobothareequalto0.5. Theevent E {\displaystyleE} istheobservationofaplaincookie.Fromthecontentsofthebowls,weknowthat P ( E ∣ H 1 ) = 30 / 40 = 0.75 {\displaystyleP(E\midH_{1})=30/40=0.75} and P ( E ∣ H 2 ) = 20 / 40 = 0.5. {\displaystyleP(E\midH_{2})=20/40=0.5.} Bayes'formulathenyields P ( H 1 ∣ E ) = P ( E ∣ H 1 ) P ( H 1 ) P ( E ∣ H 1 ) P ( H 1 ) + P ( E ∣ H 2 ) P ( H 2 )   = 0.75 × 0.5 0.75 × 0.5 + 0.5 × 0.5   = 0.6 {\displaystyle{\begin{aligned}P(H_{1}\midE)&={\frac{P(E\midH_{1})\,P(H_{1})}{P(E\midH_{1})\,P(H_{1})\;+\;P(E\midH_{2})\,P(H_{2})}}\\\\\&={\frac{0.75\times0.5}{0.75\times0.5+0.5\times0.5}}\\\\\&=0.6\end{aligned}}} Beforeweobservedthecookie,theprobabilityweassignedforFredhavingchosenbowl#1wasthepriorprobability, P ( H 1 ) {\displaystyleP(H_{1})} ,whichwas0.5.Afterobservingthecookie,wemustrevisetheprobabilityto P ( H 1 ∣ E ) {\displaystyleP(H_{1}\midE)} ,whichis0.6. Makingaprediction[edit] Exampleresultsforarchaeologyexample.Thissimulationwasgeneratedusingc=15.2. Anarchaeologistisworkingatasitethoughttobefromthemedievalperiod,betweenthe11thcenturytothe16thcentury.However,itisuncertainexactlywheninthisperiodthesitewasinhabited.Fragmentsofpotteryarefound,someofwhichareglazedandsomeofwhicharedecorated.Itisexpectedthatifthesitewereinhabitedduringtheearlymedievalperiod,then1%ofthepotterywouldbeglazedand50%ofitsareadecorated,whereasifithadbeeninhabitedinthelatemedievalperiodthen81%wouldbeglazedand5%ofitsareadecorated.Howconfidentcanthearchaeologistbeinthedateofinhabitationasfragmentsareunearthed? Thedegreeofbeliefinthecontinuousvariable C {\displaystyleC} (century)istobecalculated,withthediscretesetofevents { G D , G D ¯ , G ¯ D , G ¯ D ¯ } {\displaystyle\{GD,G{\bar{D}},{\bar{G}}D,{\bar{G}}{\bar{D}}\}} asevidence.Assuminglinearvariationofglazeanddecorationwithtime,andthatthesevariablesareindependent, P ( E = G D ∣ C = c ) = ( 0.01 + 0.81 − 0.01 16 − 11 ( c − 11 ) ) ( 0.5 − 0.5 − 0.05 16 − 11 ( c − 11 ) ) {\displaystyleP(E=GD\midC=c)=(0.01+{\frac{0.81-0.01}{16-11}}(c-11))(0.5-{\frac{0.5-0.05}{16-11}}(c-11))} P ( E = G D ¯ ∣ C = c ) = ( 0.01 + 0.81 − 0.01 16 − 11 ( c − 11 ) ) ( 0.5 + 0.5 − 0.05 16 − 11 ( c − 11 ) ) {\displaystyleP(E=G{\bar{D}}\midC=c)=(0.01+{\frac{0.81-0.01}{16-11}}(c-11))(0.5+{\frac{0.5-0.05}{16-11}}(c-11))} P ( E = G ¯ D ∣ C = c ) = ( ( 1 − 0.01 ) − 0.81 − 0.01 16 − 11 ( c − 11 ) ) ( 0.5 − 0.5 − 0.05 16 − 11 ( c − 11 ) ) {\displaystyleP(E={\bar{G}}D\midC=c)=((1-0.01)-{\frac{0.81-0.01}{16-11}}(c-11))(0.5-{\frac{0.5-0.05}{16-11}}(c-11))} P ( E = G ¯ D ¯ ∣ C = c ) = ( ( 1 − 0.01 ) − 0.81 − 0.01 16 − 11 ( c − 11 ) ) ( 0.5 + 0.5 − 0.05 16 − 11 ( c − 11 ) ) {\displaystyleP(E={\bar{G}}{\bar{D}}\midC=c)=((1-0.01)-{\frac{0.81-0.01}{16-11}}(c-11))(0.5+{\frac{0.5-0.05}{16-11}}(c-11))} Assumeauniformpriorof f C ( c ) = 0.2 {\textstylef_{C}(c)=0.2} ,andthattrialsareindependentandidenticallydistributed.Whenanewfragmentoftype e {\displaystylee} isdiscovered,Bayes'theoremisappliedtoupdatethedegreeofbeliefforeach c {\displaystylec} : f C ( c ∣ E = e ) = P ( E = e ∣ C = c ) P ( E = e ) f C ( c ) = P ( E = e ∣ C = c ) ∫ 11 16 P ( E = e ∣ C = c ) f C ( c ) d c f C ( c ) {\displaystylef_{C}(c\midE=e)={\frac{P(E=e\midC=c)}{P(E=e)}}f_{C}(c)={\frac{P(E=e\midC=c)}{\int_{11}^{16}{P(E=e\midC=c)f_{C}(c)dc}}}f_{C}(c)} Acomputersimulationofthechangingbeliefas50fragmentsareunearthedisshownonthegraph.Inthesimulation,thesitewasinhabitedaround1420,or c = 15.2 {\displaystylec=15.2} .Bycalculatingtheareaundertherelevantportionofthegraphfor50trials,thearchaeologistcansaythatthereispracticallynochancethesitewasinhabitedinthe11thand12thcenturies,about1%chancethatitwasinhabitedduringthe13thcentury,63%chanceduringthe14thcenturyand36%duringthe15thcentury.TheBernstein-vonMisestheoremassertsheretheasymptoticconvergencetothe"true"distributionbecausetheprobabilityspacecorrespondingtothediscretesetofevents { G D , G D ¯ , G ¯ D , G ¯ D ¯ } {\displaystyle\{GD,G{\bar{D}},{\bar{G}}D,{\bar{G}}{\bar{D}}\}} isfinite(seeabovesectiononasymptoticbehaviouroftheposterior). Infrequentiststatisticsanddecisiontheory[edit] Adecision-theoreticjustificationoftheuseofBayesianinferencewasgivenbyAbrahamWald,whoprovedthateveryuniqueBayesianprocedureisadmissible.Conversely,everyadmissiblestatisticalprocedureiseitheraBayesianprocedureoralimitofBayesianprocedures.[15] WaldcharacterizedadmissibleproceduresasBayesianprocedures(andlimitsofBayesianprocedures),makingtheBayesianformalismacentraltechniqueinsuchareasoffrequentistinferenceasparameterestimation,hypothesistesting,andcomputingconfidenceintervals.[16][17][18]Forexample: "Undersomeconditions,alladmissibleproceduresareeitherBayesproceduresorlimitsofBayesprocedures(invarioussenses).Theseremarkableresults,atleastintheiroriginalform,aredueessentiallytoWald.TheyareusefulbecausethepropertyofbeingBayesiseasiertoanalyzethanadmissibility."[15] "Indecisiontheory,aquitegeneralmethodforprovingadmissibilityconsistsinexhibitingaprocedureasauniqueBayessolution."[19] "Inthefirstchaptersofthiswork,priordistributionswithfinitesupportandthecorrespondingBayesprocedureswereusedtoestablishsomeofthemaintheoremsrelatingtothecomparisonofexperiments.Bayesprocedureswithrespecttomoregeneralpriordistributionshaveplayedaveryimportantroleinthedevelopmentofstatistics,includingitsasymptotictheory.""Therearemanyproblemswhereaglanceatposteriordistributions,forsuitablepriors,yieldsimmediatelyinterestinginformation.Also,thistechniquecanhardlybeavoidedinsequentialanalysis."[20] "AusefulfactisthatanyBayesdecisionruleobtainedbytakingaproperprioroverthewholeparameterspacemustbeadmissible"[21] "Animportantareaofinvestigationinthedevelopmentofadmissibilityideashasbeenthatofconventionalsampling-theoryprocedures,andmanyinterestingresultshavebeenobtained."[22] Modelselection[edit] Mainarticle:Bayesianmodelselection Seealso:Bayesianinformationcriterion Bayesianmethodologyalsoplaysaroleinmodelselectionwheretheaimistoselectonemodelfromasetofcompetingmodelsthatrepresentsmostcloselytheunderlyingprocessthatgeneratedtheobserveddata.InBayesianmodelcomparison,themodelwiththehighestposteriorprobabilitygiventhedataisselected.Theposteriorprobabilityofamodeldependsontheevidence,ormarginallikelihood,whichreflectstheprobabilitythatthedataisgeneratedbythemodel,andonthepriorbeliefofthemodel.Whentwocompetingmodelsareaprioriconsideredtobeequiprobable,theratiooftheirposteriorprobabilitiescorrespondstotheBayesfactor.SinceBayesianmodelcomparisonisaimedonselectingthemodelwiththehighestposteriorprobability,thismethodologyisalsoreferredtoasthemaximumaposteriori(MAP)selectionrule[23]ortheMAPprobabilityrule.[24] Probabilisticprogramming[edit] Mainarticle:Probabilisticprogramming Whileconceptuallysimple,Bayesianmethodscanbemathematicallyandnumericallychallenging.Probabilisticprogramminglanguages(PPLs)implementfunctionstoeasilybuildBayesianmodelstogetherwithefficientautomaticinferencemethods.Thishelpsseparatethemodelbuildingfromtheinference,allowingpractitionerstofocusontheirspecificproblemsandleavingPPLstohandlethecomputationaldetailsforthem.[25][26][27] Applications[edit] Statisticaldataanalysis[edit] SeetheseparateWikipediaentryonBayesianStatistics,specificallytheStatisticalmodelingsectioninthatpage. Computerapplications[edit] Bayesianinferencehasapplicationsinartificialintelligenceandexpertsystems.Bayesianinferencetechniqueshavebeenafundamentalpartofcomputerizedpatternrecognitiontechniquessincethelate1950s.[28]Thereisalsoanever-growingconnectionbetweenBayesianmethodsandsimulation-basedMonteCarlotechniquessincecomplexmodelscannotbeprocessedinclosedformbyaBayesiananalysis,whileagraphicalmodelstructuremayallowforefficientsimulationalgorithmsliketheGibbssamplingandotherMetropolis–Hastingsalgorithmschemes.[29]Recently[when?]Bayesianinferencehasgainedpopularityamongthephylogeneticscommunityforthesereasons;anumberofapplicationsallowmanydemographicandevolutionaryparameterstobeestimatedsimultaneously. Asappliedtostatisticalclassification,Bayesianinferencehasbeenusedtodevelopalgorithmsforidentifyinge-mailspam.ApplicationswhichmakeuseofBayesianinferenceforspamfilteringincludeCRM114,DSPAM,Bogofilter,SpamAssassin,SpamBayes,Mozilla,XEAMS,andothers.SpamclassificationistreatedinmoredetailinthearticleonthenaïveBayesclassifier. Solomonoff'sInductiveinferenceisthetheoryofpredictionbasedonobservations;forexample,predictingthenextsymbolbaseduponagivenseriesofsymbols.Theonlyassumptionisthattheenvironmentfollowssomeunknownbutcomputableprobabilitydistribution.Itisaformalinductiveframeworkthatcombinestwowell-studiedprinciplesofinductiveinference:BayesianstatisticsandOccam'sRazor.[30][unreliablesource?]Solomonoff'suniversalpriorprobabilityofanyprefixpofacomputablesequencexisthesumoftheprobabilitiesofallprograms(forauniversalcomputer)thatcomputesomethingstartingwithp.Givensomepandanycomputablebutunknownprobabilitydistributionfromwhichxissampled,theuniversalpriorandBayes'theoremcanbeusedtopredicttheyetunseenpartsofxinoptimalfashion.[31][32] Bioinformaticsandhealthcareapplications[edit] BayesianinferencehasbeenappliedindifferentBioinformaticsapplications,includingdifferentialgeneexpressionanalysis.[33]Bayesianinferenceisalsousedinageneralcancerriskmodel,calledCIRI(ContinuousIndividualizedRiskIndex),whereserialmeasurementsareincorporatedtoupdateaBayesianmodelwhichisprimarilybuiltfrompriorknowledge.[34][35] Inthecourtroom[edit] Mainarticle:Jurimetrics§ Bayesiananalysisofevidence Bayesianinferencecanbeusedbyjurorstocoherentlyaccumulatetheevidenceforandagainstadefendant,andtoseewhether,intotality,itmeetstheirpersonalthresholdfor'beyondareasonabledoubt'.[36][37][38]Bayes'theoremisappliedsuccessivelytoallevidencepresented,withtheposteriorfromonestagebecomingthepriorforthenext.ThebenefitofaBayesianapproachisthatitgivesthejuroranunbiased,rationalmechanismforcombiningevidence.ItmaybeappropriatetoexplainBayes'theoremtojurorsinoddsform,asbettingoddsaremorewidelyunderstoodthanprobabilities.Alternatively,alogarithmicapproach,replacingmultiplicationwithaddition,mightbeeasierforajurytohandle. Addingupevidence. Iftheexistenceofthecrimeisnotindoubt,onlytheidentityoftheculprit,ithasbeensuggestedthatthepriorshouldbeuniformoverthequalifyingpopulation.[39]Forexample,if1,000peoplecouldhavecommittedthecrime,thepriorprobabilityofguiltwouldbe1/1000. TheuseofBayes'theorembyjurorsiscontroversial.IntheUnitedKingdom,adefenceexpertwitnessexplainedBayes'theoremtothejuryinRvAdams.Thejuryconvicted,butthecasewenttoappealonthebasisthatnomeansofaccumulatingevidencehadbeenprovidedforjurorswhodidnotwishtouseBayes'theorem.TheCourtofAppealupheldtheconviction,butitalsogavetheopinionthat"TointroduceBayes'Theorem,oranysimilarmethod,intoacriminaltrialplungesthejuryintoinappropriateandunnecessaryrealmsoftheoryandcomplexity,deflectingthemfromtheirpropertask." Gardner-Medwin[40]arguesthatthecriteriononwhichaverdictinacriminaltrialshouldbebasedisnottheprobabilityofguilt,butrathertheprobabilityoftheevidence,giventhatthedefendantisinnocent(akintoafrequentistp-value).HearguesthatiftheposteriorprobabilityofguiltistobecomputedbyBayes'theorem,thepriorprobabilityofguiltmustbeknown.Thiswilldependontheincidenceofthecrime,whichisanunusualpieceofevidencetoconsiderinacriminaltrial.Considerthefollowingthreepropositions: ATheknownfactsandtestimonycouldhavearisenifthedefendantisguilty BTheknownfactsandtestimonycouldhavearisenifthedefendantisinnocent CThedefendantisguilty. Gardner-MedwinarguesthatthejuryshouldbelievebothAandnot-Binordertoconvict.Aandnot-BimpliesthetruthofC,butthereverseisnottrue.ItispossiblethatBandCarebothtrue,butinthiscasehearguesthatajuryshouldacquit,eventhoughtheyknowthattheywillbelettingsomeguiltypeoplegofree.SeealsoLindley'sparadox. Bayesianepistemology[edit] BayesianepistemologyisamovementthatadvocatesforBayesianinferenceasameansofjustifyingtherulesofinductivelogic. KarlPopperandDavidMillerhaverejectedtheideaofBayesianrationalism,i.e.usingBayesruletomakeepistemologicalinferences:[41]Itispronetothesameviciouscircleasanyotherjustificationistepistemology,becauseitpresupposeswhatitattemptstojustify.Accordingtothisview,arationalinterpretationofBayesianinferencewouldseeitmerelyasaprobabilisticversionoffalsification,rejectingthebelief,commonlyheldbyBayesians,thathighlikelihoodachievedbyaseriesofBayesianupdateswouldprovethehypothesisbeyondanyreasonabledoubt,orevenwithlikelihoodgreaterthan0. Other[edit] ThescientificmethodissometimesinterpretedasanapplicationofBayesianinference.Inthisview,Bayes'ruleguides(orshouldguide)theupdatingofprobabilitiesabouthypothesesconditionalonnewobservationsorexperiments.[42]TheBayesianinferencehasalsobeenappliedtotreatstochasticschedulingproblemswithincompleteinformationbyCaietal.(2009).[43] Bayesiansearchtheoryisusedtosearchforlostobjects. Bayesianinferenceinphylogeny Bayesiantoolformethylationanalysis BayesianapproachestobrainfunctioninvestigatethebrainasaBayesianmechanism. Bayesianinferenceinecologicalstudies[44][45] Bayesianinferenceisusedtoestimateparametersinstochasticchemicalkineticmodels[46] Bayesianinferenceineconophysicsforcurrencyorstockmarketprediction[47][48] Bayesianinferenceinmarketing Bayesianinferenceinmotorlearning Bayesianinferenceisusedinprobabilisticnumericstosolvenumericalproblems BayesandBayesianinference[edit] TheproblemconsideredbyBayesinProposition 9ofhisessay,"AnEssaytowardssolvingaProblemintheDoctrineofChances",istheposteriordistributionfortheparametera(thesuccessrate)ofthebinomialdistribution.[citationneeded] History[edit] Mainarticle:Historyofstatistics§ Bayesianstatistics ThetermBayesianreferstoThomasBayes(1701–1761),whoprovedthatprobabilisticlimitscouldbeplacedonanunknownevent.[citationneeded]However,itwasPierre-SimonLaplace(1749–1827)whointroduced(asPrincipleVI)whatisnowcalledBayes'theoremandusedittoaddressproblemsincelestialmechanics,medicalstatistics,reliability,andjurisprudence.[49]EarlyBayesianinference,whichuseduniformpriorsfollowingLaplace'sprincipleofinsufficientreason,wascalled"inverseprobability"(becauseitinfersbackwardsfromobservationstoparameters,orfromeffectstocauses[50]).Afterthe1920s,"inverseprobability"waslargelysupplantedbyacollectionofmethodsthatcametobecalledfrequentiststatistics.[50] Inthe20thcentury,theideasofLaplacewerefurtherdevelopedintwodifferentdirections,givingrisetoobjectiveandsubjectivecurrentsinBayesianpractice.Intheobjectiveor"non-informative"current,thestatisticalanalysisdependsononlythemodelassumed,thedataanalyzed,[51]andthemethodassigningtheprior,whichdiffersfromoneobjectiveBayesianpractitionertoanother.Inthesubjectiveor"informative"current,thespecificationofthepriordependsonthebelief(thatis,propositionsonwhichtheanalysisispreparedtoact),whichcansummarizeinformationfromexperts,previousstudies,etc. Inthe1980s,therewasadramaticgrowthinresearchandapplicationsofBayesianmethods,mostlyattributedtothediscoveryofMarkovchainMonteCarlomethods,whichremovedmanyofthecomputationalproblems,andanincreasinginterestinnonstandard,complexapplications.[52]DespitegrowthofBayesianresearch,mostundergraduateteachingisstillbasedonfrequentiststatistics.[53]Nonetheless,Bayesianmethodsarewidelyacceptedandused,suchasforexampleinthefieldofmachinelearning.[54] Seealso[edit] Bayes'theorem BayesianAnalysis,thejournaloftheISBA Bayesianepistemology Bayesianhierarchicalmodeling Bayesianprobability Bayesianregression Bayesianstructuraltimeseries(BSTS) RichardJamesBoys(1960–2019),statisticianknownforcontributionstoBayesianinference Inductiveprobability Informationfieldtheory InternationalSocietyforBayesianAnalysis(ISBA) Jeffreysprior MontyHallproblem References[edit] Citations[edit] ^Hacking,Ian(December1967)."SlightlyMoreRealisticPersonalProbability".PhilosophyofScience.34(4):316.doi:10.1086/288169.S2CID 14344339. ^Hacking(1988,p.124)[fullcitationneeded] ^"Bayes'Theorem(StanfordEncyclopediaofPhilosophy)".Plato.stanford.edu.Retrieved2014-01-05. ^vanFraassen,B.(1989)LawsandSymmetry,OxfordUniversityPress.ISBN 0-19-824860-1 ^abLee,SeYoon(2021)."Gibbssamplerandcoordinateascentvariationalinference:Aset-theoreticalreview".CommunicationsinStatistics-TheoryandMethods.51(6):1549–1568.arXiv:2008.01006.doi:10.1080/03610926.2021.1921214.S2CID 220935477. ^Gelman,Andrew;Carlin,JohnB.;Stern,HalS.;Dunson,DavidB.;Vehtari,Aki;Rubin,DonaldB.(2013).BayesianDataAnalysis,ThirdEdition.ChapmanandHall/CRC.ISBN 978-1-4398-4095-5. ^Freedman,DA(1963)."OntheasymptoticbehaviorofBayes'estimatesinthediscretecase".TheAnnalsofMathematicalStatistics.34(4):1386–1403.doi:10.1214/aoms/1177703871.JSTOR 2238346. ^Freedman,DA(1965)."OntheasymptoticbehaviorofBayesestimatesinthediscretecaseII".TheAnnalsofMathematicalStatistics.36(2):454–456.doi:10.1214/aoms/1177700155.JSTOR 2238150. ^Robins,James;Wasserman,Larry(2000)."Conditioning,likelihood,andcoherence:Areviewofsomefoundationalconcepts".JASA.95(452):1340–1346.doi:10.1080/01621459.2000.10474344.S2CID 120767108. ^Sen,PranabK.;Keating,J.P.;Mason,R.L.(1993).Pitman'smeasureofcloseness:Acomparisonofstatisticalestimators.Philadelphia:SIAM. ^Choudhuri,Nidhan;Ghosal,Subhashis;Roy,Anindya(2005-01-01).BayesianMethodsforFunctionEstimation.HandbookofStatistics.BayesianThinking.Vol. 25.pp. 373–414.CiteSeerX 10.1.1.324.3052.doi:10.1016/s0169-7161(05)25013-7.ISBN 9780444515391. ^"MaximumAPosteriori(MAP)Estimation".www.probabilitycourse.com.Retrieved2017-06-02. ^Yu,Angela."IntroductiontoBayesianDecisionTheory"(PDF).cogsci.ucsd.edu/.Archivedfromtheoriginal(PDF)on2013-02-28. ^Hitchcock,David."PosteriorPredictiveDistributionStatSlide"(PDF).stat.sc.edu. ^abBickel&Doksum(2001,p.32) ^Kiefer,J.;SchwartzR.(1965)."AdmissibleBayesCharacterofT2-,R2-,andOtherFullyInvariantTestsforMultivariateNormalProblems".AnnalsofMathematicalStatistics.36(3):747–770.doi:10.1214/aoms/1177700051. ^Schwartz,R.(1969)."InvariantProperBayesTestsforExponentialFamilies".AnnalsofMathematicalStatistics.40:270–283.doi:10.1214/aoms/1177697822. ^Hwang,J.T.&Casella,George(1982)."MinimaxConfidenceSetsfortheMeanofaMultivariateNormalDistribution"(PDF).AnnalsofStatistics.10(3):868–881.doi:10.1214/aos/1176345877. ^Lehmann,Erich(1986).TestingStatisticalHypotheses(Second ed.).(seep.309ofChapter6.7"Admissibility",andpp.17–18ofChapter1.8"CompleteClasses" ^LeCam,Lucien(1986).AsymptoticMethodsinStatisticalDecisionTheory.Springer-Verlag.ISBN 978-0-387-96307-5.(From"Chapter12PosteriorDistributionsandBayesSolutions",p.324) ^Cox,D.R.;Hinkley,D.V.(1974).TheoreticalStatistics.ChapmanandHall.p. 432.ISBN 978-0-04-121537-3. ^Cox,D.R.;Hinkley,D.V.(1974).TheoreticalStatistics.ChapmanandHall.p. 433.ISBN 978-0-04-121537-3.) ^Stoica,P.;Selen,Y.(2004)."Areviewofinformationcriterionrules".IEEESignalProcessingMagazine.21(4):36–47.doi:10.1109/MSP.2004.1311138.S2CID 17338979. ^Fatermans,J.;VanAert,S.;denDekker,A.J.(2019)."ThemaximumaposterioriprobabilityruleforatomcolumndetectionfromHAADFSTEMimages".Ultramicroscopy.201:81–91.arXiv:1902.05809.doi:10.1016/j.ultramic.2019.02.003.PMID 30991277.S2CID 104419861. ^Bessiere,P.,Mazer,E.,Ahuactzin,J.M.,&Mekhnacha,K.(2013).BayesianProgramming(1edition)ChapmanandHall/CRC. ^DanielRoy(2015)."ProbabilisticProgramming".probabilistic-programming.org.Archivedfromtheoriginalon2016-01-10.Retrieved2020-01-02. ^Ghahramani,Z(2015)."Probabilisticmachinelearningandartificialintelligence".Nature.521(7553):452–459.Bibcode:2015Natur.521..452G.doi:10.1038/nature14541.PMID 26017444.S2CID 216356. ^Fienberg,StephenE.(2006-03-01)."WhendidBayesianinferencebecome"Bayesian"?".BayesianAnalysis.1(1).doi:10.1214/06-BA101. ^JimAlbert(2009).BayesianComputationwithR,Secondedition.NewYork,Dordrecht,etc.:Springer.ISBN 978-0-387-92297-3. ^Rathmanner,Samuel;Hutter,Marcus;Ormerod,ThomasC(2011)."APhilosophicalTreatiseofUniversalInduction".Entropy.13(6):1076–1136.arXiv:1105.5721.Bibcode:2011Entrp..13.1076R.doi:10.3390/e13061076.S2CID 2499910. ^Hutter,Marcus;He,Yang-Hui;Ormerod,ThomasC(2007)."OnUniversalPredictionandBayesianConfirmation".TheoreticalComputerScience.384(2007):33–48.arXiv:0709.1516.Bibcode:2007arXiv0709.1516H.doi:10.1016/j.tcs.2007.05.016.S2CID 1500830. ^Gács,Peter;Vitányi,PaulM. B.(2December2010)."RaymondJ.Solomonoff1926-2009".CiteSeerX.CiteSeerX 10.1.1.186.8268.{{citejournal}}:Citejournalrequires|journal=(help) ^Robinson,MarkD&McCarthy,DavisJ&Smyth,GordonKedgeR:aBioconductorpackagefordifferentialexpressionanalysisofdigitalgeneexpressiondata,Bioinformatics. ^"CIRI".ciri.stanford.edu.Retrieved2019-08-11. ^Kurtz,DavidM.;Esfahani,MohammadS.;Scherer,Florian;Soo,Joanne;Jin,MichaelC.;Liu,ChihLong;Newman,AaronM.;Dührsen,Ulrich;Hüttmann,Andreas(2019-07-25)."DynamicRiskProfilingUsingSerialTumorBiomarkersforPersonalizedOutcomePrediction".Cell.178(3):699–713.e19.doi:10.1016/j.cell.2019.06.011.ISSN 1097-4172.PMC 7380118.PMID 31280963. ^Dawid,A. P.andMortera, J.(1996)"CoherentAnalysisofForensicIdentificationEvidence".JournaloftheRoyalStatisticalSociety,Series B,58,425–443. ^ Foreman,L. A.;Smith,A. F. M.,andEvett,I. W.(1997)."Bayesiananalysisofdeoxyribonucleicacidprofilingdatainforensicidentificationapplications(withdiscussion)".JournaloftheRoyalStatisticalSociety,Series A,160,429–469. ^Robertson,B.andVignaux,G. A.(1995)InterpretingEvidence:EvaluatingForensicScienceintheCourtroom.JohnWileyandSons.Chichester.ISBN 978-0-471-96026-3 ^Dawid,A.P.(2001)Bayes'TheoremandWeighingEvidencebyJuriesArchived2015-07-01attheWaybackMachine ^Gardner-Medwin,A.(2005)"WhatProbabilityShouldtheJuryAddress?".Significance,2(1),March2005 ^Miller,David(1994).CriticalRationalism.Chicago:OpenCourt.ISBN 978-0-8126-9197-9. ^Howson&Urbach(2005),Jaynes(2003) ^Cai,X.Q.;Wu,X.Y.;Zhou,X.(2009)."Stochasticschedulingsubjecttobreakdown-repeatbreakdownswithincompleteinformation".OperationsResearch.57(5):1236–1249.doi:10.1287/opre.1080.0660. ^Ogle,Kiona;Tucker,Colin;Cable,JessicaM.(2014-01-01)."Beyondsimplelinearmixingmodels:process-basedisotopepartitioningofecologicalprocesses".EcologicalApplications.24(1):181–195.doi:10.1890/1051-0761-24.1.181.ISSN 1939-5582.PMID 24640543. ^Evaristo,Jaivime;McDonnell,JeffreyJ.;Scholl,MarthaA.;Bruijnzeel,L.Adrian;Chun,KwokP.(2016-01-01)."Insightsintoplantwateruptakefromxylem-waterisotopemeasurementsintwotropicalcatchmentswithcontrastingmoistureconditions".HydrologicalProcesses.30(18):3210–3227.Bibcode:2016HyPr...30.3210E.doi:10.1002/hyp.10841.ISSN 1099-1085.S2CID 131588159. ^Gupta,Ankur;Rawlings,JamesB.(April2014)."ComparisonofParameterEstimationMethodsinStochasticChemicalKineticModels:ExamplesinSystemsBiology".AIChEJournal.60(4):1253–1268.doi:10.1002/aic.14409.ISSN 0001-1541.PMC 4946376.PMID 27429455. ^Fornalski,K.W.(2016)."TheTadpoleBayesianModelforDetectingTrendChangesinFinancialQuotations"(PDF).R&RJournalofStatisticsandMathematicalSciences.2(1):117–122. ^Schütz,N.;Holschneider,M.(2011)."DetectionoftrendchangesintimeseriesusingBayesianinference".PhysicalReviewE.84(2):021120.arXiv:1104.3448.Bibcode:2011PhRvE..84b1120S.doi:10.1103/PhysRevE.84.021120.PMID 21928962.S2CID 11460968. ^Stigler,StephenM.(1986)."Chapter3".TheHistoryofStatistics.HarvardUniversityPress.ISBN 9780674403406. ^abFienberg,StephenE.(2006)."WhendidBayesianInferenceBecome'Bayesian'?".BayesianAnalysis.1(1):1–40[p.5].doi:10.1214/06-ba101. ^Bernardo,José-Miguel(2005)."Referenceanalysis".Handbookofstatistics.Vol. 25.pp. 17–90. ^Wolpert,R. L.(2004)."AConversationwithJamesO.Berger".StatisticalScience.19(1):205–218.CiteSeerX 10.1.1.71.6112.doi:10.1214/088342304000000053.MR 2082155. ^Bernardo,JoséM.(2006)."ABayesianmathematicalstatisticsprimer"(PDF).Icots-7. ^Bishop,C.M.(2007).PatternRecognitionandMachineLearning.NewYork:Springer.ISBN 978-0387310732. Sources[edit] Aster,Richard;Borchers,Brian,andThurber,Clifford(2012).ParameterEstimationandInverseProblems,SecondEdition,Elsevier.ISBN 0123850487,ISBN 978-0123850485 Bickel,PeterJ.&Doksum,KjellA.(2001).MathematicalStatistics,Volume1:BasicandSelectedTopics(Second(updatedprinting2007) ed.).PearsonPrentice–Hall.ISBN 978-0-13-850363-5. Box,G. E. P.andTiao,G. C.(1973)BayesianInferenceinStatisticalAnalysis,Wiley,ISBN 0-471-57428-7 Edwards,Ward(1968)."ConservatisminHumanInformationProcessing".InKleinmuntz,B.(ed.).FormalRepresentationofHumanJudgment.Wiley. Edwards,Ward(1982).DanielKahneman;PaulSlovic;AmosTversky(eds.)."Judgmentunderuncertainty:Heuristicsandbiases".Science.185(4157):1124–1131.Bibcode:1974Sci...185.1124T.doi:10.1126/science.185.4157.1124.PMID 17835457.S2CID 143452957.Chapter:ConservatisminHumanInformationProcessing(excerpted) JaynesE. T.(2003)ProbabilityTheory:TheLogicofScience,CUP.ISBN 978-0-521-59271-0(LinktoFragmentaryEditionofMarch1996). Howson,C.&Urbach,P.(2005).ScientificReasoning:theBayesianApproach(3rd ed.).OpenCourtPublishingCompany.ISBN 978-0-8126-9578-6. Phillips,L.D.;Edwards,Ward(October2008)."Chapter6:ConservatisminaSimpleProbabilityInferenceTask(JournalofExperimentalPsychology(1966)72:346-354)".InJieW.Weiss;DavidJ.Weiss(eds.).AScienceofDecisionMaking:TheLegacyofWardEdwards.OxfordUniversityPress.p. 536.ISBN 978-0-19-532298-9. Furtherreading[edit] ForafullreportonthehistoryofBayesianstatisticsandthedebateswithfrequentistsapproaches,readVallverdu,Jordi(2016).BayesiansVersusFrequentistsAPhilosophicalDebateonStatisticalReasoning.NewYork:Springer.ISBN 978-3-662-48638-2. Elementary[edit] Thefollowingbooksarelistedinascendingorderofprobabilisticsophistication: Stone,JV(2013),"Bayes'Rule:ATutorialIntroductiontoBayesianAnalysis",Downloadfirstchapterhere,SebtelPress,England. DennisV.Lindley(2013).UnderstandingUncertainty,RevisedEdition(2nd ed.).JohnWiley.ISBN 978-1-118-65012-7. ColinHowson&PeterUrbach(2005).ScientificReasoning:TheBayesianApproach(3rd ed.).OpenCourtPublishingCompany.ISBN 978-0-8126-9578-6. Berry,DonaldA.(1996).Statistics:ABayesianPerspective.Duxbury.ISBN 978-0-534-23476-8. MorrisH.DeGroot&MarkJ.Schervish(2002).ProbabilityandStatistics(third ed.).Addison-Wesley.ISBN 978-0-201-52488-8. Bolstad,WilliamM.(2007)IntroductiontoBayesianStatistics:SecondEdition,JohnWileyISBN 0-471-27020-2 Winkler,RobertL(2003).IntroductiontoBayesianInferenceandDecision(2nd ed.).Probabilistic.ISBN 978-0-9647938-4-2.Updatedclassictextbook.Bayesiantheoryclearlypresented. Lee,PeterM.BayesianStatistics:AnIntroduction.FourthEdition(2012),JohnWileyISBN 978-1-1183-3257-3 Carlin,BradleyP.&Louis,ThomasA.(2008).BayesianMethodsforDataAnalysis,ThirdEdition.BocaRaton,FL:ChapmanandHall/CRC.ISBN 978-1-58488-697-6. Gelman,Andrew;Carlin,JohnB.;Stern,HalS.;Dunson,DavidB.;Vehtari,Aki;Rubin,DonaldB.(2013).BayesianDataAnalysis,ThirdEdition.ChapmanandHall/CRC.ISBN 978-1-4398-4095-5. Intermediateoradvanced[edit] Berger,JamesO(1985).StatisticalDecisionTheoryandBayesianAnalysis.SpringerSeriesinStatistics(Second ed.).Springer-Verlag.Bibcode:1985sdtb.book.....B.ISBN 978-0-387-96098-2. Bernardo,José M.;Smith,Adrian F. M.(1994).BayesianTheory.Wiley. DeGroot,MorrisH.,OptimalStatisticalDecisions.WileyClassicsLibrary.2004.(Originallypublished(1970)byMcGraw-Hill.)ISBN 0-471-68029-X. Schervish,MarkJ.(1995).Theoryofstatistics.Springer-Verlag.ISBN 978-0-387-94546-0. Jaynes,E.T.(1998)ProbabilityTheory:TheLogicofScience. O'Hagan,A.andForster,J.(2003)Kendall'sAdvancedTheoryofStatistics,Volume2B:BayesianInference.Arnold,NewYork.ISBN 0-340-52922-9. Robert,ChristianP(2001).TheBayesianChoice–ADecision-TheoreticMotivation(second ed.).Springer.ISBN 978-0-387-94296-4. GlennShaferandPearl,Judea,eds.(1988)ProbabilisticReasoninginIntelligentSystems,SanMateo,CA:MorganKaufmann. PierreBessièreetal.(2013),"BayesianProgramming",CRCPress.ISBN 9781439880326 FranciscoJ.Samaniego(2010),"AComparisonoftheBayesianandFrequentistApproachestoEstimation"Springer,NewYork,ISBN 978-1-4419-5940-9 Externallinks[edit] "Bayesianapproachtostatisticalproblems",EncyclopediaofMathematics,EMSPress,2001[1994] BayesianStatisticsfromScholarpedia. IntroductiontoBayesianprobabilityfromQueenMaryUniversityofLondon MathematicalNotesonBayesianStatisticsandMarkovChainMonteCarlo Bayesianreadinglist,categorizedandannotatedbyTomGriffiths A.HajekandS.Hartmann:BayesianEpistemology,in:J.Dancyetal.(eds.),ACompaniontoEpistemology.Oxford:Blackwell2010,93–106. S.HartmannandJ.Sprenger:BayesianEpistemology,in:S.BerneckerandD.Pritchard(eds.),RoutledgeCompaniontoEpistemology.London:Routledge2010,609–620. StanfordEncyclopediaofPhilosophy:"InductiveLogic" BayesianConfirmationTheory WhatIsBayesianLearning? vteStatistics Outline Index DescriptivestatisticsContinuousdataCenter Mean Arithmetic Cubic Generalized/power Geometric Harmonic Heinz Median Mode Dispersion Averageabsolutedeviation Coefficientofvariation Interquartilerange Percentile Range Standarddeviation Variance Shape Centrallimittheorem Moments Kurtosis L-moments Skewness Countdata Indexofdispersion Summarytables Contingencytable Frequencydistribution Groupeddata Dependence Partialcorrelation Pearsonproduct-momentcorrelation Rankcorrelation Kendall'sτ Spearman'sρ Scatterplot Graphics Barchart Biplot Boxplot Controlchart Correlogram Fanchart Forestplot Histogram Piechart Q–Qplot Radarchart Runchart Scatterplot Stem-and-leafdisplay Violinplot DatacollectionStudydesign Effectsize Missingdata Optimaldesign Population Replication Samplesizedetermination Statistic Statisticalpower Surveymethodology Sampling Cluster Stratified Opinionpoll Questionnaire Standarderror Controlledexperiments Blocking Factorialexperiment Interaction Randomassignment Randomizedcontrolledtrial Randomizedexperiment Scientificcontrol Adaptivedesigns Adaptiveclinicaltrial Stochasticapproximation Up-and-downdesigns Observationalstudies Cohortstudy Cross-sectionalstudy Naturalexperiment Quasi-experiment StatisticalinferenceStatisticaltheory Population Statistic Probabilitydistribution Samplingdistribution Orderstatistic Empiricaldistribution Densityestimation Statisticalmodel Modelspecification Lpspace Parameter location scale shape Parametricfamily Likelihood (monotone) Location–scalefamily Exponentialfamily Completeness Sufficiency Statisticalfunctional Bootstrap U V Optimaldecision lossfunction Efficiency Statisticaldistance divergence Asymptotics Robustness FrequentistinferencePointestimation Estimatingequations Maximumlikelihood Methodofmoments M-estimator Minimumdistance Unbiasedestimators Mean-unbiasedminimum-variance Rao–Blackwellization Lehmann–Scheffétheorem Medianunbiased Plug-in Intervalestimation Confidenceinterval Pivot Likelihoodinterval Predictioninterval Toleranceinterval Resampling Bootstrap Jackknife Testinghypotheses 1-&2-tails Power Uniformlymostpowerfultest Permutationtest Randomizationtest Multiplecomparisons Parametrictests Likelihood-ratio Score/Lagrangemultiplier Wald Specifictests Z-test(normal) Student'st-test F-test Goodnessoffit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratiotest Modelselection Crossvalidation AIC BIC Rankstatistics Sign Samplemedian Signedrank(Wilcoxon) Hodges–Lehmannestimator Ranksum(Mann–Whitney) Nonparametricanova 1-way(Kruskal–Wallis) 2-way(Friedman) Orderedalternative(Jonckheere–Terpstra) VanderWaerdentest Bayesianinference Bayesianprobability prior posterior Credibleinterval Bayesfactor Bayesianestimator Maximumposteriorestimator CorrelationRegressionanalysisCorrelation Pearsonproduct-moment Partialcorrelation Confoundingvariable Coefficientofdetermination Regressionanalysis Errorsandresiduals Regressionvalidation Mixedeffectsmodels Simultaneousequationsmodels Multivariateadaptiveregressionsplines(MARS) Linearregression Simplelinearregression Ordinaryleastsquares Generallinearmodel Bayesianregression Non-standardpredictors Nonlinearregression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity Generalizedlinearmodel Exponentialfamilies Logistic(Bernoulli) /Binomial /Poissonregressions Partitionofvariance Analysisofvariance(ANOVA,anova) Analysisofcovariance MultivariateANOVA Degreesoffreedom Categorical /Multivariate /Time-series /SurvivalanalysisCategorical Cohen'skappa Contingencytable Graphicalmodel Log-linearmodel McNemar'stest Cochran-Mantel-Haenszelstatistics Multivariate Regression Manova Principalcomponents Canonicalcorrelation Discriminantanalysis Clusteranalysis Classification Structuralequationmodel Factoranalysis Multivariatedistributions Ellipticaldistributions Normal Time-seriesGeneral Decomposition Trend Stationarity Seasonaladjustment Exponentialsmoothing Cointegration Structuralbreak Grangercausality Specifictests Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey Timedomain Autocorrelation(ACF) partial(PACF) Cross-correlation(XCF) ARMAmodel ARIMAmodel(Box–Jenkins) Autoregressiveconditionalheteroskedasticity(ARCH) Vectorautoregression(VAR) Frequencydomain Spectraldensityestimation Fourieranalysis Least-squaresspectralanalysis Wavelet Whittlelikelihood SurvivalSurvivalfunction Kaplan–Meierestimator(productlimit) Proportionalhazardsmodels Acceleratedfailuretime(AFT)model Firsthittingtime Hazardfunction Nelson–Aalenestimator Test Log-ranktest ApplicationsBiostatistics Bioinformatics Clinicaltrials /studies Epidemiology Medicalstatistics Engineeringstatistics Chemometrics Methodsengineering Probabilisticdesign Process /qualitycontrol Reliability Systemidentification Socialstatistics Actuarialscience Census Crimestatistics Demography Econometrics Jurimetrics Nationalaccounts Officialstatistics Populationstatistics Psychometrics Spatialstatistics Cartography Environmentalstatistics Geographicinformationsystem Geostatistics Kriging Category  Mathematicsportal Commons WikiProject Authoritycontrol:Nationallibraries Germany Israel UnitedStates Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Bayesian_inference&oldid=1099312932" Categories:BayesianinferenceLogicandstatisticsStatisticalforecastingHiddencategories:AllarticleswithincompletecitationsArticleswithincompletecitationsfromApril2019CS1errors:missingperiodicalWebarchivetemplatewaybacklinksArticleswithshortdescriptionShortdescriptionisdifferentfromWikidataArticleslackingin-textcitationsfromFebruary2012Allarticleslackingin-textcitationsAllarticleswithvagueorambiguoustimeVagueorambiguoustimefromSeptember2018AllarticleslackingreliablereferencesArticleslackingreliablereferencesfromSeptember2018AllarticleswithunsourcedstatementsArticleswithunsourcedstatementsfromAugust2010ArticleswithunsourcedstatementsfromJuly2022ArticleswithGNDidentifiersArticleswithJ9UidentifiersArticleswithLCCNidentifiers Navigationmenu Personaltools NotloggedinTalkContributionsCreateaccountLogin Namespaces ArticleTalk English Views ReadEditViewhistory More Search Navigation MainpageContentsCurrenteventsRandomarticleAboutWikipediaContactusDonate Contribute HelpLearntoeditCommunityportalRecentchangesUploadfile Tools WhatlinkshereRelatedchangesUploadfileSpecialpagesPermanentlinkPageinformationCitethispageWikidataitem Print/export DownloadasPDFPrintableversion Inotherprojects WikimediaCommons Languages AfrikaansالعربيةAsturianuCatalàCymraegDanskΕλληνικάEspañolفارسیFrançaisGalego한국어BahasaIndonesiaItalianoעברית日本語NorskbokmålPolskiPortuguêsRomânăРусиньскыйРусскийСрпски/srpskiSundaSuomiSvenskaУкраїнськаTiếngViệt粵語中文 Editlinks



請為這篇文章評分?