A Gentle Introduction to Bayes Theorem for Machine Learning

2025-01-08

文章推薦指數： 80 %

投票人數：10人

Bayes Theorem is a useful tool in applied machine learning. It provides a way of thinking about the relationship between data and a model. A ... Navigation Home MainMenuGetStarted Blog Topics DeepLearning(keras) ComputerVision NeuralNetTimeSeries NLP(Text) GANs LSTMs BetterDeepLearning Calculus IntrotoAlgorithms CodeAlgorithms IntrotoTimeSeries Python(scikit-learn) EnsembleLearning ImbalancedLearning DataPreparation R(caret) Weka(nocode) LinearAlgebra Statistics Optimization Probability XGBoost PythonforMachineLearning EBooks FAQ About Contact ReturntoContent ByJasonBrownleeonOctober4,2019inProbability Tweet Tweet Share Share LastUpdatedonDecember4,2019 BayesTheoremprovidesaprincipledwayforcalculatingaconditionalprobability. Itisadeceptivelysimplecalculation,althoughitcanbeusedtoeasilycalculatetheconditionalprobabilityofeventswhereintuitionoftenfails. Althoughitisapowerfultoolinthefieldofprobability,BayesTheoremisalsowidelyusedinthefieldofmachinelearning.Includingitsuseinaprobabilityframeworkforfittingamodeltoatrainingdataset,referredtoasmaximumaposterioriorMAPforshort,andindevelopingmodelsforclassificationpredictivemodelingproblemssuchastheBayesOptimalClassifierandNaiveBayes. Inthispost,youwilldiscoverBayesTheoremforcalculatingconditionalprobabilitiesandhowitisusedinmachinelearning. Afterreadingthispost,youwillknow: WhatBayesTheoremisandhowtoworkthroughthecalculationonarealscenario. WhatthetermsintheBayestheoremcalculationmeanandtheintuitionsbehindthem. ExamplesofhowBayestheoremisusedinclassifiers,optimizationandcausalmodels. Kick-startyourprojectwithmynewbookProbabilityforMachineLearning,includingstep-by-steptutorialsandthePython sourcecodefilesforallexamples. Let’sgetstarted. UpdateOct/2019:JointhediscussionaboutthistutorialonHackerNews. UpdateOct/2019:ExpandedtoaddmoreexamplesandusesofBayesTheorem. AGentleIntroductiontoBayesTheoremforMachineLearningPhotobyMarcoVerch,somerightsreserved. Overview Thistutorialisdividedintosixparts;theyare: BayesTheoremofConditionalProbability NamingtheTermsintheTheorem WorkedExampleforCalculatingBayesTheorem DiagnosticTestScenario ManualCalculation PythonCodeCalculation BinaryClassifierTerminology BayesTheoremforModelingHypotheses BayesTheoremforClassification NaiveBayesClassifier BayesOptimalClassifier MoreUsesofBayesTheoreminMachineLearning BayesianOptimization BayesianBeliefNetworks BayesTheoremofConditionalProbability BeforewediveintoBayestheorem,let’sreviewmarginal,joint,andconditionalprobability. Recallthatmarginalprobabilityistheprobabilityofanevent,irrespectiveofotherrandomvariables.Iftherandomvariableisindependent,thenitistheprobabilityoftheeventdirectly,otherwise,ifthevariableisdependentuponothervariables,thenthemarginalprobabilityistheprobabilityoftheeventsummedoveralloutcomesforthedependentvariables,calledthesumrule. MarginalProbability:Theprobabilityofaneventirrespectiveoftheoutcomesofotherrandomvariables,e.g.P(A). Thejointprobabilityistheprobabilityoftwo(ormore)simultaneousevents,oftendescribedintermsofeventsAandBfromtwodependentrandomvariables,e.g.XandY.Thejointprobabilityisoftensummarizedasjusttheoutcomes,e.g.AandB. JointProbability:Probabilityoftwo(ormore)simultaneousevents,e.g.P(AandB)orP(A,B). Theconditionalprobabilityistheprobabilityofoneeventgiventheoccurrenceofanotherevent,oftendescribedintermsofeventsAandBfromtwodependentrandomvariablese.g.XandY. ConditionalProbability:Probabilityofone(ormore)eventgiventheoccurrenceofanotherevent,e.g.P(AgivenB)orP(A|B). Thejointprobabilitycanbecalculatedusingtheconditionalprobability;forexample: P(A,B)=P(A|B)*P(B) Thisiscalledtheproductrule.Importantly,thejointprobabilityissymmetrical,meaningthat: P(A,B)=P(B,A) Theconditionalprobabilitycanbecalculatedusingthejointprobability;forexample: P(A|B)=P(A,B)/P(B) Theconditionalprobabilityisnotsymmetrical;forexample: P(A|B)!=P(B|A) Wearenowuptospeedwithmarginal,jointandconditionalprobability.Ifyouwouldlikemorebackgroundonthesefundamentals,seethetutorial: AGentleIntroductiontoJoint,Marginal,andConditionalProbability AnAlternateWayToCalculateConditionalProbability Now,thereisanotherwaytocalculatetheconditionalprobability. Specifically,oneconditionalprobabilitycanbecalculatedusingtheotherconditionalprobability;forexample: P(A|B)=P(B|A)*P(A)/P(B) Thereverseisalsotrue;forexample: P(B|A)=P(A|B)*P(B)/P(A) Thisalternateapproachofcalculatingtheconditionalprobabilityisusefuleitherwhenthejointprobabilityischallengingtocalculate(whichismostofthetime),orwhenthereverseconditionalprobabilityisavailableoreasytocalculate. ThisalternatecalculationoftheconditionalprobabilityisreferredtoasBayesRuleorBayesTheorem,namedforReverendThomasBayes,whoiscreditedwithfirstdescribingit.ItisgrammaticallycorrecttorefertoitasBayes’Theorem(withtheapostrophe),butitiscommontoomittheapostropheforsimplicity. BayesTheorem:Principledwayofcalculatingaconditionalprobabilitywithoutthejointprobability. Itisoftenthecasethatwedonothaveaccesstothedenominatordirectly,e.g.P(B). Wecancalculateitanalternativeway;forexample: P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) ThisgivesaformulationofBayesTheoremthatwecanusethatusesthealternatecalculationofP(B),describedbelow: P(A|B)=P(B|A)*P(A)/P(B|A)*P(A)+P(B|notA)*P(notA) Orwithbracketsaroundthedenominatorforclarity: P(A|B)=P(B|A)*P(A)/(P(B|A)*P(A)+P(B|notA)*P(notA)) Note:thedenominatorissimplytheexpansionwegaveabove. Assuch,ifwehaveP(A),thenwecancalculateP(notA)asitscomplement;forexample: P(notA)=1–P(A) Additionally,ifwehaveP(notB|notA),thenwecancalculateP(B|notA)asitscomplement;forexample: P(B|notA)=1–P(notB|notA) NowthatwearefamiliarwiththecalculationofBayesTheorem,let’stakeacloserlookatthemeaningofthetermsintheequation. WanttoLearnProbabilityforMachineLearning Takemyfree7-dayemailcrashcoursenow(withsamplecode). Clicktosign-upandalsogetafreePDFEbookversionofthecourse. DownloadYourFREEMini-Course NamingtheTermsintheTheorem ThetermsintheBayesTheoremequationaregivennamesdependingonthecontextwheretheequationisused. Itcanbehelpfultothinkaboutthecalculationfromthesedifferentperspectivesandhelptomapyourproblemontotheequation. Firstly,ingeneral,theresultP(A|B)isreferredtoastheposteriorprobabilityandP(A)isreferredtoasthepriorprobability. P(A|B):Posteriorprobability. P(A):Priorprobability. SometimesP(B|A)isreferredtoasthelikelihoodandP(B)isreferredtoastheevidence. P(B|A):Likelihood. P(B):Evidence. ThisallowsBayesTheoremtoberestatedas: Posterior=Likelihood*Prior/Evidence Wecanmakethisclearwithasmokeandfirecase. Whatistheprobabilitythatthereisfiregiventhatthereissmoke? WhereP(Fire)isthePrior,P(Smoke|Fire)istheLikelihood,andP(Smoke)istheevidence: P(Fire|Smoke)=P(Smoke|Fire)*P(Fire)/P(Smoke) Youcanimaginethesamesituationwithrainandclouds. NowthatwearefamiliarwithBayesTheoremandthemeaningoftheterms,let’slookatascenariowherewecancalculateit. WorkedExampleforCalculatingBayesTheorem Bayestheoremisbestunderstoodwithareal-lifeworkedexamplewithrealnumberstodemonstratethecalculations. Firstwewilldefineascenariothenworkthroughamanualcalculation,acalculationinPython,andacalculationusingthetermsthatmaybefamiliartoyoufromthefieldofbinaryclassification. DiagnosticTestScenario ManualCalculation PythonCodeCalculation BinaryClassifierTerminology Let’sgo. DiagnosticTestScenario AnexcellentandwidelyusedexampleofthebenefitofBayesTheoremisintheanalysisofamedicaldiagnostictest. Scenario:Considerahumanpopulationthatmayormaynothavecancer(CancerisTrueorFalse)andamedicaltestthatreturnspositiveornegativefordetectingcancer(TestisPositiveorNegative),e.g.likeamammogramfordetectingbreastcancer. Problem:Ifarandomlyselectedpatienthasthetestanditcomesbackpositive,whatistheprobabilitythatthepatienthascancer? ManualCalculation Medicaldiagnostictestsarenotperfect;theyhaveerror. Sometimesapatientwillhavecancer,butthetestwillnotdetectit.Thiscapabilityofthetesttodetectcancerisreferredtoasthesensitivity,orthetruepositiverate. Inthiscase,wewillcontriveasensitivityvalueforthetest.Thetestisgood,butnotgreat,withatruepositiverateorsensitivityof85%.Thatis,ofallthepeoplewhohavecancerandaretested,85%ofthemwillgetapositiveresultfromthetest. P(Test=Positive|Cancer=True)=0.85 Giventhisinformation,ourintuitionwouldsuggestthatthereisan85%probabilitythatthepatienthascancer. Ourintuitionsofprobabilityarewrong. Thistypeoferrorininterpretingprobabilitiesissocommonthatithasitsownname;itisreferredtoasthebaseratefallacy. Ithasthisnamebecausetheerrorinestimatingtheprobabilityofaneventiscausedbyignoringthebaserate.Thatis,itignorestheprobabilityofarandomlyselectedpersonhavingcancer,regardlessoftheresultsofadiagnostictest. Inthiscase,wecanassumetheprobabilityofbreastcancerislow,anduseacontrivedbaseratevalueofonepersonin5,000,or(0.0002)0.02%. P(Cancer=True)=0.02%. WecancorrectlycalculatetheprobabilityofapatienthavingcancergivenapositivetestresultusingBayesTheorem. Let’smapourscenarioontotheequation: P(A|B)=P(B|A)*P(A)/P(B) P(Cancer=True|Test=Positive)=P(Test=Positive|Cancer=True)*P(Cancer=True)/P(Test=Positive) Weknowtheprobabilityofthetestbeingpositivegiventhatthepatienthascanceris85%,andweknowthebaserateorthepriorprobabilityofagivenpatienthavingcanceris0.02%;wecanplugthesevaluesin: P(Cancer=True|Test=Positive)=0.85*0.0002/P(Test=Positive) Wedon’tknowP(Test=Positive),it’snotgivendirectly. Instead,wecanestimateitusing: P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) P(Test=Positive)=P(Test=Positive|Cancer=True)*P(Cancer=True)+P(Test=Positive|Cancer=False)*P(Cancer=False) Firstly,wecancalculateP(Cancer=False)asthecomplementofP(Cancer=True),whichwealreadyknow P(Cancer=False)=1–P(Cancer=True) =1–0.0002 =0.9998 Let’spluginwhatwehave: Wecanpluginourknownvaluesasfollows: P(Test=Positive)=0.85*0.0002+P(Test=Positive|Cancer=False)*0.9998 Westilldonotknowtheprobabilityofapositivetestresultgivennocancer. Thisrequiresadditionalinformation. Specifically,weneedtoknowhowgoodthetestisatcorrectlyidentifyingpeoplethatdonothavecancer.Thatis,testingnegativeresult(Test=Negative)whenthepatientdoesnothavecancer(Cancer=False),calledthetruenegativerateorthespecificity. Wewilluseacontrivedspecificityvalueof95%. P(Test=Negative|Cancer=False)=0.95 Withthisfinalpieceofinformation,wecancalculatethefalsepositiveorfalsealarmrateasthecomplementofthetruenegativerate. P(Test=Positive|Cancer=False)=1–P(Test=Negative|Cancer=False) =1–0.95 =0.05 WecanplugthisfalsealarmrateintoourcalculationofP(Test=Positive)asfollows: P(Test=Positive)=0.85*0.0002+0.05*0.9998 P(Test=Positive)=0.00017+0.04999 P(Test=Positive)=0.05016 Excellent,sotheprobabilityofthetestreturningapositiveresult,regardlessofwhetherthepersonhascancerornotisabout5%. WenowhaveenoughinformationtocalculateBayesTheoremandestimatetheprobabilityofarandomlyselectedpersonhavingcanceriftheygetapositivetestresult. P(Cancer=True|Test=Positive)=P(Test=Positive|Cancer=True)*P(Cancer=True)/P(Test=Positive) P(Cancer=True|Test=Positive)=0.85*0.0002/0.05016 P(Cancer=True|Test=Positive)=0.00017/0.05016 P(Cancer=True|Test=Positive)=0.003389154704944 Thecalculationsuggeststhatifthepatientisinformedtheyhavecancerwiththistest,thenthereisonly0.33%chancethattheyhavecancer. Itisaterriblediagnostictest! Theexamplealsoshowsthatthecalculationoftheconditionalprobabilityrequiresenoughinformation. Forexample,ifwehavethevaluesusedinBayesTheoremalready,wecanusethemdirectly. Thisisrarelythecase,andwetypicallyhavetocalculatethebitsweneedandplugthemin,aswedidinthiscase.Inourscenarioweweregiven3piecesofinformation,the thebaserate,the sensitivity(ortruepositiverate),andthespecificity(ortruenegativerate). Sensitivity:85%ofpeoplewithcancerwillgetapositivetestresult. BaseRate:0.02%ofpeoplehavecancer. Specificity:95%ofpeoplewithoutcancerwillgetanegativetestresult. WedidnothavetheP(Test=Positive),butwecalculateditgivenwhatwealreadyhadavailable. WemightimaginethatBayesTheoremallowsustobeevenmorepreciseaboutagivenscenario.Forexample,ifwehadmoreinformationaboutthepatient(e.g.theirage)andaboutthedomain(e.g.cancerratesforageranges),andinturnwecouldofferanevenmoreaccurateprobabilityestimate. Thatwasalotofwork. Let’slookathowwecancalculatethisexactscenariousingafewlinesofPythoncode. PythonCodeCalculation Tomakethisexampleconcrete,wecanperformthecalculationinPython. TheexamplebelowperformsthesamecalculationinvanillaPython(nolibraries),allowingyoutoplaywiththeparametersandtestdifferentscenarios. #calculatetheprobabilityofcancerpatientanddiagnostictest #calculateP(A|B)givenP(A),P(B|A),P(B|notA) defbayes_theorem(p_a,p_b_given_a,p_b_given_not_a): #calculateP(notA) not_a=1-p_a #calculateP(B) p_b=p_b_given_a*p_a+p_b_given_not_a*not_a #calculateP(A|B) p_a_given_b=(p_b_given_a*p_a)/p_b returnp_a_given_b #P(A) p_a=0.0002 #P(B|A) p_b_given_a=0.85 #P(B|notA) p_b_given_not_a=0.05 #calculateP(A|B) result=bayes_theorem(p_a,p_b_given_a,p_b_given_not_a) #summarize print('P(A|B)=%.3f%%'%(result*100)) 12345678910111213141516171819202122 #calculatetheprobabilityofcancerpatientanddiagnostictest #calculateP(A|B)givenP(A),P(B|A),P(B|notA)defbayes_theorem(p_a,p_b_given_a,p_b_given_not_a): #calculateP(notA) not_a=1-p_a #calculateP(B) p_b=p_b_given_a*p_a+p_b_given_not_a*not_a #calculateP(A|B) p_a_given_b=(p_b_given_a*p_a)/p_b returnp_a_given_b #P(A)p_a=0.0002#P(B|A)p_b_given_a=0.85#P(B|notA)p_b_given_not_a=0.05#calculateP(A|B)result=bayes_theorem(p_a,p_b_given_a,p_b_given_not_a)#summarizeprint('P(A|B)=%.3f%%'%(result*100)) Runningtheexamplecalculatestheprobabilitythatapatienthascancergiventhetestreturnsapositiveresult,matchingourmanualcalculation. P(A|B)=0.339% 1 P(A|B)=0.339% Thisisahelpfullittlescriptthatyoumaywanttoadapttonewscenarios. Now,itiscommontodescribethecalculationofBayesTheoremforascenariousingthetermsfrombinaryclassification.Itprovidesaveryintuitivewayforthinkingaboutaproblem.Inthenextsectionwewillreviewthesetermsandseehowtheymapontotheprobabilitiesinthetheoremandhowtheyrelatetoourscenario. BinaryClassifierTerminology Itmaybehelpfultothinkaboutthecancertestexampleintermsofthecommontermsfrombinary(two-class)classification,i.e.wherenotionsofspecificityandsensitivitycomefrom. Personally,Ifindthesetermshelpeverythingtomakesense. Firstly,let’sdefineaconfusionmatrix: |PositiveClass|NegativeClass PositivePrediction|TruePositive(TP)|FalsePositive(FP) NegativePrediction|FalseNegative(FN)|TrueNegative(TN) 123 |PositiveClass |NegativeClassPositivePrediction|TruePositive(TP) |FalsePositive(FP)NegativePrediction|FalseNegative(FN)|TrueNegative(TN) Wecanthendefinesomeratesfromtheconfusionmatrix: TruePositiveRate(TPR)=TP/(TP+FN) FalsePositiveRate(FPR)=FP/(FP+TN) TrueNegativeRate(TNR)=TN/(TN+FP) FalseNegativeRate(FNR)=FN/(FN+TP) Thesetermsarecalledrates,buttheycanalsobeinterpretedasprobabilities. Also,itmighthelptonotice: TPR+FNR=1.0,or: FNR=1.0–TPR TPR=1.0–FNR TNR+FPR=1.0,or: TNR=1.0–FPR FPR=1.0–TNR Recallthatinaprevioussectionthatwecalculatedthefalsepositiverategiventhecomplementoftruenegativerate,orFPR=1.0–TNR. Someoftheserateshavespecialnames,forexample: Sensitivity=TPR Specificity=TNR WecanmaptheseratesontofamiliartermsfromBayesTheorem: P(B|A):TruePositiveRate(TPR). P(notB|notA):TrueNegativeRate(TNR). P(B|notA):FalsePositiveRate(FPR). P(notB|A):FalseNegativeRate(FNR). Wecanalsomapthebaseratesforthecondition(class)andthetreatment(prediction)onfamiliartermsfromBayesTheorem: P(A):ProbabilityofaPositiveClass(PC). P(notA):ProbabilityofaNegativeClass(NC). P(B):ProbabilityofaPositivePrediction(PP). P(notB):ProbabilityofaNegativePrediction(NP). Now,let’sconsiderBayesTheoremusingtheseterms: P(A|B)=P(B|A)*P(A)/P(B) P(A|B)=(TPR*PC)/PP WhereweoftencannotcalculateP(B),soweuseanalternative: P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) P(B)=TPR*PC+FPR*NC Now,let’slookatourscenarioofcancerandacancerdetectiontest. Theclassorconditionwouldbe“Cancer”andthetreatmentorpredictionwouldthe“Test“. First,let’sreviewalloftherates: TruePositiveRate(TPR):85% FalsePositiveRate(FPR):5% TrueNegativeRate(TNR):95% FalseNegativeRate(FNR):15% Let’salsoreviewwhatweknowaboutbaserates: PositiveClass(PC):0.02% NegativeClass(NC):99.98% PositivePrediction(PP):5.016% NegativePrediction(NP):94.984% Pluggingthingsin,wecancalculatetheprobabilityofapositivetestresult(apositiveprediction)astheprobabilityofapositivetestresultgivencancer(thetruepositiverate)multipliedbythebaserateforhavingcancer(thepositiveclass),plustheprobabilityifapositivetestresultgivennocancer(thefalsepositiverate)plustheprobabilityofnothavingcancer(thenegativeclass). Thecalculationwiththesetermsisasfollows: P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) P(B)=TPR*PC+FPR*NC P(B)=85%*0.02%+5%*99.98% P(B)=5.016% WecanthencalculateBayesTheoremforthescenario,namelytheprobabilityofcancergivenapositivetestresult(theposterior)istheprobabilityofapositivetestresultgivencancer(thetruepositiverate)multipliedbytheprobabilityofhavingcancer(thepositiveclassrate),dividedbytheprobabilityofapositivetestresult(apositiveprediction). Thecalculationwiththesetermsisasfollows: P(A|B)=P(B|A)*P(A)/P(B) P(A|B)=TPR*PC/PP P(A|B)=85%*0.02%/5.016% P(A|B)=0.339% Itturnsoutthatinthiscase,theposteriorprobabilitythatwearecalculatingwiththeBayestheoremisequivalenttotheprecision,alsocalledthePositivePredictiveValue(PPV)oftheconfusionmatrix: PPV=TP/(TP+FP) Or,statedinourclassifierterms: P(A|B)=PPV PPV=TPR*PC/PP Sowhydowegotoallofthetroubleofcalculatingtheposteriorprobability? Becausewedon’thavetheconfusionmatrixforapopulationofpeoplebothwithandwithoutcancerthathavebeentestedandhavebeennottested.Instead,allwehaveissomepriorsandprobabilitiesaboutourpopulationandourtest. Thishighlightswhenwemightchoosetousethecalculationinpractice. Specifically,whenwehavebeliefsabouttheeventsinvolved,butwecannotperformthecalculationbycountingexamplesintherealworld. BayesTheoremforModelingHypotheses BayesTheoremisausefultoolinappliedmachinelearning. Itprovidesawayofthinkingabouttherelationshipbetweendataandamodel. Amachinelearningalgorithmormodelisaspecificwayofthinkingaboutthestructuredrelationshipsinthedata.Inthisway,amodelcanbethoughtofasahypothesisabouttherelationshipsinthedata,suchastherelationshipbetweeninput(X)andoutput(y).Thepracticeofappliedmachinelearningisthetestingandanalysisofdifferenthypotheses(models)onagivendataset. Ifthisideaofthinkingofamodelasahypothesisisnewtoyou,seethistutorialonthetopic: WhatisaHypothesisinMachineLearning? BayesTheoremprovidesaprobabilisticmodeltodescribetherelationshipbetweendata(D)andahypothesis(h);forexample: P(h|D)=P(D|h)*P(h)/P(D) Breakingthisdown,itsaysthattheprobabilityofagivenhypothesisholdingorbeingtruegivensomeobserveddatacanbecalculatedastheprobabilityofobservingthedatagiventhehypothesismultipliedbytheprobabilityofthehypothesisbeingtrueregardlessofthedata,dividedbytheprobabilityofobservingthedataregardlessofthehypothesis. Bayestheoremprovidesawaytocalculatetheprobabilityofahypothesisbasedonitspriorprobability,theprobabilitiesofobservingvariousdatagiventhehypothesis,andtheobserveddataitself. —Page156,MachineLearning,1997. Underthisframework,eachpieceofthecalculationhasaspecificname;forexample: P(h|D):Posteriorprobabilityofthehypothesis(thethingwewanttocalculate). P(h):Priorprobabilityofthehypothesis. Thisgivesausefulframeworkforthinkingaboutandmodelingamachinelearningproblem. Ifwehavesomepriordomainknowledgeaboutthehypothesis,thisiscapturedinthepriorprobability.Ifwedon’t,thenallhypothesesmayhavethesamepriorprobability. IftheprobabilityofobservingthedataP(D)increases,thentheprobabilityofthehypothesisholdinggiventhedataP(h|D)decreases.Conversely,iftheprobabilityofthehypothesisP(h)andtheprobabilityofobservingthedatagivenhypothesisincreases,theprobabilityofthehypothesisholdinggiventhedataP(h|D)increases. Thenotionoftestingdifferentmodelsonadatasetinappliedmachinelearningcanbethoughtofasestimatingtheprobabilityofeachhypothesis(h1,h2,h3,…inH)beingtruegiventheobserveddata. TheoptimizationorseekingthehypothesiswiththemaximumposteriorprobabilityinmodelingiscalledmaximumaposterioriorMAPforshort. Anysuchmaximallyprobablehypothesisiscalledamaximumaposteriori(MAP)hypothesis.WecandeterminetheMAPhypothesesbyusingBayestheoremtocalculatetheposteriorprobabilityofeachcandidatehypothesis. —Page157,MachineLearning,1997. Underthisframework,theprobabilityofthedata(D)isconstantasitisusedintheassessmentofeachhypothesis.Therefore,itcanberemovedfromthecalculationtogivethesimplifiedunnormalizedestimateasfollows: maxhinHP(h|D)=P(D|h)*P(h) Ifwedonothaveanypriorinformationaboutthehypothesisbeingtested,theycanbeassignedauniformprobability,andthistermtoowillbeaconstantandcanberemovedfromthecalculationtogivethefollowing: maxhinHP(h|D)=P(D|h) Thatis,thegoalistolocateahypothesisthatbestexplainstheobserveddata. Fittingmodelslikelinearregressionforpredictinganumericalvalue,andlogisticregressionforbinaryclassificationcanbeframedandsolvedundertheMAPprobabilisticframework.Thisprovidesanalternativetothemorecommonmaximumlikelihoodestimation(MLE)framework. BayesTheoremforClassification Classificationisapredictivemodelingproblemthatinvolvesassigningalabeltoagiveninputdatasample. Theproblemofclassificationpredictivemodelingcanbeframedascalculatingtheconditionalprobabilityofaclasslabelgivenadatasample,forexample: P(class|data)=(P(data|class)*P(class))/P(data) WhereP(class|data)istheprobabilityofclassgiventheprovideddata. Thiscalculationcanbeperformedforeachclassintheproblemandtheclassthatisassignedthelargestprobabilitycanbeselectedandassignedtotheinputdata. Inpractice,itisverychallengingtocalculatefullBayesTheoremforclassification. Thepriorsfortheclassandthedataareeasytoestimatefromatrainingdataset,ifthedatasetissuitabilityrepresentativeofthebroaderproblem. TheconditionalprobabilityoftheobservationbasedontheclassP(data|class)isnotfeasibleunlessthenumberofexamplesisextraordinarilylarge,e.g.largeenoughtoeffectivelyestimatetheprobabilitydistributionforalldifferentpossiblecombinationsofvalues.Thisisalmostneverthecase,wewillnothavesufficientcoverageofthedomain. Assuch,thedirectapplicationofBayesTheoremalsobecomesintractable,especiallyasthenumberofvariablesorfeatures(n)increases. NaiveBayesClassifier ThesolutiontousingBayesTheoremforaconditionalprobabilityclassificationmodelistosimplifythecalculation. TheBayesTheoremassumesthateachinputvariableisdependentuponallothervariables.Thisisacauseofcomplexityinthecalculation.Wecanremovethisassumptionandconsidereachinputvariableasbeingindependentfromeachother. Thischangesthemodelfromadependentconditionalprobabilitymodeltoanindependentconditionalprobabilitymodelanddramaticallysimplifiesthecalculation. ThismeansthatwecalculateP(data|class)foreachinputvariableseparatelyandmultipletheresultstogether,forexample: P(class|X1,X2,…,Xn)=P(X1|class)*P(X2|class)*…*P(Xn|class)*P(class)/P(data) Wecanalsodroptheprobabilityofobservingthedataasitisaconstantforallcalculations,forexample: P(class|X1,X2,…,Xn)=P(X1|class)*P(X2|class)*…*P(Xn|class)*P(class) ThissimplificationofBayesTheoremiscommonandwidelyusedforclassificationpredictivemodelingproblemsandisgenerallyreferredtoasNaiveBayes. Theword“naive”isFrenchandtypicallyhasadiaeresis(umlaut)overthe“i”,whichiscommonlyleftoutforsimplicity,and“Bayes”iscapitalizedasitisnamedforReverendThomasBayes. FortutorialsonhowtoimplementNaiveBayesfromscratchinPythonsee: HowtoDevelopaNaiveBayesClassifierfromScratchinPython NaiveBayesClassifierFromScratchinPython BayesOptimalClassifier TheBayesoptimalclassifierisaprobabilisticmodelthatmakesthemostlikleypredictionforanewexample,giventhetrainingdataset. ThismodelisalsoreferredtoastheBayesoptimallearner,theBayesclassifier,Bayesoptimaldecisionboundary,ortheBayesoptimaldiscriminantfunction. BayesClassifier:Probabilisticmodelthatmakesthemostprobablepredictionfornewexamples. Specifically,theBayesoptimalclassifieranswersthequestion: Whatisthemostprobableclassificationofthenewinstancegiventhetrainingdata? ThisisdifferentfromtheMAPframeworkthatseeksthemostprobablehypothesis(model).Instead,weareinterestedinmakingaspecificprediction. Theequationbelowdemonstrateshowtocalculatetheconditionalprobabilityforanewinstance(vi)giventhetrainingdata(D),givenaspaceofhypotheses(H). P(vj|D)=sum{hinH}P(vj|hi)*P(hi|D) Where vj isanewinstancetobeclassified, H isthesetofhypothesesforclassifyingtheinstance, hi isagivenhypothesis, P(vj|hi) istheposteriorprobabilityfor vi givenhypothesis hi,and P(hi|D) istheposteriorprobabilityofthehypothesis hi giventhedata D. SelectingtheoutcomewiththemaximumprobabilityisanexampleofaBayesoptimalclassification. AnymodelthatclassifiesexamplesusingthisequationisaBayesoptimalclassifierandnoothermodelcanoutperformthistechnique,onaverage. Wehavetoletthatsinkin.Itisabigdeal. BecausetheBayesclassifierisoptimal,theBayeserroristheminimumpossibleerrorthatcanbemade. BayesError:Theminimumpossibleerrorthatcanbemadewhenmakingpredictions. Itisatheoreticalmodel,butitisheldupasanidealthatwemaywishtopursue. TheNaiveBayesclassifierisanexampleofaclassifierthataddssomesimplifyingassumptionsandattemptstoapproximatetheBayesOptimalClassifier. FormoreontheBayesianoptimalclassifier,seethetutorial: AGentleIntroductiontotheBayesOptimalClassifier MoreUsesofBayesTheoreminMachineLearning DevelopingclassifiermodelsmaybethemostcommonapplicationonBayesTheoreminmachinelearning. Nevertheless,therearemanyotherapplications.Twoimportantexamplesareoptimizationandcausalmodels. BayesianOptimization Globaloptimizationisachallengingproblemoffindinganinputthatresultsintheminimumormaximumcostofagivenobjectivefunction. Typically,theformoftheobjectivefunctioniscomplexandintractabletoanalyzeandisoftennon-convex,nonlinear,highdimension,noisy,andcomputationallyexpensivetoevaluate. BayesianOptimizationprovidesaprincipledtechniquebasedonBayesTheoremtodirectasearchofaglobaloptimizationproblemthatisefficientandeffective.Itworksbybuildingaprobabilisticmodeloftheobjectivefunction,calledthesurrogatefunction,thatisthensearchedefficientlywithanacquisitionfunctionbeforecandidatesamplesarechosenforevaluationontherealobjectivefunction. BayesianOptimizationisoftenusedinappliedmachinelearningtotunethehyperparametersofagivenwell-performingmodelonavalidationdataset. FormoreonBayesianOptimizationincludinghowtoimplementitfromscratch,seethetutorial: HowtoImplementBayesianOptimizationfromScratchinPython BayesianBeliefNetworks Probabilisticmodelscandefinerelationshipsbetweenvariablesandbeusedtocalculateprobabilities. Fullyconditionalmodelsmayrequireanenormousamountofdatatocoverallpossiblecases,andprobabilitiesmaybeintractabletocalculateinpractice.Simplifyingassumptionssuchastheconditionalindependenceofallrandomvariablescanbeeffective,suchasinthecaseofNaiveBayes,althoughitisadrasticallysimplifyingstep. Analternativeistodevelopamodelthatpreservesknownconditionaldependencebetweenrandomvariablesandconditionalindependenceinallothercases.Bayesiannetworksareaprobabilisticgraphicalmodelthatexplicitlycapturetheknownconditionaldependencewithdirectededgesinagraphmodel.Allmissingconnectionsdefinetheconditionalindependenciesinthemodel. AssuchBayesianNetworksprovideausefultooltovisualizetheprobabilisticmodelforadomain,reviewalloftherelationshipsbetweentherandomvariables,andreasonaboutcausalprobabilitiesforscenariosgivenavailableevidence. ThenetworksarenotexactlyBayesianbydefinition,althoughgiventhatboththeprobabilitydistributionsfortherandomvariables(nodes)andtherelationshipsbetweentherandomvariables(edges)arespecifiedsubjectively,themodelcanbethoughttocapturethe“belief”aboutacomplexdomain. FormoreonBayesianBeliefNetworks,seethetutorial: AGentleIntroductiontoBayesianBeliefNetworks FurtherReading Thissectionprovidesmoreresourcesonthetopicifyouarelookingtogodeeper. RelatedTutorials AGentleIntroductiontoJoint,Marginal,andConditionalProbability WhatisaHypothesisinMachineLearning? HowtoDevelopaNaiveBayesClassifierfromScratchinPython NaiveBayesClassifierFromScratchinPython HowtoImplementBayesianOptimizationfromScratchinPython AGentleIntroductiontoBayesianBeliefNetworks Books PatternRecognitionandMachineLearning,2006. MachineLearning,1997. PatternClassification,2ndEdition,2001. MachineLearning:AProbabilisticPerspective,2012. Articles Conditionalprobability,Wikipedia. Bayes’theorem,Wikipedia. Maximumaposterioriestimation,Wikipedia. Falsepositivesandfalsenegatives,Wikipedia. Baseratefallacy,Wikipedia. Sensitivityandspecificity,Wikipedia. TakingtheConfusionoutoftheConfusionMatrix,2016. Summary Inthispost,youdiscoveredBayesTheoremforcalculatingconditionalprobabilitiesandhowitisusedinmachinelearning. Specifically,youlearned: WhatBayesTheoremisandhowtoworkthroughthecalculationonarealscenario. WhatthetermsintheBayestheoremcalculationmeanandtheintuitionsbehindthem. ExamplesofhowBayestheoremisusedinclassifiers,optimizationandcausalmodels. Doyouhaveanyquestions? AskyourquestionsinthecommentsbelowandIwilldomybesttoanswer. GetaHandleonProbabilityforMachineLearning! DevelopYourUnderstandingofProbability ...withjustafewlinesofpythoncode DiscoverhowinmynewEbook: ProbabilityforMachineLearning Itprovidesself-studytutorialsandend-to-endprojectson: BayesTheorem,BayesianOptimization,Distributions,MaximumLikelihood,Cross-Entropy,CalibratingModels andmuchmore... FinallyHarnessUncertaintyinYourProjects SkiptheAcademics.JustResults. SeeWhat'sInside Tweet Tweet Share Share MoreOnThisTopicAGentleIntroductiontotheBayesOptimalClassifierHowtoDevelopaNaiveBayesClassifierfromScratch…NaiveBayesClassifierFromScratchinPythonDevelopanIntuitionforBayesTheoremWithWorkedExamplesNaiveBayesforMachineLearningBetterNaiveBayes:12TipsToGetTheMostFromThe… AboutJasonBrownlee JasonBrownlee,PhDisamachinelearningspecialistwhoteachesdevelopershowtogetresultswithmodernmachinelearningmethodsviahands-ontutorials. ViewallpostsbyJasonBrownlee→ ProbabilityforMachineLearning(7-DayMini-Course) HowtoDevelopaNaiveBayesClassifierfromScratchinPython 44ResponsestoAGentleIntroductiontoBayesTheoremforMachineLearning Ante October4,2019at11:15am # Greatpost,butcanweperhapsaddtheparentheses/bracketsaroundthedenominatortoavoidconfusion: P(A|B)=P(B|A)*P(A)/[P(B|A)*P(A)+P(B|notA)*P(notA)] P(Cancer=True|Test=Positive)=0.85*0.0002/[0.85*0.0002+P(Test=Positive|Cancer=False)*0.9998] Reply JasonBrownlee October6,2019at8:06am # Greatsuggestion,thanks. Reply BryanWeast October5,2019at7:30am # Soareweallgonnadieorwhat Reply JasonBrownlee October6,2019at8:13am # Eventually.Butnotfrombayestheorem. Reply Ahmed October5,2019at4:25pm # Veryinformative! Thankyouforclearingthemisconceptioninthelastpart!! Reply JasonBrownlee October6,2019at8:15am # I’mhappythetutorialwashelpful. Reply JeremyTt October12,2019at5:12am # “Giventhisinformation,ourintuitionwouldsuggestthatthereisan85%probabilitythatthepatienthascancer. Andagain,ourintuitionsofprobabilityarewrong.” Youlostmehere.I’llblamemyignorance.Pleaseexplainhowiftheprobabilityofthetestreturningpositiveforarealpositiveforagivenoutcomemakesusincorrecttosaythatthereisan85%chancethat“the”patienthascancer.Shouldthestatementsaythatitwouldsuggest…“a”patienthascancerwheresomeoneextrapolatesthebaseacrossthepopulation,notthespecificoutcome? Reply JasonBrownlee October12,2019at7:08am # Yes,thelanguagecouldbetighter,thanks. Perhapscontinueworkingthroughtheexampletoseewhatisgoingon? Reply JeremyTt October12,2019at7:58am # IbelieveIdo,butIreallyjustwanttobesure. Whatwearetryingtosayisinsteadof“Thecorrectcalculationsuggeststhatifthepatientisinformedtheyhavecancerwiththistest,thenthereisonly0.33%chancethattheyhavecancer.Itisaterriblediagnostictest!”,wearesayingthatadiagnostictestthathasan85%chanceofcorrectlyidentifyingcancerinapatientaswellasa95%rateofcorrectlyidentifyingnocancerwouldhavearateof0.33%offalsepositivesforthepresenceofcancerinasamplesetof0.02%ofthesetactuallyhavingcanceranda0.51%(?)offalsenegativesinthesamepopulation.Therefore3outof1000peoplewhotookthetestwouldbeincorrectlytoldtheyhavecancer.(Hopefullytakeittwomoretimes…tovirtuallyeliminatetheprobalisticerror.). Isthatcorrect? Reply JeremyTt October12,2019at8:13am # Sorry,Iknowitisacontrivedsample,butarticleslikethesearetrulyhelpfultometounderstandwhatIhavenotstudied.Mylastquestionwouldbe,whywouldthehbethe85%asIassumethatwasmeasuredduringtheclinicaltrialperiodandthe0.2%istheunknownthatishypothesizedtobetheincidentrateofexistingcancercases? Thankyou! Reply JasonBrownlee October12,2019at8:16am # Notquite. Recallthebeginningofthesection,therootquestionis:ifsomeonetakesthetestanditreportstheyhavecancer,whatistheprobabilitytheyactuallyhavecancer? Notethat95%referstothetestdetectingcancerIFthepersonhascancer.And85%ofnocancerIFthepersonhasnocancer. Theconditionsoneachclaimarecriticaltounravelingthescenario. Thefinalresultanswerthequestion.Ifapersonwhomayormaynothavecancertakesthetestandistoldtheyhavecancer,whatistheprobabilitytheyhavecancer,andtheansweris0.33%.Asin,“extremelyunlikely”. Reply JeremyTt October12,2019at8:34am # Right.Usingthetheoryisusefultoanswerthequestion,butasyousaid,theclaimsareimoortant.Iunderstoodthatthe85%peoplewhohavecancerandaretestedpositiveandthe95%tobewithoutcancerandtestednegative.Sothetotalfalseoutcomeswouldbe0.8ish%.Where0.33%isthefalsepositives.Theotheristhefalsenegativeswhichgivenasmorecorrect,butoveralargerpercentageofthesamepopulationgivesahighertotalpercentage. “Thetestisgood,butnotgreat,withatruepositiverateorsensitivityof85%.Thatis,ofallthepeoplewhohavecancerandaretested,85%ofthemwillgetapositiveresultfromthetest. P(Test=Positive|Cancer=True)=0.85” … “P(Test=Negative|Cancer=False)=0.95” “WecanplugthisfalsealarmrateintoourBayesTheoremasfollows: P(Cancer=True|Test=Positive)=0.85*0.0002/0.85*0.0002+0.05*0.9998 =0.00017/0.00017+0.04999 =0.00017/0.05016 =0.003389154704944” JasonBrownlee October13,2019at9:53am # IthinkIgetwhereyou’recomingfrom. Firstly,let’sdefineaconfusionmatrix: |PositiveClass|NegativeClass PositivePrediction|TruePositive(TP)|FalsePositive(FP) NegativePrediction|FalseNegative(FN)|TrueNegative(TN) 123 |PositiveClass |NegativeClassPositivePrediction|TruePositive(TP) |FalsePositive(FP)NegativePrediction|FalseNegative(FN)|TrueNegative(TN) Wecanthendefinesomerates: –TruePositiveRate(TPR)=TP/(TP+FN) –FalsePositiveRate(FPR)=FP/(FP+TN) –TrueNegativeRate(TNR)=TN/(TN+FP) –FalseNegativeRate(FNR)=FN/(FN+TP) Also: –Sensitivity=TPR –Specificity=TNR WecanthenmaptheseratesontoBayesTheorem: –P(B|A):TruePositiveRate(TPR). –P(notB|notA):TrueNegativeRate(TNR). –P(B|notA):TrueNegativeRate(TNR). –P(notB|A):FalsePositiveRate(FPR). Andthebaserates: –P(A):PositiveClass(PC) –P(notA):NegativeClass(NC) –P(B):PositivePrediction(PP) –P(notB):NegativePrediction(NP) WhereBayesTheoremisdefinedas: P(A|B)=P(B|A)*P(A)/P(B) P(A|B)=(TPR*PC)/PP Whereweliketouse: P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) P(B)=TPR*PC+FPR*NC Now,let’slookatourscenario. The“Class”wouldbe“Cancer”andthe“Prediction”wouldthe“Test”.Weknowtherates: –TPR:85% –FPR:5% –TNR:95% –FNR:15% Let’sreviewwhatweknowaboutbaserates: –PC:0.02% –NC:99.98% –PP:5.016% –NP:94.984% Pluggingthingsin,weget: P(A|B)=P(B|A)*P(A)/P(B) P(A|B)=(TPR*PC)/PP P(A|B)=(85%*0.02%)/5.016% Or,whenwecalculatedP(B): P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) P(B)=TPR*PC+FPR*NC P(B)=85%*0.02%+5%*99.98% Doesthathelp? Imightupdatetheposttoaddthisdiscussion. JG March30,2020at10:05pm # HiJason, IhavereadthisBayesMLtutorialand,inmycaseitissummarizedprettywellalltheconceptsandmathnotationaroundBayesprobabilityapproachvs.frequency. OnemoretimeIwanttosayyouaregreatpersonwithalotofgenerositybyteachingtoallofusthesemarvelousMLtechnologicsprocedures…Thankyou. AsamatteroffactsIamworkingrightnowonCovid19Imagediagnosisbyx-rayschestand,theseideasarethekeysforbinaryclassification. regards, JG Reply JasonBrownlee March31,2020at8:08am # Thanks,I’mhappyithelped. Goodluckwithyourproject! Reply DavidAckerman June28,2020at5:03am # Hi, Greatjobinexplainingtheformulaandusingitinanexample. Ithinkthereisoneerror.IncomputingP(Test=Positive|Cancer=False),aftertheequalsignyouuse:1-P(Test=Negative)|Cancer=False).Idonotthinktherightsideequalswhatisontheleftsideoftheequalsign.Thecomplementneededtomatchtheleftsidewouldbe:1-P(Test=Positive|Cancer=False)—andthatinformationisnotgiveninthefacts. David David Reply JasonBrownlee June28,2020at5:59am # P(Test=Positive|Cancer=False)=1–P(Test=Negative|Cancer=False) Wecanrestatethesamethingmoregenerally,takenfromthedefinitionoftermsabove: P(B|notA)=1–P(notB|notA) Reply DavidAckerman June28,2020at5:35am # IjustrealizedIwaswronginwhatIsaidabove.🙁 IthinkIshouldhavesaid: a.theexpressionontherightdoesnotprovidetheinformationneededontherightside. b.thefactsgivendonotallowonetocomputetheleftside. David Reply DavidAckerman June28,2020at5:39am # Oops.Imadeanothererror.(Notmyday.) In(a)inthepriorcomment,thelasttwowordsshouldhavebeen:*left*side. David Reply JasonBrownlee June28,2020at6:01am # NotsureIagree.Perhapsyoucanelaboratehowexactly? Reply DavidAckerman June28,2020at10:17pm # Jason, Ithinkthisistheproblemwithyourhypothetical: 1.Youcorrectlycomputethenumerator(as.0002*.85). 2.Youthensay,“Westilldonotknowtheprobabilityofapositivetestresultgivennocancer.”Wedonotknowthatquiteyet.InordertohavethatlastpieceofinformationweneedtoaddtothedenominatorP(B|notA*P(1-A).Inyourexamplethisis(Test=Positive|Cancer=False*Cancer=False).Wealreadyhavethatinthefacts:itis.15*.9998.(‘.15’isthecomplementofTest=Positive|Cancer=Trueandyou’vealreadycomputedCancer=Falseas.9998.) Ifyou’dsendmeyouremailaddress,I’dliketoofferaprivatecommentwhichisprobablynotofgeneralinteresttoothers. David Reply JasonBrownlee June29,2020at6:35am # Youcancontactmeanytimehere: https://machinelearningmastery.com/contact/ Reply SivakumarB February26,2021at11:19pm # ExcellentExplanationwithexamples.Forpeopleinmygenerationwhohaven’tlearntMLinthecollegesandwanttounderstandtheconceptthisisthebesttutorial.Thanksforputtingthisinanicemannerandalsomakeitopenforpeople.Greatwork.GodBlessyou. Reply JasonBrownlee February27,2021at6:04am # Thanks! Reply Amir March3,2021at2:38pm # Thanksforyourusefulpostasalways.YoumentionedthatBayesinferencecanbeintractablethatIthinkitcanberelatedtomarginalizationthatcannotbedoneanalyticallyinmostcases. IthinkthiswastheemergingpointofapproximationmethodslikeMCMCanddropoutindeeplearning. IamstillnotconvincedaboutintractabilityofanalyticalintegrationinBayesinference.Ifweconsiderastandarddistributionoveruncertainparameters,IthinkitispossibletohaveanalyticalBayesinferenceasIcanreferto section2.3MAXIMUMAPOSTERIORIESTIMATIONOFTHEPARAMETERVECTORinreference SimonO.Haykin–NeuralNetworksandLearningMachines-PrenticeHall(2008) Anycommentsonthis? Thanks! Reply JasonBrownlee March4,2021at5:46am # Notfamiliarwiththatpapersorry. Ithinkwe’retalkingaboutdifferentthings.Generallywhenitcomestobayes,wecannotcalculatethejointprobabilities–wedon’thaveallthedata,soweapproximate. Reply K.SLam April17,2021at12:23am # Theprobabilitythatapatienthascancergiventhetestreturnsapositiveresultis33.9%.Thisprobabilityiscalledpositivepredictivevalue(PPV).Thefalsepositiveprobabilityis66.1%. Whereastheprobabilitythatapatienthasnocancergiventhetestreturnsanegativeresultis100%.Thisprobabilityiscallednegativepredictivevalue(NPV).Thefalsenegativeprobabilityis0%. Reply Ma May17,2021at12:17am # Hi,inbayesiannetworkhowcancalcutekullbackleiblerdivergencewithRsoftware? Reply JasonBrownlee May17,2021at5:39am # Idon’tknow,sorry. Reply alireza July13,2021at3:22am # Hiandthankyouforyourgratepost IfyouhaveanythingaboutBayesianLatentTransitionAnalysispleaseletmeknow. [email protected] Reply JasonBrownlee July13,2021at5:19am # Notatthisstage. Reply Dmitry August11,2021at7:15am # HiJason, You’resaying: Wecancalculateitanalternativeway;forexample: P(B)=P(B|A)*P(A)+P(B|notA)*P(notA) Whatwouldbetheformula,ifwewanttoestimateP(A)inalternativeway?Itdoesn’tseemstraightforward,notjustaswapofAandB. Reply JasonBrownlee August11,2021at7:46am # IfyouhaveP(B)andnotP(A),perhapsjustreverseyourterms. Reply Jan October10,2021at3:32am # Youknowwhatmakesthingsconfusingandyouknowhowtoexplainthemwell.IhavebeenreadingdifferentsourcesregardingBayesbutIamverymuchcomfortablewithhowyoudrillthingsdowningreatdetail! Theposterior,prior,likelihoodandevidenceandnottomentiontheconfusionmatrix.Thankyou! Reply AdrianTam October13,2021at5:58am # Thankyou.Hopeyoulikeotherpostshereaswell. Reply Saeid December1,2021at1:41am # DearJason. OooohmyGod. Thisisafuckingnicepost.Thankyou. “Fittingmodelslikelinearregressionforpredictinganumericalvalue,andlogisticregressionforbinaryclassificationcanbeframedandsolvedundertheMAPprobabilisticframework.Thisprovidesanalternativetothemorecommonmaximumlikelihoodestimation(MLE)framework.” IthinkpriorprobablityistheimportantchallengeinBayesapproach. RegressionandClassificationaresuperviselearningandwehavelabels. ClusteringorCo-clusteringareunsuperviselearningandwehavenolabels. Whenwehavenolabels,wedon’tapproximatepriorprobabilityandwecan’tuseBayestheorem,unlessweassumptionpriorequaluniform. Isthistrue? WhyBayestheoremhasnotbeendirectlyusedinClustering,butveryusefulinClassificationwithNaviebayes. Reply AdrianTam December2,2021at2:05am # AllBayesianmethodsneedaprior.That’sthecharacteristicofit. Reply cp December3,2021at7:08am # IamtryingtosolidifymyunderstandingofrecallandnaivebayeswithMLtrain/testsets. Inthearticle,itwasmentionedasfollows: P(B|A):TruePositiveRate(TPR)(whichisbasicallyrecall). Accordingtotheexample,P(B|A)meansP(Test=Positive|Cancer=True).So,ifIamdoingk-foldsampling,Itrainthingsasusual,butfortestsets,ifImodifythemsothattheyonlycontain(Cancer=true)datapoints,feedthemintoanMLalgorithm(forexample,logisticregression)andthencounthowmanydatapointsthatthealgorithmpredictsaspositive,wouldtheresultbeequivalenttoP(Test=Positive|Cancer=True)whichshouldbeagainthesameasrecall? Reply AdrianTam December8,2021at6:45am # Itseemsyoudon’trealizeP(B|A)isnotaprecisenotationaswedon’tknowhowthisprobabilityiscomputed.RecallisexactlyTP/(TP+FN).HereTPshouldbebothCancer=TrueandTest=TruewhileFNisCancer=FalseandTest=True Reply cp December9,2021at4:45pm # Hmm,thankssomuchfortheexplanation,butIhavetoadmitthatIstillfeelabitunclear. Thispostofferedagoodexplanationthat P(Test=Positive|Cancer=True)=0.85means“allthepeoplewhohavecancerandaretested,85%ofthemwillgetapositiveresultfromthetest.” So,ifweconsiderlogisticregressionasatestingmechanism,andfeedallthedataofpeoplewhohavecancer,thenthenumberofpeoplethattheregressionpredictsaspositiveisbasically:P(Test=Positive|Cancer=True).AmIunderstandingcorrectly? Reply AdrianTam December10,2021at4:19am # Yes.Butyoudon’ttestinthisway.YouneedsamplesfromCancer=Falseaswell. Reply cp December10,2021at4:22am # Understoodnow.Thanks! Shekar June6,2022at12:52am # ExcellentOverviewofBayestheorem..Thankyou Reply JamesCarmichael June6,2022at8:57am # ThankyouforthefeedbackShekar! Reply LeaveaReplyClickheretocancelreply.Comment*Name(required) Email(willnotbepublished)(required) Δ Welcome! I'mJasonBrownleePhD andIhelpdevelopersgetresultswithmachinelearning. Readmore Nevermissatutorial: Pickedforyou: HowtoUseROCCurvesandPrecision-RecallCurvesforClassificationinPythonHowandWhentoUseaCalibratedClassificationModelwithscikit-learnHowtoImplementBayesianOptimizationfromScratchinPythonHowtoCalculatetheKLDivergenceforMachineLearningAGentleIntroductiontoCross-EntropyforMachineLearning LovingtheTutorials? TheProbabilityforMachineLearningEBookiswhereyou'llfindtheReallyGoodstuff. >>SeeWhat'sInside