Talk:Bayes' theorem/Archive 2

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Archive 4

Archive 5

Substantial Revision

Hello everyone. There was a recent substantial revision of Bayes' theorem [1]. I'm afraid it doesn't look like an improvement to me. Here are some points to consider.

In the introduction, Bayes' theorem is described in terms of random variables. This isn't necessary to clarify Bayes' theorem, and introduces a whole raft of heavy baggage (to mix metaphors) that is going to be well-nigh incomprehensible to the general readership.
The two-line derivation of Bayes' theorem is put off for several paragraphs by a lengthy digression which introduces some unnecessary notation and includes a verbal statement of Bayes' theorem which is much less clear than the algebraic statement which it displaces.
The example is somewhat problematic. It is formally correct, but it's not very compelling, as it doesn't make use of any relevant prior information; the medical test example and even the cookies example, which were moved to Bayesian inference a while ago, were superior in that respect. Perhaps if an example is needed, we can restore the medical test or cookies (I vote for the medical test fwiw). The example is also misplaced (coming before the algebraic statement) although that's easy to remedy.

Given the difficulties of the recent revision, I'm tempted to revert. Perhaps someone wants to talk me out of it. Regards, Wile E. Heresiarch 22:39, 11 Jul 2004 (UTC)

Perhaps you are right that the really simple material should come first. However, that's not a reason to throw away the example on political opinion polling. That example is in many respects typical of the simplest applications in Bayesian statistical inference. I for one find it compelling for that reason. To say that the simple statement that is followed by the words "... which is Bayes' theorem" is more than just a simple special case is misleading. Michael Hardy 23:41, 11 Jul 2004 (UTC)

Also, the "verbal" version is very useful; in some ways it makes a simple and memorable idea appear that is less-than-clearly expressed by the formula expressed in mathematical notation. The role of the likelihood and the role of the prior are extremely important ideas. Michael Hardy 23:44, 11 Jul 2004 (UTC)

I've moved the example farther down in the article, as it interrupts the exposition. I've also reverted the section "Statement of Bayes' theorem" to its previous form; the newer version did not introduce any new material, and was less clear. I put a paraphrase using the words posterior, prior, likelihood, & normalizing constant into the "Statement" section. -- I'm still not entirely happy with "random variable" in the introduction, but I haven't found a suitable replacement. I'd favor "proposition" but that it is likely not familiar to general readers. Fwiw & happy editing, Wile E. Heresiarch 14:49, 20 Jul 2004 (UTC)

Hello, I've moved the existing content of this page (last edit April 12, 2004) to Talk:Bayes' theorem/Archive1. I used the "move" function (instead of cut-n-paste) so the edit history is now with the archive page. Regards, Wile E. Heresiarch 14:30, 8 Jul 2004 (UTC)

Bayes' theorem vs Bayesian inference

It seems to me that the current version of the Bayes' theorem article contains a little too much Bayesian inference. This is not to deny from the importance of Bayesian inference as the premier application of Bayes' theorem, but as far as I can see:

The section explaining terms such as posterior, likelihood, etc. is more appropriate to the Bayesian inference article. None of it is taught with Bayes' theorem in courses on elementary probability (unless, I assume, Bayesian inference is also taught).
The example is one of Bayesian inference, not simply Bayes' theorem. Somewhat ironically, the Bayesian inference article contains some simple examples of Bayes' Theorem that are not Bayesian in nature, and that were moved there from an older version of the Bayes' theorem article!

Some of these things are noted in other posts to this talk page and the talk page of the Bayesian inference article, but I can't see that the current version of either article is a satisfactory outcome of the discussions. The current versions of the articles appear to muddy the distinction between Bayes' theorem and Bayesian inference/probability.

Hence, I propose to change these articles by

swapping the cookie jar and false positive examples from the Bayesian inference article for the example from the Bayes' theorem article;
deleting the section on conventional names of terms in the theorem from the Bayes' theorem article (but noting that there are such conventions as detailed in the Bayesian inference article);
revising the description of the theorem to refer to probabilities of events, since this is the most elementary way of expressing Bayes' theorem, and is consistent with identities given in (for instance) the conditional probability article.

Since this has been a topic of some discussion on the talk pages of both articles, I would like to invite further comment from others before I just go ahead and make these changes. In the absence of such discussion, I'll make the proposed changes in a few days.

Cheers, Ben Cairns 07:55, 23 Jan 2005 (UTC).

Well, I agree the present state of affairs isn't entirely satisfactory. About (1), if you want to move the medical test to Bayes' theorem in exchange for the voters example, I'm OK with that. I'd rather not clutter up Bayes' theorem with the cookies; it's no less complicated than the medical test, and a lot less interesting. (2) I'm OK with cutting the conventional terms from Bayes' theorem . (3) I guess I'm not entirely happy with stating Bayes' theorem as a theorem about events, since "events" has some baggage. I'd be happiest to say something like P(B|A) = P(A|B) P(B)/P(A) whenever A and B are objects for which P(A), P(B), etc, make sense and that might be OK for mathematically-minded readers but maybe not as friendly to the general readership. Any other thoughts about that? Anyway, thanks for reopening the discussion. Now that we've all had several months to think about, I'm sure we'll make quick progress. 8^) Regards & happy editing, Wile E. Heresiarch 21:56, 23 Jan 2005 (UTC)

Thanks for the quick response! I also prefer the medical test example. Perhaps the cookies can be returned home and then deleted. It's not so complicated a theorem that it needs many examples.

I also take your point about events, but it's just that event has a particular meaning. Perhaps a brief, layman's definition would be appropriate, for example:

"Bayes' theorem is a result in probability theory, which gives the conditional probability of an event (an outcome to which we may assign a probability) A given another event B in terms of the conditional probability of B given A and the (marginal) probabilities of A and B alone."

I don't believe this is a foolish consistency; a precise definition of an event is an important component of elementary probability theory, and anyone who would study the area (even in the kind of detail provided by Wikipedia) should come to appreciate that we cannot go around assigning probabilities to just anything. The article Event (probability theory) explains this quite well. It seems to me that the greater danger lies in obscuring the concept with an array of vaguer terms for which we do not have articles explaining the matter. Thanks again, Ben Cairns 22:43, 23 Jan 2005 (UTC).

Well, we seem to have reached an impasse. I'm quite aware that "event" has a prescribed meaning; that's why I want to omit it from article. Technical difficulties with strange sets never arise in practical problems and for this reason are at most a curiosity -- this is the pov of Jaynes the uber-Bayesian. From what I can tell, Bayesians are in fact happy to assign probability to "just anything" and this is pretty much the defining characteristic of their school. Let me see if I can find some textbook statements from Bayesians to see what is permitted for A and B. Wile E. Heresiarch 16:02, 24 Jan 2005 (UTC)

I don't think we've reached an impasse yet, but perhaps we (presently) disagree on what this article is about. Bayes' theorem is not about Bayesian-anything. It is a simple consequence of the definition of conditional probability. I don't think that this article should be about Bayesian decision theory, inference, probability or any other such approach to the analysis of uncertainty.

Even if my assertion that people "should come to appreciate that we cannot go around assigning probabilities to just anything" is misplaced (and I'm happy to agree that it is), the word 'event' is what probabilitists use to denote things to which we can assign probabilities. I cannote speak for Bayesian statisticians, as (despite doing my undergraduate degree in the field) I now do so little statistics that I can avoid declaring my allegiance. But, again, I don't believe that this article is about that at all. (I am aware of strong Bayesian constructions of probability theory, but they are not considered standard, by any means.)

What do you think of: "Bayes' theorem is a result in probability theory, which gives the conditional probability of A given B (where these are events, or simply things to which we may assign probabilities) in terms of the conditional probability of B given A and the (marginal) probabilities of A and B alone."

The main problem I have with the event business is that it's not necessary, and not helpful, in this context. Being told that A and B are elements of a sigma-algebra simply won't advance the understanding of the vast majority of readers -- this is the "not helpful" part. One can make a lot of progress in probability without introducing sigma-algebras until much later in the game -- this is the "not necessary" part. I'd prefer to say A and B are variables -- this avoids unnecessary assumptions. "A and B are simply things to which we may assign probabilities" is OK by me too. For what it's worth, Wile E. Heresiarch 16:24, 25 Jan 2005 (UTC)

The events article isn't that bad; the majority of it concerns a set of simple examples corresponding to the "things to which we may assign probabilities" definition. Of course, it also mentions the definition of events in the context of sigma algebras, but that is as it should be, too (after all, the term is in common use in that context). If you have qualms with the way the events article is presented, perhaps that needs attention, but I don't see that this should be a problem for Bayes' theorem. It seems a little POV to avoid use of the conventional term for "things to which we may assign probabilities" on the grounds that its formal definition, which does not appear in this article and is not the focus of the article on the term itself, may be difficult for some (even many) people to understand. Cheers, Ben Cairns 05:54, 26 Jan 2005 (UTC).

OK, so you saw the "not helpful" part. Can you address the "not necessary" part? Btw I don't have any desire or intent to change the event article. Wile E. Heresiarch 00:31, 27 Jan 2005 (UTC)

I think my comment above covers this to some exent, but to clarify... While the topic can certainly be explained without reference to events, we could just as easily discuss apes without calling them by that name—or worse, by calling them 'monkeys'—but that would obscure the facts that apes are (a) called 'apes', and (b) are not monkeys.

I have to say that I don't understand your resistance to using the word 'events', when you are satisfied with the (essentially) equivalent phrase, "things to which we may assign probabilities." How does adding the word detract from its elementary meaning? I don't deny that one can make a lot of progress without worrying about the details of constructing probability spaces, but providing a link which eventually leads to a discussion of those details hardly requires the reader to assimilate it all in one sitting.

Could you perhaps suggest, as a compromise, a way to present the material that (a) is clear even to the casual reader, and (b) at least hints that these things are called 'events'? Ben Cairns 04:25, 27 Jan 2005 (UTC).

Spelling of of possessive ending in 's'

Sorry to be a prude but I thought that names ending in 's' should be spelt 's'-apostraphe-'s', as in "Jones's", and should not end in an apostraphe unless the name is a plural. Shouldn't this page be "Bayes's" or is this rule particular to the UK? --Oniony 15:17, 25 July 2005 (UTC)

The Wikipedia Manual of Style says either is acceptable. I usually see "Bayes' theorem" instead of "Bayes's Theorem." I honestly don't know if this is a US/UK thing or just a matter of taste. (Personally I prefer the former.) --Kzollman 17:56, July 25, 2005 (UTC)

The lower case initial t in theorem is prescribe by Wikipedia's style manual, I think; certainly it's the usual practice here. I titled an article Ewens's sampling formula and created redirects from the various other conventional ways of dealing with possessives and eponymous adjectives, etc. I'm not sure what the style manual says, nor do I have settled preferences on this one. Michael Hardy 20:37, 25 July 2005 (UTC)

Googling for "Bayes' theorem" yields about 144 k hits, while "Bayes's theorem" yields about 6 k. Restricting the search to site:en.wikipedia.org yields 154 and 10, respectively. Searching newsgroups yields about 2500 and 150, respectively. Since both forms are acceptable, let's use "Bayes' theorem", which has much more currency than "Bayes's theorem". Wile E. Heresiarch 03:16, 26 July 2005 (UTC)

Plagiarism

The medical test emaple (Example I) seems to be plagiarized from Sheldon Ross's "A First Course in Probability". Thank you.

Example #1: False positives in a medical test

Suppose that a test for a particular disease has a very high success rate:

if a tested patient has the disease, the test accurately reports this, a 'positive', 99% of the time (or, with probability 0.99), and
if a tested patient does not have the disease, the test accurately reports that, a 'negative', 95% of the time (i.e. with probability 0.95).

Suppose also, however, that only 0.1% of the population have that disease (i.e. with probability 0.001). We now have all the information required to use Bayes's theorem to calculate the probability that, given the test was positive, that it is a false positive. This problem is discussed at greater length in Bayesian inference.

Let D be the event that the patient has the disease, and T be the event that the test returns a positive result. Then, using the second alternative form of Bayes' theorem (above), the probability of a positive is

P(T)=P(T|D)\,P(D)+P(T|D^{C})\,P(D^{C})\!

P(T) is the probability that a given person tests positive. This depends on the two populations: those with the disease (and correctly test positive 0.99 x 0.001) and those without the disease (and incorrectly test positive 0.05 x 0.999). The probability that a person has the disease, given that the patient tested positive, is determined by dividing the probability for a true positive result by the probabilty of any positive result, which is the sum of the probabilities for a true positive and a false positive:

P(disease|test+)={\frac {P(test+|disease)\,P(disease)}{P(test+|disease)\,P(disease)+P(test+|NOdisease)\,P(NOdisease)}}\!

P(D|T)={\frac {P(T|D)\,P(D)}{P(T|D)\,P(D)+P(T|D^{C})\,P(D^{C})}}\!

P(D|T)={\frac {0.99\times 0.001}{0.99\times 0.001+0.05\times 0.999}}=11/566\approx 0.019,\!

and hence the probability that a positive result is a false positive is about (1 − 0.019) = 0.981.

Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the disease. It should be noted that this is quite common in screening tests. In many or most cases it is more important to have a very low false negative rate than a high true positive rate. Another strategy to deal with this problem is to try to screen a selected population in which the prevalence of the disease is higher. For example it would be senseless to screen to the whole population for cancer (extremely costly and invasive tests) which would result in an enourmous amount of false positives as is shown above. On the other hand if you select a part of the population (i.e. those who have lost 10% of their weight in the last couple of months without having gone on a diet) the prevalence of cancer is higher and the probability of a false positive will be lower. The higher the number of characteristics your look for before you apply the test (this raises the pre-test probability, or simply put, the prevalence) the more acurate your test will be.

The Statement of Bayes' Theorem

The Statement of Bayes' Theorem section is correct but confusing. I had to re-read this section several times before I remembered from graduate school that "likelihood" has a counter-intuitive technical definition. To the average math-oriented reader, you can't just pop P(A|B) = L(B|A) and not explain that likelihood is an unfortunate technical phrase. Most non-statisticians would not equate "Probability of A | B" with "Likelihood of B | A". If someone doesn't already know Bayes' theorem (reason they're reading the article), they probably don't know what statisticians mean when they say "likelihood function" either. I'd suggest eliminating everything about likelihood functions entirely from this section and just stick with probabilities-oriented terms.--Toms2866 13:06, 28 March 2006 (UTC)

"Nontechnical explanation" and cookies example

Hello. I've cut the "nontechnical explanation" and the cookies example for the following reasons. (1) "Nontechnical explanation" is mistaken. Bayes' theorem isn't limited to observable physical events, as suggested by the repeated use of the word "occurring". The author has been misled by the suggestive term "event". (2) The verbiage about the term likelihood is void of meaning: This measure is sometimes called the likelihood, since it is the likelihood of A occurring given that B occurred. It is important not to confuse the likelihood of A given B and the probability of A given B. Even though both notions may seem similar and are related, they are quite different. Uh huh. (3) Descriptions of each term P(A), P(B), etc are covered elsewhere in the article. (4) P(A), P(B), etc are called "measures" in the "nontechnical explanation" but they're not; I suppose the author intended "quantities". (5) The description of P(B) is mistaken: This measure is sometimes called the normalising constant, since it will always be the same, regardless of which event A one is studying. No, it is not called a normalizing constant because it is always the same. (6) The cookies example doesn't illustrate anything interesting. (7) The cookies example already appears on the Bayesian inference page. -- The article needs work, and it can be improved, but not pasting random stuff into it. Wile E. Heresiarch 07:17, 28 November 2005 (UTC)

I agree with some of the points that you raise, but I also believe that there was some good information in the "non-technical" section that you removed. Furthermore, I believe that many math-related articles on Wikipedia, this one included, tend to start immediately with highly technical explanations that only Ph.D. mathematicians can understand. Yes, the articles do need to include the formal mathematical definitions, but I believe that it would be helpful to begin each article with a simple, non-technical explanation that is accessible to the more general reader. Most of these math-related articles have important applications well beyond mathematics -- including physics, chemistry, biology, engineering, economics, finance, accounting, manufacturing, forensics, medecine, etc. You need to consider your audience when you write articles for Wikipedia. The audience is far broader than the population of Ph.D. mathematicians. -- Metacomet 14:37, 28 November 2005 (UTC)

One other point: in my opinion, it is not a good idea in general for articles to point out that they are starting with a non-technical explanation, and that the full technical discussion will come later, as this article originally did. It is better simply to start with simple, non-technical descriptions and then smoothly to transition to the more formal, technical discussion. Sophisticated readers will know immediately that they can skim over the non-technical parts, and read the more advanced section in greater detail. Non-sophisticated readers will appreciate that you have tried to take them by the hand and bring them to a deeper level of understanding. -- Metacomet 14:50, 28 November 2005 (UTC)

Hi, I wrote the non-technical explanation, so I'll chip in with my thoughts. First, the reason I wrote it is that this article is too technical. If you check back the history before I first added the section, you'll see there was a "too technical, please simplify" warning on the page. Hell, I'm a computer engineer, I use Bayes' theorem every day, and even I couldn't figure out what the page was talking about. People who don't have a strong (grad level) mathematical background will be completely lost on this page. There is a definite, undeniable need for a simpler, non-technical explaination of Bayes' Theorem.

That said, the vision I had for the non-technical explaination was for it to be a stand-alone text. The technical explaination seemed complete and coherent, if too advanced for regular readers, so I did not want to mess around with it. I thought it would be both simpler and better to instead begin the page with a complete non-technical text, which regular readers could limit themselves too while advanced readers could skip completely to get to the more technical stuff. That is why, as Heresiarch pointed out, the definitions of Pr(A), Pr(B) etc. are there twice.

So I vote that we restore the non-technical explaination. Heresiarch, if you have a problem with some terms used, such as "occur" or "measure", you should correct those terms, not delete the entire section. But keep in mind when doing those corrections that the people who'll be reading it will have little to no formal background in mathematics – keep it sweet and simple! -- Ritchy 15:11, 28 November 2005 (UTC)

I think there is room for a compromise solution that will make everyone happy and improve the article substantially. Basically, I think Ritchy is correct, the non-technical explanation needs to go back in at the beginning, but it needs to be cleaned up a bit and the transitions need to be a bit smoother. The truth is, the so-called non-technical discussion is not even all that simplified -- it happens to be pretty well written and provides a very good introduction to the topic. Again, I think it just needs a bit of cleaning-up, and it needs to be woven into the article more smoothly. -- Metacomet 15:54, 28 November 2005 (UTC)

As a first step, I have added the simple "cookies" example back, but this time I grouped it with the other example in a single section entitled "Examples." Each example has its own sub-section with its own header. I think it improves the flow of articles when you put all of the examples together in a single section, and begin with simple examples before proceeding to more complicated ones. -- Metacomet 16:11, 28 November 2005 (UTC)

The next step is to figure out a way to weave the non-technical explanation back in near the beginning of the article without sounding too repetitious and with smooth transitions. -- Metacomet 16:11, 28 November 2005 (UTC)

I am not opposed to some remarks that are less technical. I am opposed to restoring the section "Non-technical explanation", as it was seriously flawed. If you want to write something else, go ahead, but please don't just restore the previous "Non-technical explanation". Please bear in mind that just making the article longer doesn't necessarily make it any clearer. Wile E. Heresiarch 02:22, 29 November 2005 (UTC)

Actually, I think it is pretty good as written. You say that it is "seriously flawed." I am confused: what are your specific objections or concerns? -- Metacomet 03:36, 29 November 2005 (UTC)

See items (1) through (5) above under "Nontechnical explanation" and cookies example. Wile E. Heresiarch 07:04, 29 November 2005 (UTC)

I have pasted a copy of the text below for reference. -- Metacomet 04:03, 29 November 2005 (UTC)

I have edited the "Nontechnical explanation" according to the critics (1) and (4). (2) and (3) are meaningless – it seems Heresiarch just doesn't like things explained too clearly to people who don't know math. (5) seems to be a misunderstanding. Pr(B) is the probability of B, regardless of A. Meaning, if we're computing Pr(A|B), or Pr(C|B), or Pr(D|B), the term Pr(B) will always be the same. That's what I meant by "it will always be the same, regardless of which event A one is studying." If the statement isn't clear enough, I'm open to ideas on how to improve it. -- Ritchy 20:10, 29 November 2005 (UTC)

Non-technical explanation

Simply put, Bayes’ theorem gives the probability of a random event A given that we know the probability of a related event B occurred. This probability is noted Pr(A|B), and is read "probability of A given B". This quantity is sometimes called the "posterior", since it is computed after all other information on A and B is known.

According to Bayes’ theorem, the probability of A given B will be dependent on three things:

The probability of A on its own, regardless of B. This is noted Pr(A) and read "probability of A". This quantity is sometimes called the "prior", meaning it precedes any other information – as opposed to the posterior, defined above, which is computed after all other information is known.
The probability of B on its own, regardless of A. This is noted Pr(B) and read "probability of B". This quantity is sometimes called the normalising constant, since it will always be the same, regardless of which event A one is studying.
The probability of B given the probability of A. This is noted Pr(B|A) and is read "probability of B given A". This quantity is sometimes called the likelihood, since it is the likelihood of A given B. It is important not to confuse the likelihood of A given B and the probability of A given B. Even though both notions may seem similar and are related, they are quite different.

Given these three quantities, the probability of A given B can be computed as

\Pr(A|B)={\frac {\Pr(B|A)\Pr(A)}{\Pr(B)}}.