Talk:Bayes estimator

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

old comments[edit]

I was surprised to see this topic doesn't yet have an article. I created a stub, but a lot more content is needed: for example, how such estimators are constructed. Anyone interested in having a go? --Zvika 18:21, 13 September 2006 (UTC)[reply]

I think there is a mistake. We should not integrate over the prior, but over P(theta | x). Otherwise, the estimator would not depend on the data at all and would be simply INTEGRAL( pi(theta)*theta) -- Shay

If you are referring to , then the integral is performed with respect to the posterior probability , not the prior . I think the formula is correct. --Zvika 16:31, 8 April 2007 (UTC)[reply]

Shouldn't it be said anything about the fact that the Bayesian estimation approach was developed by to Bruno de Finetti.? —Preceding unsigned comment added by 129.242.236.138 (talk) 22:44, 29 May 2008 (UTC)[reply]

Is the equation starting with missing the squared part? —Preceding unsigned comment added by 99.150.238.109 (talk) 03:03, 7 June 2008 (UTC)[reply]

Yes. Thanks for pointing this out. --Zvika (talk) 16:29, 7 June 2008 (UTC)[reply]

Is this paper relevant?[edit]

Uncertainty Analysis and Other Inference Tools for Complex Computer Codes. Anthony O'Hagan, Marc C. Kennedy and Jeremy E. Oakley [1] If so, is there a place for adding this to the article?

Should Gaussian process emulator redirect to this page? crandles (talk) 11:30, 15 March 2009 (UTC)[reply]

It seems to me like a very specific application whereas this article is about the general technique. In other words, I think the answer to all questions is "no".. --Zvika (talk) 14:47, 15 March 2009 (UTC)[reply]

Definition[edit]

I have some trouble understanding the definition given here. Is the sentence 'The estimator which minimizes the posterior expected loss for each x also minimizes the Bayes risk and therefore is a Bayes estimator.' part of the definition? It doesn't sound like it's defining anything. So maybe it shuold be moved to a different part of the article. Besides that, where do these 'each x' come from? Thanks for clarifying this. --Saraedum (talk) 20:55, 9 July 2009 (UTC)[reply]

It is an equivalent definition. I have tried to clarify, hope this helps. --Zvika (talk) 06:31, 10 July 2009 (UTC)[reply]
Yes it does. Thanks. Saraedum (talk) 20:16, 11 July 2009 (UTC)[reply]

generalized Bayes estimator[edit]

At present the text implies that everyone uses "generalized Bayes estimator" if improper priors are used. I think it is probably unusal to do so. For example, the book "Bayesian Theory" by Bernardo & Smith does not do so: it does not (seem to) make a distinction for improper priors. It would be misleading to leave this as it stands if major sources do not all use the same terminology. Melcombe (talk) 17:10, 18 November 2009 (UTC)[reply]

Are you sure they don't make the distinction? That sounds very strange, since there are essential differences between proper and improper priors. For example, a (proper-prior) Bayes estimator is typically admissible, whereas an improper-prior generalized Bayes estimator often isn't (e.g. the ordinary least squares estimator in the multivariate Gaussian case). --Zvika (talk) 19:02, 18 November 2009 (UTC)[reply]
I went down to the library to take a look at some statistics textbooks. Bernardo and Smith indeed do not seem to use the term generalized Bayes estimator, but neither do they refer to improper prior estimators as Bayes estimators, as far as I could tell, and they explicitly point out the fact that these are different from estimators based on proper priors. Two other standard references I looked at (Zacks's Theory of Statistical Inference and Berger's Statistical Decision Theory and Bayesian Analysis) both use the term generalized Bayes estimator. So I think the article is fine as it is. --Zvika (talk) 08:41, 19 November 2009 (UTC)[reply]
It may be that the term is only used when the distinction is relevant. We could leave things as they stand and see if anyone else has a preference. In looking at literature I found the following. Dodge's Oxford Dictionary of Statistical Terms has "generalized Bayes' decision rule" which points immediately to "Bayes' decision rule", where it is not actually clear about what distinction is being made but might be interpreted to aggree with what is in the article. In the same place, there is a reference to "an extended Bayes' decision rule" which looks to be making an entirely different distinction ... this is attributed to DeGroot(1970). Is this something else that needs to be brought into this article (I only looked for the term "extended" which does not appear)? Melcombe (talk) 10:20, 19 November 2009 (UTC)[reply]
I don't think I've heard of an extended Bayes estimator. --Zvika (talk) 14:41, 19 November 2009 (UTC)[reply]
I do not have access to DeGroot, but according to the dictionary, and extended rule is defined as follows, partly retaining the notation used in the dictionary. Let d(p) be the Bayes rule for prior distribution p. Let r(p, e) be the risk function for prior p and rule e. A rule d0 is an extended Bayes rule if, for every ε > 0, there is a prior p such that
I am not clear what this concept actually relates to. Melcombe (talk) 16:30, 20 November 2009 (UTC)[reply]

why the hidden note?[edit]

Melcombe, I don't understand why you added that note. Many pages redirect here and you can get a list of all of them here. --Zvika (talk) 17:03, 26 November 2009 (UTC)[reply]

I did so because I had just changed the redirect of "Bayesian decision theory" to point here rather than "Bayesian inference" (which did not seem to serve). There is advice about marking sections of articles with a hidden message where they are targets for links, but not necessarily whole articles. The present content has a lot about Bayesian decision theory rather than just "estimation" and the message was meant to be a reminder to anyone thinking of cutting-out the more general stuff that this would leave a substantial topic without coverage. Of course there may be scope for having two articles at some stage. Melcombe (talk) 14:01, 27 November 2009 (UTC)[reply]
Noting section redirects is important because the redirects must be updated if the section name is changed, and there is no simple way to get a list of redirects to that section. This is not the case for redirects to an article, since you can get a 'what links here' page. As for cutting out material, I don't think people would delete good material just because it doesn't belong in a specific article - they would rather find a better place and cut-and-paste. On the other hand, I think it's important to keep beginnings of articles as lightweight as possible to make it easier for newcomers get to the text and edit it. --Zvika (talk) 19:37, 28 November 2009 (UTC)[reply]

Risk function[edit]

I added the wikilink to the allready-existing article risk function, but it is not immediately clear that the term is actually being used for the same things. Is anyone able to clarify this? Melcombe (talk) 11:41, 21 December 2009 (UTC)[reply]

It seems to me that they are indeed two different things. Risk function is about the frequentist risk, and is a function of θ, whereas here the discussion is about the Bayes risk (currently a redirect to this article). Thus, IMO the link you added is misleading, and should be removed. --Zvika (talk) 15:20, 21 December 2009 (UTC)[reply]
OK... and I have removed the wikilink to risk function from this article. But this raises the question of whether "risk function" is the right terminology here ... would it be better to use "expected loss function". Only one stats dictionary I have has an entry for "risk function" and that only gives the "frequentist version". The limited books I have looked at seem to avoid "risk function" for the Bayesian approach. Melcombe (talk) 11:52, 22 December 2009 (UTC)[reply]
"Risk function" probably usually refers to the frequentist risk, but there is definitely also a "Bayesian risk function" with the meaning in this article. It is used for example in Lehmann and Casella. --Zvika (talk) 12:03, 22 December 2009 (UTC)[reply]

IMDB[edit]

This section is not very clear to me.

'Comparing this formula with one in the preceding section, one can see that m must have been related to the relative weight of the prior information in units of the new information given by one vote. Hence C must be the mean vote across the movies with more than 3000 votes, and m should be related to the deviation of votes in this pool.'

It first refers to an unclear previous formula, and implies that anyone can deduce the conclusions the writer draws. Can the writer clarify? The example that is given is quite abstract as well. Perhaps an example that reflects the shortcomings of IMDB's approach, or even better a superior alternative, would be helpful? — Preceding unsigned comment added by 82.136.222.138 (talk) 21:02, 19 June 2011 (UTC)[reply]

  • This section appears to be original research. I propose it be deleted, unless someone finds reliable sources which discuss this "misapplication" of bayesian statistics by IMDB. -- X7q (talk) 20:27, 20 June 2011 (UTC)[reply]
    • I am not an expert, but I disagree with the conclusion the author of this section draws. The goal of IMDB is to use a Bayesian estimate to ensure that (for example) a "show x" with 4 votes/ratings of 10 doesn't out-rank "show y" with 400 votes/ratings of ten and 1 vote/rating of 9, even though the straight average for show Y would be lower than that of Show X. Once a show gets much more than 3,000 votes/ratings, IMDB wants its score or rank to be closer to the straight Average, but at lower numbers of votes/ratings, they want the score or rank to be weighted close to the average across all shows. I don't see the flaw in their approach, but given that my organization models after IMBD's approach (using a similar equation for ranking shows at denveropenmedia.org), I'd love to know if it is flawed. From my math, shows with over 3,000 votes don't get a boost at all, their "W" (weighted rating) just approaches "R" (pure average). Deproduction (talk) 20:45, 1 December 2011 (UTC)[reply]
      • IMDB … use a Bayesian estimate: The whole point is that IMDB does not use the Bayesian estimate. They use something they call “true Bayesian estimate”, but this claim has no relation to reality. How do you think this number (25000) could enter into a formula for the Bayesian estimate?
      • The goal of IMDB is to use: who cares about the goals of some companies here? “Bayesian estimate” is a mathematical notion; it does not pay attention to people’s goals.
      • a "show x" with 4 votes/ratings of 10 doesn't out-rank "show y" with 400 votes/ratings of ten and 1 vote/rating of 9: your numbers are very misleading. The formula used by IMDB should be illustrated by analyzing a show with 40,000 votes vs one with 400,000 votes. Both numbers of votes are so overwhelmingly large that there should be no trace of prior knowledge left in the result.
      • my math, shows with over 3,000 votes don't get a boost at all: care to write a WikiPedia article about your math? Even with 300,000 votes, a movie is penalized by 7% comparing to a movie with many millions of votes (a movie 2.0 above the average would be counted as 1.86 above the average). With 3,000 of votes, a movie is penalized 833% (= (25+3)/3 - 1). (Hmm, I see that you use an older number 3,000 instead of 25,000; still, your statement is very wrong — see below). 76.218.120.86 (talk) 03:12, 2 August 2014 (UTC)[reply]
          • Not sure why you're being combative, but my statement wasn't wrong. as the number of votes increases, W approaches R. Its inaccurate to say that movies are "penalized" because of low vote counts, because if a film with low vote counts has an average score/rating (R) below the average of all shows (C), it will be boosted. I don't blame you for questioning IMBD's approach, I just don't think this article is a space for that. This is an example of where Bayesian Estimates are used in a practical way. The criticisms of IMDB or suggestions of how perhaps you'd design the calculations if *you* ran IMDB are irrelevant.Deproduction (talk) 00:07, 6 February 2015 (UTC)[reply]
    • the last sentence of this section said "As used, all that the formula does is give a major boost to films with significantly more than 3000 votes." This is clearly an inaccurate statement. A Boost in relation to what? to its average? the formula results in a weighted rating that would be "boosted" only if its average ("R") was lower than the mean of all shows("C"), and that "boost" actually shrinks the more votes the show has. I find it hard to believe that IMDB's use of Bayesian is flawed, considering that organizations with some pretty smart people, including Rotten Tomatoes and the Open Media Foundation have emulated it. IMDB has one of the most successful voting/rating systems on the web, and this is the first I've seen their use of Bayesian Estimates criticized. I'd like for them to chime-in here, because again, if there are flaws, I want to learn from their mistakes and ensure that my organization isn't making the same mistakes. Deproduction (talk) 03:29, 2 December 2011 (UTC)[reply]
      • A Boost in relation to what? Thanks, this was not clear: to the majority of films deserving to be noted. These films typically have lower vote count than blockbusters, so are majorly penalized w.r.t. blockbusters. And this is equivalent to boosting blockbusters.
      • "boosted" only if its average ("R") was lower than the mean: same as above; this “boost” is only relative; it is due to penalization of the rest. 76.218.120.86 (talk) 03:12, 2 August 2014 (UTC)[reply]

Some more clarity: The Videos in IMDB's top 250 have generally well over 100,000 votes/ratings, making the impact of "C" minimal in their bayesian equation. When you look at the IMDB Rating of a film like "The Godfather", with 500,000 votes/ratings, its rating is practically just "R", which is what they want. Their use of Bayesian ensures that a film with only a few hundred votes, all a perfect-10 does not rank above "the Godfather"s 9.2 average. They arbitrarily chose 3,000 as a number of votes, above which the shows "W" (Weighted Rating) should be getting closer to its pure "R" (Average rating). At 3,000 votes, a film's Weighted Rating (W) would be exactly between its straight Average (R) and the average across all shows (C). Beneath 3,000, and W approaches C (W=C when the # of votes is 0), while above 3,000, W Approaches R the more votes are received. I do believe that IMDB deserves mention in this article, as they were a leader in using a form of Bayesian Estimation for creating one of the earliest and most popular voting systems on the web, but on first glance, I think the author of this section was mistaken. Deproduction (talk) 03:28, 2 December 2011 (UTC)[reply]

  • IMDB's top 250 have generally well over 100,000 votes/ratings: of course. The formula they use makes it pretty hard for a movie with much fewer votes to enter the list. So it is a corollary of their formula, and not something to support validity of their formula. Checking… With a correct Bayesian estimate, 90 of 250 have below 100,000 votes — and this is with movies with less than 25000 votes cut off the list!
  • while above 3,000, W Approaches R: Even assuming m=3000 (which is long as changed to 25000), a movie with 3,001 votes is penalized 100% w.r.t. a blockbuster. So such a movie has no chance to beat a blockbuster with an average vote 8.5 — even if all of 3,001 votes are 10. (Recall that, for a movie with “true” average grade below 9.99, a probability to get such a vote is less than 1/1,000,000,000; so, with a pool of 1,000,000 movies, this voting has no chance to be a fluke.) 76.218.120.86 (talk) 03:12, 2 August 2014 (UTC)[reply]
        • Your statement that "a movie with 3,001 votes is penalized 100% w.r.t. a blockbuster" is 100% wrong. A movie with 3,000 votes with an average rating (R) that is equal to the average across IMDB (C) would be neither penalized nor boosted. Their weighted average (W) would equal their straight average (R). If their average rating (R) were above the average across IMDB (C), their weighted average (W) would be lower than their straight average (R). IF their average rating (R) were below the average across IMDB (C), their weighted average (W) would be higher than their straight average (R). So, a movie with 3001 votes is *not* penalized, its rating on IMDB (W) is just weighted towards C. IMDB could explain their logic behind the number of votes required, but this isn't the place for it. I imagine they're protecting against "stuffing the ballot box", or recognizing that some films are distributed among certain alternative audiences who are very biased towards liking a film... Like there are some "Pro scientology" films that essentially only scientologists see. There are tens of thousands of scientologists, and they all LOVE the film. They could even be encouraged to log-onto IMDB and give the film a perfect 10, to encourage evangelism of Scientology... While we can debate it here (on the talk page) or elsewhere, I don't think this wikipedia article is the right place for people to state their opinion that IMDB favors blockbusters, when that is just conjecture, and even if it were true, is irrelevant to this article. Deproduction (talk) 00:23, 6 February 2015 (UTC)[reply]

Look back before May 2011, and you'll see this section was initially a "Practical Example of Bayesian Estimates" for several years. I think we should revert — Preceding unsigned comment added by Deproduction (talkcontribs) 03:38, 2 December 2011 (UTC)[reply]

      • This section has remained unchanged for too long. I'm removing it now, and replacing it with a common-sense explanation of why IMDB uses this Bayesian approachDeproduction (talk) 04:09, 25 November 2012 (UTC)[reply]

serious errors[edit]

There are serious errors in the section summarizing admissibility of Bayes procedures. Only the first result (about unique Bayes procedures) is correct. Both subsequent results are false, in general: the procedures must be Bayes with respect to a prior with full support. In the continuous case, the parameter space has to be an open subset of nice topological space. Indeed, the discrete example is a special case of the third one because the general result (Blyth's method), is about the admissibility among procedures with continuous risk functions with respect to (sequences of possibly improper) priors with full support. — Preceding unsigned comment added by 2601:644:400:ABDA:F049:8882:98B1:550E (talk) 06:26, 20 April 2017 (UTC)[reply]