Talk:Bayesian average

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Mathematics Low‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Low	This article has been rated as Low-priority on the project's priority scale.

In its present state the section titled "calculation" seems to consist of things that might make sense ONLY if the prior distribution and the conditional distribution of the observations given the parameter are both normal. Michael Hardy (talk) 03:46, 18 July 2008 (UTC)[reply]

Messy[edit]

A Bayesian average is a method of calculating the mean of a data set, where there is a known prior probability of the value being estimated.

What does that mean? I have a Ph.D. in statistics and I'm good at deciphering opaque writing, and here I need to guess. Where it says "prior probability of the value being estimated", does it mean "prior probability distribution of the value being estimated"? "Values being estimated" are not things to which one assigns probabilities! The "mean of a data set"??? Really? Why should one need a prior distribution if the thing whose mean one wants is a data set? And what does calculating the "mean of a data set" have to do with "estimating" something?? My guess is that someone wants to estimate a population mean (not the "mean of a data set") and that estimate is to be based on a data set.

This is a badly written article in its present form. Michael Hardy (talk) 05:59, 30 June 2009 (UTC)[reply]

Hello Michael Hardy. I agree with you, even after the article was revised in the intervening 13 years since you wrote your comment. Here is a two paragraph example of a web browser safety service that claims to use "Bayesian averages" to determine website reputation, Biased Average by MyWot (which doesn't have a great reputation itself, see archived talk page if curious. I will have a look at the article, although I don't know if I can help much.-- FeralOink (talk) 13:15, 2 February 2023 (UTC)[reply]

What is a "height" of an occupation? Michael Hardy (talk) 06:02, 30 June 2009 (UTC)[reply]

The example ends without saying what is done with the data! Michael Hardy (talk) 06:04, 30 June 2009 (UTC)[reply]

Perhaps needs to be related to pseudocount, and broadened. Bayesian estimates made using conjugate priors can quite often in form resemble the adding of fictitious data.

As for usage of the term, I believe that IMDB says it applies a "Bayesian mean" to its user ratings, essentially meaning the formula on this page.

IMO, if it is going to call the method Bayesian, the article needs to be much more explicit as to how the adjustment can arise in a properly Bayesian setting; and to identify that, even if it is true that this calculation may sometimes be called "the Bayesian mean" (citation needed), nevertheless it is only actually the Bayesian estimate of the (population) mean if particular modelling choices have been made. Jheald (talk) 09:44, 30 June 2009 (UTC)[reply]

The context here is related to that of a Shrinkage estimator and it would probably also be possible to present is as a type of Empirical Bayes estimate. However, the basis of the estimator need not be Bayesian in any formal sense as such estimators can be derived from a MVUE approach ... and thus no distributional assumptions are needed. A simple linear model involving group means could be set up and the theory worked out which would yield an optimal estimate for a group mean, weight the observed mean for the chosen group together with the overall mean, assuming the relevant variances are known. But the main questions are ... should this be called a Bayesian average (or who calls it a Bayesian average) and is it important enough for a separate article? Perhaps something could be added to Shrinkage estimator . Melcombe (talk) 10:26, 30 June 2009 (UTC)[reply]

There's nothing particularly Bayesian about this article; I'd delete it, the article is confusing and badly motivated. Bill Jefferys (talk) 20:12, 10 July 2009 (UTC)[reply]

In a bid to prevent this article from being deleted, I have entirely rewritten the introduction based on my own understanding of the subject. The language is totally laymen (and no citation) but I thought I should start by making the article understandable at least, then we can improve from there. I really don't know what to do with the sections though. They're in pretty bad shape. --Mizst (talk) 14:03, 22 July 2009 (UTC)[reply]

To address the notability issue, Bayesian average is in use mostly in review sites, most popular of them (that I've seen) is probably IMDB as mentioned earlier. I have a few more examples: www.thebroth.com, www.mangaupdates.com, and www.boardgamegeek.com. In these sites, they pad out the reviews with arbitrary scores until a certain amount of reviews is reached in order to prevent a lopsided computed average as a consequence of the small number of initial reviews, and they call this method Bayesian Average. If considered in the sense that the probabilist is imposing his prior experience/belief (of scores) which is outside of the data at hand (the actual reviews) into the representative statistic (the average), then it could be considered Bayesian. --Mizst (talk) 17:08, 22 July 2009 (UTC)[reply]

What a mess....[edit]

This article still begins as follows:

A Bayesian average is a method of calculating the mean of a data set[...]

That is nonsense. Obviously this is an attempt to ESTIMATE a mean of a POPULATION by using a DATA SET. It is NOT an attempt to calculate the mean of the DATA SET. Michael Hardy (talk) 15:41, 22 July 2009 (UTC)[reply]

Thanks for pointing that out. I actually added the later paragraphs before modifying the existing top so that got lost on me. I have a tendency to keep whatever's there too, a hard wikipedian habit which is hard to shake. Btw, you can also edit any errors you spot yourself too which is encouraged. You're probably actually more qualified than me as you said you have a Ph.D. in statistics. --Mizst (talk) 16:12, 22 July 2009 (UTC)[reply]

Factual Accuracy[edit]

Let's clean up this article step by step toward the way of a quality article. Michael, would you kindly start by stating the currently disputed factual accuracy in the article? (as it was you who put the {accuracy} tag there) This will enable us or other people to start cleaning them up. --Mizst (talk) 17:14, 22 July 2009 (UTC)[reply]

I've removed the accuracy tag. I've had to spend some effort at guessing what this was trying to say. There's a question of what is Bayesian about this. Bayesianism is about probability as degree-of-belief in propositions that are uncertain. This would coincide with posterior expected value if both the prior and the data were normally distributed, so in those circumstances it could be considered Bayesian. But the article doesn't say that. There is also a question of whether this sort of shrinkage estimator should be considered desirable independently of that sort of consideration, and then only afterwards one should address the question of probability distributions. But I'm not sure how one would argue for such a thing. Michael Hardy (talk) 19:05, 22 July 2009 (UTC)[reply]

Hmm ... I think I may have confused Bayes' Theorem with Bayesian Interpretation when I rewrote the sentence in the article. Actually the way "Bayesian Average" is employed is specifically the subjectivist view of Bayesian Interpretation. In a way, the person computing the statistic believes that the arithmetic mean does not represent the population, so he adds other information into that mean to get closer to what he believes the population is, which doesn't have to be a normal distribution. Since it is subjective, whether it is desirable depends on how much you agree with him. --Mizst (talk) 19:56, 22 July 2009 (UTC)[reply]

The real reasons for the overly simplistic model are probably just simplicity, essentially zero-cost calculation and not giving too conspicuously bogus numbers (as opposed to simple mean). It reminds me of the use of naive Bayes classifier in Bayesian spam filtering - I think the reasons for the choice are essentially the same, and that the people doing these things are similar (programmers who want a quick 80% solution, not statisticians). For people who really care about accuracy there are plenty of more serious approaches, see e.g. the Netflix prize.

That said, what this article really needs is reliable sources. I hadn't heard of this article's topic before either, and am not sure if it's really notable. -- Coffee2theorems (talk) 01:13, 26 July 2009 (UTC)[reply]

Example has incorrect calculations[edit]

The table with the data for basketball players, students, and the actor, has incorrect Bayesian Averages.

The calculations result in this:

Basketball Players

=(((average amount of data per set * average height) + (amount of data per set *average height per set))/(average amount of data per set+ amount of data per set ))

=(((8.666666667 * 190.333333333333) + (15* 191))/(8.666666667 + 15))

=190.7558685

Students

=(((average amount of data per set * average height) + (amount of data per set *average height per set))/(average amount of data per set+ amount of data per set ))

=(((8.666666667 * 190.333333333333) + (10* 179))/(8.666666667 + 10))

=184.2619048

Actors

=(((average amount of data per set * average height) + (amount of data per set *average height per set))/(average amount of data per set+ amount of data per set ))

=(((8.666666667 * 190.333333333333) + (1* 201))/(8.666666667 + 1))

=191.4367816

Scarborough Res (talk) 06:11, 15 December 2011 (UTC)[reply]

The text above the table makes it clear that the average height of the population is 176 cm, for which the values in the table are approximately correct. --Brilliand (talk) 06:20, 17 January 2012 (UTC)[reply]

--Polzme (talk) 11:48, 9 October 2014 (UTC)[reply]

(15*191 + 10*179 + 1*201)/(15+10+1) = 186.76 and not 190.33.

Citations?[edit]

As far as I can tell, this technique is being used in place of collaborative filtering, which typically requires building profiles of user ratings before a recommendation can be made. Given the lack of user profiles, it looks similar to techniques used in reputation systems. I was able to find a paper on computing expected ratings (in a similar way) for multinomial dirichlet here. Benjaminbishop (talk) 20:28, 8 December 2009 (UTC)[reply]

Missing information[edit]

Even in its limited form, this article is incomplete. It would be useful to have all terms of the equation clearly defined. --Japarthur (talk) 08:19, 28 April 2017 (UTC)[reply]

Potential?[edit]

I've thought for a while that this article has potential, if it said more. I've occasionally thought of trying to see if I could do something with it. I see that someone's proposed a merger. The subject of the Additive smoothing article seems to be quite similar. Michael Hardy (talk) 02:08, 5 September 2018 (UTC)[reply]