Talk:Binomial test

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

Critical significance level[edit]

Dear author, Are you sure we cannot reject the null hypothesis at the 5% significance level ? Maybe at 1% since p-value is .027?

It's best not to address questions using 'Dear author' - articles are written collaboratively. Richard001 08:29, 23 May 2007 (UTC)[reply]

Working at critical significance level 0.05, we may not reject the null hypothesis in this case. The null hypothesis was that the die is fair (two-tailed), not that the die is loaded towards 6 (a one-tailed statement) but, to calculate the binomial test statistic, we run a one-tail test. The result (P = 0.27) must then be multiplied by 2 before comparing with the critical significance level, 0.05. Thus, 2(0.27) > 0.05, so we cannot reject the null hypothesis.

Sorry - when the two.side statistic is computed in R, the p-value is less then twice the one-tailed. Hence binom.test(51,235,(1/6),alternative="two.sided") returns p-value = 0.04375, and binom.test(51,235,(1/6),alternative="greater") returns p-value = 0.02654. —Preceding unsigned comment added by Achristoffersen (talk • contribs) 09:43, 19 December 2008 (UTC)[reply]

Integrated explanation of how the two-tailed test works and why it is not the same as doubling the one-tailed test. Page now agrees with R :] —Preceding unsigned comment added by 192.150.186.97 (talk) 23:09, 27 May 2009 (UTC)[reply]

Equal-effect size[edit]

I see the logic of the two-tailed test of "equal effect size", but I am not sure it is right. At least it seems arguable to me. The effect size can be seen in probabilistic terms or it could be seen in absolute terms. It really is question of how you frame the question. Is it: What is the probability of getting Xroll(6) +/- Yroll(6)? Or is it: What is the probability of getting a value so far down the tail of the distribution? —Preceding unsigned comment added by 85.6.223.142 (talk) 14:33, 16 October 2010 (UTC)[reply]

It's important to point out that good statistics programs, like R, do not calculate the two-tailed result using an "equal effect size" based on pure distance from mean. That's a heuristic used for back-of-the-envelope calculations. Unfortunately, that heuristic has also been picked up by some less-than-R statistics packages. The correct way to calculate the two-tailed binomial test is to (assuming outcome is less than mean, for this example) sum

\Pr({\text{outcome}}<={\text{observed}})+\Pr({\text{outcome}}>={\text{special}})\,

where the "special" is a the lowest outcome with

\Pr({\text{outcome}}={\text{special}})\leq \Pr({\text{outcome}}={\text{observed}})

. That is, R cuts a horizontal line across the distribution at a height equal to the probability of the observed outcome. It then sums the area underneath that horizontal line (thus capturing both the left and right tails). If you simply double the one-sided result or use an absolute effect size, your two-sided result will be more and more incorrect as your distribution becomes more and more skewed (i.e., as the expected probability strays farther away from 0.5). (side note: To see how R calculates the two-sided result, type "binom.test" alone (without quotes) in R) —TedPavlic (talk/contrib/@) 22:32, 19 February 2012 (UTC)[reply]

An equal "effect size" would suggest a symmetrical distribution, which this is not. I have deleted the relevant paragraph. — Preceding unsigned comment added by 2.25.223.112 (talk) 00:19, 8 May 2012 (UTC)[reply]

In statistical software packages[edit]

The text in this section that describes how to carry out the binomial test in Matlab is incorrect. Specifically, this statement is wrong:

In MATLAB, use binofit:
- [phat,pci]=binofit(51, 235,0.05) (generally two-tailed, one-tailed for the extreme cases "0 out of n" and "n out of n"). You will get back the probability for the dice to roll a six (phat) as well as the confidence interval (pci) for the confidence level of 95% = (1-0.05), respectively a significance of 5%.

In fact, binofit only returns a maximum likelihood estimate of the bias of the coin, and does not perform the hypothesis test. As far as I know, there is no built-in binomial test function in Matlab. I suggest that we delete the binofit line from the article. Paresnah (talk) 22:54, 26 July 2016 (UTC)[reply]

Done. Paresnah (talk) 22:01, 27 July 2016 (UTC)[reply]

Very unclear text in section "Example Binomial test"[edit]

First we have the sentence "One method is to sum the probability that the total deviation in numbers of events in either direction from the expected value is either more than or less than the expected value." I have no clue what that sentence could mean. One should clearly spell out the computation. Mathematical formalism is much clearer than such vague language.

Then we have the sentence "The second method involves computing the probability that the deviation from the expected value is as unlikely or more unlikely than the observed value, i.e. from a comparison of the probability density functions." Again I have no clue what that could mean. One should again spell out exactly what that means, by stating clearly how to compute this value (as it is done in the second paragraph of this section).

It should also made clear how the computation can be reproduced in R. The "one-tailed test" for R in the Section "In statistical software packages" appears twice, while only the second one ("greater") corresponds to the one-tailed test as applied in the Example section (apparently using the "first method", while the "second method" is apparently not provided in R?). If the reader doesn't already know exactly what this page is about, then the explanations won't help, neither are there calculations to help. And the R help also doesn't help, since it does not say precisely what is calculated, but just refers to the literature instead.

Then there is the problem with statements like "In this case, the probability of getting 51 or more 6s on a fair die is 0.02654." The value here is rounded, and one should say this in my opinion (the statement made is simply not true). I believe that to provide the exact values in the footnote would be helpful.Oliver Kullmann (talk) 01:43, 22 May 2019 (UTC)[reply]

Definition of the p-value[edit]

A p-value has to have the uniform distribution under the null hypothesis. The reason for this is that, is so you can use it to reject the null with probability exactly equal to alpha whenever p < alpha. This should be added to the explanation, I think.