Talk:False discovery rate

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Domain of applicability[edit]

Many practitioners are not even aware that controlling the FDR=E[V/R] means little or nothing when the distribution of V/R has non-negligible dispersion. The asymptotic variance of V/R is: , where is the nominal FDR, is the prior probability that the statistic comes from the null distribution and is the expected rejection fraction. The standard error of V/R dies as where m is the number of simultaneous tests. So an appropriate combination of large enough m and high enough expected rejection fraction is the situation where FDR adequately summarizes V/R. Otherwise one is better off using a procedure which directly controls , such as Romano's method or a method based upon asymptotic approximation. *Lehmann, E. L.; Romano, Joseph P. (2005), Generalizations of the familywise error rate, vol. 33, pp. 1138--1154, doi:10.1214/009053605000000084, ISSN 0090-5364 *Izmirlian, Grant (2020), Strong consistency and asymptotic normality for quantities related to the Benjamini–Hochberg false discovery rate procedure, vol. 160, doi:10.1016/j.spl.2020.108713Izmirlig (talk)

Classification of multiple hypothesis tests[edit]

The section "Classification of multiple hypothesis tests" refers to a main article which is the exact copy of it. I think the main article should be more comprehensive. Otherwise there's not a point in mentioning main article. Blue down quark (talk) 19:14, 27 June 2019 (UTC)[reply]

Methods[edit]

Should probably include reference to at least the Bonferroni correction, Fisher's LSD (as possible exposition of the problem) and Tukey method. While these methods are not as advanced as those listed, they do form a good basis for the area. HyDeckar 14:07, 17 March 2007 (UTC)[reply]

To me, these seem more relevant for the multiple comparison article than here. Tal Galili (talk) 10:44, 17 February 2013 (UTC)[reply]

Algorithms[edit]

This article needs sections covering the different FDR methods such as step-up, step-down control, and something about adaptive versus non-adaptive control --Zven 02:10, 21 September 2006 (UTC)[reply]

The statement “proportion of incorrectly rejected type I errors,” i.e. “proportion of incorrectly rejected false positives” does not make much sense. Should it read, perhaps, “proportion of incorrectly rejected null hypotheses in a list of rejected hypotheses”? Why not to use the definition by Benjamini and Hochberg: “expected proportion of errors among the rejected hypotheses.” Or, in other words: The expected ratio of the false positives to the number of rejected null hypotheses. Ref.: Benjamini and Hochberg (1995), J. R. Statist. Soc. B 57, pp. 289-300] [[[User:Jarda Novak|Jarda Novak]] 16:23, 1 November 2006 (UTC)]

For the purposes of clarity, something like your statement, “proportion of incorrectly rejected null hypotheses (type 1 errors) in a list of rejected hypotheses” would be an improvement --Zven 23:08, 2 November 2006 (UTC)[reply]

The variables in the table under 'classification of m hypothesis tests' seem to have been incorrectly defined. According to Benjamini & Hochberg (1995) the proportion of errors commited by falsely regecting the null is Q=V/(V+S), thus V is the number of null's falsely rejected (false positives) and V+S is the total number of nulls rejected. Therefore S is the number of false nulls rejected i.e. true negatives. However, in the table variable S has been defined as a true positive and conversely U has been defined as a true negative, when in fact it's a true positive.

I didn't want to edit the page without checking first, does anyone have any thoughts on this,have I totally lost my mind or am I right? Natashia muna 14:05, 25 October 2007 (UTC)[reply]

Copyrighted material[edit]

The pdf file pointed to by reference [1] (Benjamini and Hochberg) is copyrighted material taken from the jstor.org archive and posted contrary to jstor.org's explicitly stated policies. The pointer to the pdf file should be removed; however the citation can stay. Bill Jefferys 21:49, 22 December 2006 (UTC)[reply]

The link in question was on Yoav Benjaminis (one of the primary authors) homepage, so he is at fault for not adhearing to any copyright on jstor. This is an interesting issue as he is the person in breach of copyright. Is creating a link to his breach also in breach, or just furthur incriminating the author in question? Anyone who wants to can always use a search engine for 'Benjamini & Hochberg' anyway since is in google and others.

Hmm, interesting legal quandary here, as noted above this apparent violation of the publisher's copyright is on the homepage of its author. In most cases, actually, the manuscript belongs to the author(s) until the author(s) transfer copyright to the journal. Therefore, depending on what documents were signed by their respective authors it is entirely possible that one paper could legally be posted on the homepage of its author while another from the same journal would be in violation. We do not know what exact legalese Benjamini and Hochberg signed and therefore cannot determine whether Benjamini is violating that legalese by posting a copy of his paper on his website. In any case, I don't really see how Wikipedia is in breach of the law either way, because the protected content isn't uploaded, all the article does is point readers at Benjamini's page, which could readily be found by many other means anyway. 71.235.75.86 (talk) 00:09, 11 February 2008 (UTC)[reply]

It is quite common for journals to allow authors to post their papers on personal websites, especially after a certain amount of time has elapsed. As mentioned above, it's absurd to make accusations about violating the terms of a contract that we haven't seen. It also seems very unlikely that Benjamini would blatantly violate his agreement with the journal. That said, perhaps the link should direct to the official version of record, leaving those interested in a free version to find one for themselves. — Preceding unsigned comment added by 23.242.207.48 (talk) 00:47, 3 June 2015 (UTC)[reply]

Precision needed[edit]

I noticed in the Dependent tests part that c(m)=1 for positively correlated tests, i.e the same value as for independent tests. As an unfamiliar to the issue it seems surprising to me. Thus I think it should be explicitely written that the value of c(m) for positively correlated tests is the same as for independent tests. At the moment it looks like an error in the article. —The preceding unsigned comment was added by 129.199.1.20 (talk) 09:31, 15 February 2007 (UTC).[reply]

Title does not match content[edit]

The title of the article is "False discovery rate", which is a rate, not a statistical method. False discovery rate control is a statistical method. The content should be moved to a new article, called "False discovery rate control" or something appropriate. The current article, "False discovery rate", should be re-written about the rate of false discoveries. -Pgan002 03:51, 10 May 2007 (UTC)[reply]

False discovery rate (FDR) is an accepted name for this technique in the statistics literature. --Zvika 07:39, 30 August 2007 (UTC)[reply]
I agree with Zvika. Tal Galili (talk) 20:16, 15 February 2013 (UTC)[reply]

There was a time when there was only one known procedure for controlling the FDR and it might have been reasonable to speak of "the FDR procedure." That time has long since passed. It is not appropriate to use the term "false discovery rate" to refer to a technique for controlling the false discovery rate, even if some researchers have done so in the past. No statistician would make that mistake these days. There are error rates and there are procedures for controlling those rates. The two concepts should not be confused. — Preceding unsigned comment added by 108.184.177.134 (talk) 07:35, 29 May 2015 (UTC)[reply]

I agree as well.Consider for example the Benjamini-Hochberge FDR procedure. I actually came here looking for a term to refer generally to all procedures which control a multiple tests generalization of type I error. Izmirlig (talk) 22:05, 17 April 2020 (UTC)[reply]

FDR is not expected FPR[edit]

Changed text to state that FDR is expected proportion of false positives among all significant hypotheses. Previously it stated that it is the expected FPR which is quite wrong. The false positive rate (FPR) is not a Bayesian measure as is the FDR (i.e. incorporates the prior probabilities of hypotheses). — Preceding unsigned comment added by Brianbjparker (talkcontribs) 15:06, 9 December 2010 (UTC)[reply]

Lead[edit]

Could somebody with knowledge of this subject add something to the lead that explains how FDR is a basic statistical measure (FP/(TP+FP)) as well as a method to control for multiple hypothesis testing? –CWenger (^@) 06:31, 26 July 2011 (UTC)[reply]

Definition of the FDR and some other points[edit]

The FDR is defined as the expectation of the ratio V/R and it is said that one might want to keep this expectation under a threshold value α : E(V/R) ≤ α However this inequality makes sense provided we know with respect to which probability measure the integral on the LHS is written. The problem is that we have many null hypothesis and then many ways to choose this probability measure. For the FDR to be a relevant indicator to control in hypothesis testing, I guess the expectation should be evaluated for probability distributions that satisfy the nulls which are assumed to be true. This ambiguity about the expectation should be discarded from the definition of the FDR

Next, V (for instance) is defined as the number of type one errors (ie the number of true nulls that are rejected). It is said it is an unobserved random variable. Everyone would agree it is unobserved as we don't know the number of true nulls (and then which ones are true). Now consider the "event" {V=k} (exactly k true nulls are rejected). I claim this is not an event in the sense that it cannot be written from the inverse image of a measurable function of the data. To tell wether {V=k} has been realized, we need information about the probability distribution of the data (namely wether this probability distribution satisfy or not the rejected nulls). If this is so, V is not a random variable for the random experiment under consideration. --194.57.219.138 (talk) 08:20, 17 February 2012 (UTC)[reply]

Do you have a reference for a paper discussing these issues? Tal Galili (talk) 20:15, 15 February 2013 (UTC)[reply]

Implicit in the definition of any multiple testing procedure is a model. The most appropriate is one which has marginals equal in distribution to a mixture of the null and a common alternative. In this case, , where , is the nominal FDR and is the prior probability that a statistic has the null distribution Izmirlig (talk) 22:09, 17 April 2020 (UTC)[reply]

Moving parts of "Classification of m hypothesis tests"[edit]

I just want to state that this section's content may be useful in other articles, but it should not (IMHO) be removed from the current article, since it is so basic to the way FDR is defined/explained. Cheers, Tal Galili (talk) 20:19, 15 February 2013 (UTC)[reply]

Adding "related statistics" section[edit]

I was noted by my college that the following two:

  • is defined as:
  • The local fdr is defined as:

Are not error rates, but in fact statistics which control some other error rates.

A contribution by future editors may be to check this claim carefully, and if so - see with which error rates they relate.

Also, it is worth checking how the different error rates relate to one another.

Another note from my college: recall Fdr=mFDR (marginal FDR), See Genovese & Wasserman.

Tal Galili (talk) 18:13, 8 March 2013 (UTC)[reply]

RFDR - removed (until proper citation is added)[edit]

The following section was removed from the article since it does not include proper citation:

Note that the mean for these tests is which could be used as a rough FDR, or RFDR, " adjusted for independent (or positively correlated, see below) tests". The RFDR calculation shown here provides a useful approximation and is not part of the Benjamini and Hochberg method; see AFDR below.

When asking an export in the field about the above quote he wrote:

I have no idea where this appeared. I do not understand in what sense it may be argued to be correct. It is also not related to the adaptive FDR

If someone can give proper citation then please restore the above paragraph to the main article. Tal Galili (talk) 14:59, 20 September 2013 (UTC)[reply]


Another paragraph, removed from the "Benjamini–Hochberg–Yekutieli procedure" section:

Using RFDR and second formula above, an approximate FDR, or AFDR, is the min(mean ) for  dependent tests = RFDR / .

No citation's needed. The first test is for (1/m)alpha, the last is for (m/m)alpha, and the mean is simply the midpoint for that list of test criteria, i.e.,

[alpha(1/m) + alpha(m/m)]/2 = (alpha(1+m)/m)/2 = alpha(m+1)/(2m). -Attleboro (talk) 18:18, 11 October 2013 (UTC)[reply]


Attleboro - once the sentence says this is a "rough estimate", I would like to see this given with citation. I will add a "citation needed" instead of removing the text.
No single value can be a "rough estimate" for FDR. Call it what it is, the mean value of alpha. Attleboro (talk) 19:37, 14 October 2013 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on False discovery rate. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 01:07, 28 September 2017 (UTC)[reply]

Move discussion in progress[edit]

Just letting people know, there is a move discussion in progress on Talk:FDR (disambiguation) that you may be interested in participating in. Dr. Vogel (talk) 11:49, 30 June 2019 (UTC)[reply]

Storey's work[edit]

Reading this page, I wonder whether it would benefit from a paragraph on JD Storey's extensions to FDR? The case against is that Storey's work is already mentioned in the "related concepts" section, where his pFDR is equated to Benjamini and Hochberg's FDR+1, with the comment that it can't be strictly controlled; additionally, most people are probably using B&H's original FDR because it's easy to calculate and well known. The case for including Storey is that his extensions have been taken up by others, reviewed, and (perhaps irrelevantly as such things don't qualify as reliable sources) many/most academics who've written teaching-pages on FDR include Storey's approach, or at least reference it. If they think it's worth teaching, there must be some community concensus that it's a notable part of the subject. Typical reviews frequently mention both Benjamini and Hochberg, and Storey in their first paragraphs. I'm happy to write something. But I get a slight impression of a divide in approaches between B&H and Storey? I am not a dedicated statistician, just a user, so (1) I don't want to put my foot in it, and get embroiled in a big theoretical fight about what definitions of FDR can be proven to be controllable, and (2) I am liable to bias what I write towards what a practical data-analyst might wish to do (i.e. many of us don't care whether an FDR is algebraically perfect; we care about whether it practically gives us a good balance between wasting time on false positives, and missing genuine ones). So, I'd be grateful if anyone who's following this page could let me know whether I'm blundering into a catastrophe. I certainly don't want to get into an acrimonious fight about statistics. if no one comments, I'll produce some text in the next week or two. Thanks! Elemimele (talk) 21:17, 10 May 2021 (UTC)[reply]

I've now added a very short paragraph at the bottom of the literature section, referring to Storey's work. I wondered about a more extensive explanation later on, but it sat uneasily with the reference to pFDR in the 'Related Concepts' section, and I couldn't work out where to put it. I think the mention fits in the literature section, because although Benjamini and Hochberg are obviously the central figures in FDR, their work rested on some literature ideas that are already in the literature section, and their work gave rise to an extensive follow-up literature (as all good ideas do). Of all those who've followed-up, Storey leaps out as by far the most notable. He's developed the concept extensively, perhaps not always in ways that B&H's followers would like (not always quite so rigorous, pFDR cannot be controlled in the same way as FDR, being undefined when there are no discoveries), but certainly with a lot of influence. As a crude measure of statistical consensus, almost everyone who teaches FDR seems to mention Storey and q-values in their teaching. I hope what I've added is both correct, and in proportion (i.e. I haven't over-played his role). Sorry if I've messed things up Elemimele (talk) 11:40, 10 June 2021 (UTC)[reply]
Hey, sorry for responding only a month later. I saw what you've added. I think that general statements that quantify contribution should be references. For example, saying "The FDR has proved particularly useful because it quantifies exactly what the researcher needs to know" - is a rather subjective statement. Is it what the researcher needs to know? It depends. If looking for potential candidates for gene research - sure. If it's for a drug trial on which dose should be used - less so. So it's a rather imprecise statement (which I'm about to remove).
Also, the use and estimation of m0 was discussed by BH before Storey. And saying that he contributed to it's dissemination is worth explaining more. The paper you bring "A direct approach to false discovery rates" was cited 5k times. The BH paper (Controlling the false discovery rate: a practical and powerful approach to multiple testing) was cited 76k times. So I think the phrasing appears a bit biased, and require more citation/grounding in references.
I'll change the text a bit. Please take a look to see if you agree. Tal Galili (talk) 11:45, 10 June 2021 (UTC)[reply]
I'm struggling a bit with the changes you've made to the text I suggested.
Dear Elemimele, you've raised several issues here - so let's try to decouple them. I'll add responses per thing you wrote :) Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
(1) I agree that my statement about FDR being useful is subjective, but I think your replacement of my sentence is basically the same thing, and equally subjective. I have no objection to your choice of wording, let's just keep your version and see if anyone else feels it needs citation?
(1) I don't want either your or my sentence to be subjective. They should describe what is either self evident, or comes from direct statement in the literature. If you want - you can copy paste a sentence you'd like us to debate into a new section and respond to it piece by piece - and I'm sure we can reach an agreement. Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
(2) Of course B&H has been cited a lot more than the Storey reference that I inserted in literature; it's more important, which is why it's the prevailing theme of the overall article, which correctly dwells on B&H's work. But that shouldn't preclude any other author being allowed mention: the concept of the False Discovery Rate isn't 'owned' by B&H, and an on-going article has to reflect that.
My point about the more citations wasn't that it's not worth mentioning. It was to say that the original paper does "owe" it's popularity to the other paper. It might have contributed, but I wouldn't emphasize a claim that paper X helped paper Y - without a clear citation/reference that says that this was the contribution of paper X to paper Y in the literature.
(2) "But that shouldn't preclude any other author being allowed mention: the concept of the False Discovery Rate isn't 'owned' by B&H..." I FULLY agree with that statement. FDR is a property. And there are methods that preserve this property. For example, Simes test is exactly the same test as BH. But it is a different "test" in the sense that it was devised (and proved) to have a different property (FWE in the weak sense, i.e.: an omnibus test). So it's worth a mention in an article on FDR, especially because it's basically the same calculation (though use slightly different in practice, as mentioned above). In light of that, I'm happy to extend as much as possible this article. It includes adding more details about the FDR property itself. Expending about different methods that preserve the property. And mentioning (from my stand point, at length), related concepts (such as pFDR and q value). It's obvious that these works are serious, and I'm happy to have them be part of this article (or have their own article, such as the q value, and have them summarized here, with a link to their main article) Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
But my struggle is that the paragraph I wrote is worded completely inappropriately for the location in which you've put it.
o.k., so it may need to be relocated to another section. Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
To be honest, the whole 'Related Concepts' section is awkward as a section because it creates an assumption that 'False Discovery Rate' means 'FDR as defined by B&H' and all other definitions are related-but-not-really-false-discovery-rates. That's not how it is. Actually, there was ground-breaking stuff in which B&H brought to birth the idea of false discovery rates, and defined FDR in a way that allowed them to develop the maths. Since then, many others have worked on false discovery rates, defining them differently according to purpose, and relating their definitions to that of B&H.
These other papers together have combined to develop the overall concept of false discovery rates. Now there are two things to decide: (1) Is this page about false discovery rates in general, or Benjamini and Hochberg's false discovery rate in particular? If the latter, we should come clean, and change its title to 'Benjamini and Hochberg's false discovery rate'.
What you're describing is a deep issue. To my knowledge, there is a term "False discovery rate", there is a paper that defined it in a specific way, and that paper became popular - which cemented the meaning of that term to a specific definition (the one preserved by BH and other methods). Following it, we've had other papers trying different definitions from the same (let's say) family of "false discovery rate" properties. So Storey doesn't say pFDR is FDR, it calls it directly pDFR, and his argument in the paper is that pFDR is the quantity of interest over FDR. That doesn't mean pFDR should be called FDR, it's still a different quantity (not can I see that in that paper he said "from now on, let's call pFDR FDR instead".
It sounds like what you're claiming here is that in reality there is no single "FDR" definition (i.e.: B.H. definition). But in fact, there are many different definitions, under the "family" of FDR. And BH was just the first.
So in a similar way, the article on "Entropy" for example (see Entropy (information theory)), shouldn't be called Entropy, but we should change the name of that article to "Shannon's entropy". And have a more general article about "Entropy", that would include cross entropy, etc (I mean, they're all entropy, aren't they?).
To be honest, that doesn't sound reasonable to me with regards to FDR. There is a specific term someone (BH) coined. He gave a specific definition. It caught on. Others have defined similar definitions. Each of them has it's place, and should be called the same way as their authors proposed they should be called. And, AFAIK, non of them called their alternative definitions "FDR". At least not a way that caught on (if you have a reference to show it is not true - please share). Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
(2) What do we do about the text I wrote? If no one else apart from us two comments, I shall remove it completely in a week, because it just reads wrongly in the 'Related concepts' section, making itself ripe for removal.
I think it would be great to add more "meat" around the pFDR and the q-value. I think more specific information that is based on citations would be wonderful. If you want to write such a paragraph in the talk page, and we can improve it together - that sounds good to me. Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
Incidentally, the citation-needed thing could be replaced by precisely the same citation as I used after my third sentence. All three sentences describe material in that reference, but it looks to me rather over-fussy to cite the same paper in successive sentences.
I think that a statement should be easily "proven" from the citation. If there is a paper with (say) 20 pages, and you say something that is mentioned in pages 2, 14 and 18. Then I wouldn't just give the overall citation at the end of the paragraph. Instead, I suggest to define the citation once (using ref name = "storey2002"), and then re-use the same reference, but add page numbers each time. See: Help:References_and_page_numbers. Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
Can I just add a non-encyclopaedic comment as an outsider to this area of statistics (I have no link either to B&H or Storey): I feel, somehow, that those who publish in FDR have sorted themselves into two teams: the Original-is-Best team who defend the authenticity of B&H and cannot tolerate the existence of anyone else, and the Wild Side, who don't care about whether statistics can be controlled or not, like to express themselves graphically, and are afraid of integral signs (and therefore enjoy Storey's papers). This isn't a very healthy split (it really doesn't reflect well on anyone. Why is it that statistics, of all areas of mathematics, seems to spawn feuds?), and it makes it hard to write a balanced encyclopaedic article as the field is made by two groups one of which won't acknowledge the existence of the other. But we should try. Can you think of a better way to word what I wrote, so it might fit in the 'related concepts' section? Any other editors got a viewpoint? Elemimele (talk) 13:26, 10 June 2021 (UTC)[reply]
My non-encyclopaedic response: I happen to know Benjamini (he was my PhD advisor). He's a wonderful human being, and I love him. On a personal/professional level, I don't know of the "two teams" you mention. They may exist, but I'm unaware of them. I think what I wrote above is not trying to exclude any academic work. I'm trying to keep our joint (encyclopedic) vocabulary "clean" and "consistent". I want terms to have a consistent meaning, when we can. This of course breaks here and there. For example, the phi coefficient (statistics) is also called the Matthews correlation coefficient. They are both the EXACT identical quantities. Should they have two articles, or should they be merged? And if merged, what should be their name? (I have my own opinion) The point being, that this issue comes up here and there, and decisions need to be made.
In this specific case, I think FDR is a specific term, coined in a specific paper, in a specific way. And this is what (IMHO) this article should be about. Part of this term is that it was later expanded in many ways - by Benjamini, Yakutielli, Ruth Heller, and others from Tel Aviv University. But also by many other people and teams from Stanford to many other Universities. I think, as much as they relate to FDR, they are worth a mention in this article (or be described at length). I think specific statements about what was the following contribution (e.g.: they created another definition, another formula, another procedure, etc.) - can (and is worth being) mentioned in this article (and maybe also have their own articles as well).
The thing to be sensitive about are subjective terms like "X popularized Y" - because these are qualitative statements that are worth being grounded in specific numbers. Something like "X introduced Y [ref 1], which has been widely used as well [ref 1]" is a better phrasing since ref 1 can indicate the paper, and ref 2 can link to the google scholar citation score (which is a clear indication of the statement (for example, 5k citation can easily be termed "widely used", by whichever standard of popularity I'm aware of).
I hope what I wrote helps. Tal Galili (talk) 12:57, 11 June 2021 (UTC)[reply]
@Talgalili: Hi, yes, it helps a lot. I need to go away and let my thoughts settle as you've given me a lot of good stuff to think about. I'm sorry I didn't check back earlier! My initial feeling is that we've made some headway in that I can understand why you'd want to keep this article's definition of FDR "clean" of alternative definitions on the grounds that they're not the same thing; i.e. if someone else cares to define a false discovery differently to the original, then their work should be separate on its own page (if it's notable enough). I hope that's fair? Actually the question of definition of FDR is really interesting. In the lead, we're starting out with the intuitive definition FDR = FP / (FP + TP). In the Definitions section we say the same thing: Q=V/R, followed by FDR = E[Q]. I think this is right: it's the fundamental concept. But of course it's not mathematically tractable because it's undefined in the event that R=0, and that's where the other definitions come in. My initial perception was that this page related to FDR under its fundamental definition of E[V/R], so I expected it to cover all significant work that grew out of that definition. Of course the first, and by far the most significant development of this theme was B&H's, where they got round the problem of R=0 by using a functional definition of FDR=E[V/R | R>0].Pr(R=0); this is not as fundamental as the original definition, but it's necessary because it works, and it is also a very rigorous, satisfactory version. My feeling is that when other people have used other functional definitions such as pFDR, they are still talking about the fundamental FDR (E[V/R]), just dealing with the R=0 problem in different ways. I certainly don't think pFDR deserves a page of its own. If it's a 'related concept', it's so closely related that it simply makes no sense without the background of FDR, and it wouldn't have happened were it not for B&H's original work. Also, while the difference matters a lot to a statistician developing the theory, it's meaningless to the end user. No one is going to calculate their false discovery rate when they have no discoveries! In fact they'll only do it when they've got an excess of discoveries and need to draw the line, i.e. in situations where Pr(R=0) is very small, and FDR is approx = pFDR (which is also one of the things Storey relies on, as he then develops the idea of FDR approx = E[V]/E[R]; he's doing this not as a redefinition of FDR, but as an approximation for practical purposes). So that was my thinking in putting Storey in the literature section, rather than as a related concept - I felt that he was representative of the literature that has grown up around the fundamental concept of E[V/R]. Incidentally, he's not a special case; I'm sure the Bayesian people deserve good mention too, but I don't have the expertise to handle them. Just let me add: I'm glad we're all friends! I've never doubted that Benjamini is a good bloke, and for what it's worth, I'm grateful to him and Hochberg on a near-daily basis, for giving me such a valuable tool to define a limit on how far down the rabbit-hole of ever-decreasing significance I'm prepared to plunge. I'll have a think again about how the Storey bit could be reworded if it's to be in 'related concepts', or how else to handle it. q-values only need a brief mention as they've got their own page (which I got renamed to q instead of Q, with welcome assistance from someone!). I agree with you about phrasing and citation; I'll go and think, and come back if I come up with any useful extra text. Best wishes, and thanks! Elemimele (talk) 13:14, 15 June 2021 (UTC)[reply]
@Elemimele: Hey Elemimele,
It sounds like we've reached an alignment on the situation :)
The literature section is inside the history section. I believe that part should relate specifically to the FDR concept (as we've just discussed).
i.e.: what was it's origin. The same way that in articles about probability distributions, there is a clear distinction between history of the distribution and a "related distribution" section.
I do think that the "related concepts" section is currently very thin. I can imagine it growing to be "Followup works and related concepts" (or something like that). And use that section to discuss things as you've mentioned in your reply. This just requires a bunch of work. But I think a simple first step is just use something similar to the paragraph I've moved there, and use that to add more depth. Similar text could probably made to each of the other concepts. And if you, or others, could tie these concepts together in a more coherent way - the better.
I'm a big believer in incremental work. Whatever you can do to move us all one step further would be great :)
Cheers, Tal Galili (talk) 18:56, 15 June 2021 (UTC)[reply]
Yup, @Talgalili: I very much like your idea of gradual expansion of the "related concepts" section to become more than just the rather bald list that it currently is. I was worried about my little paragraph ending up there because it looked out of place in a list, but if more such little paragraphs appear, the whole section will be better. I will try to reword what I wrote so it's appropriate to that location. It might take me a few days. I'll reword it directly there, in hopes that if I get anything wrong, incomplete or inappropriate, you and other editors will feel able to step in. Many thanks! Elemimele (talk) 08:50, 17 June 2021 (UTC)[reply]
@Elemimele: Sounds good :) Tal Galili (talk) 10:20, 17 June 2021 (UTC)[reply]
It is hard to follow your long discussion, however, to me the Storey procedure is qualitatively different to the B-H procedure. B-H estimates an upper bound for the FDR while Storey aims at estimating the (p)FDR. I think this should be reflected by the entry. lukall (talk) 13:56, 7 November 2022 (UTC)[reply]

Q values and expansion[edit]

Can somebody expand it with for example https://www.mv.helsinki.fi/home/mjxpirin/HDS_course/material/HDS3_Qvalue.html Biggerj1 (talk) 13:21, 6 January 2023 (UTC)[reply]