Talk:Independent and identically distributed random variables

Untitled[edit]

In the "Generalizations" section, I am missing pairwise/k-wise independence mentioned (i.e. any pair/k-tuple in the sequence is independent, but larger subsets are not necessarily independent). Pairwise/k-wise independence is used in theoretical CS. --David Pal

Wiki Education Foundation-supported course assignment[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 27 August 2021 and 19 December 2021. Further details are available on the course page. Student editor(s): Hanshenli. Peer reviewers: Yibeiiiii, C.Hua Wang, Joannetsai.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 22:56, 17 January 2022 (UTC)[reply]

auto correlation[edit]

I think this can simply be correlation to determine if observation are IID. auto correlation is simply one for a time domain of similar. Correlation is more general I think. — Preceding unsigned comment added by Chrisparker126 (talk • contribs) 18:40, 7 October 2019 (UTC)[reply]

Link to German Version[edit]

Looks like this would be the corresponding article in German Wikipedia

http://de.wikipedia.org/w/index.php?title=Unabh%C3%A4ngig_und_identisch_verteilt&redirect=no

It links to:

http://de.wikipedia.org/wiki/Zufallsvariable#unabh.C3.A4ngig_und_identisch_verteilt

There in the text you will find "i.i.d. (für independent and identically distributed)".

[edit]

I am leaving this in the 'talk' page, in case my edit is sloppy and removed-- but I aim to include some important information I learned today about IUDs and female anatomy, which is very mundane, but little known information: 'uterine malformation' is a common occurrence in women. We are not informed of its likelihood purchasing a potentially expensive IUD.

It is estimated that 7% of women, according to wiki's Interuterine Malformation page, (other sources will report as high as one fifth of womem) is born with this condition.

When a uterus has an unusual shape, it cannot always accommodate an IUD in such a way that it is effective. The uterus may be cleft in half, making 2 uteri. Some women have 2 cervixes, or 2 vaginas.

These unusually common malformations (they are here https://en.wikipedia.org/wiki/Uterine_malformation), absolutely must have a link in this article, for people considering the use and potential functionality of an interuterine device.

A uterus with 2 chambers cannot be sufficiently protected from pregnancy with this contraception in the same way a woman with a normal uterus would, and I had never heard of the prevalence of this condition until today. its taken me 25 years to hear about it. It would be better to consumers if this practical information were more common knowledge,

Any consumer of this product unaware of the link, or the structure of their uterus runs a risk of pregnancy and wasting money.

In short, an informational relationship between the IUD page and the Uterine Malformation page would be a very helpful one. https://en.wikipedia.org/wiki/Uterine_malformation

^ This seems to be on the wrong page. This is IID, not IUD 203.91.225.198 (talk) 23:23, 25 January 2021 (UTC)[reply]

Not soure how to add the langunage link in this page. — Preceding unsigned comment added by 95.208.167.145 (talk) 17:59, 8 May 2013 (UTC)[reply]

How about explaining 'independent but not identically distributed' variables? The meaning of independence and identical distribution, and its implication, should be more explicitly stated... in my opinion, that is. — Preceding unsigned comment added by 182.216.110.134 (talk) 04:52, 21 April 2016 (UTC)[reply]

IID consistency[edit]

After noticing the lead contained a mixture of both, I made a bold edit in favour of IID which I personally find less visually distracting than the dots in i.i.d. when the term is dropped into every second sentence. However, IID is not exactly beautiful, either, and typographically I would advise IID (i.e. {{sc2|IID}}), except that this is apparently discouraged in the MOS. This article might the one where it makes sense to go against the recommended-style grain, though it's above my pay grade to decide this unilaterally. — MaxEnt 00:57, 20 March 2017 (UTC)[reply]

Your decision has my support, since dropping the full stops (or periods) from initialisms (and other abbreviations) has been throughout the last century or more, and continues to be, a productive process in English writing, as seen in the following usages, for example:

International Business Machines → I.B.M. → IBM
Company → Co. → Co
Proprietary Limited → Pty. Ltd. → Pty Ltd
et caetera (or et cetera) → etc. → etc

As an aside, it seems to me a pity that the equivalent, but more euphonious, IDI – standing, obviously, for Identically Distributed and Independent – did not become the standard usage, as advocated by one of my lecturers in my youth. Oh well, c'est la guerre! yoyo (talk) 15:16, 19 November 2017 (UTC)[reply]

I saw "IID" on another page, and did not know what it meant. In contrast, "i.i.d." would have been immediately clear. Hence, I much prefer "i.i.d.".

What I prefer, however, matters little. Similarly for what you prefer. Rather, Wikipedia should generally follow what is most commonly used by reliable sources. For this, "i.i.d." is used far more commonly than "IID". Hence, I have changed the article to use the former. SolidPhase (talk) 12:45, 23 March 2019 (UTC)[reply]

Usage of the phrase "random variables"[edit]

"An element in the sequence is independent of the random variables that came before it" ... "the probability distribution for the nth random variable is a function of the previous random variable in the sequence"

Shouldn't we use "element" or value instead of the phrase "random variables"in those sentences? Each value in the sequence is a random variable? Or the whole sequence is represented by a random variable? Sarmadys (talk) 05:46, 12 June 2017 (UTC)[reply]

There's a bigger problem. The lead asserts this:

Note that IID refers to sequences of random variables. "Independent and identically distributed" implies an element in the sequence is independent of the random variables that came before it.

The reference given supporting the definition of IID rvv is to Professor Aaron Clauset's notes on a probability primer for a complex systems modelling course. They're fine for their stated purpose, but don't pretend to be a rigorous mathematical treatment of the underlying probability theory. Even so, nowhere in that ref is there a mention of a sequence of random variables. What is there is an indexed set of observations, the index values i coming from an initial segment of positive integers [math]1, 2, … n[/math]. And that's all that IID talks about - a set of observations, each assumed to come from the same underlying probability distribution.

This is the first time I've seen IID defined in terms of a sequence of rvv. Arguably, one (informal) usage of the term sequence in maths is as a set indexed by the first so many (non-zero) "counting numbers", as above. But the usual connotations of the word sequence include that the ordering of the elements is essential - that's what most general readers would expect and possibly infer. However, I assert that the order of the elements is not of the essence in defining IID rvv! To say that it is essential, we need a better source. yoyo (talk) 23:29, 19 November 2017 (UTC)[reply]

I agree that there are big problems here. As well as those identified, the word "sequence" usually implies "countable", but sets of non-countably many iid rvs are often defined in the literature (e.g. here). The notion of sequence also adds a point of confusion for the reader when the article comes to "independent of the random variables that came before it". What about "after it"? What if there is no natural order? It has to be rewritten without the concept of the rvs coming in some order at all, which isn't so difficult. Then there is the section "Definition" which only defines pairwise independence and I strongly suspect that definition is wrong. McKay (talk) 03:01, 20 November 2017 (UTC)[reply]

I think white noise is not IID[edit]

White noise implies constant mean and variance and zero autocorrelation. Correlation only measures linear relationships, and hence does not imply independence, nor does it imply identical probability distribution for all the sequence of ransom variables, since it also concerns itself with the first two mean-centered moments of the distribution. IntelligentET (talk) 22:13, 10 November 2018 (UTC)[reply]

Machine Learning section[edit]

The "In machine learning" section has a number of serious issues. First, it is much too detailed and specific for an article on as general a statistical concept as i.i.d. random variables. Even inclusion of a sentence that amounts to something like "In machine learning, each vector of variables in a dataset is often assumed to be an i.i.d. random vector" would be of doubtful value in this article. The assumption of i.i.d. sampling is pervasive across various applications of statistical analysis of data, serving as the simplest assumption about a data-generating process. Mentioning machine learning can mislead a reader by giving the impression the assumption is particular to machine learning (while in fact it's independent(!) of it). Second, there are claims that are normative (e.g. "currently acquired massive quantities of data to deliver faster, more accurate results") and ill-defined (e.g. "The computer is very efficient to calculate multiple additions, but it is not efficient to calculate the multiplication"). Third, of the two URLs linked as references in the section, one no longer works and the other is not in English, which is not suitable for the English language version of Wikipedia. Fourth, the section written as an answer to the question posed at its beginning: "Why assume the data in machine learning are independent and identically distributed?". The gist of the answer provided is that the log-likelihood function is additive, a simplification that makes for a more tractable optimization problem. But this again isn't particular to machine learning, but to maximum likelihood estimation. Moreover, i.i.d. sampling does not mean that the distribution function is known, so this is implicitly being assumed by the section. And then there's the fact that most machine learning methods are quite different from maximum likelihood. Finally, a good answer to this question would tackle the numerous problems with the assumption of independence in many real-world datasets, due to sample selection, autocorrelation, unobserved confounders, etc.

Undsoweiter (talk) 09:00, 13 January 2022 (UTC)[reply]

I fully agree with @Undsoweiter. There is no reason for explicitly mentioning machine learning. Moreover, I also do not understand the first reason at the end. Why is the cental limit theorem of any relevance at this point? One is not adding together random variables during likelihood optimization. Nmdwolf (talk) 15:01, 26 May 2022 (UTC)[reply]

Degraded quality of article due to edits since November 2021[edit]

While I understand the value of having students edit Wikipedia articles for a class, numerous issues have been introduced into the article. These include: the use of the pronoun "we"; the inconsistent math fonts for independence of events (which also has other issues); the machine learning section as detailed above in a separate comment; the elimination of important examples illustrating how the i.i.d. sampling can be a flawed assumption; the unnecessary mentioning of "data mining" and "signal processing"; etc. With the semester already finished, it is doubtful the issues will be remedied by members of the class. I think all edits since November 2021 should be undone. Undsoweiter (talk) 09:12, 13 January 2022 (UTC)[reply]