Talk:Identifiability

The previous content of the “Identifiability condition” article:

In mathematics, the identifiability condition says that if a function evaluates the same, then the arguments must be the same. I.e., a function is "identifiable" iff it is one-to-one. It is defined as

f(x)=f(y)\Leftrightarrow x=y\quad \forall x,y

The article on injective functions deals with this same topic more abstractly.

Example 1[edit]

Let the function be the sine function. This function does not satisfy the condition when the parameter is allowed to take any real number.

\sin(0)=0=\sin(2\pi )=\sin(4\pi )=\cdots

However, if the parameter is restricted to $[-\pi /2,\pi /2]$ then it satisfies the condition.

Example 2[edit]

Let the function be $y=f(x)=x^{3}$ . This function clearly satisfies the condition as it is a one-to-one function.

Example 3[edit]

Let the function be the normal distribution with zero mean. For a fixed random value and nonfixed variance, this function satisfies the condition

f(x;\sigma _{1}^{2})=f(x;\sigma _{2}^{2})\Leftrightarrow \sigma _{1}^{2}=\sigma _{2}^{2}

· Category:Elementary mathematics · Category:Articles lacking sources (Erik9bot)

The section that reads, "(for example two functions ƒ1(x) = 10 ≤ x < 1 and ƒ2(x) = 10 ≤ x ≤ 1 differ only at a single point x = 1 — a set of measure zero — and thus cannot be considered as distinct pdfs)", does not make sense mathematically. As written, f1(x) and f2(x) are identical. The do not differ at all. It's not clear to me what the author's intent was.

Lead discussion[edit]

I am reverting the edit by User:Baccyak4H, and apologize for doing so, but it seems to me the edit had the effect contrary to intended: the lead hasn’t become clearer and even to the contrary. The definition suggested by Baccyak4H:

“

A model is identifiable if for two formulations of the model which yield identical descriptions of the dependent variables, then the formulations themselves must be identical as well.

”

raises several immediate questions: what is a “formulation of the model” (in particular, what does it mean to have several alternative formulations), what is the “identical description of the dependent variables”, and what does it mean for formulations to “be identical as well”?

The definition which is in the lead now

“

A parametric model is identifiable, if it is theoretically possible to learn the true value of the model’s underlying parameter after obtaining an infinite number of observations from the model.

”

appears to me much clearer, it talks about being able to find the true value of the parameter of a parametric model. Of course, this is more like defining identifiability through its property, in particular the “Definition” section states different mathematical definition, closer to the one suggested by Baccyak4H. However it is my belief that the lead section has to be as clear as possible, and therefore the definition which simultaneously tells the reader why this property is important, is more welcome. ... stpasha » talk » 18:02, 19 September 2009 (UTC)[reply]

Per WP:BRD I'll respond here and discuss before reverting. However, I wanted to point out there were several other changes I made which improved (IMO) readability. I may go ahead with some of those other (noncontested) changes, but if you object to all of my edit, let me know, and I'll discuss first.

I'll point out that both the current reading, and my replacement, read quite poorly. But I am willing to discuss things here. To wit, "model’s underlying parameter" implies a parametric model with one parameter. "after obtaining an infinite number of observations from the model" is at best a heuristic that may not even be logically possible, depending on one's philosophical slant toward probability.

As a general statement, there are two major schools of thought I have encountered about this topic. 1) It's no big deal, just add some ad hoc arbitrary constraints and full steam ahead. 2) The presence of this issue demonstrates that the practitioner does not know what they are doing and suggests they have a talk with some of the people on the front lines before resuming modeling. Now I know that is all technically WP:OR, but it may help channel the discussion. And I admit it does not help with the unsourced issue. Baccyak4H (Yak!) 03:42, 20 September 2009 (UTC)[reply]

How about the following: "Identifiability is a property of a statistical model, which is defined formally below. Roughly speaking, unless a model is identifiable, statistical inference is not possible."

Incidentally, concerning Baccyak's general statement, here is a quote from the textbook of Lehmann and Casella: ".... Unless this is done, the parameters are statistically meaningless; they are unidentifiable" (italics in the original). This seems to say that identifiability is a required condition for meaningfulness. --Zvika (talk) 05:05, 20 September 2009 (UTC)[reply]

There was something in WP:LEAD about being able to stand alone as concise article, and also about not teasing the reader. So “defined formally below” is not a good option. As for the second sentence, I like it, “statistical inference” sounds better than “being able to estimate the model”. ... stpasha » talk » 05:19, 20 September 2009 (UTC)[reply]

Great; I added that in. Inference is also more precise since it includes both estimation and hypothesis testing. --Zvika (talk) 17:34, 20 September 2009 (UTC)[reply]

This now looks quite better. My suggestion: the parenthetic "(or semiparametric, or nonparametric)" is unattractive, and since the conjunction is with "parametric", one could point out the qualifiers are practically exhaustive and thus essentially tautological. I would suggest rewording that part significantly, perhaps even dropping all three "*parametric" qualifiers. Baccyak4H (Yak!) 03:30, 21 September 2009 (UTC)[reply]

“Model’s underlying parameter” means that the model is defined through some kind of parameter; it can be a parametric model (in which case the parameter is k-dimensional vector), or non-parametric (in which case the “parameter” is infinite-dimensional), or semi-parametric, or semi-nonparametric. In all cases the model does have a parameter which we want to identify, otherwise the notion of identifiability is not defined.

“After obtaining an infinite number of observations from the model” indeed leans towards one of the several possible interpretations of the notion of probability; however this claim is not heuristic. If {X_t} is a sequence of observations from the model, then for any set A∈X (where X is the space where X’s take values) by the law of large numbers

{\frac {1}{T}}\sum _{t=1}^{T}\mathbf {1} _{\{X_{t}\in A\}}\ {\xrightarrow {as}}\ \operatorname {Pr} [X\in A],

and thus having the infinite amount of observations we will be able to recover the probability distribution of X.

It is my belief that this “∞ number of observations”-ish definition is better in explaining the concept of identifiability at layman-level, than the more standard “invertibility of P_θ mapping” approach. This definition is intuitive: if the model is defined as having an unknown parameter, then we kinda want to know what that parameter is, and identifiability is the property of being able to know that. Infinity here appeals to the fact that if we have sufficiently many observations, then this could be thought of as “approximately ∞”, and therefore we should be close to the situation where we can approximately fing the true value of the parameter. At the same time the definition where we say that two distinct values of parameters must generate distinct probability distributions is convenient mathematically, but does poor job at explaining why would we want that.

Besides my definition is easily augmented to partially identifiable models: we simply say that the model is partially undefined if it is possible to learn the values of some of the parameters (but not all) after having observed the infinite number of draws from the model.

As for your other edits (you edited only 3 sentences), I do not object against them strongly, but they just don’t read smoothly enough… First you say “Identifiability is a property of a statistical model regarding its formulation or parameterization”. So basically this sentence adds info that Id. is the property about model’s parametrization, while removes info that the property is in fact essential and is required from basically any model. It also introduces the terms “formulation” and “parametrization”, which ideally should be wikilinked, whereas the current parametrization article says nothing about statistical models. Your next sentence “A model is identifiable if for two formulations of the model which yield identical descriptions of the dependent variables, then the formulations themselves must be identical as well” — we basically discussed it already. Then you remove for some reason the sentence that the model is estimable only when it is identifiable. And lastly, Often a model is identifiable only under certain technical restrictions on its formulation; in such a case the set of these requirements is called the identification conditions for the model. Basically you add here “on its formulation”, which I do not know if it's good or bad since not sure what the “model’s formulation” is. Sometimes the technical restrictions are the restrictions on the parameters, sometimes on the so-called “independent variables” (like E[xx']≠0), sometimes they are so monstrous and so technical that one couldn't even hope to comprehend what that restriction means.

Now as for the general question about identifiability, i believe that it IS actually a big deal. Every single model used in practice is identifiable simply because not-identifiable models are not very interesting and do not get published in the journals. Also every single model in use have its own identification conditions, although most often we are simply interested in the fact that those conditions exist and are plausible enough to assume they hold in practice. However deriving these ID conditions is frequently quite tricky… Often when some new model arises it is not really known whether it is identified or not, so people may do some monte-carlo studies and such, but at least there should be some reason to believe that the model is in fact identifiable, even if we cannot prove that. However if the model is known to be not identifiable, then it’s a very serious problem — such model cannot be estimated and therefore is not of much interest. ... stpasha » talk » 05:11, 20 September 2009 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on Identifiability. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Added archive https://archive.is/20130113035515/http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp358?ijkey=iYp4jPP50F5vdX0&keytype=ref to http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp358?ijkey=iYp4jPP50F5vdX0&keytype=ref

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 11:33, 11 November 2017 (UTC)[reply]