Talk:Theil–Sen estimator

Theil–Sen estimator has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.
Review: October 20, 2018. (Reviewed version).

A fact from Theil–Sen estimator appeared on Wikipedia's Main Page in the Did you know column on 8 July 2011 (check views). The text of the entry was as follows:

Did you know... that the Theil–Sen estimator can accurately fit a line to a set of sample points even when up to 29% of the points have been arbitrarily corrupted?

A record of the entry may be seen at Wikipedia:Recent additions/2011/July.

Wikipedia

Statistics

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on the importance scale.

Mathematics

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
???	This article has not yet received a rating on the project's priority scale.

tau[edit]

Quote: "As Sen observed, this estimator is the value that makes the Kendall tau rank correlation coefficient comparing the sample data values yi with their estimated values mxi + b become approximately zero."

Really? Then the method gives an estimation (mxi + b) completely uncorrelated with the estimated variable (yi)? Olaf (talk) 00:49, 27 April 2014 (UTC)[reply]

No, it means that roughly half the yi are greater than the corresponding mxi+b, and roughly half are less. Deltahedron (talk) 19:49, 27 April 2014 (UTC)[reply]

No, it's not median error supposed to be equal to zero as it would be in your interpretation, it's Kendall's tau rank correlation. Counterexample: if y_i = x_i, then the estimator mx_i + b = 1x_i + 0 = x_i = y_i and thus the tau correlation between the estimator mx_i + b and the original value y_i is equal to one, instead of zero. Olaf (talk) 20:07, 27 April 2014 (UTC)[reply]

That's not a particularly good counterexample, since the number of concordant and the number of discordant pairs are both zero, and hence tau=0. Deltahedron (talk) 20:11, 27 April 2014 (UTC)[reply]

Let's check: y₁=1, y₂=2, y₃=3.

Estimations: Y₁=1, Y₂=2, Y₃=3

Concordant pairs:

1<2 and y₁ < Y₂

1<3 and y₁ < Y₃

2<3 and y₂ < Y₃

Tied pairs: none

Discordant pairs: none.

Tau = 1

In absence of tied ranks the tau correlation has the same property as Pearson's correlation: tau(A,A) = 1, and we have no tied ranks, if a_i <> a_j when i<>j

Olaf (talk) 20:23, 27 April 2014 (UTC)[reply]

No, it's the residuals that are all equal and hence uncorrelated. Deltahedron (talk) 20:37, 27 April 2014 (UTC)[reply]

Yes, and the article supposed, it's the estimated values, not their residuals. Now it's fixed ([1]). Thank you for the references. Olaf (talk) 20:43, 27 April 2014 (UTC)[reply]

However, what's important is what independent reliable sources say. Searching "Theil Sen" "Kendall tau" in Google Books gave me: [2], [3], [4] which support the assertion of the text (unlike the reference to Rousseeuw & Leroy (2003), pp. 67, 164 which did not). Deltahedron (talk) 20:19, 27 April 2014 (UTC)[reply]

Ok, so it's tau correlation between estimation error and X value equal to zero, not between estimator and estimated value! (the second reference). Olaf (talk) 20:26, 27 April 2014 (UTC)[reply]

Thanks for clearing this up. —David Eppstein (talk) 22:36, 27 April 2014 (UTC)[reply]

Bias[edit]

The statement on unbiasedness,

The Theil–Sen estimator is an unbiased estimator of the true slope in simple linear regression

is unfounded. The corresponding source explicitly states that Sen's claim to that effect is incorrect. It should be removed. Muhali (talk) 08:38, 14 February 2017 (UTC)[reply]

Just dug a little deeper. Their counterexample is built on asymmetric noise, which is somewhat rare, so maybe we just keep it the way it is stated now. Muhali (talk) 09:04, 14 February 2017 (UTC)[reply]

Accuracy of the estimated slope[edit]

The description seems to be of a kind of percentile bootstrap, but as far as I can see, this is incorrect. The procedure described here would yield a 95% interval for the sampled slopes, not (as it should) of their median. A reference for the described procedure is missing. Maybe someone has a good reference to a good way of doing this? (I don't have one handy now.) --Han691 (talk) 17:23, 19 August 2019 (UTC)[reply]