Talk:Johnson–Lindenstrauss lemma

What is epsilon? and other suggestions[edit]

The symbol ε is not defined before the first time it is used. It would probably be useful to expand the "one proof of the lemma takes ƒ to be a suitable multiple of the orthogonal projection onto a random subspace...." sentence into a full-fledge example application of the lemma. Intuitively, I'm pretty sure that it would say that the distances between data points will be compressed by n/N, within a factor of 1±ε. —AySz88 \^-^ 01:46, 13 February 2010 (UTC)[reply]

Answer: epsilon is a bound on the admissible distortion as measured by the ratio of the squared pairwise euclidean distances of random samples from projected dataset and the original dataset.

Where does the log(1/ε) comes from?[edit]

My understanding is that the exact bound (from the referenced papers) is: k >= 4 * log(m) / (ε^2 / 2 - ε^3 / 3) hence O(ln(m) / ε^2) looks more correct to me (maybe I don't understand the big omega notation). Also what about putting the exact bound on k rather than the big omega notation? — Preceding unsigned comment added by Ogrisel (talk • contribs) 10:12, 4 January 2012 (UTC)[reply]

Hobsonlane (talk) 00:34, 26 September 2015 (UTC): Another interpretation includes an additional factor on the (1 +/- ε) bounds, the square root ratio of the number of dimensions in each space, (1 +/- ε) * sqrt(k/d):[reply]

https://www.cs.berkeley.edu/~satishr/cs270/sp11/rough-notes/measure-concentration.pdf https://www.cs.berkeley.edu/~satishr/cs270/sp13/slides/lec-18.handout-nup.pdf https://www.cs.berkeley.edu/~satishr/cs270/sp13/slides/lec-18.pdf

But I think this Rao interpretation is wrong. It is inconsistent at best, inverting the dimension ratio to sqrt(d/k) in other places. So I didn't add this factor to the formula in the article, but Rao seems to be onto something that makes sense. The ratio of dimension reduction should have an impact on the distance preservation bounds that the Lemma guarantees. It also makes sense that there is a probability that those distance bounds are satisfied, as in the Berkeley/Rao slides and handouts. Hobsonlane (talk) 00:33, 26 September 2015 (UTC)[reply]

Does anyone have a reference for the 8(ln m)/(eps^2) bound?[edit]

Does anyone have a reference for the $n>8(\ln m)/\varepsilon ^{2}$ bound? In e.g. Dubhashi and Panconesi's Concentration of Measure for the Analysis of Randomized Algorithms (2009, Cambridge University Press; Theorem 2.1) the bound is $n\geq 4(\ln m)/(\varepsilon ^{2}/2-\varepsilon ^{3}/3).$ We do have $4(\ln m)/(\varepsilon ^{2}/2-\varepsilon ^{3}/3)\geq 8(\ln m)/\varepsilon ^{2},$ so if $1<m$ then $n>8(\ln m)/\varepsilon ^{2}$ ; this assumption, however, is weaker and hence gives a stronger theorem.

I've seen some recent references say $n>8(\ln m)/\varepsilon ^{2}$ . However, every one of these references makes me wary; every single one states just the result without proof, and there's no telling where they found the bound $n>8(\ln m)/\varepsilon ^{2}$ -- did they just copy it from Wikipedia? Every reference I've seen that actually has a proof has stated a bound that is (in some cases) more restrictive.
Noting $4(\ln m)/(\varepsilon ^{2}/2-\varepsilon ^{3}/3)=8(\ln m)/(\varepsilon ^{2}(1-2\varepsilon /3))\leq 24(\ln m)/\varepsilon ^{2}$ yields that the theorem holds for any $n\geq 24(\ln m)/\varepsilon ^{2}.$
Mohri et al. (Foundations of Machine Learning, 2018, MIT Press; Lemma 15.4) give the condition $n>20(\ln m)/\varepsilon ^{2},$ though they also assume $4<m$ .
Matousek (Lectures on Discrete Geometry, 2001, Springer; page 300, Proof of Theorem 15.2.1) gives the condition $n>200(\ln m)/\varepsilon ^{2}.$

Thatsme314 (talk) 10:06, 3 June 2022 (UTC)[reply]

Section 3.1.3 of Duchi's Statistics 311 course derives this Lemma (see "Lecture Notes" link under https://web.stanford.edu/class/stats311/). They derive a

n\geq 16\log(m)/\epsilon ^{2}

bound and the proof is pretty straightforward. Since this is is consistent with all of the other bounds you've found, perhaps this bound could be used instead and we remove the dispute tag?

Otherwise it is not obvious to me how to get the prefactor down to 8, but it would be entirely unsurprising if that could be done. It might be that chi-squared can be subexponential with a tighter factor? 38.64.135.102 (talk) 15:24, 21 October 2022 (UTC)[reply]

I do not claim that this applies here, but in general the Concentration of Measure should be used with care as a source. It contains a lot of typos and sometimes misrepresents the conditions needed for a result to hold to an extend that make the statement wrong. Best, 80.109.197.172 (talk) 15:35, 6 February 2023 (UTC)[reply]

Lemma 2.6 from [1]https://cims.nyu.edu/~cfgranda/pages/OBDA_spring16/material/random_projections.pdf appears to have the desired bound. 148.64.99.246 (talk) 19:39, 19 June 2023 (UTC)[reply]

Looks legit on a very quick skim, I added that ref and removed the dispute. Joanico (talk) 17:29, 3 March 2024 (UTC)[reply]