Talk:Logistic regression

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
High	This article has been rated as High-importance on the importance scale.

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

Archives

Archive 1

Why would the sum be one?[edit]

The beginning currently ends Each object being detected in the image would be assigned a probability between 0 and 1 and the sum adding to one.

Why would the sum be one? Couldn't the probability of a cat appearing in the image and the probability of a dog appearing in the image be estimated independently it there seems to be several animals in the image? - Tournesol (talk) 07:51, 21 April 2020 (UTC)[reply]

Pseudo-R-squared[edit]

As of 2020-08-19 this article contains a section with a title "Pseudo- $R$ ²s".

A reference to "Logistic regression# $R$ ²s" from within "Coefficient of determination#R² in logistic regression" directed to this article but NOT the desired section. I'm changing the section heading here to "Pseudo-R-squared" and then changing the link accordingly, so the interested reader can more easily find it.

I maybe should make the comparable changes to the similar section head in Coefficient of determination, but I don't know that I'll do that right now. DavidMCEddy (talk) 14:35, 19 August 2020 (UTC)[reply]

simple algebraic manipulation[edit]

please explain — Preceding unsigned comment added by 2001:1C02:C08:4D00:435:AC3A:BB96:176B (talk) 17:34, 15 October 2020 (UTC)[reply]

Wiki Education Foundation-supported course assignment[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Trant22t. Peer reviewers: Trant22t.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 02:47, 17 January 2022 (UTC)[reply]

Utility theory / Elections example is irrelevant[edit]

The elections example must be improved or removed. The example in its current state is irrelevant. A lot of text is dedicated to explaining an abstract political system and that rational actors choose to act rationally. The example culminates in the utility table, followed by redundant truisms like "Different choices have different effects on net utility" and mentions of "regression coefficients", "complex ways", "polynomial regression" and other smart-ish words. Importantly, none of this says how the utility model is related to the logistic regression or uses any notation or equations from the article. I am placing the "original research" plaque. — Preceding unsigned comment added by AVM2019 (talk • contribs) 15:28, 13 May 2022 (UTC)[reply]

Looks like the section in question was added a decade ago, in one of the last major expansions this page received. I agree that it should be reworded in accordance with WP:V and cited, or else removed. I know it was a long time ago, but maybe the author of the content, benwing, has an idea about where to start? - Astrophobe (talk) 17:05, 13 May 2022 (UTC)[reply]

As an option, perhaps this material can be removed and replaced with a 1-paragraph explanation accompanying a reference to Discrete choice, which talks a lot about logit regression. AVM2019 (talk) 11:42, 30 May 2022 (UTC)[reply]

Exam pass/fail is a bad example[edit]

Andrew Gelman argues specifically for such examples that a continuous model should be used in order not to throw data out (19:15). Now, while it's merely a toy example, it solidifies bad practice. People tend to take things literally. It could be easily swapped for something more appropriate. 149.117.159.29 (talk) 15:42, 31 May 2022 (UTC)[reply]

Importance in retrospective studies[edit]

I like how the article has been developed since I reviewed it some years ago. An important thing lacking from the intro and table of contents, and i think not mentioned at all, is the importance of the logit regression model in medical and other retrospective studies. It is really one of the main reason why this model, rather than probit regression, is used. IF one assumes logistically distributed errors in occurrence of some event (which is a little bit weird, it might be more natural to assume normally distributed errors), then logistic regression is proper to estimate parameters (while probit regression applies if errors are normally distributed. I may be stating this imperfectly, but this is for a process where probability (event Y=1 rather than 0) is a function of a linear combination of variables. That is, the outcome Y=1 if X*beta + e > 0, and Y=O otherwise. Here e is the error term. You could randomly sample from a population of data generated that way, and then logistical regression or probit regression (depending on form of error distribution) will "reliably" estimate the parameters in vector beta; they are the maximum likelihood estimates.

The amazing thing is, that if that is the generating process AND the error term is logistical, then even for non-random sampling of a retrospective study with over-sampling of the rare outcome, the logit regression estimation is "correct". I.e. a medical researcher could randomly sample from observed Y=1 outcomes, possibly very rare, or take all such observations, and randomly sample from possibly very numerous Y=0 outcomes. E.g. occurrences of death from a certain cancer, out of all patients at a hospital. Then IF logistical error distribution is assumed, then analysis by logistical regression is proper, and you get the parameter estimates and the log-odds type interpretations, etc. But if the errors are assumed normal, then probit regression is not valid for this retrospective study type of non-random sample. I believe this is huge in why medical studies use logistical regression, it is effectively for convenience in retrospective studies, that errors are assumed logistical. I think more so than for using the log-odds interpretation.

Have i explained this clearly enough? Could this please be explained in intro and body of this article? There are tons of textbook sources for this. (Is or was this already covered in versions of this article?) --Doncram (talk) 16:34, 1 July 2022 (UTC)[reply]