Talk:Scoring rule

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
???	This article has not yet received a rating on the project's importance scale.

Game theory

	This article is part of WikiProject Game theory, an attempt to improve, grow, and standardize Wikipedia's articles related to Game theory. We need your help! Join in \| Fix a red link \| Add content \| Weigh inGame theoryWikipedia:WikiProject Game theoryTemplate:WikiProject Game theorygame theory articles
???	This article has not yet received a rating on the importance scale.

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Untitled

PurpleMage (talk) 03:13, 16 November 2010 (UTC)The binary decision scoring rule notation of U(x,q) does not lend itself to multiclass scoring. I would like to integrate the notation of binary vs multiclass scoring rules better as the division does not need to be so stark. PurpleMage (talk) 03:13, 16 November 2010 (UTC)[reply]

The introduction to this should explain the usage in terms not just human forecasting, but also in terms of pattern classifier calibration. This article is tricky since p, which is our optimal probability is called the 'forecasters personal probability belief' for forecasting which does not make sense for a machine algorithm that we still desire honesty from.PurpleMage (talk) 05:04, 16 November 2010 (UTC)[reply]

Yes! If so, should make comparisons with estimation theory, for example maximum likelihood. Should also include some proofs. Kjetil Halvorsen 05:43, 2 August 2011 (UTC) — Preceding unsigned comment added by Kjetil1001 (talk • contribs)

I agree. Also, a proof specifically showing that a rule is proper would be a good proof to add. 199.46.199.232 (talk) 01:21, 5 March 2012 (UTC)[reply]

Would it be possible to write the lead section of the article in a way that lets it be understood by common human beings (as opposed to mathematicians)? The third phrase alone contains at least three non trivial concepts with which the reader needs to be familiar in order to understand just that one single sentence, not speaking of the rest of the lead. The same sentence in addition (!) mentions that probablities of all possible outcomes need to sum to one. Given that one knows what a proability is, how does mentioning the fact that sum = 1 helps add anything useful to letting the reader understand the subject topic? If one does not know what probabilies are then again how does that help? see what a lead should be. I assert that the lead is impenetrable already to common human beings and after that the reader gets drowned in math without any more addo. As with many other science related articles this article's target audience seems to be mathematicians AFAICS. I assert that that's not the purpose of Wikipedia. Mathematicians have their own publishing universe that serves as their reference. Wikipedia's main target is the general public and therefore the aim should as far as possible (!) be to allow the general public to understand the writing. I am aware that I am criticizing without improving the article. I guess I would if I felt that I am competent. Thanks TomasPospisek (talk) 21:56, 24 May 2020 (UTC)[reply]

Over three years later, and TomasPospisek's statement still applies. This article is not comprehensible to people who do not have a deep understanding of statistics, and it need not be that way, nor is it useful to keep it so.

Further, weird residual text remains: "A poorly calibrated forecaster might be encouraged to do better by a bonus system. A bonus system designed around a proper scoring rule will incentivize the forecaster to report probabilities equal to his personal beliefs." This is a statement about the psychology of motivation for (weather) forecasters, which is likely quite wrong, and to which the article (Bickel, E.J. (2007)) cited as support is actually irrelevant. Given the topic of the article, these two sentences have nothing useful to say about forecasting or scoring rules and should be deleted. This text comes from long ago, when the surrounding text was different, and although it wasn't useful then either, it made a bit more sense. 38.147.235.238 (talk) 22:07, 19 August 2023 (UTC)[reply]

What is a forecast scheme?

This term is used in the defintion section without explanation. — Charles Stewart (talk) 13:31, 24 February 2017 (UTC)[reply]

first external link outdated?

I can't see a video under the link for "Video comparing spherical, quadratic and logarithmic scoring rules" MathieuPutz (talk) 22:12, 2 January 2023 (UTC)[reply]

add a proper paragraph "Comparison of scoring rules"

This paragraph should discuss the gif in depth and explain what are the graphs which are visible there. Biggerj1 (talk) 12:44, 1 September 2023 (UTC)[reply]

Also when to use which scoring function is interesting, see discussion in https://doi.org/10.1287/deca.1070.0089 Biggerj1 (talk) 21:51, 1 September 2023 (UTC)[reply]

Discuss Problem of extremely imbalanced dataset

Biggerj1 (talk) 06:38, 24 September 2023 (UTC)[reply]

https://stats.stackexchange.com/questions/489106/brier-score-and-extreme-class-imbalance Biggerj1 (talk) 06:39, 24 September 2023 (UTC)[reply]

Discussion on possible merging of this page

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.

To not merge; capable of expansion and scope different. Klbrain (talk) 10:51, 10 November 2024 (UTC)[reply]

On the top of this page, it has been suggested to merge this page together with Loss functions for classification, so I want to open a discussion about that.

I personally do not agree with this, as there's plenty of interesting research that has been done on continuous scoring rules (both univariate and multivariate). Until shortly, only CRPS was briefly mentioned as a continuous scoring rule.

In the past week, I added a variety of material on continuous scoring rules to this page, and I plan to summarize a variety of comparison papers, in order to create a full-fletched comparison of scoring rules section, including an expansion of the applications section, since scoring rules are often applied in machine learning applications. In my opinion, this is enough to warrant a separate page. CuriousDataScientist (talk) 13:31, 11 May 2024 (UTC)[reply]

I agree. In my pov scoring rules are first and foremost about forecast verification, and should be treated separately from loss functions. It is clear that the topics do overlap, but it would be misleading to merge because 1)they stem from different branches of science, and it is good to acknowledge contributions of different fields, least for the sake of history. 2) Losses and metrics are, imho, distinct notions. You might want to minimize a loss (meaning you would study the way it behaves in a minimization algorithm), while you expect a metric to give you information about a phenomenon/system. Scoring rules can be both, they are not "only" losses. 90.55.188.103 (talk) 12:51, 16 July 2024 (UTC)[reply]

Losses can be both as well. RMS loss is a common example of an easily-interpretable rule. Closed Limelike Curves (talk) 15:23, 18 July 2024 (UTC)[reply]

I agree with you that there's a lot of research on continuous scoring rules, so I think that any merge should go in the opposite direction (from loss functions for classification into this page). Loss functions for classification are a specific kind of scoring rule; specifically, they are an application of scoring to classification tasks (usually with binary/categorical predictions rather than probabilistic ones). Closed Limelike Curves (talk) 15:28, 18 July 2024 (UTC)[reply]

I oppose the merge. Loss functions aren't the same as scoring problems, for instance the loss can have a regularization term. The emphasis of scoring functions should be on the goodness of fit. For loss functions on training Earlsofsandwich (talk) 16:50, 4 November 2024 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Unclear citation

The section " Interpretation of proper scoring rules" starts with the claim "All proper scoring rules are equal to weighted sums (integral with a non-negative weighting functional) of the losses in a set of simple two-alternative decision problems that use the probabilistic prediction, each such decision problem having a particular combination of associated cost parameters for false positive and false negative decisions. A strictly proper scoring rule corresponds to having a nonzero weighting for all possible decision thresholds. Any given proper scoring rule is equal to the expected losses with respect to a particular probability distribution over the decision thresholds; thus the choice of a scoring rule corresponds to an assumption about the probability distribution of decision problems for which the predicted probabilities will ultimately be employed, with for example the quadratic loss (or Brier) scoring rule corresponding to a uniform probability of the decision threshold being anywhere between zero and one."

I have been unable to verify this claim using the citations. I think more explicit mention of where such a claim is made needs to be inserted. Niklas V Lehmann (talk) 13:17, 22 November 2024 (UTC)[reply]