Talk:Statistics/Archive 5

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 3

Archive 4

Archive 5

Proposed merge with Mathematical statistics

No consensus. Power~enwiki (talk) 05:45, 27 July 2017 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

As per rationale via Wikipedia:Articles for deletion/Mathematical statistics (2nd nomination). Appears to be the third suggestion to merge. Nordic Nightfury 16:10, 29 November 2016 (UTC)

Support. ~~The book on non-mathematical statistics must be very thin.~~ Remarkably, the article on Mathematical statistics is not actually very mathematical, though it certainly discusses mathematical concepts. Isambard Kingdom (talk) 17:10, 29 November 2016 (UTC)

Support. Statistics is a mathematical discipline. Jmc200 (talk) 14:06, 15 December 2016 (UTC)

Oppose. Sorry guys, the above reasons for support won't wash. It's obvious, innit, that statistics is a subset of mathematics. However, there are particular areas & methods within statistics that rely more directly on mathematical techniques than others, and these areas can be grouped and called 'mathematical statistics'. Look, for example, at the modules available from the Open University: among a number of statistical modules there is a level 3 one called 'mathematical statistics'. So don't deny that there is such a field within statistics, and don't claim that is the whole field. People using statistics recognise the validity of the description. Gravuritas (talk) 16:59, 15 December 2016 (UTC)

PS having just looked at the Mathematical statistics,then confusion reigning is understandable. It starts off with some very wobbly defintions, and proceeds to fill much of the article with stuff which I would call stats, not mathematical stats. Please suspend the "merge" request for a month and I'll try to change the math stat article so that at least it does what it says on the tin. Gravuritas (talk) 17:14, 15 December 2016 (UTC)

Oppose. If there was no more to statistics than mathematical statistics, the 'mathematical' prefix would be redundant. But there is much more, such as the non-mathematical aspects of the design of surveys, experiments and observational studies, graphical and tabular display of data and results, the philosophy of statistics, plus the whole body of knowledge about how to chose which mathematical method to use. Qwfp (talk) 17:49, 15 December 2016 (UTC)

Support. "Statistics" as a scientific field is synonymous to "mathematical statistics", which should be distinct from the everyday use of "statistics" that refers to "summary statistics". The "Statistics" article should be either redirected to mathematical statistics or become a disambiguation page for "mathematical statistics" and "summary statistics". Delafé (talk) 22:35, 16 December 2016 (UTC)

Comment: the statement ' "Statistics" as a scientific field is synonymous to "mathematical statistics", ' is a numpty statement. Let's take two examples: 1. Much work with the normal distribution involves the user in nothing more than simple arithmetic, and so is not considered to be part of mathematical,statistics. 2. Some work with Likelihood functions involves the frequent use of differentiation, and so is considered mathematical statistics. The inability of pure mathematicians and non-mathematicians to recognise the difference is frankly, irrelevant. Ask a non-academic statistician and s/he will understand the difference. Gravuritas (talk) 01:17, 17 December 2016 (UTC)

I'm not sure what you mean by "user making simple arithmetic on the normal distribution" but this sounds like a very personal point of view based on a very personal perception of statistics and mathematics in general. You seem to imply that differentiation is allegedly closer to mathematics than some other analytical process just because it is (in your opinion) more difficult to apply on paper. If that is the case then I must say you haven't understood at all what mathematics is about (and it is probably why concepts that are common knowledge to a mathematician are "numpty" to you). The fact is that any trained statistician or specialised mathematician knows that the principles of statistical inference are pure mathematics in nature and have been established by means of mathematical proof, something which cannot be said for, e.g., the algorithmic inference made by predictive models in machine learning. Delafé (talk) 10:04, 21 December 2016 (UTC)

Response. Let me try one more time. The sub-field of mathematics, which is statistics, is useful to a great number of people, of a very wide range of ability, training, and experience. Many elements of statistical thinking and reasoning are sufficently easy to use that lots of people use them. Some elements of statistical reasoning demand a higher level of ability and/or training in maths than most people can cope with or been educated in, and this disparate set of techniques can conveniently be called 'mathematical statistics'. If you wish, I can list the topics that the Open University consider to be mathematical statistics. There is absolutely no assertion on my part that the rest of stats is 'non-mathematical'. If you like, we could effectively split stats into 'easy stats' and 'Less easy stats' with mathematical stats being the latter. @MaxEnt yes, non-academic statisticians would draw approximately the same line- for instance, none of the stats used by Six Sigma practitioners industrially would be mathematical statistics.

Gravuritas (talk) 17:33, 5 April 2017 (UTC)

Support. Generally I prefer splitting over lumping, but the cuts need to be helpful, rather than obstructive. Gravuritas says "Ask a non-academic statistician and s/he will understand the difference." Unfortunately, the standard isn't "understand", it's explicate. Would any two of these intuitive non-specialists draw essentially the same line? If not, what we have here is a noddable of nonagreement, where everyone agrees internally that such a line exists, but when pressed no two people wish to carve it in the same place.

On the other side, does non-mathematical statistics even make sense? Imagine telling some math-hating child "oh, this isn't math, it's least squares". Statistics prior to least squares has about the relation to modern statistics that alchemy has to modern chemistry (defined error estimator about as central to the reformation as the periodic table).

For my own purposes, I might be tempted to lasso traditional statistics (distilled facts about national populations useful to government) under a page titled "statistics (public administration)". But I'm sure not going to invent an arbitrary wall to partition statistics into "statistics (alchemy)" and "statistics (chemistry)", as if that usefully aids the great unwashed who visit here.

After checking out Darwin, Galton and the Statistical Enlightenment, another page name for old-school statistics would apparently be "statistics (unenlightened)", and then I guess old-old-school statistics would be "statistics (applied phlebotium)" — look ma, no math at all! — MaxEnt 18:31, 11 January 2017 (UTC)

Oppose There is a lot of statistics that is outside the scope of mathematics, you can see my examples in Talk:Statistics#Definition_of_.22statistics.22. Mcshuffles (talk) 14:38, 4 April 2017 (UTC)

Oppose These are very different topics, and the combined article would be too varied to be useful to many people. Elliot321 (talk) 17:39, 4 April 2017 (UTC)

Support We should combine the articles because the articles are so similar and basically are the same. Not-a-parted-haired-libertarian (talk) 14:28, 10 April 2017 (UTC)

The preceding comment (by Not-a-parted-haired-libertarian) was moved from the next section to this one, because I think the editor simply made a mistake when they placed it at the bottom of the page. I added the "Support" label for the convenience of readers. Note that this user has since been blocked for sock puppetry (not, AFAIK, involving any other user who has commented here) and disruptive editing, but this comment seems reasonable, so… - dcljr (talk) 10:02, 23 April 2017 (UTC)

Long comment. (First off, please note that the proposal is for Mathematical statistics to be merged into Statistics, not the other way around. Also, I should point out upfront that I created the Mathematical statistics article back in August 2004, when there was very little statistical information in Wikipedia.) This has come up repeatedly over the past decade (4 different links). Most of the arguments I've seen that MS should not be merged here seem to be based on a hypothetical MS article that has never actually existed — namely, a well developed one containing much material not already covered (or not more appropriately covered) in Statistics (or elsewhere… more on that in a moment). OTOH, many objections to having MS as a separate article seem to be based on a misunderstanding (or mischaracterization) of what the word "mathematical" implies about the topic.

Addressing the second point first: The distinction between statistics and mathematical statistics is quite similar to that between physics and mathematical physics. A lot of physics requires mathematical calculations, of course. And almost all of the physics in use nowadays (being learned by students and being applied by scientists and engineers) was developed after — and is in some way a direct result of — Newton's application of then-cutting-edge mathematics to the subject. But that doesn't mean all of "today's physics" can rightly be called "mathematical physics". Instead, the term (as I understand it) refers to current research in physics that employs various techniques from applied mathematics (especially from mathematical analysis and abstract algebra), as well as to upper-level undergraduate and graduate physics courses taught from the same perspective. (Note that as a math major in college I took 3 semesters of physics alongside physics majors, but none of what I learned was what I'd call "mathematical physics"!) Similarly, just because most statistics being used nowadays is mathematical in nature and follows the mathematical work of Legendre, Galton, Pearson, etc., doesn't make it all "mathematical statistics". Instead, that term is used mainly in academia to refer to research and undergrad/grad courses based on techniques of (mostly) mathematical analysis. There is a contrast to be made with physics, however, in that I think I would count a lot of calculus-based introductory statistics classes as (elementary) mathematical statistics, whereas I don't think most calculus-based introductory physics classes count as mathematical physics. (But perhaps that could be attributed to my own bias or ignorance.)

In any case, note that we also have a Statistical theory article, as well as a redirect at Theoretical statistics that points not to that article but to Mathematical statistics. With this in mind, it is perhaps instructive to consider other fields of study X that not only have an article or redirect at "Mathematical X" but also at "Theoretical X" or "X theory". These include (omitting variations that use "theory" in a different sense than the one under discussion):

Chemistry: Mathematical chemistry, Theoretical chemistry
Biology: Mathematical biology = Theoretical biology = Biological theory (all redirect to Mathematical and theoretical biology)
Economics: Mathematical economics, Economic theory (latter redirects to Economics#Theory)
Psychology: Mathematical psychology, Theoretical psychology

Looking through these, it would seem that "Mathematical X" ("MX") is typically described as involving the application of mathematical methods to the field X, whereas "Theoretical X" or "X theory" ("TX") is more concerned with providing theoretical explanations of observed phenomena in the field X. The distinction is a subtle one, but IP editor 5.151.82.74 explained it this way (paraphrasing remarks made at Talk:Theoretical physics): "MX" tends to be a branch of applied mathematics of interest to mathematicians, whereas "TX" tends to be a subfield of X. One might take the contrast to the extreme and say that "MX" investigates the properties of, and relationships between, mathematical objects that just happen to be inspired by the field X (i.e., to find mathematical truths), whereas "TX" exists to find better explanations of observations collected in field X (i.e., to find scientific "truths" [explanations]).

Now, despite the fact that many people call statistics a science, there isn't really the same level of interplay between theory and experimental observation as in the other sciences listed above. So any distinction between statistical theory and mathematical statistics is perhaps not a useful one for our purposes. Therefore, if we're going to merge and redirect one of them to this article, I would say merge and redirect both; and if we're going to keep them separate from this article, then I would say merge and redirect one to the other. (Note, BTW, that both ST and MS are linked to by a similar number of other articles. By my count, not counting template transclusions: 449 links to Statistical theory and 365 links to Mathematical statistics.)

Perhaps a way of moving forward on this is for one user to take it upon themselves to implement one solution (i.e., in their userspace, with the help of other interested parties) and another user to implement the other solution, and then it will come down to a more concete descision: which solution seems better. (Although keep in mind that some interested parties might not be able to devote much time to this until the summer…) - dcljr (talk) 17:51, 23 April 2017 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Definition of "statistics"

I take issue with the following that was written in this Wikipedia article (introduction section, fourth line from top):

  Some popular definitions are:
   Merriam-Webster dictionary defines statistics as "classified facts representing the conditions of a people in a state – especially the facts that can be stated in numbers or any other tabular or 
   classified arrangement[3]".

I've probably read the definition of "statistics" once or twice before, but I've never seen it specifically (yet so vaguely) attributed to "people in a state." So, I decided to pay Merriam-Webster.com a visit (the source cited) and this is what it actually has listed for the definition of "statistics":

  Definition of statistics
   1: a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data 
   2: a collection of quantitative data

Statistics is confusing enough as it is. Perhaps we shouldn't complicate it any further than it needs to be. I'm not trying to be rude, so I hope this is taken as a friendly piece of criticism. At the very least, though, citations should be used accurately. Perhaps the definition you used was true at the time of writing, although I find it hard to believe Merriam Webster would use such a definition for statistics. Stranger things have happened I suppose. Either way, I just thought I should let you know. Emerald Evergreen 18:14, 27 January 2017 (UTC)

Statistics being a branch of mathematics, depends on your definition of mathematics. I'd argue that certain aspects of statistics are outside the scope of mathematics, like:

certain principles that relate to real-world data (for example the Likelihood principle)
visualising data
biases
statistical algorithms — Preceding unsigned comment added by Mcshuffles (talk • contribs) 14:05, 4 April 2017 (UTC)

Statistics is also treated as separate to mathematics in scientific literature. For example in arvix it is not considered a branch of mathematics. In the world of scientific journal's math and statistics tend to be treated separately.

It would be better to say "statistics is a science" than "statistics is a branch of mathematics", since the later takes sides in an on-going dispute (you can for example see this dispute on CrossValidated https://stats.stackexchange.com/questions/78579/stats-is-not-maths).

I much prefere oxford dictionary's definition

The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.
— https://en.oxforddictionaries.com/definition/statistics

---Mcshuffles (talk) 13:59, 4 April 2017 (UTC)

It looks like that Merriam-Webster definition is talking about plural form of statistic rather than a field of study and research. — Preceding unsigned comment added by 2600:1012:B059:C76A:6C05:6933:637E:DC6A (talk) 19:38, 14 August 2017 (UTC)

Claiming that statistics is not a branch of mathematics is like claiming that energy is unrelated to mass. Clearly the basis of statistics is mathematical logic and mathematical operators. It is doubtful that Pearson, Gauss, or any other progenitor of statistics would claim that it is independent of mathematics and such assertions are self-important at best and ludicrous at worst.99.181.60.231 (talk) 14:04, 25 December 2017 (UTC)

As history goes on, scientific disciplines get more specialized. It is doubtful that Newton would regard physics and math as distinct disciplines, but now we do. Mgnbar (talk) 21:22, 25 December 2017 (UTC)

Normal distribution picture is not correct

The first picture posted on this page as representative of statistics is extremely flawed. The upwards axis posted next to the graph should not be labeled as probability, but probability density, because the normal is a continuous distribution and a probability on that distribution is an integral across a region, not the upwards axis itself. The dashed lines supposedly indicating 1, 2, and 3 standard deviations are clearly wrongly placed. The two points of inflection on the true normal gaussian curve are by definition at ±1*sigma, so even visually it is easy to tell those lines are not correct. In line with the last point, the lines indicating 95% and 99% density very clearly do not cover intervals corresponding to 95% and 99% of the area under that bell curve. There is also an x-bar symbol above the density plot, which is not really appropriate as sample variance is only normally distributed with a known population variance, which is almost never the case.

It seems disingenuous to represent all of statistics with this particular picture given its inaccuracies. I could render a replacement in R in a few minutes, but I wanted to post to the talk page before apparently arbitrarily replacing the main image. — Preceding unsigned comment added by 132.177.238.68 (talk) 15:39, 11 April 2018 (UTC)

First: I don't have a problem with using a normal distribution picture to represent statistics. I mean, no picture is going to represent everything, and the normal distribution is as good as anything else. But you are quite right that this particular picture of the normal distribution is wildly inaccurate. Thanks for pointing it out.

The next issue is: I think that the picture is intentionally "schematic" and not to-scale. If you make the picture realistically scaled, then are the horizontal lines for "95%" and "99%" too close to be visually distinguished? If so, then let's replace them with the 68% and 95% lines at 1σ and 2σ? Mgnbar (talk) 17:40, 11 April 2018 (UTC)

The x-bar symbol above the y-axis seems to be marking the mean of the distribution, not claiming that this is the distribution of the sample mean. Accordingly, it should be marked with the symbol

\mu

or just the word "mean" instead of x-bar. (Note, BTW, that the OP presumably wanted to say "sample mean is only normally distributed with a known population variance" [emphasis added], which is not actually true, as the distrubtion of the sample mean does not depend on knowledge of the population variance, only on its value, whether known or not. But that's kind of beside the point in the present discussion.) Oh, and "T-score" seems to be an "educational assessment" thing, but I would rather not see it in this diagram because of the potential for confusion with Student's t. But that's just my personal preference. - dcljr (talk) 02:56, 12 April 2018 (UTC)

As for the 95%/99% vs. 68%/95% issue, I would try to keep 95%/99% in there, if possible, since those are common confidence levels (for example), and no one cares what precise range actually contains 68% of the probability (as they do with 95%). - dcljr (talk) 03:05, 12 April 2018 (UTC)

File:Galton-height-regress.jpg

Francis Galton's 1889 graph of offspring height by parent height

Good spot about the problems with the image. Without checking, I suspect it's been there an awfully long time. I think it's time to reconsider the lead image for Statistics entirely -- personally i'd like one that includes some real data with some form of statistical fit. After a quick browse around Commons I quite like this one as it's of considerable historical importance yet reasonably clear and visually appealing at a limited size, and it shows real data plus a fitted line. Linear regression is familiar to many so not too scary, but on the other hand it could seem old-fashioned and dull... Comments? Better suggestions? Qwfp (talk) 06:52, 12 April 2018 (UTC)

Note also File:Galton's_correlation_diagram_1875.jpg, a different (older) version also purportedly by Galton, although that one seems a bit too "technical" for a topmost image. Personally, I think a normal distribution image is ideal, if we can get one that's clear and correct. - dcljr (talk) 18:15, 12 April 2018 (UTC)

Or perhaps something like this if we want to illustrate that statistics has moved on a bit since simple linear regression? Opinion polls are a use of statistics that's familiar to most people. Qwfp (talk) 07:17, 12 April 2018 (UTC)

@Qwfp: I recommend being somewhat conservative with the "lead image". Something simple and recognizable to someone who knows a little statistics (hence my preference for a normal distribution), and yet visually interesting enough to attract the interest someone who doesn't. Note that, as of a few days ago, all logged-out users are (by default) now seeing the image we're talking about when they hover on links to this article (log out or change your "Page previews" preference to check this). So, it's time to make a change, I think. I don't really like any of the images in commons:Category:Normal distribution. I wish I could just edit the SVG to correct the mistakes, but I can't (currently). File:Standard deviation diagram micro.svg, while deadly dull, is at least correct looking. How 'bout that as an interim choice, until a better option can be found? Alternatively, since you like real data, File:Fisher iris versicolor sepalwidth.svg, while somewhat cartoonish, refers to a historical dataset. That image is already being used all over the place as an "icon" for statistics related stuff (e.g., Portal:Statistics). - dcljr (talk) 08:37, 19 April 2018 (UTC)

Hey, I just noticed Template:Statistics topics sidebar. If we put this at the top, I think the iris-data image will be used for "page previews". - dcljr (talk) 08:48, 19 April 2018 (UTC)

incomplete

Needs a section on applied statistics in medicine and credit to Florence Nightingale for pioneering this field. 100.15.129.3 (talk) 12:15, 14 August 2018 (UTC)

STATISTIC A FILE OF FORMAT — Preceding unsigned comment added by 115.111.223.59 (talk) 14:12, 5 September 2018 (UTC)

Note on future expansion

There needs to be more information on statistical modelling as well as Bayesian statistics. Esquivalience (talk) 18:21, 12 February 2019 (UTC)

I Think this article is fuzzy

The article is today like a general conversation about statistics, but does not give the reader a clear picture of the question, what is statistics. The article about Sample (statistics) is very fuzzy too. A question is also how the statistical topics in the Wikipedia should be organised, sub-divide?

I think this Statistics article should start something like this:

Statistics is a branch of mathematic Scientific methods and Scholarly methods for processing data of a specific kind, from a defined population, to get scientific describing results. With the statistics come some sub-activities dependent on the demands of the statistical methods like; data collection, organization, analysis, interpretation and presentation.

Statistic methods are used in general in two specific cases:

Full populations (all Londoners, all German men over 2m tall etc), including data from every individual of the whole
Survey methodology or Sample (statistics) dependent on a few very specific mathematical conditions/coincidences allowing the results of a small sample group of a huge population to represent/be projected on all the population

Full population statistics

Full population statistics are made by governments and authorities most of the time based on data from governmental administrative systems and is a side effect of administrating a country, town or other units. It is normally referred to as National Demography. It is generally used in governmental planning of its services and for Land-use planning to make working town bodies.

Sample statistics

Sample statistics are commonly used by the authorities in planning the future society but one of the major users is private marketing getting the data about customer’s preferences. It is a relatively cheap way of getting very good quality information.

The Sample statistics is possible because of a few very specific mathematical conditions/coincidences and is dependent on that the conditions must be completely fulfilled to be able to use this beautiful luck of the mathematical coincidences. Else the sample is not a sample but a very tiny population and says nothing about the intended huge population. In short the result would be scientific trash. However fulfilling the mathematical conditions/coincidences the sample can be projected on an as huge population as ever, millions and billions of individuals.

The basic Sample statistics conditions are:

There must be a sample of at least 1000 individuals to answer specific questions (Questionnaire, but could also be instances of scientific tests/observations of things like in physics)
The sample must be selected by absolute Randomness of the full population the sample is representing
At least 90% must answer, preferably 95% should answer
Analyse of the non-answers must not reveal a tendency (like all non-answers are communists or religious, non-answer often of a philosophical reason), even if all other conditions are fulfilled, if it fails here, the result is scientific trash. There are no ways of repairing a failure of the previous conditions by analyse of the non-answers.

In general making larger samples than 1000 have very little effect on the accuracy, it is a consequence of the nature of the mathematical conditions/coincidences. Reading about larger samples always put the question, if the manager of the survey really knows the statistics skill? The reason is large samples are very expensive and delivers almost nothing.

One of the most interesting aspects of Sample statistics is that its very specific mathematical conditions/coincidences are the connection between Quantum mechanics and the Classical physics, the world like we see it. This because Quantum mechanics are heavily dependent on Probability distribution and Probability theory. This makes gold always looks like gold and water like water, there are no Quantum mechanics surprises in the common human world. In a group of many the particles as a group statically behaves always as expected.

Simplified it can be expressed like, the world behaves in the classical way humans expect it, as long as we fulfil the basic Sample statistics conditions. It means that if we are watching at least 1000 physical particles (that Quantum mechanics manages) and 950 of them are observed, them as a group beaves in the classical way. The particles are very small, extremly many, observed and in normal life the conditions are always fulfilled. It is not until the individual particles are studied they behave like Quantum mechanics. If these very specific mathematical coincidences wouldn’t exist we would most likely see our world as a thick soup. And if the conditions would not be fulfilled we would see our world very fuzzy and with no answers, black, blind. Because it is a set of very specific mathematical coincidences the sharpness decreases extremely rapidly when the condition fulfilment is slightly worsen. This is why a statistical report with 30% non-answers is pure scientifically trash.

The other way, if we start to study the answers of the individuals in a sample, it says nothing but the individual. And the individual can behave any way. It will get crazy pictures, might not be Quantum mechanics but still quite crazy. Individuals and populations are indeed not the same and can’t be ever.

In other words the basic Sample statistics conditions have a huge impact in ordinary daily life of humans. This is also processed by the human mind in the form of experience and expectations. When seen a 1000 times most people have a pretty strong feeling for what could be expected. We use Sample statistics every day without being aware. But when the sample in ordinary human life experiences is not random they can get indeed very prejudiced. Like watching a town square and estimate the share of immigrants in the country will give the wrong impact, because immigrants have less good housing and tend to be not at home more than domestics. In short the sample is biased, non-random. But we unaware use the sample methods getting our minds. Being aware makes us realise misjudgements.

Statistical methods of processing data

A typical Statistical mathematic Scholarly method for processing data is that the result of data collection is often represented by something looking like a swarm of dots on a two dimensional data chart. To be able to use the data it has to be converted to a mathematical trend line and for that statistical mathematical equations are used. These methods also give quality measurements of the correlation.

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).[3] Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

A standard statistical procedure involves the test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual difference between populations is missed giving a "false negative").[4] Multiple problems have come to be associated with this framework: ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis

Data collection is often the most expensive part

In Full population statistics there are problems with data collection mainly dependent on the accuracy of the data in governmental administrative systems. The main issue is if the collected data is really what it is expected to be and that is dependent on how the data is produced in the administrative systems. Side effects like Tax evasion might make the population to fool the administrative system and so the statistics. Another common problem is public opinion and General Data Protection Regulation limiting available data.

While the math can be processed in ordinary common computers, getting the data especially in Sample statistics can be very tricky and very expensive. In most surveys minimizing the non-answers (is very essential for the usability of the result), cost more than the rest of the entire survey.

In the 1970ies and 1980ies telephone interviews were very efficient means in countries like Sweden where all had fixed phone lines of the governmental monopoly telecom operator. The operator could sell phone lists to the statistical operators for use. People being used to the good society with rapid growth and welfare positively answered gladly any questions. However the telecom network structures are fragmented, the mobile phone world is harder to get a correct sample from and people are reluctant to answer, having no identification of the answering benefiting their life anymore. In fact getting the non-answers below 10% is today almost impossible today, in short the results are often nothing but trash. The problem for many customers of statistics today, is that they are not aware enough to judge the quality of the surveys they are buying. They want it badly and often don't want to know the backside.

The need for data and the money in the business makes always suppliers when jobs are offered for good money. Especially the news media and politicians are heavily dependent on surveys of political faith. The commercial marketing is also in very strong need and with good economic resources. As long as there are someone willing to pay for the impossible there are always people offering fraud and a lot of statistical reports we see today are in fact scientific humbug. Mainly because the non-answers are far too many. It is a mathematical coincidence with tight conditions, making something very cheap and usable possible, but only if the conditions are meet. This is why in politics predictions seems to be less good these days.

--Zzalpha (talk) 01:22, 10 June 2019 (UTC)

You have made similar remarks at Talk:Mathematics. And there you were told that Wikipedia editors need to avoid Wikipedia:Original research and support their edits with Wikipedia:Reliable sources. Please continue your interest in helping Wikipedia, in accordance with its policies and with due respect for the consensuses and compromises that already exist. Mgnbar (talk) 01:29, 10 June 2019 (UTC)

Concur with Mgnbar. If there are no reliable sources which you have to support your proposed changes, the article should stay as it is. Rollidan (talk) 02:11, 10 June 2019 (UTC)

Nomination of Portal:Statistics for deletion

A discussion is taking place as to whether Portal:Statistics is suitable for inclusion in Wikipedia according to Wikipedia's policies and guidelines or whether it should be deleted.

The page will be discussed at Wikipedia:Miscellany for deletion/Portal:Statistics until a consensus is reached, and anyone is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.

Users may edit the page during the discussion, including to improve the page to address concerns raised in the discussion. However, do not remove the deletion notice from the top of the page. North America¹⁰⁰⁰ 13:23, 1 October 2019 (UTC)