Talk:Statistics/Archive 3

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Archive 4

Archive 5

Number of data points

Was wondering if there was a name for the statistical principle that maintains that the more data points you have, the more reliable your dataset will be... Thanks.Jefferson61345 02:30, 8 August 2007 (UTC)

Yes it's the central limit theorem. —Preceding unsigned comment added by 82.32.9.240 (talk) 20:03, 30 September 2007 (UTC)

Are there any theorems or definitions related to a small number of data points? In particular, I'm wondering if there is a definition of the term "poor statistics" (or "weak statistics"), which is sometimes used by scientists when describing the statistical analysis of experimental data sets. Usually, this term is accompanied by the statement that "more data" are needed to improve the statistics. What is the limit in number of data points below which statistics are "poor"? Are there other factors to be taken into account? Is "weak statistics" equal to "poor statistics"? --Uxh (talk) 17:27, 2 May 2008 (UTC)

Questions

Question:- What is the procedure of finding no. of standard n X n latin square design ? Question:- What is the defination non_trivial sufficient statistics ? Pls solve this questions if possible. Thanks a lot. —Preceding unsigned comment added by 164.100.6.9 (talk) 05:40, 5 April 2008 (UTC)

Fallacy?

Statistics can be easily deemed a fallacy. If statistics say that kids whose parents don't talk to them about not smoking are more likely to smoke (you know the common argument), that is a fallacy. Yes, it may be a true statement, but it cannot be argued that the kids whose parents tell them not to smoke would not find smoking cool and that the kids whose parents didn't tell them not to smoke may decide may feel it is disgusting. Statistics as a field tend to treat all people as equal in all regards when that is clearly not true. Not everybody can throw 49 touchdown passes in an NFL season like Peyton Manning did in 2004 or be the leading goal scorer at the Soccer World Cup. I just figured this might be an idea to consider discussing in the article, even though it may be difficult to find a decent source. 205.166.61.142 00:31, 31 August 2006 (UTC)

You make some sweeping generalizations. One of the purposes of statistics is to attempt to explain an outcome with the most explanatory variables. If a certain type of person is more likely to have a certain kind of outcome (for example, black men tend to have more cardiovascular problems), it is in the best interest of such research to treat everyone differently, not the same. Statistics such as the t-test and ANOVA often differentiate people more than treat them the same. I think your football analogy may be one of the fallacies you are talking about. Football statistics are descriptive statistics--they only describe those people to which they apply (in your case, professional football players and nobody else). Inferential statistics, such as the t-test, often group people according to like kinds based on particular variables, like incidence rate of cardiovascular health problems. Chris53516 13:43, 31 August 2006 (UTC)

Let me add to that answer in case the poser of the question returns. Statisical methods are not (correctly) used to prove cause and effect or to make claims that something is always true. Statistics is more of an art of educated guessing where mathematical methods are used to make best decisions about what is most likely or what tends to be related. In fact, built into the methods of statistics are ways of determining how likely you are to make an error in your "educated guessing". Typically, someone using statistical methods correctly will say, "I am 99% sure that these two factors (such as not smoking and parents telling the child not to smoke) are related to each other." Then qualifiers will be added. Even in that case, a good statistician wouldn't claim that one factor causes the other. It could be that both items are caused by some third, unidentified, factor. But, of course, those types of misinterpretations of statistical results are made all the time. That doesn't mean, however, that the cause and effect is not logically the best interpretation to the situation. Suppose, for example, that a large number of people get sick who mostly all ate spinach. We might make a best guess that spinach caused the illness. But, really it might be something else like a common salad dressing used by spinach lovers or the fact that spinach stuck in their teeth chased away potential romantic relationships leaving the spinach-eaters in a heart-sick condition which eventually led to real illness. Of course, those alternatives are ridiculous. I guess they COULD be true, but most people would go with the theory that the spinach was teinted. And even if the spinach was the problem, it could be that, for some, there was another unidentified cause. So, we are left with concluding, "Probably this is the cause most of the time." --Newideas07 21:48, 3 November 2006 (UTC)

Need Link to Reliability (statistics) page

This page needs links to the pages on Reliability (statistics) and Factor Analysis. I'm not sure if these should be put under Statistical Techniques or See Also. I'm also wondering if there should be a link to Cronbach's Alpha (which is one type of reliability estimate).

It seems to me that there are probably quite a few statistical techniques that are not linked from this page. Perhaps it would be helpful to create a hierarchical index of statistical techniques. I see that something like this can be done in the Table of contents. Kbarchard 22:24, 16 September 2006 (UTC)

This page is not a list of statistical topics (which we link to in the "See also" section), and not every statistical technique or estimator needs to be listed here. The ones you mention seem a bit too specialised for a general article on statistics, but could be usefully added to articles like multivariate analysis and social statistics. -- Avenue 01:34, 18 September 2006 (UTC)

Standardized coefficient for DYK

I wrote an aricle on Standardized coefficient, but I am no expert in statistics. If this could be quickly vetted by an editor more experienced with this field, we could have a statistical WP:DYK.--_{Piotr Konieczny aka Prokonsul Piotrus | talk} 20:25, 7 October 2006 (UTC)

Name of Etymology subsection

Etymology here is the study of the history of the word statistics, not the history of statistics itself. The first paragraph or so of the current Etymology subsection is etymology, but the later paragraphs go beyond etymology to actual history of statistics. That's why I think there are many better, broader titles for this subsection. Or maybe I am interpreting etymology too narrowly? Joshua Davis 15:11, 21 October 2006 (UTC)

I think Etymology works, even if it does go beyond simple etymology. It's still related to the word's history. -- Chris53516 16:04, 22 October 2006 (UTC)

I agree that Etymology was not an accurate description here. I've tried to remedy the situation somewhat by moving some of this material to the Statistics Today section. I also removed a reference to Michel Foucault, which does not seem to me to belong here at all. Thefellswooper 22:06, 31 March 2007 (UTC)

Criticism

I would like to propose we change the name of this section to "The Misuse and Limitations of Statistics" or something similar as Joshua suggested. I also would like to make big revisions to it if no one is working on it or attached to it as it is. I'm a statistician (M.S.) and educator. If anyone objects or has a better idea or is already working away hard on this, speak soon or I'll do it. --Newideas07 22:04, 3 November 2006 (UTC)

I think that is a good topic, but for a separate article. There are certainly lots of abuses of statistics, but this page seems fine to me, needing only minor edits. Plf515 02:34, 24 November 2006 (UTC)plf515

I agree with the opening comment. Statistics is one of the three primary branches of mathematics (Pure, Applied and Statistics), and at the moment Pure and Applied seem to get more attention. Go for it Newideas07 David —Preceding unsigned comment added by 82.32.9.240 (talk) 20:01, 30 September 2007 (UTC)

Note about archives

I used a method that others may not like. If someone else wants to change the archive, find and copy any new comments, and begin at this page to do so: Start of archiving. Thanks for being patient while I made these archives. -- Chris53516 (Talk) 23:01, 3 November 2006 (UTC)

Merge from applied statistics

There was a suggestion at Talk:Applied statistics to merge into this article - it's only a stub but it may have some potential. I'll leave it for the statisticians here to decide. Richard001 19:53, 6 February 2007 (UTC)

Merge. In my opinion, "applied statistics" is a redundant phrase. To me it appears that statistics are often applied somehow. So, the article can be merged as a new section or integrated into this article. — Chris53516 ^(Talk) 20:27, 6 February 2007 (UTC)

I am not a statistician and cannot really comment on the material, so I won't "formally" vote. But the long-standing stubbiness and infrequent editing suggest a merge to me. I'd add that Mathematical statistics is similarly meager, covering nothing that isn't already covered here. Joshua R. Davis 13:54, 8 February 2007 (UTC)

Merge. I don't quite agree with Chris53516 when he asserts that "applied statistics" is pleonastic, but this article already covers the distinction between "applied statistics" and "theoretical statistics" adequately, in the introduction. I looked through the applied statistics article carefully, and in my opinion a merger is overkill. Applied statistics should simply be deleted. DavidCBryant 15:48, 8 February 2007 (UTC)

(Note. If someone deletes the page, be sure to redirect it to this article. — Chris53516 ^(Talk) 16:00, 8 February 2007 (UTC))

Having heard no objections, I have gone ahead and changed Applied statistics into a redirect page. Don't give up on Mathematical statistics quite yet, though. I'm trying to get hold of Dcljr, who had quite a few ideas on that score. I'm sure the theoretical article can be turned into something better pretty soon. DavidCBryant 01:50, 14 February 2007 (UTC)

misconceptions

not a statistician here but maybe the article ought to have a section addressing those. statistical mechanics has nothing to do with mathematical statistics. many areas are related to rigorous formulation of statistical mechanics: probability and analysis, topology, number theory, etc., but not statistics. i also removed the reference to "sports statistics". to call computing, say, slugging percentages or ERA's or free throw percentages doing statistics seems rather abhorrent, IMHO. Mct mht 07:09, 10 February 2007 (UTC)

Thanks for taking those (See also) links out. I concur with your decisions. Do you mean to tell me that Maxwell and Boltzmann aren't just two guys who played for the Yankees? ;^> DavidCBryant 12:36, 10 February 2007 (UTC)

who's on first base, Dave? :-) Mct mht 07:26, 11 February 2007 (UTC)

I don't mind losing "statistical mechanics", but in my view removing "sports statistics" is going too far. Sure, the routine collection of free throw percentages etc is not exactly groundbreaking statistical work, but it is a (small) part of statistics. I've seen several articles on aspects of sports statistics in reputable statistical journals. They're admittedly more common in lighter fare (e.g. the ASA's Chance magazine has a regular column titled A Statistician Reads the Sports Pages), but they demonstrate that professional statisticians view sports statistics as within their ambit. -- Avenue 03:21, 11 February 2007 (UTC)

i am certainly in no position to object if that's the concensus of professional statisticians. Mct mht 07:26, 11 February 2007 (UTC)

Statistical mechanics is indeed probabilistic mechanics, but I'd be inclined to leave the link here. Sports statistics, as pointed out, is deeper than people may realize. (There was a great article on this in the WSJ around August or Sept. of last year.) There is legitimate inferential statistics going on there, e.g. attempts to correct for the effects of luck on a player's stats. JJL 03:47, 11 February 2007 (UTC)

I don't much care if sports statistics are listed in this article. At least they're comparable (in quantity) to the other kinds of data regular statisticians deal with. But let's keep the references to physics out of the "see also" list ... the meaning of "statistics" in the context of physics and thermodynamics is substantially different from the meaning this article deals with. I guess I could say I use a result from statistical mechanics (a measurement of the ambient temperature) to "make an informed decision" (whether to wear a flannel shirt, or not). But that really seems like stretching the point, to me. Oh – what's on second, and who's on third. ;^> DavidCBryant 17:25, 11 February 2007 (UTC)

Statistics and Accuracy

Can an expert out there please discuss the topic of statistics and accuracy. For example, do statistics HAVE to be accurate? Or can statistics be a general indication of a trend, reality, etc.

In general, the data from which statistics are derived are as accurate as the observers/experimenters/statisticians can make them. I suppose that observational errors are possible (I might think the lights are off when they're really on ... maybe I just went blind, and haven't realized that yet), but in practice observational errors are fairly rare, and easily controlled.

Even though the observations are accurate, the statistics themselves may be imprecise. In general, the larger the number of observations that can be made, the more precise the statistical estimates that emerge. This tendency of the collected data in a small sample to diverge somewhat from the true characteristics of a sampled population is analyzed, in the first instance, by the statistical variance of the data collected.

Notice that certain kinds of data (mostly relating to people's opinions, and similar subjective measurements) are inherently less reliable than the measurements that can be made in fields like chemistry and physics. Such data can easily be manipulated to reach misleading conclusions, no matter how carefully statistical procedures are carried out (for example, by asking biased questions, or by limiting the allowed responses on a questionnaire, etc.) DavidCBryant 04:33, 10 August 2007 (UTC)

Actually, to qualify as a measurement, a set of observations only have to result in a reduction in uncertainty, not necessarilly ellimination of uncertainty. In other words, if the accuracy is greater than the accuracy of your previous uncertain estimate, then it told you something you didn't know. I just wrote a book about it called "How to Measure Anything".Hubbardaie 22:35, 10 August 2007 (UTC)

Three types of lies

Lies, damn lies and statistics —Preceding unsigned comment added by 70.80.220.247 (talk) 14:46, 28 October 2007 (UTC)

Misuse of statistics

Currently the Misuse of statistics section contains a quote from Dennis Lindley that is not referred to in the text and has nothing to do with misuse, as far as I can tell. I think that this section is also disproportionately large (roughly 20% of the text), in danger of giving the casual reader the impression that statistics as a discipline is inherently untrustworthy or controversial. It's also loaded with weasel words.

I propose that we shorten this section dramatically and leave the details to the Misuse of statistics article (so that it's similar to the short History of statistics section, with its accompanying History of statistics article).

In fact, I think that the misuse/misinterpretation paragraph of the Overview section is itself sufficient, without a Misuse section at all, but probably I'm in the minority there? Joshua R. Davis (talk) 16:42, 20 January 2008 (UTC)

I do think that a misuse of stats. section is valuable here, and a longer article on it elsewhere is also useful. While there may be a case for some rebalancing, I am fine with the section as is. Certainly, many people coming to this page will be familiar with "lying with statistics" and with the perception that stas. is, as you say, inherently controversial, and this section both addresses that and puts it in a more formal context. I agree that the Lindley quote is misplaced here and should be (re)moved. But the current section nicely transitions from the general perception of lying with stats. to the more scientific concerns over hypothesis testing, p-values, etc. JJL (talk) 17:28, 20 January 2008 (UTC)

I agree that the section is worth having, but there's also a lot of room for improvement. I've made a few changes to the second paragraph. I think the part about hypothesis testing could be reduced to a simple statement that CIs are preferable to p-values. The Bayesian bit should either be expanded or removed; just saying it's another option, but has its own critics, gives the reader very little information. Mentioning publication bias might be useful. The paragraph on the Abelson perspective is interesting, but does it really deserve this much prominence? -- Avenue (talk) 23:40, 20 January 2008 (UTC)

I have tried to make the section more concise in a manner compatible with these opinions. It still has a lot of weasel words, since I haven't verified any of the information. Joshua R. Davis (talk) 00:12, 31 January 2008 (UTC)

Statistics As Principled Argument, by Robert P. Abelson

I think that the following is interesting and deserves to be in Wikipedia. But I do not think that it should be in this article. Maybe in a more specialized (new) article on the foundations/philosophy of statistics.

In his book Statistics As Principled Argument, Robert P. Abelson articulates the position that statistics serves as a standardized means of settling disputes between scientists who could otherwise each argue the merits of their own positions ad infinitum. From this point of view, statistics is a form of rhetoric; as with any means of settling disputes, statistical methods can succeed only as long as all parties agree on the approach used.

So I have put the paragraph here, and deleted it from the article. —Preceding unsigned comment added by 86.156.222.165 (talk) 10:22, 31 January 2008 (UTC)

I have created a new article Foundations of statistics, which incorporates the above quoted paragraph. The article is currently a stub. TheSeven (talk) 11:19, 31 January 2008 (UTC)

I think the point of view that staistics is rhetoric is valid and merits inclusion in the main statistics article. The "Misuse" section may not have been the optimal place for it but I'd like to see a statement to that effect somewhere here. Abelson is an obvious reference for that viewpoint but not the only one. JJL (talk) 12:48, 31 January 2008 (UTC)

What I would ideally like is the article Foundations of statistics expanded from a stub into a real article. Then the Statistics article could include a paragraph that summarized the foundations, and linked to that as the main article on the topic. The former should be done anyway, I think; it is an important topic. TheSeven (talk) 14:34, 31 January 2008 (UTC)

Is "Foundations of statistics" a term that statisticians use to talk about this stuff, or did we just make it up? When I hear it (as a non-statistician) I think probability theory. The Abelson stuff seems better described as "Philosophy of statistics". Is there a lot to say about the philosophy of statistics? (I'm honestly asking.) Joshua R. Davis (talk) 14:38, 31 January 2008 (UTC)

"Foundations of statistics", or "foundations of mathematical statistics", are common terms. I just tried googling and got 110,000 results. There are also books with that title. There is a substantial philosophical component to this though. Googling for "philosophy of statistics" gave me 88,000 results. So perhaps there should be an article with that title, which redirects to the foundations article—? TheSeven (talk) 15:01, 31 January 2008 (UTC)

I think "Mathematical Statistics" and "Philosophy of Probability" are more common. You don't see many (separate) phil. of stat. courses; for example, try searching for it at Amazon. The viewpoint Abelson discusses at length isn't his own theory; in my experience it's reasonably common among statisticians--like a formal mathematical proof, an hypothesis test is a form of argumentation (a practical form, a la Peirce, say). I do think that the article must address the fact that an hypothesis test (etc.) is a way of settling disputes as well as a way of finding things out. When the FDA asks for statistical arguments, that's what it wants--an argument that the drug is effective and safe. JJL (talk) 15:14, 31 January 2008 (UTC)

I have not heard the phrase "Philosophy of Probability" before, as far as I can recall. I just tried googling for it, and got 766 results. Compared with 110,000 for "foundations of statistics". TheSeven (talk) 15:27, 31 January 2008 (UTC)

Wait, let's go apples-to-apples! For "Foundations of stats." I think one more commonly sees "Math. stats." as the foundations are in analysis and probability. For "Philosophy of stats." one more commonly sees it as part of a "Philosophy of Probability" course/book than on its own. Here are a few Phil. of Prob. books: [1], [2], [3], [4]. The word chance commonly appears in its place (again, Peirce is an example), and of course it also can be studied in a modern physics context. I can't find a book entitled "Philosophy of Statistics" there; Foundations of stats. does make an appearance [5]. JJL (talk) 15:41, 31 January 2008 (UTC)

This is interesting, especially because several people are claiming that the topic is too insignificant to merit a Wikipedia article of its own. See here—you can vote if you wish. TheSeven (talk) 17:38, 31 January 2008 (UTC)

Would a better name be centred around "statistical inference", rather than just "statistics" ? Melcombe (talk) 15:09, 12 February 2008 (UTC)

The discussion in this section is effectively closed. The right place would now be the discussion in Foundations of statistics. (According to that article though, this is the standard name for the topic. Moreover, it has far more Google hits.) TheSeven (talk) 21:53, 12 February 2008 (UTC)

Considered picture add to history section

Considering to add the following picture to the history section. Any objections or comments should be made now before picture is added. —Preceding unsigned comment added by TeH nOmInAtOr (talk • contribs) 18:42, 12 June 2008 (UTC)

What is the difference between F(x) and f(x)?

Can somebody please explain to me with an example the difference between F(x) and f(x) for a continuous random variable? As far as I understand f(x) is a derivative of F(x), please correct me if I am wrong, but that is not sufficient enough for understanding the whole process. Many thanks. -Chetan. —Preceding unsigned comment added by Chetanpatel13 (talk • contribs)

Those two should be interchangable, as far as I know. By the way, use four ~ to sign with your user ID. Chris53516 17:07, 18 October 2006 (UTC)

Chris, thanks for the response, BTW they are very different. Thanks for the tip and hopefully I am doing it right this time. -- Chetan M Patel 18:24, 18 October 2006 (UTC)

How are they different? Please use 4 ~ to sign your name. It's easier than what you did. Chris53516 18:31, 18 October 2006 (UTC)

f(x) is probability density function (PDF) whereas, F(x) is cumulative distribution function (CDF). Chetan M Patel 18:58, 18 October 2006 (UTC)

The names of the functions are a convention, widely used in statistics. Perhaphs a better question is: whats the difference between a PDF and CDF? Its probably easiest to understand if you know about integration with

F(u)=\int _{x=-\infty }^{x}f(x)dx

. As we are working over a continuous domain the chance of a random variable taking a particular real-value, 0.123456789 say, is zero so it only makes sense to talk of probabilities calculated over a range of values and its a convention to use the range

[-\infty ,x]

giving the CDF. So yes

f(x)={dF \over dx}

. What is the meaning of the PDF, well if you consider a discrete probability distribution like the binomial distribution then the PDF is just the probability of a particular number, here the probabilities of a particualr number 0,1,2,3 occuring is non zero. Futhermore, PDF is useful for visulising the shape of a distribution, for the normal distribution it gives the familiar bell shaped curve, the CDF would be S-shaped and its harder to see whats happening. --Salix alba (talk) 20:45, 18 October 2006 (UTC)

Correction: that should be

F(u)=\int _{x=-\infty }^{u}f(x)dx

. The upper bound of integration must be u if F(u) is what you're evaluating. Michael Hardy 22:47, 18 October 2006 (UTC)

In case anyone wants a "Statistics for Dummies" explanation of all that: f(x) is the drawing of a curve that defines a certain probability density function (pattern). For example, a bell shaped curved has an equation, f(x), and represents a situation in which falling in the middle of some range is most likely with tapering probabilities as you go to the left or right. Most measurements of objects fall in this category. But, probabilities of having x in some range are found by calculating the area under the curve. To find the area under the curve, you have to integrate f(x) to get F(x). Sometimes, that is impossible or just really hard and so approximation techniques are used instead, which is why one reason why you usually get probabilities out of tables instead of using equations. There are other theoretical uses for the two functions. I'm not sure if that clarified things for anyone. --Newideas07 21:23, 3 November 2006 (UTC)

In case that didn't clarify things for some people, the 'statistics for dummies for dummies' version is that the pdf is the height of the density at a given point, whereas the cdf is the area under the curve fro a range of points. For example, if we want to know the probability of a person being 5'9" tall, that's a question for a pdf (f(x); if we want to know the probablity of being 5'9" or less, that's a cdf (F(x)). Plf515 02:09, 24 November 2006 (UTC)plf515

Note: the Y axis in gaussian distribution chart plot is labeled as "probablility". That is NOT correct, since probability for each given point is zero. Y axis stands rather for probability density —Preceding unsigned comment added by 146.107.217.52 (talk) 11:47, 30 June 2008 (UTC)

External links—recent changes

Reasons for most of the changes were given in the edit summaries. The changes should not be reverted without addressing those reasons. I also do not see that a section "Resources at educational institutions" is more helpful for readers than a section "Online courses and textbooks": better to have a section that tells people what is at the link instead of where the link is located. TheSeven (talk) 19:49, 6 September 2008 (UTC)

Where are the "reasons" in your edit summary??? In your edit you reverted "3 E digest links, links to products and link to personal website by non-academic" and deleted the invisible editing comment to prevent future spamming. Please explain why the links you push should be exempted from the restrictions established at WP:ELNO? The external link policy page states: "Except for a link to an official page of the article subject one should avoid: [...] 4. Links mainly intended to promote a website. 5. Links to sites that primarily exist to sell products or services, or to sites with objectionable amounts of advertising [...] 9. Links to the results pages of search engines, search aggregators, or RSS feeds [...] 11. Links to blogs and personal web pages, except those written by a recognized authority. This is meant to be a very limited exception. As a minimum standard, recognized authorities always meet Wikipedia's notability criteria for biographies." [my emphasis]

- About your links: The link to the commercial enterprise StatSoft,Inc. is inappropriate as per 4 and 5 above. The self-published, personal webpage informath is inappropriate as per 4 and 11 above, and it is by a non-authoritative source as per 11 above. The free download site www.freestatistics.info/en/about.php is iffy for several reasons, including 11 above, but also related to this statement by the author: "All programs listed on the Free Statistics Web Site at freestatistics.info are the sole property of their respective authors. [...] I don't accept responsibility about all the sofware listed in the Free Statistics Web Site. My goal is however to list only software that in no way could damage the Pc of users." The site www.ericdigest.org is a great search engine resource where students can conduct searches for scholarly publications in every single subject we cover on Wikipedia - however, it would appear inappropriate as per 9 above. I'll still leave it among the links here now, for further review.

- About the subheads: "Online courses and textbooks" is an invitation to commercial enterprises that sell these products to spam us. Limiting this section to university related websites ensures that we will not involuntarily be issuing spam-invitations to textbook & software companies and companies offering online "diploma mill" courses. The subhead "Other resources" opens the door for sites by non-authoritative sources with self-promotional interests, who are here primarily for the joy of seeing their own personal webpage featured on Wikipedia (please read WP:SPAMMER). As the now deleted entries showed, it was also seen as a spam invitation by companies offering free software sections on their sites as a small part of the main commercial section.

- About the links to the international organizations established to promote the study of statistics: I find them appropriate here, and since that section is not violating any policy, I see no reason to delete it. However, I have not reverted your deletion of that section. 71.106.254.126 (talk) 23:50, 7 September 2008 (UTC)

Taking your points in turn....

Regarding edit summaries, I made many edits, and included a reason with most of them.

I deleted your "invisible editing comment" in error--oops.

After reading point 5 at WP:ELNO#Links_normally_to_be_avoided, I now agree with deleting the StatSoft link.

The informath site is non-commercial, and is written by a mathematician who used to work on Wall Street, now studies independently, and has several peer-reviewed publications [6][7]. And the link is useful/informative. So I do not see how 4 or 11 applies.

I do not understand your objection to freestatistics.

Glad you agree on the ERIC link.

Regarding subheadings, they seem to have attracted only one link that is considered spam (StatSoft--and I think that the link is actually quite useful, and would greatly prefer to include it if it did not violate policy). The subheadings are also helpful for readers. So we do not agree on this.

Regarding links to the international organizations, if you had read my edit summaries you would know that I deleted them because List of academic statistical associations is linked to under "See also".

Two other points....

What do you think of [8]? The content seems good, but there already is a link to something by the same author.

I put a new discussion topic at List of basic statistics topics, which you might like to comment on.

TheSeven (talk) 04:12, 8 September 2008 (UTC)

The external links policy states that personal web pages, except those written by "a recognized authority", are to be avoided. The authors of the sites you have now reintroduced in the article are not "recognized authorities" that fit into the definition on the policy page, which states that "recognized authorities always meet Wikipedia's notability criteria for biographies". Being a "mathematician who used to work on Wall Street" does not exactly automatically elevate you to the status of "authoritative source in statistics" (the same seems to be valid for the software reviewer who writes the articles on the self-published site freestatistics). A quick search of scholarly publications in statistics and math for the past 10 years reveals no peer-reviewed articles by the creator of the site informath in those subjects (although he appears to have published a couple of articles in other fields), and there seems to be no academic institutions teaching statistics that could vouch for his competence as an authority in the subject. All that aside, the page you push has very little substance and is not about the important methodological disputes that have taken place in this subject, which the title seems to imply. It instead makes the following, rather self-obvious and simple, observation: "The assumption-making phase of a statistical analysis can be disputed, unlike the calculation phase." Having that link in the external links section sets a bad example which makes it harder to explain to other users who want to add their own personal webpages that it is inappropriate to do so. May I ask why the mentioned sites are so important for you to include here that you seem ready to engage in an edit war over them? Also: Why are you removing the name of the universities in the other external links? Regarding your question about Prof. David Lane's Rice Virtual Lab in Statistics: that page links to his "HyperStat Online Statistics Textbook". No reason to link to both individually. Will update the external links section accordingly. 71.106.254.126 (talk) 08:57, 8 September 2008 (UTC)

On informath, I did not know about "recognized authorities always meet Wikipedia's notability criteria for biographies"; so I agree with removing this.

You did not address my objection to removing a subheading for "Online courses and textbooks".

You removed freestatistics, after I restored it (with explanation), without explanation.

I do not see why the two "web" links benefit from information like "by Gordon K Smyth, The Walter and Eliza Hall Institute of Medical Research"; how does this help the reader?--for me, it just distracts.

TheSeven (talk) 10:50, 8 September 2008 (UTC)

No, the subhead "Online courses and textbooks" have continuously attracted spam links (as evident by the long-running clean-up efforts in the article's history, for example -a few of many- [9], [10], [11], [12]). The reason is that it is formulated in a way that seems to invite people who sell online courses and textbooks to enter their products, which is not a well formulated headline. To say that it attracted only one spam link is a bit of an understatement.

Freestatistics was commented on in both posts: please reread. In short: The same rules apply to that site as those used to explain why the other personal, self-published website was inappropriate.

About giving the source in the external links, please see WP:EL#External links section. It states: "a concise description of the contents and a clear indication of its source is more important than the actual title of the page". I'll shorten the description of Gordon Smyth's link to make the source of the link more concise, but sources are needed for all of the external links in that section. If the sources distract you, I'd suggest you take that issue up for discussion in a more general discussion of links, for example on the talk page of the policy or style guide pages. 71.106.254.126 (talk) 00:34, 9 September 2008 (UTC)

Regarding "long-running clean-up efforts", you might note that I am the editor for one of the edits that you cite. In fact, I have been cleaning the External links for some time now: most recent four-- [13], [14], [15], [16] (and there are priors). So I am well familiar with the issue.

I still believe that readers benefit from having the subheading, and I have been watching the External links section to keep improper things out. My familiarity with WP policies is obviously not perfect though: you have rightly cited policies with which I was unfamiliar--including here about "a clear indication of [a link's] source". Perhaps you can think of another wording for the subheading that does not, in your view, invite spam. In the meantime, I would prefer to keep the subheading and monitor it--even if my monitoring is not perfect.

What is the reason for mentioning "David Lane", "Gordon K Smyth", and "Statistics Community"? The last is also obviously inaccurate.

The OnlineStat book that I cited above [17] is not the same as the HyperStat book at Rice. For example, compare the discussion of the t distribution in the former [18] to that in the latter [19]. Based on an admittedly-cursory look, the former is the best (free) online intro stats books there is. Yet the Statistics article does not currently link to it.

TheSeven (talk) 10:59, 9 September 2008 (UTC)

The definition of Personal web page, which is linked to from WP:EL, is a web page "created by an individual to contain content of a personal nature". So your criticism of freestatistics and informath is not valid. I have also added "OnLine Statistics", mentioned above. TheSeven (talk) 18:38, 10 September 2008 (UTC)

Self-published ruminations online by an individual who is a non-authoritative source are personal in nature, regardless of subject matter (or, as per the article you refer to: "The content of personal web pages varies and can, depending on the hosting server, contain anything that any other websites do."), The external links policy refer to "personal web pages" in the context self-published site by a non-authoritative source, the opposite of a site "published by a reliable source" and a personal site "published by a recognized authority". As for the argument that the sites are "non-commercial", see common spammer strawmen. Please also read reply at Wikipedia_talk:External_links#Blogs Afv2006 (talk) 20:30, 10 September 2008 (UTC)

Please cite a reference for your claim about "self-published site by a non-authoritative source". Note that if your claim were valid, it would also require removal of other links, such as that for Dallal's Statistical Practice. And your link to Wikipedia_talk:External_links#Blogs makes no sense. I do not believe that you have read what you are editing. What is your purpose? TheSeven (talk) 20:57, 10 September 2008 (UTC)

The source for the statement about self-published sites by non-authorities is the policy page on verifiability: "Anyone can create a website or pay to have a book published, then claim to be an expert in a certain field. For that reason, self-published books, newsletters, personal websites, open wikis, blogs, knols, forum postings, and similar sources are largely not acceptable." See also WP:reliable sources, and point 2 on the List of sites normally to be avoided. Afv2006 (talk) 00:06, 11 September 2008 (UTC)

Using the personal web pages as sources instead

Following some discussion at Wikipedia_talk:External_links#Blogs, I have some additional thoughts on what should and should not be included in External links here. My opinion is that all the current links are appropriate except one, informath. The informath link should probably be removed. On the other hand, I think that the link provides useful information about statistics for non-experts (which will include most readers of the WP article). WP:EL says that if the "page to which you want to link includes information that is not yet a part of the article, consider using it as a source for the article". My suggestion is that that is what should be done: remove that one link and incorporate its content either in this article or perhaps in Misuse of statistics.

Your thoughts? TheSeven (talk) 23:06, 10 September 2008 (UTC)

Drive-by comment. I was shocked to see this article only had one note/reference. Most often there should be zero external links in such a case. If this largish group of external links can be used to reference the article, great. If none of these are reliable sources, then get rid of them all. Presumably that is not the case, so the mission here should be to source this article properly, and then look at whatever external links are left. For a broad topic like this I would think there would be a lot of references, and then only have one single Dmoz external link. 2005 (talk) 23:21, 10 September 2008 (UTC)

Agree 2005, it needs to be properly sourced. As per established policy, personal web pages cannot be used as sources in Wikipedia articles. The ideas presented above, ie. having a self-published, non-peer reviewed article from one of these sites used instead as a source in this article, or using the article to create and base an entire, individual Wikipedia article on, is a bit absurd. That is likely to set off even more bells for those who are concerned when it comes to Wikipedia:Conflict of interest#How to avoid COI edits. Afv2006 (talk) 00:06, 11 September 2008 (UTC)

I never proposed "to create and base and entire individual Wikipedia article" on anything. I was proposing incorporating the content of the informath page, either in this article or in Misuse of statistics, and my comment stated that clearly. Where are you getting your claims from?

Are you also insinuating that I have some CoI with the External links??

Including a link to DMOZ sounds good to me. DMOZ has a large number of links though. Including a select few--the best--in WP might still be nice. What do you think of a separate WP "List of non-commercial online textbooks"?

TheSeven (talk) 12:44, 11 September 2008 (UTC)

Personal webpages definitely can be used as sources in Wikipedia articles. But they only can be used "in very limited circumstances". Einstein's personal web page could be cited on a number of topics. In general though, the bar is high. If a person is not notable enough to have a wikipedia article, then their personal website normally would not be a valid source. 2005 (talk) 20:36, 11 September 2008 (UTC)

Disambiguate "statistics" (set of functions of data, lower case) from "Statistics" (field of study, upper case)

(Thread moved from user talk page)

Re ""statistics" plural is already treated, two paragraphs down" -

Oops. Sorry, forgot to delete below. I put it at the top for a quick find by a very general reader, who might wonder why use of "statistics" can sometimes refer to the data itself, or, say, a bunch of averages, but not be interested in "Statistics". I will move it back to top unless I misunderstood the reason for its present location. Tautologist (talk) 16:53, 25 September 2008 (UTC)

Ah, I understand. My opinion is that your new material is equivalent to the stuff already there, except that the stuff already there adheres to the Wikipedia:Manual of Style better. So I vote for keeping the version already there. On a separate issue, if you think that this stuff should be placed higher in the intro, then I recommend that you simply switch the second and third paragraphs of the intro. This stuff should be kept in its own little paragraph, I think, because it is not actually what this article is about; it should be kept obviously distinct. Cheers -- Mgnbar (talk) 17:07, 25 September 2008 (UTC)

Yes, it is more of a disambiguation than an article topic.

1. "Statistics" (caps) is singular, "statistics" (lower case) is plural.
2. "statistics" (lower case) can refer to informal scatter plots, or the raw data itself, (see 3. below, as the distinction is nontrivial), like in sports (except in the classic Sci Am article and papers by Efron), as well as to crude representaions of data, which are not algorithmically produced. So I put in "raw data" and changed "algorithm" to "function". But a better might be "transformation". What do you think?
3. "statistics" (lower case) is not entirely trivial to think about. Under the Bohr atom model (lots of space with points of stuff between) all of physical reality could be regarded as "statistics" (lower case), everything is a giant scatter plot, whereby any perceptual apparatus (like touch or vision) "interprets" an underlying model as being solid reality, such as the existence of a body (or a table). Bertrand Russell famously commented on this idea, but did not use "scatter plot" or this wording, talking about a table in front of him and knocking on it. Less philosophically, what were "stars" (before Hubble?), turned out to be tight scatter plots of stars, now called galaxies. In fact, our perceptual apparatus can be interpreted as subconsiously doing "Statistics" (and is so interpreted in some areas of cognitive neuroscience), and any sense data, or reality itself, as "statistics". So the expression "statistics" would belong in the field phenomenology and ontology as a subfield of metaphysics.
4. "statistics" can be used interchangeably with "statistical proof", and I cite examples of such use in the article of that name. I never even thought about it until writing Statistical proof, and found that this expression is in no textbook index in "Statistics", but is everywhere else (including in number theory!). I got the article examples by Googling news stories, where the expression occurred in fields all over, just in the one single last week.
I am moving this thread to Statistics article, so please respond there. Thnx. Tautologist (talk) 18:27, 25 September 2008 (UTC)

I do not disagree with what you're saying, but the current appearance of the article is highly irregular; there is simply too much verbiage at the top of the page, before the article begins. I have three recommendations, in decreasing order of importance: Mgnbar (talk) 20:31, 25 September 2008 (UTC)

Shortened. Tautologist (talk) 22:07, 25 September 2008 (UTC)

To handle statistics we just need a concise disambiguation message of the usual kind, such as "For statistics in the sense of the numerical results of a statistical procedure, see Statistic." Mgnbar (talk) 20:31, 25 September 2008 (UTC)

I tried to do a normal disambiguation, but it was either too lengthy and top heavy for that format, or innacurately confusing. It now amounts to shortening and moving the stuff from inside the article to the top, and includes statistics that are not numerical, such as colored-coded symbols in tally picture, or a cluster picture of qualitative data.

The recently added material about mathematical and statistical proof does not belong in a disambiguation message. Move it into the text of the article somehere. Mgnbar (talk) 20:31, 25 September 2008 (UTC)

You are right. I will do with brief sentence in appropriate article context, with link. Tautologist (talk) 22:07, 25 September 2008 (UTC)

I vote that we have Statistical science redirect to the journal (which sends users interested in statistics back to this page). Mgnbar (talk) 20:31, 25 September 2008 (UTC)

Half of one, six dozen of the other. Tautologist (talk) 22:07, 25 September 2008 (UTC)

Alright folks. The current disambiguation message is far too long. Move all of this out to the proper disambiguation page Statistics (disambiguation). The fact that statistic (singular) refers to a function of a data set can be mentioned in situ, but hardly needs a prominent message at the very top of the article. siℓℓy rabbit (talk) 22:41, 25 September 2008 (UTC)

(You've probably been asked this a thousand times in different joke formats, but did you get your name from Phd? Tautologist (talk) 22:48, 25 September 2008 (UTC)

How's this for a terse disambuguation, "There is Statistics and there are statistics." (See next section.) Tautologist (talk) 22:55, 25 September 2008 (UTC)

I agree with Mgnbar and Silly Rabbit - the top of this article is a mess. I suggest (a) removing the sentence about math proof and statistical proof, (b) moving the comment about pural and singular back where it was, (c) adding the usual {{otheruses}} template at the top of the article, and (d) direct Statistical science to the journal (come to think of it, we should move Statistical Science (journal) to Statistical Science to avoid the redirect entirely). —G716 <^T·_C> 17:17, 27 September 2008 (UTC)

BTW The difference between statistic and statistics is singular vs. plural, not lower case vs. capitalized—G716 <^T·_C> 17:23, 27 September 2008 (UTC)

Rubbish - the distinction is not singular vs plural either, it's between the singular word "statistics" and the singular word "statistic", which happens to have a plural form "statistics". Even so, it's still not cap vs. lower case—G716 <^T·_C> 18:22, 27 September 2008 (UTC)

Apparently Statisticians need to go back to grammar school. Tautologist (talk) 01:40, 28 September 2008 (UTC)

Statistics Jokes Section

. Tautologist (talk) 23:10, 25 September 2008 (UTC)

Q: What does Road-Rage have to do with Normality?

A: It's mean at the median. Tautologist (talk) 23:10, 25 September 2008 (UTC)

"There are lies, Damn lies, and there's Statistics" - Twain/Disraeli

"I am a Statistician" - Tautologist (talk) 23:08, 25 September 2008 (UTC)

"There are lies, Damn lies, and there's Statistics" - Twain/Disraeli

"There's Statistics, and there are statistics" Tautologist (talk) 23:08, 25 September 2008 (UTC)

But does anyone have any that are acutally funny? Tautologist (talk) 23:08, 25 September 2008 (UTC)

Bayesian and Frequentist / Modern history

The role of, and controversy between, Bayesian probability and Frequency probability is very important and warrants fronting. Also, this article and the History of statistics make virtually no mention of 20th century developments. Nils (talk) 22:50, 17 August 2008 (UTC)

Agree on history comment. Also on the baysian and frequentist perspective. Both schools are pretty fuzzy. meaningfully defining the "long run" is pretty hard to get at for the latter, and as to the former, bayesianism is almost a religion for some, with as many "sects" as a major world religion. A problem with the history of 20th century statistics is Galton's influence on Hitler, and Fisher's overt racism and supremicism, which make these topics to shy away from. It is interesting that unlike most fields, with a brief bio of the founders in most basic texts, there is none of this in Statistics texts. I am plodding through an original edition (which I picked up recently for nothing) of Pearson right now. Tautologist (talk) 04:26, 28 September 2008 (UTC)

Non-inferential statistics, Statistics applied to mathematics or the arts

I tried to use restrict examples where methodology arising in Statistics, not strictly to mathematics (e.g. group theory – post Schoenberg, or strictly within probability or chaos theory.) For example, by not using examples in modern art where the psychological effect of clustering, as in Markov processes, but similar to methods in exploratory data analysis, is still strictly within the realm of probability theory. But there is still a natural overlap, as these examples are often not considered either mathematics, which traditionally has an essential “proof-theorem” aspect, nor Statistics, which has an inferential aspect, though also a predictive component, though not always, even traditionally, “testing, estimation, AND prediction.” There is also overlap with physics, statistical mechanics now only marginally overlapping with traditional statistics, but with Fisher information now being discovered in (or put in, depending on point of view) exotic places as far away as black holes.

The rewrite is waay too long, as I went into trying to explain why stats not strict math, per edit history reason for edit summary comment, then tried to chop it down to the point of becoming unintelligable and still too long. Best to discuss here for consensus, then chop down into much smaller exposition. Maybe article on application of Statistics outside of Statistics article is better, with very brief description here. Tautologist (talk) 01:38, 28 September 2008 (UTC)

This section is one gigantic paragraph. Can you please split it up into sensible parts?

As it is, it seems to begin with inference/discovery of patterns and then transition to art and card tricks. There are several words in "scare quotes", which (correctly) indicate vagueness inappropriate to Wikipedia. There is some stuff such as "predicatively create art" that need explanation. Mgnbar (talk) 21:25, 29 September 2008 (UTC)

Now rewritten and organized. Tautologist (talk) 21:29, 29 September 2008 (UTC)

Forecasting and prediction

This article needs a section on forecasting and prediction. Tautologist (talk) 01:05, 28 September 2008 (UTC)

Calling participants "patients"

I have a concern which might warrant a discussion. I believe it is misleading to call participants of clinical trials "patients." General dictionary definitions list the first entry for patient as something along the lines of "somebody who receives medical treatment." Further the Merriam-Webster Medical Dictionary says, "a sick individual especially when awaiting or under the care and treatment of a physician or surgeon." In some of the clinical trials I've worked in it was the opinion of many to stay consistent and use the terms "subject" and "participant" or "study participant." Although the doctor (principle investigator) has a responsibility to care for the participants, the consensus was that it is misleading to refer to them as patients. I have held this belief for some time and have seen sites' standard operating procedures reflect this belief, while some pharmaceutical company's protocols for study refer to participants as "patients" throughout the entire document. Any insight? There may be some precedence in the code of federal regulations, or in ICH-GCP (good clinical practice) or other related relevant documents. Just a item I think bears discussion...Zach99998 (talk) 05:23, 5 October 2008 (UTC)

In memory of all those who were killed as the result of studies, I think its a good idea also not to call them patients, and merly 'subjects' of study. —Preceding unsigned comment added by 67.174.157.126 (talk) 19:08, 27 November 2008 (UTC)

hi

may i have a example like a problem and graph?? i will use it for my research paper.. thank you

(adding datetime stamp to facilitate archive bot —G716 <^T·_C> 03:42, 11 December 2008 (UTC))

External Link Removal

26/01/2009: I fail to see the irrelevance of the rev. 266510578 contribution. It was deemed spam by G716, and I think without due consideration. Obviously the section is there to point to literature relevant to the subject, which it obviously was. I've included the link here for inspection:http://bookboon.com/us/student/statistics "3 Statistics Ebooks for Students", provided by Bookboon.com also Bookboon.com (wiki) —Preceding unsigned comment added by N.j.hansen (talk • contribs) 16:00, 26 January 2009 (UTC)

List of Classical Books and Contributors to the Statistics field

I read this article with interest, and missed a list of classical books on Statistics or a list of contributors who have done most to the field of Statistics in the past century. 76.30.100.77 (talk) 15:31, 14 March 2009 (UTC)Narayana Subramaniam

These may be what you are looking for:

—G716 <^T·_C> 22:49, 14 April 2009 (UTC)

Introduction: design and sampling need greater emphasis (equal to "data analysis" imho)

Statistics is better described as the methodology of scientific practice, which is concerned with both "making sense of data" and "producing data that makes sense".

(The practical concern with scientific practice distinguishes statistics from the philosophy of science.)

The introduction emphasizes the (passive) analysis of data, not the (active) production of useful data-sets, using the design of experiments and sampling.

The current description over-emphasizes modelling data---abduction, rather than inductive procedures whose properties are studied with deduction (to use the Peircean trilogy of deduction, induction, and abduction).

The title of C.R. Rao's book gives a better synapsis of our discipline "Statistics: Putting Chance to Work". A greater emphasis on experiments and sampling appears in the best works by the greatest statisticians---Fisher, Rao, Cochran, Peirce, Neyman, Cox, etc. —Preceding unsigned comment added by Kiefer.Wolfowitz (talk • contribs) 13:22, 24 May 2009 (UTC)

A suggested description of statistics

Statisticians (working as part of a research project) "create data that makes sense" with random sampling and with randomized experiments; the design of a statistical sample or experiment specifies the analysis of the data (before the data be available). When reconsidering data from experiments and samples or when analyzing data from observational studies, statisticians "make sense of the data" using the art of modelling and the theory of inference---with model selection and estimation; the estimated models and consequential predictions should be tested on new data.

From my contribution to description of statistics under mathematics

(UPDATED Kiefer.Wolfowitz (talk) 14:25, 26 May 2009 (UTC)) Kiefer.Wolfowitz (talk) 00:48, 26 May 2009 (UTC)

Discussion

This comment was made in response to a previous description of statistics, which had even more flaws. My thanks are due to the criticisms of this colleague ( Kiefer.Wolfowitz (talk) 14:25, 26 May 2009 (UTC) ).

"Statisticians help scientists..." is misleading. First, statisticians are scientists, and second, statistics has many applications outside the fields of "science" (law, public policy, business decision making, etc). Furthermore, "when scientists produce data from observational studies or bad experiments, statisticians "make sense of the data" " is nonsense: statisticians "make sense of data" (modelling, estimation, prediction, hypothesis testing, inference, etc) regardless of the source: observational or designed studies, good or bad experiments alike. What is a "bad" experiment anyway? —G716 <^T·_C> 02:45, 26 May 2009 (UTC)

REPLY: My earlier reply seems no longer relevant, since the updated revision was motivated to meet the above objections. Kiefer.Wolfowitz (talk) 10:49, 26 May 2009 (UTC) Kiefer.Wolfowitz (talk) 14:25, 26 May 2009 (UTC)

I do agree that design of experiments is a very big part of the subject. However I don't think we need a complete rewrite of the lead. The current lead does a good job of summarising a lot in a few paragraphs, a minor tweek might be more appropriate. --Salix (talk): 11:43, 26 May 2009 (UTC)

In the lead paragraph, I inserted the following sentence [which I hope falls under (permissible) "tweaking"]: "Statisticians improve the quality of data with the design of experiments and survey sampling."Kiefer.Wolfowitz (talk) 17:14, 7 June 2009 (UTC)

Statistics is a mathematical science but not part of applied mathematics

The mathematics article annexed statistics as a field of applied mathematics.

My correction follows:

Applied mathematics has significant overlap with the discipline of statistics, whose theory is formulated mathematically, especially with probability theory.^[1]

Kiefer.Wolfowitz (talk) 00:48, 26 May 2009 (UTC) Kiefer.Wolfowitz (talk) 14:25, 26 May 2009 (UTC)

what does statistics mean? i still don't get it. a teacher from our school asked us to explain it, but i don't understand. please help me. —Preceding unsigned comment added by 121.97.157.25 (talk) 11:43, 2 June 2009 (UTC)

^ Like the mathematical science physics and computer science, statistics is an autonomous discipline, and not a branch of applied mathematics. Like physicists and computer scientists, research statisticians are mathematical scientists. Many statisticians have an degree in mathematics, and some statisticians are also mathematicians.

[1] Like the mathematical science physics and computer science, statistics is an autonomous discipline, and not a branch of applied mathematics. Like physicists and computer scientists, research statisticians are mathematical scientists. Many statisticians have an degree in mathematics, and some statisticians are also mathematicians.

[1]