User:Anomie/Sourcing

From Wikipedia, the free encyclopedia

This essay is a collection of views relating to WP:V, WP:OR, WP:RS, and related policies and guidelines, and how they are misapplied or misinterpreted in practice. Everything here is the opinion of the author; if you agree or disagree, please discuss it on the talk page. Who knows, you may just change my mind.

Notability[edit]

WP:N is sometimes misapplied, despite the explicit statement in the policy itself that "These guidelines pertain to the suitability of article topics but do not directly limit the content of articles." Within the article, the criteria are whether the statement is verifiable and whether the statement is relevant. Part of the confusion here is likely due to two factors: "notable" is used where "relevant" is meant, and "relevant" in the context of Wikipedia articles is not and should not be explicitly defined.

Primary sources[edit]

Some editors take a position that amounts to "Primary sources should be forbidden!", and demand secondary sources be used for everything. This is not supported by Wikipedia's policies; secondary sources are preferred, but primary sources are perfectly acceptable in certain contexts.

The main concern with primary sources is that they will often not contain the sort of analysis that should form the bulk of a Wikipedia article, and trying to use them for this purpose is likely to lead to original synthesis. However, unoriginal synthesis is perfectly acceptable, and primary sources can sometimes be used for this purpose. Another major concern is that primary sources alone cannot establish the notability of a topic, but rather than forbid primary sources this simply requires secondary sources.

More importantly, though, primary sources are ideal to establish certain facts. If it must be established that something is patented in the US, there is no need to find a secondary source that discusses the matter if the patent is available from the USPTO. If it is necessary to establish that a character in a book is frightened by dogs, it is sufficient to cite the page in the book where the character is described as being terrified when confronted by a dog. If it is necessary to establish the fact that some cows are brown, a picture of a group of brown cows is more straightforward than any secondary source.

Many Wikipedia sourcing policies do not make a clear distinction between primary and secondary sources, and many of these policies apply mainly to secondary sources. The distinction is clear: "X is true" uses a secondary source and requires that we have cause to believe that the source is accurate, while "S says X is true" uses a primary source and only requires that we believe the source is not fake. On the other hand, "X is true" is likely to be relevant, while "S says X is true" is much more likely to be irrelevant.

Self-published sources[edit]

Wikipedia's whole policy on self-published sources regards their suitability as secondary sources, and should be read in that light. A self-published source can also be used as a primary source, most obviously by changing the statement from "X is true" to "Author believes X", or even "Someone believes X". The challenge then is to determine why Author's belief is relevant.

This leads naturally into forum and blog posts as sources, as the objections against their use are based in WP:SPS. If a statement "Some people believe X" needs citation, it should be perfectly acceptable to reference as a primary source a forum thread where people are discussing X, unless there is credible concern that people are misrepresenting their actual views in their own posts.

Note though that a statement "No one believes X" cannot be supported in this way, as absence of proof is not proof of absence and anyone's statement that no one believes X would have to be used as a secondary source. Similarly, "Few believe X", "Most believe X", and similar semi-quantitative statements as well as quantitative statements like "50% believe X" cannot be supported, as the self-selected sample of forum posters is certainly biased in numerous ways and the rigorous statistical analysis required to account for this would quickly become original research. This could be avoided by rephrasing the statement to "50% of posters to Forum believe X", but such a statement is highly likely to be irrelevant.

Original research and synthesis[edit]

Wikipedia:No original research is an important policy, although it greatly overlaps with Wikipedia:Verifiability. Original synthesis is also contained in WP:OR; a possible distinction is that "original research" is information that is not supported by sources, while "original synthesis" is the use of information from sources to support a statement that the sources do not actually support. Often, though, everything is lumped under "original research".

Some, however, set the bar for original synthesis too low. Everything in Wikipedia involves synthesis, and unoriginal synthesis is encouraged. There is no clear threshold between "original" and "unoriginal", but the following should be uncontroversial:

  • If the source says 12 kilometers, we can say 7.5 miles. If the source says 100.0°F, we can say 37.78°C or 310.9 K.
  • If source A says one thing and source B says another, we can say that there is disagreement. If the first edition of the book contains a line of dialog and the second edition has this line altered, we can say that the line was changed without having to find a secondary source that points out the obvious.
  • If we have "Socrates is a man" and "All men are mortal", we can use simple logic to say "Socrates is mortal". When using logic, though, be careful of crossing the line into originality.
  • The selection of what is relevant and what is not relevant to the article is not "original research" or "original synthesis"; WP:OR affects what sources are required to have a statement in the article and not how to choose which statements belong or their layout. WP:NPOV, WP:WEIGHT, and the like affect that.
  • Sometimes, someone will attempt to reject a source because the source contains original research. The source may be rejectable based on WP:V or WP:RS grounds, but WP:OR does not apply.

Part of the problem comes from the example currently used in WP:SYN:[1]

Here is an example from a Wikipedia article, with the names changed. The article was about Jones:

Smith says that Jones committed plagiarism by copying references from another book. Jones denies this, and says it's acceptable scholarly practice to use other people's books to find new references.

That much is fine. Now comes the unpublished synthesis of published material. The following material was added to that same Wikipedia article just after the above two sentences:

If Jones's claim that he consulted the original sources is false, this would be contrary to the practice recommended in the Chicago Manual of Style, which requires citation of the source actually consulted. The Chicago Manual of Style does not call violating this rule "plagiarism." Instead, plagiarism is defined as using a source's information, ideas, words, or structure without citing them.

This entire paragraph is original research, because it expresses the editor's opinion that, given the Chicago Manual of Style's definition of plagiarism, Jones did not commit it. To make the paragraph consistent with this policy, a reliable source is needed that specifically comments on the Smith and Jones dispute and makes the same point about the Chicago Manual of Style and plagiarism. In other words, that precise analysis must have been published by a reliable source in relation to the topic before it can be published in Wikipedia.

Original synthesis is not the actual issue here! The problem is that the CMS definition of plagiarism is irrelevant unless someone else brings it up (be it Jones, a relevant third party, or a general sentiment used in public discussion of the matter). Unfortunately, discussion of the issues with this example tends to be stonewalled by OR hardliners, so we're stuck with it.

Reliable sources[edit]

Wikipedia has many issues with defining just what is a "reliable" secondary source, and in fact there is a great difference of opinion among Wikipedia editors on this issue. This is reflected in the fact that WP:RS is only a guideline rather than a policy, and one of the major objections to WP:ATT is that it elevates aspects of WP:RS to policy.

Part of the problem is that reliability depends on context. Any source that is not false is an accurate statement of what the author believes. But unless we trust the author or the editorial oversight, we cannot be sure that the source is an accurate statement of fact. This is the distinction that WP:SPS attempts to make.

Further, reliability depends on context in that a source may be reliable for claims it makes in one field and not another. Take, for example, a prominent physicist: his writings related to physics may be completely reliable, while his writings on sociology may be no more reliable than those of anyone else.

Reliability and POV are related, as a source that is distinctly POV is likely to have damaged reliability because it will favor positive reflection of supporting viewpoints and negative reflection of the opposition. Sometimes, though, we can create a NPOV article by contrasting the viewpoints of both sides of the issue; in this case, POV sources used carefully can be helpful.

In popular culture[edit]

The reliability guidelines (including WP:RS and parts of WP:V and WP:OR) are geared towards fields where people make a living studying the field and publishing their findings, either for the sake of the knowledge itself or for the sake of gaining readership for their publication. Some fields, often those described as "popular culture", do not get this level of attention. If the field is notable enough to have some mention in reliable secondary sources, we will accept a Wikipedia article on it. But once notability is established, what can we use to write the article when only a minimum of information exists in traditional sources?

Wikipedia:Reliable sources/Examples, which was originally split out of WP:RS per WP:SUMMARY and is still directly referenced from WP:RS, gives some guidance. Forums with active and expert moderation can be acceptable, as can be posts or blogs when the identity of the poster can be verified and the poster would be acceptable under WP:SPS. Wikipedia articles are generally considered unreliable, but a well-sourced article (especially a version recently promoted to Featured Article status) could be acceptable if for some reason the sources used in the article cannot be used directly. Unfortunately, many are unaware of or are unwilling to accept this sort of argument.

Some editors have a bias against "popular culture" articles being in Wikipedia at all, and many will apply higher standards to certain types of sources in certain situations. For example, a statement by a Fortune 500 company top executive relating to the company and published on the company website may be accepted without question, while a statement by a leader of a notable pop-culture group relating in the same way to the group and published on the group's website will be questioned.

Conclusions[edit]

I now understand why WP:IAR is policy and why WP:NOT#BUREAUCRACY instructs us to avoid instruction creep. Trying to make rules for every situation is hopeless, as is trying to address every acceptable exception to every rule. It's no surprise that WP:IAR is a controversial policy, as the whole point of the rule is to prevent people from using the rules to obstruct progress.