Jump to content

User:Pascal666/check

From Wikipedia, the free encyclopedia

Though this project is still active, it has been mostly superseded by Wikipedia:WikiProject Check Wikipedia, a much larger set of tests that extends across many languages. I am currently working to better document the validity checks I have been performing so they can be merged into the larger project.

Reports currently posted on Wikipedia

[edit]

History

[edit]

This project dates from June 2006, at which time the only test I was running was looking for template elements inside articles (as documented much more thoroughly on my userpage) indicative of a substituted template. I have expanded it over the years to include many more checks. Most of the problems found I fix myself, but a couple reports are so large I have not attempted to fix all of them. Some of these reports as well as others requested by other Wikipedians can be found above.

Technical details

[edit]

All of my checks are run in Perl on en.wiki database dumps. When analyzing enwiki-xxxxxxxx-pages-articles.xml.bz2 I use Parse::MediaWikiDump and IO::Uncompress::Bunzip2. The later allows me to analyze the dump without having to expand it to disk; it is instead decompressed into memory on the fly. For the rest of the dump files I use IO::Uncompress::Gunzip to decompress on the fly and access the data directly with my own algorithms.

Many of the below include Perl regexs.

Substituted templates

[edit]

The original purpose of this project was to find templates that had been substituted into articles when they should have been simply included.

Wikimarkup

[edit]

Certain wikimarkup should never be found on pages that are not included in other pages. This wikimarkup is usually only found in templates that should never be subst'd.

Interwikis

[edit]

Templates often exist in multiple languages. When a template gets subst'd its interwikis get placed into the article as well. Since an article should never contain an interwiki pointing to the template namespace in a foreign language, the existence of these interwikis can be indicative of a subst'd template. Example: [1]. They can also be caused by a template not having noincludes around its interwikis. Example: [2]. Both problems need to be fixed.

Categories

[edit]

Certain elements should never appear inside a "[[Category:xxx]]".

Least wanted categories

[edit]

Special:WantedCategories only includes the top 1000 wanted categories. Many categories on this list have simply not yet been created. The bottom of the wanted categories list (the least wanted categories) contains mostly typos. That is, categories not are not really wanted, but pages that have simply been miscategorized. This report was requested by a user who planned to fix these typos (but didn't realize how many there are) and is currently posted at User:Pascal666/cats.

Wrong case cats

[edit]

Categories are case sensitive. If a page is in a non-existent category that has the same name as an existing category just different capitalization, the user probably intended to put the page in the existing category. Example: [3]

Wrong hyphenation cats

[edit]

If a page is in a non-existent category that has the same name as an existing category just different hyphenation, the user probably intended to put the page in the existing category. Example: [4]

Duplicate categories

[edit]

Users will sometimes create a new category not knowing that the category already exists, just with a different capitalization or hyphenation.

Included non-templates

[edit]

Users often accidentally include a page when they intend to instead create link to it, or place an article into a category. This check started as simply a text search for "{{Category:" but turned into examining enwiki-xxxxxxxx-templatelinks.sql.gz for any includes outside template space (though many of these are valid anyway so this has required many exceptions). Example: [5]

Living people

[edit]

Birth cats

[edit]

Anyone in Category:Living people should also be in a births category within the last 123 years. This report is currently posted at User:Pascal666/living.

Death cats

[edit]

Anyone in Category:Living people should not be in a deaths category as well. Example: [6]

Template parameters

[edit]

By scanning templates for "{{{\w+" a list of parameters each template accepts can be created. You can then scan articles that include each template to find parameters that it is not designed to accept. In many cases the parameter simply has the wrong case. PascalBot (talk · contribs) was created to fix many of these. Example: [7]

[edit]

Links between Wikipedia articles should be accomplished using wikilinks instead of external links to "http://en.wikipedia.org". This report is currently posted at User:Pascal666/external. Example: [8]