Wikipedia:Overcategorization

From Wikipedia, the free encyclopedia

Categorization is a Wikipedia feature used to group pages for ease of navigation, and correlating similar information. However, not every verifiable fact (or the intersection of two or more such facts) in an article requires creating an associated category. For some article topics, this could potentially result in hundreds of categories, most of which aren't particularly relevant. This may also make it more difficult to find any particular category for a specific article. Such overcategorization is also known as "category clutter".

To address these concerns, this page lists types of categories that should generally be avoided. Based on existing guidelines and previous precedent at Wikipedia:Categories for discussion, such categories, if created, are likely to be deleted.

Non-defining characteristics[edit]

See also: Wikipedia:Categorizing articles about people#Defining and Wikipedia:Defining

One of the central goals of the categorization system is to categorize articles by their defining characteristics:

The defining characteristics of an article's topic are central to categorizing the article. A defining characteristic is one that reliable sources commonly and consistently refer to[1] in describing the topic, such as the nationality of a person or the geographic location of a place.

Categorization by non-defining characteristics should be avoided. It is sometimes difficult to know whether or not a particular characteristic is "defining" for any given topic, and there is no one definition that can apply to all situations. However, the following suggestions or rules-of-thumb may be helpful:

  • a defining characteristic is one that reliable, secondary sources commonly and consistently define, in prose, the subject as having. For example: "Subject is an adjective noun ..." or "Subject, an adjective noun, ...". If such examples are common, each of adjective and noun may be deemed to be "defining" for subject.
  • if the characteristic would not be appropriate to mention in the lead section of an article (determined without regard to whether it is mentioned in the lead), it is probably not defining;
  • if the characteristic falls within any of the forms of overcategorization mentioned on this page, it is probably not defining.

Often, users can become confused between the standards of notability, verifiability, and "definingness". Notability is the test that is used to determine whether a topic should have its own article. This test, combined with the test of verifiability, is used to determine whether particular information should be included in an article about a topic. Definingness is the test that is used to determine whether a category should be created for a particular attribute of a topic. In general, it is much easier to verifiably demonstrate that a particular characteristic is notable than to prove that it is a defining characteristic of the topic. In cases where a particular attribute about a topic is verifiable and notable but not defining, or where doubt exists, creation of a list is often the preferred alternative.

It is recommended to name or rename categories to have as little vagueness as possible, discouraging non-defining articles from being added. If you have just invented a subcategory on the spot that lacks a main article, it may not be a defining attribute. Examples include:

In disputed cases, the categories for discussion process may be used to determine whether a particular characteristic is defining or not. For example, there is consensus that places should not be categorized as established in the year of the earliest surviving historical record of the place.

Trivial characteristics[edit]

Avoid categorizing topics by characteristics that are unrelated or wholly peripheral to the topic's notability.

For biographical articles, it is usual to categorize by such aspects as their career, origins, and major accomplishments. In contrast, someone's tastes in food, their favorite holiday destination, or the number of tattoos they have would be considered trivial. Such an item which may be appropriate information to include in an article, may still be inappropriate for categorization. In general, if something could be easily left out of a biography, it is likely that it is a trivial characteristic.

Also avoid categorizing people by information associated with a person's death, such as the age at which the person died, the place of the person's death, or by whether the person still had unreleased or unpublished work at the time of their death.

Subjective inclusion criteria[edit]

Adjectives which imply a subjective, vague, or inherently non-neutral inclusion criterion should not be used in naming/defining a category. Examples include subjective descriptions (famous, popular, notable, great, important), any reference to relative size (large, small, tall, short), relative distance (near, far), or personal trait (beautiful, evil, friendly, greedy, honest, intelligent, old, ugly, young).

Arbitrary inclusion criteria[edit]

There is no particular reason for choosing "7%", "$30,000", or the 100th episode as cutoff points in these cases. Likewise, a school district with 3,800 students is not meaningfully different from one with 4,100 students. A better way of representing this kind of information is to make it a list, either in an existing article, or as a separate list, such as "List of school districts in (region) by size". Note that Wikipedia allows a table to be made sortable by any column.

Intersection by year or time period[edit]

Categorizing by year (or group of years, such as by decade, by century, or even by historical era) is not generally considered an #ARBITRARY division for categorization.

However, avoid creating a category tree of individual by year categories with very few members (see also #NARROW). In that situation, consider grouping them by the next tier up. So for example, instead of grouping by year, group by decade. And then diffuse the by decade categories by year only when necessary. This applies to any time period, like months to years; or years or decades, to centuries.

Similarly, If two or more by year categories have a large #OVERLAP, (for example, because many athletes participate in multiple all-star games, or because religious leadership does not usually change from year to year), it is generally better to (up)merge to the (non-year) parent category of the topic, and then diffuse as appropriate.

In addition, people are categorized by time period only if their activity in that time period is a #DEFINING characteristic.

For example:

  • a writer who lived from 1850 to 1910 and wrote their only work in 1908 should be categorized under Category:20th-century writers. They did no notable writing in the 19th century, so should not be included in Category:19th-century writers
  • an English soldier born in 1590 and notable for military service in the 1620s should not be categorized in Category:People of the Tudor period, since their defining characteristic relates to years after the Tudor period ended in 1603.

While people may be categorized by the year of their birth and year of death, do not categorize people by day or month of birth or death. (See also list of CFD examples here.)

When categorizing by time period, clearly state the inclusion criteria at the top of the category. For example, This category is for politicians who were active in the 19th century is not the same as This category is for politicians who were born in the 19th century.

Intersection by location[edit]

Categorizing by the geographic boundary of a polity can be a way to divide subjects into regions that are directly related to the subjects' characteristics. Location may also be used as a way to diffuse a large category into subcategories, for example, Category:American writers by state.

However, avoid sub-categorizing subjects by location if that location does not have any relevant bearing on the subjects' other characteristics. For example, quarterbacks' careers are not defined merely by the specific state that they once lived in (unless they played for a team within that state).

People should not be categorized by place of residence, if the person has never resided in that place. The place of residence of parents and relatives is never #DEFINING and rarely notable.

And while the place of a person's birth may seem significant from the perspective of local studies, is rarely defining from the perspective of the individual. The place of death is not normally categorized – consider using a list if this relates to a specific place or event. If it is relevant to identify the place of burial (either from the perspective of the person or the burial place), then someone buried in a less notable cemetery, or in a place with just a few notable burials, should be recorded in a list within the article about the burial place. However, if the burial place is notable in its own right and has too many other notable people to list, then such burials may be categorized.

Narrow intersection[edit]

Categories which intersect two (or more) topics or characteristics can result in very narrow categories with few members. Such categories should only be created when both parent categories are large enough for diffusion to be an option, and when similar intersections can be made for related categories. A common way to address such narrow categorization is to selectively "Up-Merge" the contents of the category to its parent categories.

For example, if an article is in category "A" and in category "B" – a category A and B does not necessarily need to be created for this article.
Similarly, while an article in categories A, B, and C could potentially be placed in categories "A and B", "B and C", and "A and C" – creating a "triple intersection" of category A, B, and C, should generally be avoided.

Miscellaneous categories[edit]

It is not necessary to completely empty every parent category into sub-categories. So do not categorize articles into "miscellaneous", "other", "not otherwise specified" or "remainder", categories. Such articles will have little in common. If there are some articles that don't fit appropriately into any of the sub-categories, then leave the articles in the parent category.

Mostly overlapping or duplicative[edit]

If a category is mostly duplicative or overlapping with another category (such as the coverage of "crime" and "crime history"), or if two categories' names are similar enough to have nearly identical inclusion criteria (such as "denial", and "skepticism"), it is generally better to merge the subjects to a single category, and re-categorize any articles or categories which might no longer meet the criteria of the unified target category.

It might also be appropriate to create lists to provide clarity and to detail the each of the instances.

Unrelated subjects with shared names[edit]

Avoid categorizing by a subject's name when it is a non-defining characteristic of the subject, or by characteristics of the name rather than the subject itself.

For example, a category for unrelated people who happen to be named "Jackson" would be inappropriate. However, categorization may be appropriate if the categorized subjects are directly-related. For example, a category grouping articles directly-related to a specific Jackson family, such as Category:Jackson family (show business).

When considering grouping subjects that share a name, a disambiguation page might be a possible alternative solution.

By being associated with[edit]

The problem with saying that something is "associated" with something else, is that it can be a #SUBJECTIVE and vague determination. Determining what degree or nature of "association" with a particular subject is necessary to qualify for inclusion in such a category can also be subjective and vague, and any threshold set may fail #ARBITRARY.

However, it may be appropriate to have categories whose title clearly conveys a specific and defined relationship to a specific subject, such as Category:Obama family or Category:Obama administration personnel.

By opinion or preference of an issue or topic[edit]

Avoid categorizing people by their personal opinions, even if a reliable source can be found for the opinions. This includes supporters or critics of an issue, personal preferences (such as liking or disliking green beans), and opinions or allegations about the person by other people (e.g. "alleged criminals").

Please note, however, the distinction between holding an opinion and being an activist, as the latter may be a defining characteristic (see Category:Activists).

Potential candidates and nominees[edit]

Example: Potential 2008 Republican U.S. Presidential Candidates (deleted in November 2006)

Wikipedia is not a crystal ball. A candidate not yet nominated for public office, the possible next CEO of a certain corporation, a potential member of a sports team, an actor on the short list to play a role, or an award nominee (just to name a few examples) should not be grouped by category. Lists may sometimes be appropriate for such groupings, especially after the passage of the events to which they relate.

Award recipients[edit]

A category of award recipients should exist only if receiving the award is a #DEFINING characteristic for the large majority of its notable recipients. And a recipient of an award should be added to a category of award recipients only if receiving the award is a defining characteristic of the recipient.

Per Wikipedia:Categories, lists, and navigation templates, the existence of lists and categories is determined by separate criteria. So regardless of whether a category is created, a list of the recipients may be created (presuming that the list meets the notability criteria). If both a category and a list are viable on the same topic, such a list may make a suitable main article for the category, indicated with the {{Cat main}} template.[2]

Published list[edit]

Books, magazines, websites, and other such publications, regularly publish lists of the "top 10" (or some other number) in any particular field. Such lists tend to be #SUBJECTIVE and may be somewhat arbitrary. Some particularly well-known and unique lists such as the Billboard charts may constitute exceptions, although creating categories for them may risk violating the publisher's copyright or trademark.

Venues by event[edit]

Avoid categorizing locations by the events or event types that have been held there, such as arenas that have hosted specific sports events or concerts, convention centers that have hosted specific conventions or meetings, or cities featured in specific television shows that film at multiple locations.

Likewise, avoid categorizing events by their hosting locations. Many notable locations (e.g. Madison Square Garden) have hosted so many sports events and conventions over time that categories listing all such events would not be readable.

However, categories that indicate how a specific facility is regularly used in a specific and notable way for some or all of the year (such as Category:National Basketball Association venues) may sometimes be appropriate.

Performers by performance[edit]

Avoid categorizing performers by their performances. Examples of "performers" include (but are not limited to) actors/actresses (including pornographic actors), comedians, dancers, models, orators, singers, etc.

This includes categorizing a production by performers' performances. For example, just as we shouldn't categorize a performer by action or appearance, we shouldn't categorize a production by a performer's action or appearance in that production.

Performers by action or appearance[edit]

Avoid categorizing performers by some action they may have performed (such as a "pirouette", a "runway walk", a "spit take", a "sword fight", "anal sex", etc.); some method of performance (such as while standing on their head, left-handed, etc.); or how they may have chosen to appear (such as bald, veiled, etc.)

Performers by role or composition[edit]

  • Performers who have portrayed <character name>
  • Performers who have portrayed <a type of character>
  • Performers who have performed <a specific work>

Avoid categories which categorize performers by their portrayal of a role. This includes:

This also includes voicing or dubbing characters, both in live-action (such as Darth Vader or Ultraman) or in animation (such as Bugs Bunny or Donald Duck), even if the "voice" in question is animal sounds or other specific sound effects.

Similarly, avoid categorizing artists based on producers, film directors or other artists they have worked with (such as "George Martin musicians" or "Steven Spielberg actors"). Performers are defined by their body of work, not by the people they have #ASSOCIATED with professionally. For example, Tom Hanks is distinguished by his performances as an actor, not by the fact that he has appeared in Steven Spielberg's films.

Performers by production or performance venue[edit]

  • Performers who have performed at <location>
  • Performers who have performed on <production>

Avoid categorizing performers by an appearance at an event or other performance venue. This also includes categorization by performance—even for permanent or recurring roles—in any specific radio, television, film, or theatrical production (such as The Jack Benny Program, M*A*S*H, Star Wars, or The Phantom of the Opera).

Note also that performers should not be categorized into a general category which groups topics about a particular performance venue or production (e.g. Category:Star Trek), when the specific performance category would be deleted (e.g. Category:Star Trek script writers).

Role or composition by performer[edit]

  • <Characters> who have been portrayed by a specific performer
  • <Types of characters> which have been portrayed by a specific performer
  • <Works> which have been portrayed by a specific performer

Avoid categorizing characters or specific works by the performers who have portrayed them or appeared in them. A typical film or television series has many actors in various roles, so categorizing by actor results in needless clutter. Similarly, some roles, particularly animated ones like Woody Woodpecker and historical/mythological figures like Hercules, have been performed by multiple actors, and being performed by a particular actor is seldom a defining trait for such roles.

Notes[edit]

  1. ^ in declarative statements, rather than table or list form
  2. ^ Per this RfC

See also[edit]