Wikipedia:WikiProject Copyright Cleanup/2023 backlog drive

From Wikipedia, the free encyclopedia

Instructions[edit]

For new users[edit]

Simplified flowchart (please also read the below pages!)

Firstly, thank you for taking the time to help clear the backlog! Your efforts are appreciated. Copyright is complex and nuanced topic to understand so we recommend you start on the easier backlogs to clear, linked below. For CCI, these are mainly pages that involve copying from non-free websites, so no offline research is required. Category-wise, all the suspected violations should have a source URL.

An exhaustive list of instructions for handling text-based copyright violations is available at the top of the copyright problems page. A good guide on how to start editing at CCI is User:Moneytrees/CCI guide. A brief rundown of handling CCIs, but no substitute for reading the relevant pages, is below:

  • Check for dead links, if there are, use IABot to restore them
  • Run the page through Earwig's copyright detector to get a cursory score. Often mirrors copy from Wikipedia, so make sure to identify these and ignore them.
  • Check the article' sources and compare it to existing text. WP:REX may be helpful for hard to access sources.
  • If you have identified any possibly infringing content with a source
    • Check the page's licence: is it compatible per WP:COMPLIC?
    • If the content is not compatible, remove or rewrite it with a link to the source material in the edit summary
    • Remove the diff from the CCI page and mark it with {{y}}. Mark the article talk with {{CCI}}
  • If you have identified any possibly infringing content without a source
    • In case of content added by repeat copyright violators at CCI, the content may be presumptively removed
      • Please note this in your edit summary, linking to the CCI page if applicable
    • Otherwise, if you still suspect the content of being plagiarised from a non-free source, removing it under other policies (e.g. if it's unreferenced) may be appropriate.

Please do not hesitate to ask any experienced editors for help

For returning users[edit]

Welcome back, and thanks for taking part. This drive is mainly focusing on CCI, and the rewards system is available below.

Rewards system[edit]

For articles at CCI...

  • Handling a diff <1k bytes - one point
  • Handling a diff >1k bytes - two points

For everything else...

  • Handling any article - two points
  • Reviewing all diffs of an article - four points

Awards[edit]

Image Minimum Template
5 points The Invisible Barnstar
10 points The Working Wikipedian's Barnstar
25 points The Tireless Contributor Barnstar
50 points The Cleanup Barnstar
100 points The Copyright Cleanup Barnstar
200 points The Great Copyright Drive Barnstar
500 points The Order of the Superior Scribe of Wikipedia
Re-reviewing
25 articles
The Teamwork Barnstar
In addition, the person who accumulates the most points during the backlog
elimination drive, will receive the Copyright Review Medal of Merit

Beginner friendly CCIs[edit]

Category backlogs to clear[edit]


Construction[edit]

Currently, there are significant backlogs in the three principle queues of copyright cleanup: CCI, CP and CopyPatrol. Other parts of the projects have made significant progress with clearing their backlogs through gamifying reviews and providing rewards for a certain number of points. Whilst a backlog drive is appealing, a gamified approach may not be effective in respect to copyright.

The Backlog (August 2023)[edit]

Based on rough estimates and database counts, copyright backlogs on Wikipedia are:

  • CCI currently has over 100,000 remaining diffs to be reviewed
  • CopyPatrol currently has ~70 open reports at a time
  • CP is at a manageable level for now

Rough ideas[edit]

  • Backlog drive where we reward points for older CCIs
  • Focus on a large CCI that's easier for beginners to tackle (rtkat3, werldwayd, etc.)
  • Tackle low-risk stuff towards the end of CCIs
  • Clear out Category:Copied and pasted articles and sections with url provided, so it doesn't have to be listed at CP
    • Not too big so we could evaluate each once like a CCI review
  • Bot to collate number of articles fixed
  • ?

Development[edit]

Rewards system[edit]

Most backlog drives make use of a point/article system, and this would make sense here: barnstars, etc. could be given out for certain criteria in a similar manner to the GAN drive. Finding points can be done automatically relatively easily: the NPP drive made use of bots to collect data such as the backlog size and user points.

The main problem is quality. Unlike the above, it is much more difficult to review individual users, not only because of the sheer number of pages, but the fact that there are a much more finite number of editors with sufficient copyright experience as GAN/NPP experience in the above drives. However, we could still probably get a relatively high standard with a set sample, which will have to be decided. One per 25 pages may be a good starting point but if this is an issue we can amend as appropriate.