Wikipedia:Moderator Tools/Automoderator

Tracked in Phabricator
Task T336934

The Moderator Tools team is building an anti-vandalism 'automoderator' tool for Wikimedia projects. It will allow moderators to configure automated reversion of bad edits based on scoring from a machine learning model. In simpler terms, we're building software which performs a similar function to ClueBot NG, but making this available to all language communities. Below you'll find a summary of this project, as well as some English Wikipedia-specific questions we have.

Further details and centralised discussion can be found on MediaWiki, but we wanted to also create a discussion venue on the English Wikipedia to discuss how Automoderator might be used here, particularly because of the existence of ClueBot NG. We recognise that ClueBot NG has been used here for a long time, and has the trust of the community. If English Wikipedia editors don't want to use Automoderator, that's fine! Because ClueBot is specifically trained on English Wikipedia, we may find that Automoderator simply cannot be as accurate or comprehensive. But in the event that we find Automoderator is more effective or accurate than ClueBot NG, we want to ensure the door is open for the community to evaluate either transitioning to Automoderator or having it run in parallel. We might also want to explore building shared features, such as false positive reporting and review, which ClueBot NG could leverage even if Automoderator isn't enabled as a full system.

Please share your thoughts on the talk page here or on MediaWiki. We also have an infrequent newsletter, which you can sign up to here.

Current status: We're looking for input into our measurement plan and invite users to test out Automoderator. We plan to pilot Automoderator on the Indonesian Wikipedia in May.

Summary[edit]

A substantial number of edits are made to Wikimedia projects which should unambiguously be undone, reverting a page back to its previous state. Patrollers and administrators have to spend a lot of time manually reviewing and reverting these edits, which contributes to a feeling on many larger wikis that there is an overwhelming amount of work requiring attention compared to the number of active moderators. We would like to reduce these burdens, freeing up moderator time to work on other tasks.

Our hypothesis is: If we enable communities to automatically prevent or revert obvious vandalism, moderators will have more time to spend on other activities.

Our goals are:

Reduce moderation backlogs by preventing bad edits from entering patroller queues.
Give moderators confidence that automoderation is reliable and is not producing significant false positives.
Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.

We will be researching and exploring this idea during the rest of 2023, and expect to be able to start engineering work by the start of the 2024 calendar year.

We recently presented this project, and other moderator-focused projects, at Wikimania. You can find the session recording here.

Potential solution[edit]

We are envisioning a tool which could be configured by administrators to automatically prevent or revert edits. Reverting edits is the more likely scenario - preventing an edit requires high performance so as not to impact edit save times. Additionally, it provides less oversight of what edits are being prevented, and may make it easier for vandals to evade the tool. Moderators should be able to configure whether the tool is active or not, and have options for how strict the model should be.

Lower thresholds would mean more edits get reverted, but the false positive rate is higher, while a high threshold would revert a smaller number of edits, but with higher confidence.

We are considering a wide range of possible configuration options so that communities can feel confident in how and when Automoderator would take action.

The technical specifics of this project, such as whether it lives in an Extension or not and what the user account performing reverts would look like, haven't been determined yet. We'll update our project pages with details as we make decisions on these aspects.

Further details on the solution we're exploring are available on MediaWiki.

ClueBot NG[edit]

On English Wikipedia this function is currently performed by ClueBot NG, a volunteer-maintained tool which automatically reverts vandalism based on a long-running machine learning model. The bot is configured to have a 0.1% false positive rate, and enables editors to report false positives for review by the community.

Based on our analysis, ClueBot NG currently reverts approximately 150-200 edits per day, though it previously reverted considerably more - as many as 1500-2500 per day in 2010.

Questions[edit]

We'd like to know more about your experiences with ClueBot NG. Below are some questions we have, but any thoughts you want to share are welcome:

Do you think ClueBot NG has a substantial impact on the volume of edits you need to review?
Do you review ClueBot NG's edits or false positive reports? If so, how do you find this process?
Are there any feature requests you have for ClueBot NG?

Open questions[edit]

How would English Wikipedia evaluate Automoderator to decide whether to use it?
Would the community rather test this software early (e.g. in log-only modes), or wait until further in development after other communities have trialled it?
What configuration options would you want to see in the tool?