User:Smalljim/AddBad

This user subpage is currently inactive and is retained for historical reference.
It was last substantively updated 21 April 2016.
If you want to revive discussion regarding the subject, you might try contacting the user in question or seeking broader input via a forum such as the village pump. It was last substantively updated 21 April 2016.

This page in a nutshell: A new generation of applications for recent changes patrol can be created, making better use of the information that's available in the recent changes feeds. A prototype application is described.

Recent changes patrol could be better![edit]

It's possible to create a new generation of applications for recent changes patrol (RCP) on Wikipedia. The current tools don't make optimal use of all the information that's available in Wikipedia's irc recent changes feed.

Available information[edit]

The information in a recent changes feed includes:

editor name or IP address
page edited, created or deleted
page size change
edit summaries, including standard ones like "page blanking"
edit filter hits

Configuration[edit]

We can have configuration files such as:

whitelisted editors
an IP address – ASN match file^[1]
an IP geolocation file^[1]
watchlists of page names
watchlists of usernames, IPs, ASNs
watchlists of edit summaries

To be most useful the watchlists should allow regular expression matches.

Detection[edit]

By using no more than the above information and configuration files we can detect many potentially unwanted actions made by non-whitelisted editors. These include:

edits by those who have had edit filter hits
further edits by editors who have already been reverted (and warned)
further edits to pages that have recently had reverts on them
large additions or removals of content, or blanking of pages
the creation of new pages by editors who have already been reverted or had pages deleted
unusually fast or prolific editing
edits to frequently-vandalised pages
IP editors making similar edits using the same ISP or from the same area
matches on edit summaries
and several others

Although one of these actions in isolation may not be problematic, repeated actions or a combination of more than one of them is much more likely to be.

Making use of edit filter hits is possibly the most significant improvement that can be made (not least because the edit filters have access to the text diffs). I think all the large Wikipedias have extensive sets of edit filters that can detect many forms of vandalism and other inappropriate edits.^[2] One can envisage a closer association between the edit filters and a new generation of RCP applications, with the filters being adjusted more interactively.

User interface[edit]

After detecting potential unwanted/vandalism actions, we have to decide how to present the information to the user. This could be minimal: simply presenting the most likely events one after another, as the current applications do. Or we could use an information-rich interface that shows details of all the recent events that pass a threshold, highlighted in some way according to the program's assessment of how bad they are – the user can then select the events he's most interested in. I prefer this approach.

AddBad[edit]

What follows is a description of a prototype application provisionally called AddBad that I have been developing to demonstrate the above principles.^[3] As a way of prioritising actions that may be worth looking at, the application awards "badness" points to editors based on events such as reverts, warnings, edit filter hits etc. AddBad has an information-rich interface and uses colour to highlight edits according to the badness accumulated by the editor.^[4] When running, around one or two potentially-bad edits per second are notified (depending on activity, of course), making for a set of easily-followed constantly updating lists, which as can be seen form a colourful display that is packed with relevant information.

As an example, an editor might accumulate 30 badness points for hitting an edit filter that warns that it has detected swear words in the edit. If the editor persists in posting the edit (despite the automatic warning), it will appear as a relatively low priority bad edit. A revert and a level 1 warning from another editor (or ClueBot NG) would award say 10 + 50 more badness points to the vandal editor. If the vandal then makes another edit we will be alerted with a brighter highlight reflecting the 90 badness points he now has. Further edits that result in reverts/warnings will add more badness resulting in even brighter highlighting, and so on. If we ourselves revert/warn, a lot of badness is awarded to ensure that we can easily follow his subsequent edits. In the case of false alerts, we can easily zero an editor's badness, or add him to an "ignore today" list, or even add him to the whitelist. Alternatively, we can add badness to editors whose actions look suspicious.

The configuration files in AddBad add a significant aspect that is not utilised by the present generation of AV programs. For example, if an editor name is regex-matched in a config file, then every edit made by that editor is alerted using a distinctive highlight. If that editor hits an edit filter or gets reverted, badness is awarded as above, increasing the highlighting. Or he can be easily ignored if appropriate. Some vandals repeatedly hit the same page or range of pages, using different IP addresses or account names: edits by non-whitelisted editors to these pages can be notified too, with extra highlighting if there is a regex match on the editor name or IP, or on the ASN. Because the configuration files are persistent and changes to them can be applied immediately, there's a decent chance that vandalism like this can be tracked over long periods if necessary, even if it evolves.

Customisation of the config files also allows AddBad to be adapted to focus on particular aspects that the user is interested in; and further tailoring can be achieved by adjusting the badness points awarded for each type of action (each edit filter can have a different score, for example). This customisation would be beneficial when several recent changes patrollers are online at the same time, since it would reduce the likelihood that they are all chasing the same bad edits: a phenomenon with which every Huggle user will be familiar.

In addition to all the above, new page creations by non-whitelisted editors are displayed, as are speedy deletion requests and deletions of those pages. Reverts, warnings and blocks are shown as they happen too, as well as other relevant events such as AIV reports. It's quite possible to leave the application running in the background while working on something else and only take action when vividly-highlighted edits appear, or to sit back and just watch as the seamier side of Wikipedia is acted out before your eyes. It's reassuring to see how much vandalism is quickly reverted by the dedicated band of recent changes patrollers using the existing tools – but AddBad regularly reveals unwanted edits that have been missed by others.

Application details[edit]

In its prototype form, AddBad is a set of Perl scripts, with a bit of jQuery to make the web interface work. One script collects, massages and stores the irc rc feed. A second script tails the output of the first, analyses each line to determine if, where, and how it should be shown, and uses Ajax to regularly update the scrolling lists on its webpage, as shown above (which is served from a local web server). Clicks on individual entries on the webpage can show diffs, or (at present) send an edit history or page history to the program user's logged-in Wikipedia session for processing.^[5]. An integrated front end for reverting etc. (like Huggle's) would convert it into a fully-fledged anti-vandalism program.

At present I don't plan to put in the additional work that would make AddBad suitable for wider use, but could be persuaded if there's enough interest. However I hope these notes describe some useful principles for anyone interested in creating a new generation recent changes patrol or anti-vandalism program (or enhancing the existing applications). I'd be happy to discuss these principles with any bona-fide editors.

Notes[edit]

^ ^a ^b These are freely available from Maxmind.
^ As of May 2015, en.wikipedia had 128 live filters, including the then recently-added "chicken fucker" filter.
^ Development has proceeded intermittently since February 2014, and I'm still tweaking it as of April 2016. AddBad is short for "Adds Badness", which is how the application prioritises events.
^ The highlighting is CSS-based, so it's simple to tweak or change completely.
^ I use Twinkle to help with this, which explains the great increase in my edits tagged (TW) since March 2014.

[Mx-1] These are freely available from Maxmind.

[2] As of May 2015, en.wikipedia had 128 live filters, including the then recently-added "chicken fucker" filter.

[3] Development has proceeded intermittently since February 2014, and I'm still tweaking it as of April 2016. AddBad is short for "Adds Badness", which is how the application prioritises events.

[4] The highlighting is CSS-based, so it's simple to tweak or change completely.

[5] I use Twinkle to help with this, which explains the great increase in my edits tagged (TW) since March 2014.

[1]

[2]

[3]

[4]

[5]