User talk:Crispy1989/Archives/Archive 1

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Constructive training

Hi, I have put some entries on the Constructive and vandalism pages. One idea that you may (or not) have considered is to identify reliable editors known not to vandalise. All changes made by these users would be constructive. This would give a large population for training the bot. A large set of reliable vandalism is harder to obtain, but it might be possible to identify some by using the reverse edits made by these same reliable editors rather than manually identify the original vandalism. --Brian R Hunter (talk) 23:14, 1 April 2008 (UTC)

The problem with this is that a huge difference in the number of vandalism vs nonvandalism edits could cause unreliable operation of the neural network. It's a good idea, and if we can find a way to reliably identify vandalism en mass as well, it could work. Crispy1989 (talk) 20:32, 10 April 2008 (UTC)

I agree that it would be good to balance the numbers. There are two approaches which combined might give useful results in mass identifying vandalism.

As above, look for undo's self identified as vandalism performed by our reliable editors. Take the reverse of this edit to be vandalism.
Identify editors whose only contribution is vandalism, possibly by following the contribs link of the originator of the vandalism identified in 1 above.

--Brian R Hunter (talk) 20:46, 10 April 2008 (UTC)

I've been considering this too; I think a major problem with using trusted editors as a source for good edits is that it'll bias it hugely. Trusted editors tend to type coherent sentences, have good spelling, not make so many markup errors because they're experienced and because they preview, use edit summaries, and all the other things that tend to be good practice, but the absence of which is not vandalism - we don't want an automatic-newbie-biting-machine ;)

I like the idea of going through reversions, possibly reversions accompanied by suitably strong warnings, to get a vandalism set, though. Pseudomonas_(talk) 10:25, 19 May 2008 (UTC)

Finding vandalism

How about contacting someone like Gurch and asking him if huggle can compile a list for you of all the reverts? That would be quite accurate. Tiddly-Tom 13:01, 3 April 2008 (UTC)

A possible idea but do not blindly follow all reverts as many are done in error and are themselves reverted. Even manual reverts by trusted editors can be wrong - people and bots make mistakes. Brian R Hunter (talk) 13:36, 3 April 2008 (UTC)

Compiling Dataset

Hello Crispy1989

I read your request for examples of vandalism. This part of the request has me puzzled:

"Part of the preprocessor groups words into categories (based on wiktionary categories) for processing by the neural network. If there are any additional wiktionary categories that you think might be pertinent to vandalism (ie, vandalism will show a marked difference in words from those categories that normal edits in a reasonable number of cases), add it to the bottom of the list:"

Is this saying you are looking for categories of articles that are likely to be vandalized?

Or are you looking for something more specific? Wanderer57 (talk) 17:28, 10 April 2008 (UTC)

No, basically there are three things we want.

Thousands of examples of vandalism (diffs).
Thousands of examples of good edits (diffs).
Possibly more Wiktionary (not Wikipedia) categories of words likely to be used in vandalism.

Thanks for your interest in helping, and I hope I've answered your questions. -- Cobi^(t|c|b) 17:54, 10 April 2008 (UTC)

To expand on that, the list of categories isn't only a list where words in the categories make it more likely to be vandalism, but also words that make the edit less likely to be vandalism. The neural network will "figure out" which it is, and by how much. Even if a category is completely arbitrary, it will figure that out as well. Ideally, we would use all categories, but the more categories that we use, the more computing power is needed to process the neural network, and we only have enough computing power for a few select categories. Crispy1989 (talk) 20:32, 10 April 2008 (UTC)

What about sneaky vandals?

Is it possible to train a neural network to detect vandalism such as this one? If I understand correctly, your approach would fail outright because vandal just changed a number. Fireice (talk) 02:08, 11 April 2008 (UTC)

I guess it depends on the factors included for consideration by the neural network. In this case a number of factors combine to aid identification. In themselves they do not indicate vandalism but the combination of the changed text and these may be sufficient to allow a bot to revert with a 'possible vandalism' justification.

The user is not registered.
There is no edit summary.
The user has made few other edits
Other edits by the user have been identified as vandalism.
Other recent edits by the user have been reverted.
The replacement number format (a low number with two decimals) compared with the previous number.
The number value combined with the words 'film' and 'budget'.

A neural network will typically discover a lot of hidden factors given sufficient examples.

--Brian R Hunter (talk) 22:12, 12 April 2008 (UTC)

Checking

Hey. on User:Crispy1989/Dataset/Vandalism do you want me to go through the list like someone has started doing and check the diffs to make sure they are vandalism? And any that some people would say no to have a discussion about or something? Please let me know what you think. ·Add§hore· ^Talk/_Cont 21:40, 12 April 2008 (UTC)

I think they should be checked, the bot should only work with blatant vandalism. More subtle cases of what is generally thought to be vandalism, like removing deletion tags or discussions on talk page can be easily handled by humans. Fireice (talk) 23:57, 13 April 2008 (UTC)

Sure, go through both lists and remove any wrongly added entries, however, please do not remove subtle vandalism. -- Cobi^(t|c|b) 07:56, 14 April 2008 (UTC)

What about if many people disagree over whether it i vandalism or not. ? ·Add§hore· ^Talk/_Cont 15:34, 14 April 2008 (UTC)

The Wikipedia vandalism policy and "assume good faith" should be followed. In the cases where it's ambiguous from that standpoint, it shouldn't be listed in either list. Crispy1989 (talk) 04:33, 18 April 2008 (UTC)

Randomness of samples

Hmm, shouldn't the sampling be random? I mean, I started to collect links and realized that if I reported every (obvious) vandalism to the list, I could report every revert of that vandalism as constructive. But that wouldn't be a random sampling of constructive edits, it would only be vandalism reverts. Thinking a bit more on this: it's probably reasonable to assume the constructive, as well as the vandalism links reported actually won't represent a random sample at all?

I don't really know how this works, but I would suspect that is a problem?.

Suggestion: Have the bot collect random edit samples, and let editors wanting to help classify them as vandalism/constructive/uncertain or something. Then you could have more than one editor check every link also. I mean, to prevent a vandal to mark valid links as vandalism and so on. Then, if you identified someone making bad faith reports you could remove his input from the dataset. And if different users make different classifications then that would mark the edit as uncertain...

I hope this makes sense? :) Apis (talk) 04:19, 18 April 2008 (UTC)

Maybe it didn't sound like one but it was actually more of a question than a suggestion (that part was just an idea). So is it ok to add a lot of samples that are not the slightest random? --Apis (talk) 03:13, 20 April 2008 (UTC)

Yes, nonrandom samples are OK, as long as it's not too outrageously nonrandom. Crispy1989 (talk) 03:43, 20 April 2008 (UTC)

Bayesian filtering

Hi Crispy1989 - I'm happy that someone is finally working on an ANN vandalism solution. I've long since thought about undertaking such a project, but the time requirements were just too much for me. Have you considered adding a Naive Bayes classifier also? This is how some spam filters work and it was another idea I had for fighting vandalism. Once you guys collect a large dataset, I might be able to use it to create a Bayes vandalism detector to run in addition to your ANN one. Oh, and I've got experience with neural nets, adaptive algorithms, and Bayes nets/classifiers, so I would be glad to help in any capacity. --CapitalR (talk) 17:09, 23 April 2008 (UTC)

Hi - I was also thinking about undertaking a similar project, not with neural networks probably but with support vector machines, but wasn't sufficiently ambitious. It seems like once a database is compiled, it would be useful to mess around with all kinds of classification techniques, see what works best. I'll go do a little manual adding to the edit lists now. I don't know PHP, but I do have some experience with machine learning, so if there's any way I can help further, let me know. Kalkin (talk) 21:12, 11 May 2008 (UTC)

Before I considered an ANN, I did consider a Bayesian Classifier, but decided against it for many reasons. One is that it only takes content into account. It also doesn't consider relationships between words, only proportions of words, which would leave a lot of error when detecting vandalism. The neural network as it is does have a component which considers content (although less specifically than a Bayesian Classifier). Crispy1989 (talk) 21:43, 11 May 2008 (UTC)

Spread the word

Most editors are not aware of the rewrite. I was thinking that we should post a message to the village pump looking for ways to spread the word around to experienced editors, perhaps on a notice board. --209.244.31.53 (talk) 20:29, 2 May 2008 (UTC)

It won't work

Seriously, I tried this years ago. Neutral networks are great in theory but useless for anything other than trivial classification problems in the real world. You're better off going after specific characteristics as is done with ClueBot at the moment -- Gurchzilla (talk) 08:02, 11 May 2008 (UTC)

Actually, neural networks are a lot more flexible than that. In this specific instance, the neural network performs so well that it's even identifying errors in its own training dataset. The key in getting a neural network to work in instances like this is to be able to effectively convert the input data into a format that will work with the neural network, and that part forms the majority of the code. I'm not sure what you used to convert edits into the neural network's input layer, but if you didn't carefully consider that, it may be the reason why your attempt failed. If you give more specifics on the approach you used, I might be able to shed some light on why it didn't work. But like I said, my approach is already outperforming the existing Cluebot to the extent that it's identifying classification errors in its own training dataset, so it would appear that this is indeed a superior method. Crispy1989 (talk) 21:51, 11 May 2008 (UTC)

Features

Beyond wiktionary categories, what else is likely to get into the feature vectors for a given edit? I ask mainly from curiosity - I'd be interested to see, for instance, whether using POS tagging + WordNet synset groupings gives any interesting results, since one could get useful category information from previously unseen words, and potentially use it to build custom features like "replacement with antonym". I do realise that this isn't a collaborative effort, so you may save yourself work by not letting the likes of me interject... Pseudomonas_(talk) 10:00, 19 May 2008 (UTC)

It's definitely a collaborative effort, and thanks for the suggestions, I'll see what I can do. The problem with adding too many inputs is that the precision limitations on floating point values may limit the effect of the more important values when in the input layer alongside many less important ones. The key is finding which values are the most important. Currently, the limited set of wikionary categories was hand-picked because I thought those categories might be important. After I figure out the accuracy of the current network (after there's enough manually picked training data), I can experiment with what might make it more accurate. Thanks. Crispy1989 (talk) 19:54, 26 May 2008 (UTC)

How are we doing?

Well, quite a few of us have been adding our contributions for a while now. It would be nice to know how useful they've been. Are you able, on the basis of what's been submitted so far, to sharpen up your requirements? What do you need more of?

One thing that occurs to me is that it isn't really easy to visualise whether a diff will add any value or not - I tend to add ones which the current bots missed but which I feel I could hope that a really, really good bot would catch. Might a better way be to publish a list of (potentially, thousands of) edits which the current network rates as 'borderline', and have a team of volunteers adjudicate? I can see some disadvantages with that, but wouldn't it make better use of the available brainpower? Philip Trueman (talk) 11:36, 22 May 2008 (UTC)

Good idea - I'll work on a way of doing that. My eventual plan is to train the base neural network with the output of the current ClueBot (still running the old code) to get a sort-of average baseline training. Then, I'll retrain it with all of the manually selected edits, to fine tune it. After the training is finished, I'll run the entire training set through the network again, and group the incorrectly classified or borderline edits into a category for further review, and if remaining, more intensive training. At that point, I'll run a set of random (untrained) edits through the bot and post them here for manual review for correctness. Thanks for the suggestion. Crispy1989 (talk) 19:51, 26 May 2008 (UTC)

I'm also looking at building a tool for curating a corpus of unclassified edits for this - I'll let you know how I get on. Pseudomonas_(talk) 13:13, 27 May 2008 (UTC)

general comment

Edits that are suicide threats should probably not be reverted by bot, even if they are/contain vandalism (although from the definition of vandalism, they aren't vandalism). 69.140.152.55 (talk) 21:23, 29 May 2008 (UTC)

A users edits to their own user page should mostly not count as good or bad unless they are a revert of trolling and bad behavior, vandalizing a sock/ban tag, et cetera. 209.244.31.53 (talk) 04:29, 31 May 2008 (UTC)

You have more messages

USER TALK:Crispy1989/Dataset

All subpages have talk pages, maybe it was a bad idea to start it, but this may be used to dataset discussion and then main talk for neural net/engine. 209.244.31.53 (talk) 04:29, 31 May 2008 (UTC)

A few more questions

What aspects, such as the edit window contents, user rights, edit summary, reference websites, page type or a special page (something like WP:AIV needs to be treated differently than other pages, if it is opted in), and user pattern will the engine detect? Should edit wars or trivial good faith edits be on the constructive list? What about accumilated vandalism by one user where each edit by itself is not enough to be reverted? Diffs for nominating a page for deletion, when page is afterwards deleted? --209.244.31.53 (talk) 19:09, 14 June 2008 (UTC)

Things that the bot should revert as vandalism should be classified as vandalism. Things that the bot should not revert and are not classified as vandalism should be on the good edit list. If these criteria are ambiguous, the edit should go on neither list. Crispy1989 (talk) 17:53, 1 September 2008 (UTC)

If only the page itself is checked, there will be many reverts left to human reverters. If users rights were used, such making admins incapable of vandalism, there would be less mistake reverts and the threshold may be risin. There are no diffs in the dataset for adding a deletion tag, these pages are deleted and diffs that may have been in the dataset don't work, though for some reason they work for a short while. As for accumilated vandalism, comparing across multiple revisions may produce more reverts. 209.244.31.53 (talk) 06:01, 30 October 2008 (UTC)

Related work

Hi Crispy, there is ongoing research (Research on social software misuse) that maid be interesting for your approach. A overview is provided in this paper: Automatic Vandalism Detection in Wikipedia (pdf) . I'm a former member of this research group and i know, that they are willing to share information and data with interested people. In particular they already have a training set (wikipedia-vandalism-corpora), including currently (as far as i know) 3200 tagged edits. Download can be found here (zip). Don't be confused the amount of 940 edits given on the webpage is not up to date.
Feel free to contact them or me, if you're interested in any kind of cooperation.

Greetz Simor MSSimor (talk) 21:23, 24 June 2008 (UTC)

Toolset For New Cluebot Engine?

I'm curious about the tools you're using to develop this ANN engine. Languages? Compilers?--Isaac R (talk) 17:37, 3 August 2008 (UTC)

It's being written in C using the gcc compiler. The ANN toolkit name is annutils, but the stock annutils version contains a number of flaws. The svn version has these corrected. Crispy1989 (talk) 17:54, 1 September 2008 (UTC)

Cluebot rewrite

I noticed that cluebot is written in PHP. I am a decent PHP coder and I have experience with signal/noise heuristics. Is the new version also going to be in PHP? Is their anything I can do to help out? —Preceding unsigned comment added by U0000 (talk • contribs) 18:07, 20 August 2008 (UTC)

Most of the new cluebot is going to be written in C. There is far too much intense numerical processing for the core to be written in a scripting language. However, the Wikipedia interface code will still be written in PHP, and will use most of that portion of the code from the current Cluebot. Crispy1989 (talk) 17:55, 1 September 2008 (UTC)

cluebot implementer help?

Hi, I'm helping maintain the OLPC wiki and we are trying to set up cluebot to monitor its changes. Is there a mailing list or other gropu of cluebot maintainers and implementers I could write to for help? Thanks! +sj + 21:55, 6 January 2009 (UTC)

Also, should we be using the old or the new cluebot, if we're trying to get something working this month? +sj +

Dataset interface

I, I was trying to register to Cluenet to contribute to the dataset, but their authentication system doesn't work. Are there any other way to help? (meanwhile, I've left a proposal here). Thank you. --CristianCantoro (talk) 17:33, 5 April 2009 (UTC)

Idea

"The word categories (as opposed to a blacklist) allows the bot to recognize that certain words may be acceptable if similar words are already used in the article"

This is good, but what I think is better is to look for similar words not only in the article but in all articles that cover the same main topic.

The first step to achieve this is to categorize articles based on their main topic. The second step is to do an analysis on a data dump to determine the frequency of "bad words" appearing on each main topic.

If you think this idea has a chance then I can go into details on how I think it can be done. Sole Soul (talk) 02:22, 27 October 2010 (UTC)

I've considered something similar to this, but with a more general (albeit more complex) method not relying upon topic distinctions. It's on my TODO list, but fairly low down. Very few of the currently incorrectly classified edits would be helped by this, so I'm focusing on the more effective improvements first. Crispy1989 (talk) 02:27, 27 October 2010 (UTC)

Bot training interface

I would be willing to train the bot some through the interface you mentioned.

All those classifications will go straight into training though wouldn't they? Would it also generate a new trial dataset as well? Gigs (talk) 02:00, 3 November 2010 (UTC)

Training is done asynchronously, not real-time. The interface automatically groups edits together, so they can be combined into whatever dataset is currently needed. Currently, the interface is loaded with edits from the existing dataset which need to be reviewed for accuracy, which is currently very important in training. After those are almost finished, we'll add some of the live Wikipedia edits, in a different group - we'll likely use this first as a trial dataset, to gauge more precisely the bot's effectiveness on live edits, then we'll merge it with the training set to improve that accuracy.

The training interface uses google accounts - all I need is your google account name and I can give you access. You can email it or send it in any other private manner you'd like. Crispy1989 (talk) 02:08, 3 November 2010 (UTC)

I think I've found some bugs in the training interface. PleaseStand ^(talk) 00:47, 6 November 2010 (UTC)

http://en.wikipedia.org/w/index.php?action=view&diff=394519572 is a page not in the main namespace
http://en.wikipedia.org/w/index.php?action=view&diff=394519609 is an edit by ClueBot NG
Every once in a while, I get an error message and have to refresh the page. Often, but not always, these error messages include the text "no more edits." I am using Firefox 3.6.12 on Windows.

Also, I see many edits from RjwilmsiBot adding "persondata" (and also some edits from other bots). PleaseStand ^(talk) 00:50, 6 November 2010 (UTC)

Classify it normally, even if it is not in the main namespace.
Classify it, still.
Say every 1/80 or so? It's on my todo list to fix.

A lot of the edits from interface are random edits picked as a random sampling. The bot needs all kinds of edits to be able to classify them correctly. Also, in the future, let's keep ClueBot NG stuff on it's talk page. Thanks. :) -- Cobi^(t|c|b) 00:57, 6 November 2010 (UTC)

Should I mark spam as vandalism or constructive. On one hand, I would revert such edit if I stumble upon it, and on the other hand, it is not the bot's job to fight spam. Sole Soul (talk) 10:05, 7 November 2010 (UTC)
Hi - in the future, we should keep general Cluebot-NG stuff on the Cluebot-NG talk page. In response to your question - isn't spam a subclass of vandalism? Crispy1989 (talk) 11:26, 7 November 2010 (UTC)

ClueBot NG review interface

Hey, I just requested access to the ClueBot NG review interface. My Google username is the same as this one. :-) --Ixfd64 (talk) 02:05, 14 November 2010 (UTC)

Your review interface account has been created. Thanks for your help! Crispy1989 (talk) 07:45, 14 November 2010 (UTC)

Talkback

Hello, Crispy1989. You have new messages at Hamtechperson's talk page.
You can remove this notice at any time by removing the {{Talkback}} or {{Tb}} template.

20:24, 17 November 2010 (UTC)

ClueBot NG

Hi, Crispy1989

I've noticed ClueBot NG starts a new November 2010 section to warn users about vandalism when there's already one on the talk page. Is that expected? --John KB (talk) 18:24, 20 November 2010 (UTC)

That's a bug in the bot, but it's not my area. I work with the core that does vandalism detection. Cobi wrote and maintains the interface to Wikipedia, which includes the warning logic. Crispy1989 (talk) 18:31, 20 November 2010 (UTC)

Ok. Thanks, Crispy. --John KB (talk) 18:32, 20 November 2010 (UTC)

Barnstar

The DaVinci Barnstar

I hereby award you, Crispy1989, the DaVinci Barnstar for creating the amazing vandal-fighting ClueBot NG. It seems to always revert vandalism within two seconds! If I had a nickel for everytime it beat me to the revert... Excellent work! :) --Meaghan [talk] ≈ 00:42, 29 November 2010 (UTC)

Anti-vandalism bot operator's barnstar

		Anti-vandalism bot operator's barnstar
		For creating the ClueBot NG! For the thought and effort that has gone into creating, training, and maintaining this awesome project. Thank you for keeping the WP clean(er). Keep up the great work! — HELLKNOWZ ▎TALK 13:50, 10 December 2010 (UTC)

ANN inputs

I'm interested in knowing the list of inputs/stats that goes into the ANN and maybe will suggest adding a few. I read the subpages of User:Crispy1989/CluebotNG Metrics and made some guessing on what they represent as they are not documented and seems outdated. Sole Soul (talk) 01:24, 17 December 2010 (UTC)

Here are the input config files, and here is a list of input names, usually somewhat descriptive of what they are. -- Cobi^(t|c|b) 01:27, 17 December 2010 (UTC)

Also important is this. It's what actually generates the list of inputs, and includes the formulas for generating them. Crispy1989 (talk) 01:29, 17 December 2010 (UTC)

I have a few suggestions, some of them I'm positive are not already included in your code, some I'm not so sure. The Edit Filters inspired some of these suggestions:

Appended text: text added at the end of the page after categories or interwiki links. An exception for templates and external links should be added. This is highly indicative of vandalism.
Name of the user in the added text or the added external link. Make an exception if the user name is the same as the article title (like User:foo_ fan in an article named foo).
Whether an article is a featured article. For example, a large change to featured article by a new user maybe indicative of vandalism.
Whether an article is about a BLP. The community has high tolerance for reverting a questionable edit if the subject is a BLP. I think the community would support lowering the threshold and ignoring 1RR in this case.
Non-English contributions. Calculated: (total characters - (English letters + numbers + special characters + white space)) / (total characters + 1)

Also other suggestions not related to the ANN inputs:

Do not revert if the suspected edit is the user's fifth or more in a row. The logic: if a user has made 4 edits without being reverted by a human then there is a good chance he is not a vandal.
Rollbacking a vandal make sense, but rollbacking a newbie who inserted '''Bold text''' by mistake doesn't. In this case only the last edit should be reverted.

Sole Soul (talk) 03:16, 17 December 2010 (UTC)

In response:

This can be added after I finish the wikimarkup parser, which is what I'm working on right now.
I'll consider adding this in the future, but I'm not sure if it's a reliable method of determining vandalism versus constructive. I'll test it to see if it works.
This should be possible to add, but it could take some time. In particular, we'll have to regenerate our downloaded cache of the dataset after updating the code to detect past featured articles. Because this is a time and bandwidth consuming process, we'll likely hold off on this for a bit, until we make other similar improvements that require dataset regeneration, so we can do it all at once.
If there's a reliable way of detecting whether or not an article is a BLP, I can add it as an input to the ANN. Removing the 1RR restriction and/or lowering the threshold is a somewhat different matter, and would require a bit of reworking. I'll put "flexible threshold" on my TODO list, but it's fairly low down compared to other issues. What's a good way of detecting whether or not an article is a BLP?
This isn't really easy to implement. Many of the bot's core statistics involving characters are done via table lookups into a table of 256 characters, for efficiency. Right now, unicode is simply stripped. As far as I can see, foreign characters occur pretty much equally in vandalism and non-vandalism, so I doubt it would be a terribly good metric anyway. I'll see if it's possible to count skipped unicode characters though.

This is a question for Cobi, but I don't think it's an infrequent occurrence that a user vandalizes many times in rapid succession, before they get reverted. Cobi may have different input, though.
There's no way to determine the "type" or "severity" of vandalism with the current neural network. I could modify it to provide additional outputs, but then it would require the entire dataset to be rebuilt from scratch, with humans not only classifying every edit as vandalism/constructive, but also with a "severity" score.

Crispy1989 (talk) 03:55, 17 December 2010 (UTC)

2) I should add that this is only for registered users. It is maybe an indicator of COI or vandalism (for example: User:foo adds "foo was here")

3) Check for {{featured article}} in the text

4) Check for [[Category:Living people]] in the text

5) If an edit is large addition that consists mostly of foreign characters then it is reverted by humans most of the time [1] [2].

Hence I said 5 or more. I remember seeing examples of FPs like this. If it is a FP then it is highly bity to revert, because the user had put large effort. If it is a vandalism then the user will eventually be caught and reverted with one click by a human. Sole Soul (talk) 11:48, 17 December 2010 (UTC)

2) I'll add it and see if it helps at all.

3) Great, I didn't know it was that easy. Will do.

4) Also easy. Will do (add as ANN input).

5) I'll see if there's a good way to track the unicode characters that are being stripped out. It may or may not be possible/easy. If it is, I'll add it.

1) This might be possible, but it falls into Cobi's domain (the Wikipedia interface). The ideal way to implement this would be as an ANN input (number of recent edits that weren't reverted). However, I suspect it may take several additional HTTP queries to implement, and may make it quite slow (I could be wrong about this). The other possible issue I can see with this is that it could actually lead to increased false positives, instead of reducing them. Because the majority of constructive edits will have at least 5 (and usually more) prior unreverted edits, the ANN will learn that users with less than this number are highly suspect. One possible modification that could improve this is to, instead of including the number of sequential recent edits that weren't reverted, include the percentage of the previous 5 edits that *were* reverted. But again, this may not even be practical to fetch. Crispy1989 (talk) 12:15, 17 December 2010 (UTC)

Regarding your last point, what about adding it as part of the post-processing filters rather than the ANN inputs?

I have more suggestions for your consideration:

1) Articles about schools are very highly vandalized. Can you make an input that look for the word "schools" in categories.

2) Look for the word "plot" in the edit summary inside the syntax "/* */ ". Reason: good edits under plot sections in articles about movies often has high resemblance to bad edits, resulting in FPs.

3) This one is for Cobi and not related to the ANN inputs. Why not make the bot remove the talk page warning if a user reported it as a false positive. Yes a vandal can also make a false false positive report but he also can remove the warning. The edit summary of the removal should be something like "Remove warning. Reason: user reported it as false positive". Sole Soul (talk) 23:01, 18 December 2010 (UTC)

Highly-vandalized articles are already covered in a different way. One input to the ANN is the number of times in recent history that an article has been vandalized. This has the effect of applying not only to schools, but also to any other frequently targeted articles.

A planned addition after I finish the wikitext parser is to add a Bayesian database for edit summaries. This should be able to pick up on "plot" as well as other similar keywords.

In general, I'm trying to avoid as much as possible hardcoding words into the bot. There's almost always a better solution which involves learning which words are significant rather than hardcoding them in. The only times it's sometimes a good idea to hardcode a string to search for in an input is for certain types of templates or categories that, independently, could drastically affect the result, as opposed to representing a trend. (For example, looking for "schools" in a category would have been an acceptable use of a hardcoded string, but that particular example is already covered by other mechanisms as I explained)

We've considered causing false positive reporting to cause effects such as removing the warning or reverting the revert, and we still haven't come to a decision on this. We're trying to consider all possible effects. But with the new, easier false positive reporting interface, we're getting a surprising number of false false positives. Crispy1989 (talk) 23:41, 18 December 2010 (UTC)

ClueBot NG down ?

Hi Crispy1989, Since 12:39, 21 December 2010 I see ClueBot no longer makes reverts. Also the IRC-channel on cluenet.org no longer reports stuff. Krinkle (talk) 02:09, 22 December 2010 (UTC)

ClueBot NG is being moved from a home workstation to a more stable hosted, dedicated server. Crispy1989 (talk) 18:43, 22 December 2010 (UTC)

Speak

I attempted to make a report about User:ClueBot NG false positive on the Speak article

I can't figure out how to find the post. How do I find and review my post?
I kept seeing the word "anonymous" even though I typred out my user name. I take my privacy dead serious and if my IP is revealed on the report then I want the revision history to be deleted immediately. Like right now! Slightsmile (talk) 17:36, 3 January 2011 (UTC)

The report is here, report comments etc will always say anonymous unless you are logged in with a report interface account. As far as I am aware the IP of the reporting user is not displayed to any-one in the report interface not even when I login with an administrative account. I cannot confirm if this is stored in the back end which is a possibility. DamianZaremba ^{(talk • contribs)} 17:48, 3 January 2011 (UTC)

Thanks for the fast reply. btw the false positive report had the wrong link and now I can't find the correct one so just cancel that report. One of those days. FWIW the edit I was revering to, 14 year old Kristen Stewart was allergic to grass and so had do a scene differently in the film Speak. Again thanks and oops. Slightsmile (talk) 18:01, 3 January 2011 (UTC)

I just noticed this, I cannot confirm if this is stored in the back end which is a possibility. I don't understand what that means. Is this something I should be concerned about? Slightsmile (talk) 18:11, 3 January 2011 (UTC)

Reports are stored in a MySQL database (as is a lot of ClueBot NG data)and as I don't have the source code to hand I cannot verify if anything like your IP gets stored in the table of comments etc. I do not believe anything like this is stored and looking though the scripts which utilize the report data nothing relevant to the reporting user such as your IP addresses is ever used. So in short no there is nothing to worry about, I will ask Cobi to make clear what exactly is stored on the report page before you submit entries so this is clearer in future DamianZaremba ^{(talk • contribs)} 18:19, 3 January 2011 (UTC)

ClueBot NG 2

Hi Crispy

Thought I'd make you aware of this edit on Cobi's page. As you can see a new editor has shut off ClueBot. I thought I'd post on your talk page because I don't know if you can restart ClueBot NG? It's just in case you get online before Cobi. Am I right though that the mistake by ClueBot isn't sufficient enough to close the Bot down, it should've been reported as a false positive? Personally that's what I would've done.--5 albert square (talk) 23:38, 5 January 2011 (UTC)

OK, I think someone's got it up and running again now.--5 albert square (talk) 23:43, 5 January 2011 (UTC)

Someone reverted the bot shutoff about a minute after it was shut off. In addition, it seems you've found the ruckus I've raised on the editors userpage. Also, I am about to suggest something to Cobi to hopefully prevent incidents like this. -- SnoFox^(t|c) 23:52, 5 January 2011 (UTC)

Yeah, it would be a good idea if newbies weren't allowed to switch off the Bot. I don't know if a change to the coding would maybe stop that?--5 albert square (talk) 00:31, 6 January 2011 (UTC)

Well, it's already semi-protected. If you see over at the ClueBot Commons talk page, specifically my edit, you can give other alternatives. ClueBot NG's current code just takes the value of the page and sets it into a variable. I just looked, and it simply checks to see if is true or false. Gathering info on the last editor might cause the Wikipedia interface to be even slower than it is... -- SnoFox^(t|c) 00:56, 6 January 2011 (UTC)

The Cluebot knowledgebase

Hi crispy , I have seen Cluebot do some pretty nice reverts. In terms of where it should be in 5 years, I suggest looking at it as an expert system application with comprehensive rules. I have a few ideas. It may be easier if I suggest the ideas and you add them, as appropriate, rather than start coding myself.

For instance, this edit included a "random string" and could be caught by a rule of the type:

IF

user is an IP

confidence in randomness of string > 75% and

familiarility with IP < 20%

THEN

confidence in vandalism > 70%

Eventually the final confidence should be derived using inexact reasoning. It is pretty easy actually, if approached the right way. These rules can then be gradually fine-tuned over the years and in 5 years you will have a very comprehensive knowledgebase.

I would also suggest tapping into wordnet eventually and that will open a totally new door.

How do we start this conversation? Cheers. History2007 (talk) 10:53, 8 January 2011 (UTC)

Hey History2007. ClueBot NG uses an artificial neural network, which I do not believe allows for a rule exactly like that in the core of the bot, which "measures" for vandalism. If memory serves, the bot already takes into account if an edit is made by an IP user or a registered user. It also already checks against known words. In regards to that specific edit, I can't figure out why ClueBot NG did not revert it. I cannot find a log of it on IRC nor was the IP editor reverted before on that article. Since I'm just a simple talk page stalker who thinks ClueBot NG's core is mostly a black box, you can pick up a conversation with Cobi/Crispy best by hopping onto IRC on ClueIRC's #cluebotng. If you do not have an IRC client, check out Mibbit. -- SnoFox^(t|c) 18:07, 8 January 2011 (UTC)

Hi, do you have a direct link to the neural net system used? In general, neural nets and rules complement each other - they use very different approaches and have very different success rates on different applications. Neural nets are actually somewhat harder to get going that simple rule systems. Second question, is there a simple, very simple, example of a Perl-based bot out there that I can experiment with to generate reports for myself in my user space without affecting Wikipedia? If you know of one, I can just start playing with that too. Thanks. History2007 (talk) 18:27, 8 January 2011 (UTC)

Simple rule-based systems such as the one you suggest perform very poorly in this context. The original ClueBot used a rule-based system like this, and even with Cobi's impeccable maintenance and frequent updates and improvements, it caught, at best, a small fraction of the vandalism that ClueBot-NG catches. The future of vandalism detection on Wikipedia is machine learning.

You're right that ANNs don't work well for all applications - but we carefully considered alternatives before starting. The closest machine learning technique to your rule-based system is probably a form of learning decision tree, but we decided that an ANN would be more effective in determining the complex relationships often demonstrated between statistics on vandal edits.

You're also right that ANNs are harder to understand, code, and get working than rule-based systems. But in writing CBNG, we aren't aiming for what's easiest - we're aiming for what will work the best. Also note that we have indeed gotten this ANN-based system to work very well.

Improvements to our current approach utilizing an Artificial Neural Network come mostly in the form of adding additional inputs, or modifying existing ones. We already have a few inputs which help to identify random additions, and we've considered adding an additional input based on markov chains (on letters).

We have already considered including additional word-based measures, but something like wordnet would have limited effectiveness by only using synonyms. We're looking into developing our own system that forms a database by scanning existing Wikipedia articles, and instead of collecting synonyms, collecting related words. Using this sort of method, an addition's relation to the existing article content could be determined.

The current neural network library used by ClueBot NG is called libfann. It used to be annutils, but that was no longer maintained. The exact library used is interchangable, as the CBNG core is modular. However, most of the work is not actually running through the ANN - most of the work is calculating the inputs using various simple statistics as well as NLP techniques. libfann is open source, but I'm not sure if it has a Perl interface. CBNG code is also open source, and the core is written in C/C++. Perl is just too slow to practically calculate all of the statistics necessary to get a well-performing ANN. Crispy1989 (talk) 19:04, 8 January 2011 (UTC)

Thanks for your detailed response. So the one factor I had not thought about was execution speed. I guess you have to run through many edits. I was thinking of Perl because it is so portable and easy to use all over the place. As for Wordnet, I was not thinking of using synonyms but their ontology/category structure. It is a rich system that has much hidden value. I have used it in other applications by extracting suitable items.

My guess is that learning decision trees from reverted vandalisms would not be trivial. In general, if one is to use machine learning in this application, parameter adjustment systems, including neural nets, would work better. However, those systems often get much better results if coupled with a semantically driven system, which complements their behavior with reasoning. In the very long term, I think a blackboard-type architecture will eventually emerge.

Now my specific questions are:

Do I understand that Cluebot is learning by observing reverted vandalism?

Is there a description of the neural net input fields etc.?

How many edits per minute does Cluebot check?

What I would like to do is try a simple Perl-based bot on my own that just prints reports for myself, then see what happens. So the last question:

Is there a generic "Perl bot" link you know of that I can copy and modify and play with?

If I get it working, then you guys can decide if it is worth using, else I will leave it on the shelf - like 70% of AI. But, as Google Translate has shown, AI is eventually making a come back. So we will see. But anyway, if you know of a link to a simple Perl-bot, I will appreciate it. Thanks. History2007 (talk) 21:15, 8 January 2011 (UTC)

We already have a word category mechanism. Our planned system would be much more flexible.
Learning decision trees would be a viable alternative to an ANN - definitely better than fixed rules. In fact, another piece of software, STiki, uses such trees for machine learning vandalism detection - although to less success than ClueBot NG. But as stated, our analysis has concluded that an ANN is more effective in this situation.
As stated, ClueBot NG already uses some NLP techniques ("semantically driven" in your terminology) as inputs to its neural network. As long as each NLP technique produces a value relatively independent from other values, this method works well. It also has the advantage of being able to recognize complex relationships (as an ANN), whereas most blackboard systems would not.
ClueBot learns by training from human-classified edits. Simply observing reverts isn't sufficiently accurate.
A list of neural net inputs is part of the source. It's read from a configuration file containing formulas. See us on IRC for access.
ClueBot NG analyzes every edit to the main namespace, potentially several hundred a minute. Additionally, during training, it analyzes our entire dataset (currently 30,000 edits, and growing) in rapid succession.
CBNG's core current processes an edit in approximately 0.03 seconds. Even with this time, it gets tedious waiting for it to retrain and trial after a modification. We don't want it to increase much beyond this.
I'm sure Perl Wikipedia bots exist, but I don't know of any in particular. Try asking the BAG.
Using Perl for any sort of intensive data processing is a case of using the wrong tool for the job. If your reason for choosing Perl is familiarity, I highly suggest you become familiar with a more appropriate language for tasks such as this.

Crispy1989 (talk) 21:49, 8 January 2011 (UTC)

Thanks for the info. Now the picture of how Cluebit works is emerging. Now:

I hate Perl. I think it is a terrible language - but highly portable, does reasonable pattern matching and has many libraries. That is why I use it once in a while. Even then I hardly ever write Perl directly, I have top level calls to it that do not let me see much Perl. I just like the many libraries it has.

My idea of blackboard system would be that multiple bots eventually talk to each other, one neural net, the other rules, etc. Blackboards can use any strategy, they do best when multiple strategies combine.

As for execution speed, how many computers out there run Cluebot now? 20? 200? 2,000? Eventually I think after it totally stabilizes in 3-5 years you could get 20,000 computers to run it and speed will no longer be a problem. And given that Perl runs almost everywhere, portability for the rule-based parts will not be a problem.

I would not really try to learn decision trees yet. Just for fun, I will try to hand code a set of rules, then eventually open the door to established users at large (who are not programmers) suggesting rules. Then increase it. The idea is a multi-user developed expert system against vandalism. The key to getting it to work is to get the chemists to suggest the rules about what might be vandalism on a chemistry page, etc. Just as users contribute content, they can contribute rules. That will be a very different approach. Best results are often obtained by combining very different approaches.

Thanks for mentioning BAG. I had not thought of that. Anyway, next step is for me to ask at at BAG for a very simple Perl bot. Cheers History2007 (talk) 00:02, 9 January 2011 (UTC)

ClueBot NG runs on one, considerably low-powered server. :) It is trained on a more powerful machine, but it can be trained anywhere, really. I'm with you there on Perl, though. -- SnoFox^(t|c) 00:06, 9 January 2011 (UTC)

Why not run the trained version on 20 computers? History2007 (talk) 00:11, 9 January 2011 (UTC)

Because that's not necessary. ClueBot NG is not some super monstrous program that requires a super computer to run. In fact, it's quite the opposite. Small and modular, it can run off of my machine built for Windows 98 -- and still probably beat you at reverting vandalism. :) -- SnoFox^(t|c) 00:31, 9 January 2011 (UTC)

I'd rather not have another programming language debate, but in short - C/C++ is at least as portable as Perl, and has at least as many libraries. Most Perl libraries are just interfaces to C/C++ libraries. For pattern matching, most other often-used languages support Perl-compatable regular expressions. For CBNG, we made a conscious choice not to do PCRE-style pattern matching, because it's slow.

The concept of a blackboard system requires each approach to contribute to a defined area of knowledge about the subject. It cannot be effectively used to combine multiple full-out approaches. Such an approach was already suggested by the STiki creator, to whom I explained how any intelligently designed system to combine arbitrary overlapping inputs would effectively reduce down to picking whichever input was most reliable.

ClueBot NG runs on a single computer. There's no reason to run it on multiple, because it's efficient enough to run on a single one. It is a very poor approach to programming to allow oneself to be lazy and write inefficient code because you might be able to get enough hardware to run it.

As I already stated, the original ClueBot used exactly the same type of rule-based system that you suggest, complete with allowing users to suggest rules. In fact, several anti-vandal bots previous to ClueBot also used a similar system. It can be shown that a 2-layer ANN is very similar to a rule-based system, with the exception that the ANN optimally determines scores/weights, whereas a rule-based system only uses guesses. An ANN with hidden layers improves even further.

If you'd like to try yet another rule-based bot, feel free - but it's highly unlikely to beat an intelligent machine-learning approach. Crispy1989 (talk) 00:37, 9 January 2011 (UTC)

Ok, I will play with it anyway, and see what happens. Cheers. History2007 (talk) 00:48, 9 January 2011 (UTC)

ClueBot

Hi Crispy

Sorry to trouble you, could you please look on ClueBot's talk page when you get a second? I came across something tonight that worried me a little, it looks like ClueBot removed warnings from a users page and then went back to giving the user a level 1 warning instead of a level 3. I've posted full details and links on ClueBot's talk, really not sure how it's happened!

Still love ClueBot though, congratulations on such a great bot!--5 albert square (talk) 02:28, 13 January 2011 (UTC)

Hi, thanks for noticing and pointing this out. For future reference, we check CBNG's talk page at least as often as our own, so leaving stuff there is perfectly alright. This particular issue falls into Cobi's domain, and he is looking into it as I type this. Thanks! Crispy1989 (talk) 09:38, 13 January 2011 (UTC)

Ah thanks Crispy, and I will note that for future reference :)--5 albert square (talk) 23:36, 13 January 2011 (UTC)

Thanks

Thanks so much for your intervention on page Women in Judaism. --Geneviève (talk) 16:09, 21 January 2011 (UTC)

User talk:ClueBot NG/Run

Hi Crispy

Someone has set up this page, is this a page to do with ClueBot? I know it's related to the Run page of ClueBot, just thought that was a weird message to post on the talk! If it can be deleted then let me know and I'll delete the page :) --5 albert square (talk) 21:11, 12 April 2011 (UTC)

User talk:ClueBot NG/Run is not used by the bot only User:ClueBot_NG/Run DamianZaremba ^{(talk • contribs)} 21:15, 12 April 2011 (UTC)

Ah so it can be just deleted as nonsense? Thought so but thought I'd check it wasn't a test of some kind. Thanks!--5 albert square (talk) 21:42, 12 April 2011 (UTC)

Cluebot NG API (or Logs)

Hi,

maybe you can help me out: I'm a PhD student doing research on Wikipedia, and I wonder if there is also an API for Cluebot NG that you can send a RevID to and get back the vandalism score (or -if not available - maybe just a log of the vandalism scores already computed in the past). I couldn't find anything in that regard on the user:Cluebot NG page or elsewhere. The background is that for my research, I analyze a local dump of the English Wikipedia and I need to filter out revisions that are most probably vandalism (and need to do so also for the past time when Cluebot NG was not active yet and for revisions where he wasn't fast enough and got 'beat' by others ). And Cluebot NG seems to do the best job so far marking such revisions with high (and proven) accuracy. I would set up ClueBot NG myself on my machine to let it go trough the dumps. But this seems fairly complicated and I don't have the proficiency to do that.

Thanks 129.13.72.198 (talk) —Preceding undated comment added 17:04, 28 April 2011 (UTC).

Blender edit...

I have been trying to update the Blender (software) page.

Making an article, getting revisions from various editors; some helpful, some not.

Got a new message, from "bot".

Not meaning to be malicious or useless. Hoping to add to the community. Apparently links are a bad thing :)

Help.

Thank you.

Jambay (talk) 10:06, 28 May 2011 (UTC)

Jambay

Noelia,

Please don revert information on NOELIA, you keep reverting acurate nformation, we are the Record Label of Noelia, aslo her brithday is 1979, is plenty of article who support that information. please help to contibute not to miss inform. Thanks, Daniela. Pink Star Music SM/ Universal — Preceding unsigned comment added by 207.239.157.38 (talk) 15:28, 1 July 2011 (UTC)

The Wikipedia Game

A discussion about improving the help documentation inspired an idea--Wikipedia tutorials would be best if they were interactive and immersive. The thought of a learning-teaching game came up, one based on a real interface with realistic 'missions'. Would you be interested in providing some feedback or helping work on it, or know some editors who might? The idea is just getting started and any assistance with the help/policy side, the experienced-editor side, or the coding/game-making side would be great. Cheers, Ocaasi ^c 01:08, 28 May 2011 (UTC)

Please put your responses at User_talk:Ocaasi/The_Wikipedia_Game to consolidate discussion. Dcoetzee 11:10, 29 May 2011 (UTC)

Yo D Bag, stop f***ing deleting my edits to the Glen macnow page, your a dope, Im not saying anything mean, its all s**t he says about himself on the show as is more relevant to his on air personality than anything else on the page your a bleeping a**hole. — Preceding unsigned comment added by 68.81.172.214 (talk) 15:41, 5 August 2011 (UTC)

ClueBot

Hey man. That bot of yours, it's awesome, it is very rare to get false positives, and it can somehow detect vandalism or not. I made some tests and it's awesome, but just wondering, HOW DO YOU DO THAT!? — Preceding unsigned comment added by 186.213.214.216 (talk) 05:34, 4 July 2011 (UTC)

Not a false positive, but a shared IP

Hi, I saw a notification of vandalism, ID 17990 (Alternative Fuels article). I do occasionally make edits but didn't make the "offending" one (a gratuitous example of human silliness, from what I read (and I'd change my name if it was Brad, too). FYI the IP address of the buffoon that made the edit is a pooled/shared one, since I got the IP and hence the message earlier today (I don't know whether I still have it, my ISP has been somewhat flaky today and the IP has changed more than once). These IPs are part of a pool that gets shared around (and we get a different one sometimes several times a day) and there's never an IP that is "mine"!

I'm not quite sure of the value of IP-related admonishments since it just brasses off legitimate contributors such as I and doesn't concern the buffoons at all (different for fixed IPs, but not for pool users like most of us). If I wanted to make an edit and found the IP blocked by someone else's foolishness, I'd simply not contribute. — Preceding unsigned comment added by 89.242.229.180 (talk) 15:25, 22 July 2011 (UTC)

ClueBot has a sense of Humour...

See this and this over at Jerry Meals. It's all fixed since then, but after his apparently bad call recently, his article got created initially as a method of attacking him. ClueBot reverted vandalism back to...vandalism. XD CycloneGU (talk) 13:52, 27 July 2011 (UTC)

It should be noted that my examples were RevDelled among many more. CycloneGU (talk) 18:49, 27 July 2011 (UTC)

Request to access ClueBot NG Dataset

Hi, I've submitted a request to work with ClueBot NG's dataset, and was wondering if you could possibly go through the backlog of accounts to be approved and let me in. My email address is c****.e******@gmail.com. Thanks. Phuzion (talk) 02:41, 7 August 2011 (UTC)

I will be reviewing all account requests later today. - DamianZaremba ^{(talk • contribs)} 02:43, 7 August 2011 (UTC)

A barnstar for you!

	The Anti-Vandalism Barnstar
	for your wok on ClueBotNG. I was looking for vandalism on the recent changes page, and I found some, but When I went to the page to remove it, your trusty sidekick ;) had already removed it. I think anyone who operates a bot anyway deserves either the bot barnstar, the random acts of kindness barnstar, or the Anti-vandalism barnstar. Happy editing! pluma Ø 00:14, 25 August 2011 (UTC)

A barnstar for you!

	The Anti-Vandalism Barnstar
	For operating ClueBot. Namtar 17:04, 26 August 2011 (UTC)

Bot-edit tag

Could the ClueBot NG be modified to use the bot-edit tag? Having never worked with a bot, I have no idea how this is done. I just know that this will tag edits listed in the Watchlist like the edits by Lucas-bot, JAnDbot and Helpful Pixie Bot (among others). Thanks... Cbbkr (talk) 22:45, 31 August 2011 (UTC)

Protection of bot-shutoff page for Cluebot NG?

I'd appreciate your comments at the "Protection of User:ClueBot NG/Run" section of User talk:ClueBot Commons. Nyttend (talk) 04:07, 23 September 2011 (UTC)

ClueBot NG

I'm sick of this bot. It's the second time that it reverts perfectly reasonable editions: first here and now here. This is serious. Wikipedia shouldn't be reverting massively this way. This will only alienate new users. --190.133.1.3 (talk) 19:12, 25 September 2011 (UTC)

I didn't realise this question had also been asked here. There is a reply already at User talk:Cobi--5 albert square (talk) 01:02, 26 September 2011 (UTC)

I'm also sick of this bot. It keeps undoing what I have edited, and so I need it blocked. I don't know how to prevent it from hitting me, but it alienates me!!!!!!!! Just like 190.133.1.3, we hate the bot, and it looks like putting spam on our Gmail accounts. Well, if you don't delete or block the bot, I will NEVER sign up to Wikipedia again with the IP address I am using!!!!! (125.163.158.31 (talk) 09:20, 14 June 2012 (UTC))

A barnstar for you!

	The Original Barnstar
	Thank you for cluebot! I noticed it while perusing an article I'd edited for vandalism a while back. It appears your bot is now doing a better job than I ever could. Zhoulikan (talk) 06:44, 18 November 2011 (UTC)

A kitten for you!

hi

Kpsp8 (talk) 16:05, 13 December 2011 (UTC)

Hi! Cute kitten! I love the kitten!!!!(125.163.158.31 (talk) 09:37, 14 June 2012 (UTC))

Dataset

Hi Crispy1989, I control Salebot (an antivandalbot) on pt.wiki, 12.000 edits per month. I was looking for the dataset of ClueBotNG to improve that of Salebot. Is it possible to have it? Greats, and good job for ClueBotNG. Kim richard (talk) 09:24, 24 May 2012 (UTC)

'bot penis vandalism

ClueBot NG has learned to be a penis vandal. It needs re-training. Uncle G (talk) 09:18, 3 July 2012 (UTC)

The Wikipedia Adventure: Request for feedback on Community Fellowship proposal

Hi! I'm contacting you because you have participated or discussed The Wikipedia Adventure learning tutorial/game idea. I think you should know about a current Community Fellowship proposal to create the game with some Wikimedia Foundation support. Your feedback on the proposal would be very much appreciated. I should note that the feedback is for the proposal, not the proposer, and even if the Fellowship goes forward it might be undertaken by presently not-mentioned editors. Thanks again for your consideration.

Proposal: http://meta.wikimedia.org/wiki/Wikimedia_Fellowships/Project_Ideas/The_Wikipedia_Adventure

Cheers, User:Ocaasi 16:40, 27 July 2012 (UTC)

Request

I put the message below to the bot's talkpage, but the edit summary still remains. It should be deleted. Thanks,Egeymi (talk) 09:37, 15 September 2012 (UTC)

209.119.38.226

Thanks for reverting the edits of the user on Recep Tayyip Erdogan page. However, edit summary should be deleted, since it includes defamation in Turkish. There are two separate edit summary of this user, thanks dear bot.Egeymi (talk) 05:20, 14 September 2012 (UTC)

ClueBot NG must go!!!

Your bot is racist and against women's rights. The page on Christie Blatchford is defamatory against Canadian Aboriginals and does not reflect the considerable controversy she generates, and erased references that weren't overtly pro-Blatchford. You allow references from the National Post (who has obvious ties to Blatchford) but not Huffington post? 207.34.139.42 (talk) 00:21, 20 August 2013 (UTC) Listen up git, I heard you run ClueBot NG eh? Well I have bone to pick with this so called 'bot'. This automated piece of rubbish is ruining Wikipedia by blocking legitimate edits. I am currently leading the petitions to get rid of ClueBot NG for which, by the way, support is mounting fast. I love helping Wikipedia out by sharing my wealth of knowledge on the lower leagues of British football (or 'soccer' to you yanks), however I am now absolutely fuming at the way in which I have treated by this ridiculously strict 'bot', which will block any (and I mean any) perfectly legitimate information I so kindly donate to Wikipedia. I have a wealth of knowledge on the field of lower league footballers and lesser known teams, information which, if allowed to share, would significantly improve the current appalling state of Wikipedia's pages on lower league English football. — Preceding unsigned comment added by Boomage (talk • contribs) 22:35, 24 December 2012 (UTC)

I agree. — Preceding unsigned comment added by Boyboy13 (talk • contribs) 21:41, 31 January 2013 (UTC)

Hello. There is currently a discussion at Wikipedia:Administrators' noticeboard/Incidents regarding an issue with which you may have been involved. Thank you. - Rich(MTCD)^T|C|E-Mail 14:21, 25 December 2012 (UTC)

Sorry Boomage, ClueBot is exactly what is right with Wikipedia. It is the best thing around. You, on the other hand.... Let me not say... History2007 (talk) 00:57, 26 December 2012 (UTC)

Urgent -- ClueBot NG not editing

Hello Crispy:

ClueBot NG is currently not editing, and has not done so in the last few hours. The IRC feed says "run disabled" yet the run page is set to True. If you could look into this, that would be excellent. Thanks, It's a Fox! (What did I break) 01:15, 19 March 2013 (UTC)

A kitten for you!

hey

how are (talk) 14:34, 7 May 2013 (UTC)

Bot blocked

I had pinged you on the bot's talk - but see you've not been around much lately. I've blocked the bot for 48 hours and will try to find someone active to fix it. See: User talk:ClueBot Commons — Ched : ? 20:39, 22 May 2013 (UTC)

bot idiot

must go its stupid — Preceding unsigned comment added by Teiganezzy (talk • contribs) 10:06, 27 August 2013 (UTC)

Thanks, Crispy. I was just surfing Special:Contributions/Teiganezzy and felt the need to chime in. The only thing better than being able to revert vandalism with the click of a button is not having to. Radiodef (talk) 00:02, 28 August 2013 (UTC)

YOUR BOT IS DEFECTIVE. IT HAS FALSELY SENT ME THREE MESSAGES ALLEGING THAT I "VANDALIZED" WIKIPEDIA SIMPLY BECAUSE I POSTED REQUESTS THAT ARTICLES REGARDING STOCKS TRADED ON THE NYSE, NASDAQ, AND LONDON STOCK EXCHANGE, WHICH ARE GIVEN SOLELY IN MANDARIN, INCLUDE ENGLISH TRANSLATIONS FOR ENGLISH-SPEAKING WIKIPEDIA USERS--AFTER ALL, THE NYSE, NASDAQ AND LONDON STOCK EXCHANGE ARE ALL ENGLISH-SPEAKING STOCK EXCHANGES. TO CALL SUCH REQUESTS "VANDALISM" OF WIKIPEDIA IS ABSOLUTELY ABSURD. YOU NEED TO FIX YOU "BOT." Curse that ClueBot NG of yours ==

	The Anti-Vandalism Barnstar
	As much as I'm in favour of bots that crack down on vandalism on lightning, that ClueBot NG of yours does it a bit too fast. Anytime I head over to the Abuse Log and see brand new vandalism edits, I go over to undo them just to find that ClueBot NG beat me to it. Well played, good sir, well played. Have a nice day, mate. --Matthew (talk) 19:13, 1 September 2013 (UTC)

AV HOF

	Anti-Vandalism Hall Of Fame
	For creating Cluebot NG, I invite you into the Anti-Vandalism Hall Of Fame! buff bills 7701 21:12, 9 September 2013 (UTC)

What's wrong with ClueBot NG?

How many times do I have to report a "false positive" before I get some explanation about it? Is it so, that the bot doesn't read edit remarks or talk pages? Could you explain? Rump Bass (talk) 11:46, 3 January 2014 (UTC)

ClueBot removes my posts when I try to insert correct terms instead of the current incorrect ones. Please make it stop

                                                                                                           -Guns 'n' Coffee  — Preceding unsigned comment added by Guns n' coffee (talk • contribs) 12:28, 12 October 2014 (UTC)

A cheeseburger for you!

What does the algorithm for the bot look like? Adam Perkinson (talk) 22:11, 30 May 2014 (UTC)