Jump to content

Wikipedia talk:Counter-Vandalism Unit/Vandalism studies/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

To do list

Anyone feel free to edit the below to do list to come up with more ideas. Remember 14:17, 4 January 2007 (UTC)

  • Gather other editors of interest into the project
  • Come up with proposed studies to conduct and implement them
  • Revise project page to make it more accessible
  • Figure out what Study 2 will be
  • Finish Obama study

Typology

Note: I'm not putting my name on the items below; feel free to edit directly. If there are disagreements, let's note that in a discussion section. Or split this off into a subpage? John Broughton | Talk 00:22, 4 January 2007 (UTC)

Targets of vandalism

There are a variety of targets that vandalism can hit:

  • Articles:
    • Main page article
    • Other featured articles (FA)
    • Good articles (formally so categorized - GA)
    • Other articles (not GA, not FA)
  • Templates
  • Wikipedia namespace (including Wikipedia talk)
  • User namespace (including User talk)
  • Categories
  • Redirects
  • External links
  • References
  • Media (pictures, music files, etc.)
  • Hehe :)

Sources of vandalism

Vandalism comes from:

  • Anonymous IP addresses
  • Newly registered users (typically vandal-only accounts)
  • Disruptive editors (limited but some constructive work)
  • Trolls, sock puppets, etc. - disgruntled "power users"

Types of vandalism

The IBM study identified five major types:

  • Articles:
    • Mass deletion: deletion of all contents on a page
    • Offensive copy: insertion of vulgarities or slurs
    • Phony copy: insertion of text unrelated to the page topic
    • Idiosyncratic copy: adding text that is related to the topic of the page but which is clearly one-sided, not of general interest, or inflammatory
  • Redirects:
    • Phony redirection (a redirect with a misleading pipe)

Methods of vandalism

How do we feel about a Methods of vandalism section? Will it do more harm than good to be specific? JoeSmack Talk 05:43, 4 January 2007 (UTC)
I think it would be helpful to create categories, but we don't necessarily have to state how one can do the more reckless types of vandalism. Remember 14:02, 4 January 2007 (UTC)
Alright, identifiable but not a diagram on how to do them, I think that is sound. JoeSmack Talk 14:17, 4 January 2007 (UTC)

Impact of Vandalism

Proposed studies

In order to study vandalism and the response to vandalism on wikipedia I had a couple of ideas. First, we can try to somehow gather the data of a random group of vandalism that occurred during a particular time period or on particular pages. The second idea, which would be much more controversial, would be to engage in some small systematic vandalism on random pages and see what gets reverted first and what manages to stay on and why. I believe that this idea would be face lots of controversy, but I think it may yield interesting data and it seems like a controlled way in which to study certain aspects of vandalism. Remember 14:07, 4 January 2007 (UTC)

I think the more random we can make the sample pool the better. The first idea I think should be tried first before the second; many have done 'experiments' like the second and I don't think we need to go that route until others avenues don't prove to be just as useful. If we make the first study empirical, focused, vigorous and detailed I think it could provide a lot of useful information. JoeSmack Talk 14:20, 4 January 2007 (UTC)
Here's a link to a tool that shows the most popular articles. Another way of selecting a sample would be random pages, of course.
As for the second study, that's absolutely a violation of WP:POINT as proposed. I for one am not particularly interested in being blocked for a week (or whatever) for screwing around with Wikipedia. An acceptable alternative would be to IDENTIFY a sample of vandalism from a stream of edits (rather than, say, from an article's history of reverts), but not revert the vandalism . We'd need to keep the data offline in order to avoid anyone tampering with it, but its doable. John Broughton | Talk 22:43, 4 January 2007 (UTC)
The toolserver.org/ link is broken. travb (talk) 19:13, 21 December 2008 (UTC)

Proposed Study 1

Each member randomly choose a certain amount of web pages (as chosen by the random article link). Then go through the whole history of the article and categorize what vandalism occured, who was responsible for the vandalism and how long each vandalism remained. Remember 21:03, 4 January 2007 (UTC)

Any study should start small, and be iterative - that is, avoid doing a lot of work and then realize that it was done wrong. It's better to spend more time planning than a lot of time regretting.
So, first, we should restrict this to (say) the last three months of edits. Second, we should do a test run with a very small number of pages. Third, we should put together the table(s)/lists/whatever where we'll record the data before we actually start grabbing data. (For example, do we want to count non-vandalizing edits; and how do we want to categorize the type of vandalism?) Which means there probably should be a WikiProject Vandalism studies/Study1 subpage set up before anyone starts any type of counting. Assuming, of course, we have consensus on moving forward. John Broughton | Talk 22:43, 4 January 2007 (UTC)
I think this is a great idea. Move forward, by all means; you have my help. JoeSmack Talk 05:56, 6 January 2007 (UTC)

Discussion

First, we probably need to think about how to set up the data gathering. Here is a proposed table that others should feel free to edit. Remember 16:18, 6 January 2007 (UTC)

Page Date of edits examined Total Number of vandalisms Each vandalism with reference Comments
Wiki page Oct 2006-Dec 2006 15 ?? ??
...Lets choose what we are going to hit before we design the hammer. ;) Which articles are we going to monitor, how many, and how long? And what is our workable definition of vandalism here? Anything reverted? Anyone with a history that doesn't look like a good contrib? We need a solid, solid definition first and foremost. JoeSmack Talk 17:36, 6 January 2007 (UTC)

Proposal 2

Alright how about this. 1. Which articles are we going to monitor, how many, and how long?

We take a random sampling of articles as chosen by the random article link on wikipedia. We look at all the edits for one month of the year for 2004, 2005, 2006. That will help us see how vandalism has changed each year. Remember 17:48, 6 January 2007 (UTC)
I like this idea. For some reason I thought we were all going to put 100 articles on our talk pages and watch them for a month. Lets pick something like November of 2004, 2005, 2006 and look through the history for vandalism, and record it. How big should our sample size be, should we go until we reach 1,000? More? Less?

2. And what is our workable definition of vandalism here? Anything reverted? Anyone with a history that doesn't look like a good contrib?

I vote that we split up vandalism into several different categories similar to the IBM study. Obvious added vandalism (adding various curse words and other nonsense), Deleted information, and subtle vandalism (other additions that are intended to push a POV or otherwise harm the article).Remember 17:48, 6 January 2007 (UTC)

Further discussion

I'm down with the IBM study's analysis; the only questionable aspect I'd say approach with caution is "Idiosyncratic copy: adding text that is related to the topic of the page but which is clearly one-sided, not of general interest, or inflammatory". We want to have clear boundaries between Offensive (Max tucker is a dickbox!), phony (Max tucker is actually a computer programmer with too much time on his hands) and blatantly POV (Max Tucker has been demonstrated to be an inflexible human being throughout all walks of life). I think we should separate all these classes too, and have the blatantly POV (idiosyncratic); i believe this one will be the most, uh, subjectively defined class of vandalism. How does this sound?
We also should set up an examples page so we can get some of our definitions straight on some instances. This is where we can decide if we want to classify vandalism by one or multiple classes. For instance, example.com vandalism should be defined as Phony vandalism, but should it also be Deletion vandalism when it replaces a section/page? Similar for vandals who replace the 'criticisms' section of their favorite film director with the word 'shitcock'; is this Offensive vandalism or deletion? Or both? We need to decide if we're going to run a multiple class system on single instances or not, and then follow by having a subpage with examples of vandalisms that are classed for our sake and everyone else's who are looking in on the work. JoeSmack Talk 18:19, 6 January 2007 (UTC)
On another note, we should have other users go through and do a second classification for reliability. We don't want the results from the previous classification to be apparent because it'll taint the second assessment, but we can either work a second table out or maybe put data in black text over black background or something clever like that to help. If we can show our reliability is high, it'll really help the credibility of our study. JoeSmack Talk 18:27, 6 January 2007 (UTC)
And again, on another note: this won't include article creation vandalism. They usually get reverted right off the back. If the random article button clicks to an obvious db-, it should be db-ed and moved on, right? Also, in terms of linkspam: should this be counted as vandalism? Lots of links to places to get viagra etc are added and removed all the time, should that be counted? And if so, what about less encyclopedic links that are added like to youtube copyvios and inappropriate user blogs? JoeSmack Talk 18:56, 6 January 2007 (UTC)
I vote to not count linkspam. I have set up the first study page at Wikipedia:WikiProject Vandalism studies/Study1 I think we can work through these issues as we go through the study there. I am going to try to start setting up a table there. Remember 18:22, 11 January 2007 (UTC)

Proposal 3: The viability of allowing anyone to edit

Cost-benefit analysis: Try to weigh the benefits of allowing anyone to edit (readers fixing typos, new registered users that wouldn't have gotten interested if it weren't for anon editing etc) versus the costs (vandalism, and good, particularly expert contributors leaving due to said vandalism (remember two contributors aren't necessarily equal - you must evaluate the cost of the loss of an expert versus the gain of another contributor to TV show articles)).

Further discussion

Although this proposal probably seems quite daunting, if the study was properly performed, it could make a signficant difference to the outcome of the continuing viability of the anon' editing policy. --Seans Potato Business 02:24, 19 February 2007 (UTC)


Proposal for Study 2

It is common not to semi-protect Featured Articles when they're on the frontpage, with the idea that people new to Wikipedia, who visit a FA, will get a good idea of how Wikipedia is open for anyone to edit.

I propose we conduct a study to look at a number of factors with FA's at the day they've been on the frontpage:

  • how many edits, vandalist edits, sorts of vandalist edits, reverts and time to revert
  • how many edits have been made by new users (use "User contributions")
  • Compare FA's by day of the week.

JackSparrow Ninja 11:43, 23 February 2007 (UTC)

This is what I started to do last December although I had not seen this edit at the time, but it became too much of a chore, each article required about 8 hours of work, plus I became very disenchanged with the reactions to vandalism. I captured edits to a FA during its time as FA. I captured the Date and Time of the edit; the editor; the nature of the edit -- beneficial, neutral (such as copyedits) or harmful; and, if the edit was harmful, the consequences of the edit -- date and time of reversion, reverting editor and personal consequence to the "vandal".

The disenchantment mainly came from the last, the personal consequence to the "vandal". On almost every article article that I studied there were multiple Final Warnings (This is your final warning. If you continue to make destructive edits . . . you will be blocked from editing.) issued to the same editor for successive edits but the editor was never blocked.

Another source of disenchantment were the edits that were constructive in that they provided additional information, sometimes with references, that, while pertinent, were contrary to some editor's POV and were reverted by an editor as being NPOV. Somewhere around the third or fourth time I encountered this I realized that assumed ownership of articles wa possibly a larger problem that vandalism. By that time I knw that this page was here and had read part of the entries so I knew that vandalism was recognized as a problem. I have not been able to find any group that is concerned about editors who adopt articles or groups of articles and act as gatekeepers to edit on the grounds of misapplication of WP Guidelines.

So I will continue to use WP as a resource since there are a lot of very good articles on WP on subjects of serious academic interest, well sourced and either unbiased or with a very obvious bias, and do a little copy editing here and there I find need for it. I will simply ignore that articles where I find an editor who reverts edits that are not personally acceptable. I read the Talk page of any article that seems interesting. An editor who has assumed ownership of an article is easily discovered on the Talk page.

Good luck with your efforts.

JimCubb (talk) 22:59, 20 April 2008 (UTC)

Categorizing vandalism

I've set up a page at Wikipedia:WikiProject Vandalism studies/Types of vandalism. If you have proposed changes, I suggest you just edit over what is there, rather than doing a threaded discussion, and post your comments here. John Broughton | Talk 19:33, 7 January 2007 (UTC)

Notifying the community

Any ideas how to go about letting people know that this project is going on? I have a feeling that there are other interested people out there, but that our project may not be the easiest to find. Remember 18:36, 7 January 2007 (UTC)

I've got some links, but not time to follow-up on them: Wikipedia:WikiProject (for general info, I think); Wikipedia:WikiProject Council/Guide (best practices), and Wikipedia:WikiProject Council/Directory. If someone else would take a look, that would be great. John Broughton | Talk 18:50, 7 January 2007 (UTC)
Any talk pages where you find an active discussion of the problem of vandalism and the counter vandalism unit. --Seans Potato Business 02:25, 19 February 2007 (UTC)

Bayesian spam filterting

Hey folks, it just crossed my mind that we already have a solution to vandalism. Take how spam was dealt with in the world of email: bayesian filtering. Theoretically speaking, we should be able to apply the same thing to vandalism. Even if we don't have access to the text people are putting into articles, we have enough input to make reasonable assertions (anonymous IP, number of lines changed, edit summary, article edited, etc). Over the next month I'm going to try to implement this into WikiGuard. I'll let you know how it goes.  :) --Brad Beattie (talk) 18:48, 7 January 2007 (UTC)

Bayssain and other methods for determining what is and isn't vandalism could be very useful as front-end tools for editors who are fighting spam, since they presumably reduce scanning work by human beings. I don't, however, think that they are particularly good at coming up with robust, defensible numbers on who is doing vandalism and what type of vandalism they're doing. So while I encourage your efforts, and don't think this WikiProject should wait to see what happens with them. John Broughton | Talk 18:56, 7 January 2007 (UTC)

How to conduct a study - an example

Here's a modest but useful example of what I think a userful study looks like. The following comes from here.

  • Study goal: To evaluate if it is true that positive contributions from anonymous users far outweigh vandalism.
  • Data sampling approach: an informal tally of anonymous contributions i come across in my daily vandal-cleaning efforts from june 12 to august 15 2006
  • Results:
    • vandalism - 116
    • non-vandalism - 505

One could critize the study for its non-random approach, for not clearly defining how vandalism was determined, for not evalating whether non-vandal edits were significantly useful or not, but still, the evaluation did provide useful information - it appears that anonymous IP edits are constructive far more often than not. And that in turn is actionable - that since the benefits to allowing anonymous IP addresses to edit Wikipedia articles seem to outweigh the costs, the current approach should continue. John Broughton | Talk 20:00, 7 January 2007 (UTC)

I think this is the general case, but the study should be more nuanced. Some articles are very much vandalized. So much that the edits of the vandal and the reverts takes up more than half of the history. I've been working on The Simpsons for a long time and it's a daily strugle to keep it vandal free. --Maitch 14:45, 11 January 2007 (UTC)

Started first study

Go to WikiProject Vandalism studies/Study1 to check it out and help make it better. Remember 22:40, 11 January 2007 (UTC)

I moved it to Wikipedia:WikiProject Vandalism studies/Study1; it was previously in article space. Trebor 22:54, 11 January 2007 (UTC)

Need help

Anybody that wants to help get 100 data points for our first study Wikipedia:WikiProject Vandalism studies/Study1 please let me know because we need all the help we can get. Below is a copy of the current results we have so far, which I think are interesting. Remember 17:30, 18 January 2007 (UTC)

Current cumulative tally

Total edits 2004, 2005, 2006 = 100
Total vandalism edits 2004, 2005, 2006 = 5
Percentage of vandalism to total edits = (5/100)= 5%

November 2004

Total edits in November 2004 = 15
Total vandalism edits in 2004 = 2
Percentage of vandalism to total edits = (2/15) = 13.33%

November 2005

Total edits in November 2005 = 45
Total vandalism edits in 2005 = 2
Percentage of vandalism to total edits = (2/45) = 4.444%

November 2006

Total edits in November 2006 = 40
Total vandalism edits in 2006 = 1
Percentage of vandalism to total edits = (1/40) = 2.5%

Percentage of overall vandalism that was

Obvious vandalism = (4/5) = 80%
Inaccurate vandalism = (0/5) = 0%
POV vandalism = (0/5) = 0%
Deletion vandalism = (0/5) = 0%
Linkspam = (1/5) = 20%

Percentage of overall vandalism that was done by

Anonymous editors = (4/5) = 80%
Editors with accounts = (1/5) = 20%
Bots = (0/5) = 0%

Reverting

Average time before reverting = (7991+14+6816+18+2561)/5= 3480 minutes
Percentage of reverting done by
Anonymous editors = (0/5) = 0 %
Editors with accounts = (5/5) = 100%
Bots = (0/2) = 0 %

Still need help

I still need people to help with the first study. We are now up to 40 points, but I would like to get northwards of 100. Here are the current results based on the first 40 points. Remember 17:12, 28 January 2007 (UTC)

Current cumulative tally

Total edits 2004, 2005, 2006 = 150
Total vandalism edits 2004, 2005, 2006 = 8
Percentage of vandalism to total edits = (8/150)= 5.3%

November 2004

Total edits in November 2004 = 22
Total vandalism edits in 2004 = 2
Percentage of vandalism to total edits = (2/22) = 9.09%

November 2005

Total edits in November 2005 = 59
Total vandalism edits in 2005 = 5
Percentage of vandalism to total edits = (5/59) = 8.47%

November 2006

Total edits in November 2006 = 69
Total vandalism edits in 2006 = 1
Percentage of vandalism to total edits = (1/69) = 1.44%

Percentage of overall vandalism that was

Obvious vandalism = (7/8) = 87.5%
Inaccurate vandalism = (0/8) = 0%
POV vandalism = (0/8) = 0%
Deletion vandalism = (0/8) = 0%
Linkspam = (1/8) = 12.5%

Percentage of overall vandalism that was done by

Anonymous editors = (7/8) = 87.5%
Editors with accounts = (1/5) = 12.5%
Bots = (0/5) = 0%

Reverting

Average time before reverting = (7991+14+6816+18+2561+4+11+11)/8 = 2,178.25 minutes
Percentage of reverting done by
Anonymous editors = (0/8) = 0 %
Editors with accounts = (8/8) = 100%
Bots = (0/8) = 0 %

This widely watchlisted article has been running unprotected since 31 January. move=sysop protection was added on February 5. See the article's talk page for discussion about allowing IP edits. Would some volunteers here be willing to turn their analytical skills to this article? It is likely to be linked from the main page "In the news" box during the weekend of 10 February when Obama is expected to announce his plans for the 2008 presidential election. --HailFire 18:48, 8 February 2007 (UTC)

What sort of study were you thinking about? Remember 20:02, 8 February 2007 (UTC)
Something a lot like this. Such a study could inform the continuing talk page discussion mentioned here and help the article's editors to build greater consensus on IP edits. More broadly, the analysis may provide useful guidance on the relative merits of protection/unprotection for other high visibility political articles visited with frequent vandalism. --HailFire 21:08, 8 February 2007 (UTC)

Still thinking such a study might provide useful insights for managing vandalism on this and other closely watched articles with broad readership. The article is currently in unprotected status. --HailFire 11:02, 6 March 2007 (UTC)

I ran across this project, and this looks like a neat mini-project that I can take up. I'll create it at User:BuddingJournalist/ObamaAnalysis. BuddingJournalist 08:20, 9 March 2007 (UTC)
Feel free to set it up as a study under the wikiproject vandalism study section (e.g. Wikipedia:WikiProject Vandalism studies/Obama article study) so the whole group can help out. Remember 14:57, 9 March 2007 (UTC)
Indeed, this looks really interesting. JoeSmack Talk 15:11, 9 March 2007 (UTC)
Move completed! Feel free to contribute! BuddingJournalist 01:35, 10 March 2007 (UTC)

Lots of new data for the unprotected period between 12 and 17 March, for anyone who wants to give it another look. Would also be interesting to track time of day and geolocation data, perhaps putting this in a graphic. Some examples here. --HailFire 15:08, 19 March 2007 (UTC)

A study of my user page

I carried out a vandalism study on my own user page and found that 47% of the vandalism was made by registered users. Angela. 22:13, 29 March 2007 (UTC)

user box

I created a user box for this project. Let me know what you think. Remember 20:34, 30 March 2007 (UTC)

WVSThis user is interested in
studying vandalism.






WikiProject Wikidemia

I found this project myself through a recommendation from the FA protection discussion, I hadn't realized it existed before then, probably due to lack of searching on my own part. I'd like to know how this project should related to the more general Wikipedia:WikiProject Wikidemia, which I've followed for a few months, but haven't had much involvement in. This project is a more general research effort (though fairly inactive at present) and seems a logical parent project for this one. I think all efforts related to vandalism research should be redirected here in the case of any repetition and to keep all of those involved together. It should also make this project easier to find.

My other question is whether there are any other related research pages or projects like this I may have missed. Does anyone know of any? Richard001 03:10, 11 April 2007 (UTC)

Please see WP:RW. And by all means add this project to it.-- Piotr Konieczny aka Prokonsul Piotrus | talk  07:25, 24 April 2007 (UTC)

Wikiversity

Would this project perhaps fall more under the scope of Wikiversity than Wikipedia? --Remi0o 06:37, 16 April 2007 (UTC)

Not really, it's more an internal thing, and not particularly relevant to Wikiversity. Richard001 22:57, 16 April 2007 (UTC)

recent vdl study done by User:Colonel_Chaos

i just saw User:Colonel_Chaos/study over at the village pump. haven't had the time to read it, but i thought i'd let the project here know about it. it looks like he vandalized things himself, very WP:POINT, but here it is. JoeSmack Talk 23:50, 1 May 2007 (UTC)

Interesting. That raises a semi-ethical question about our research here - are we allowed to partake in vandalism to further our understanding of it? If we cannot do so, it does place some restrictions on our research. For example we have to rely on observation rather than true experimentation. If we were to revert all of the vandalism done would that be acceptable? It's problematic to have a group of Wikipedians going around vandalizing things, but it also helps with research, so it's an open question of whether or not it's acceptable. Richard001 00:05, 2 May 2007 (UTC)
I don't think that going against WP:POINT is a good idea; at least he knows he did. I don't think that this project wants to walk down that mine field to be honest. His sample size was pretty small just like ours, so his average of 10 hours to revert probably isn't the strongest result. It does raise the open question: How long does vandalism typically remain visible? Right now the project here has tended to lean more towards a per-edit-incidence rate, but this question is still important none the less and we should keep it in the back of our heads. JoeSmack Talk 18:33, 2 May 2007 (UTC)
I agree with Joe at this point. I think we would just garner animosity towards this project if we condoned this behavior. But I do think we should add his study to the list of studies that have been conducted on wikipedia. Remember 18:44, 2 May 2007 (UTC)
P.S. He used a registered user to vandalize, which means these results are the reversion time of vandalism by a registered user and not users in general. JoeSmack Talk 18:36, 2 May 2007 (UTC)

Some general comments on my study. First of all, I admit that the sample size was rather small, but would you really want me to expand it? As for the strength of my result, I'm not sure that 10 hours is really here or there, but I think that my study clearly demonstrates that it takes a very long time to revert vandalism. I was dealing with Featured Articles here for crying out loud, not stubs. I'd wager that a similar study with stubs would generate an average revert time of never. You may not like my methods, but my conclusions warrant consideration. Colonel Chaos 21:49, 2 May 2007 (UTC)

Keep in mind that measuring is easy, but knowing what it is you are measuring is the hard part. Maybe we're not measuring vandal reverting in FAs but vandalism in a small group of FAs that don't get reverted quickly. That'd change the whole set of results on their head. Thats why we want big sample sizes to help reduce that possibility. The limit of vandalizing yourself is that you run into ethical issues like 'uh, should i really be expanding the sample size'?
Anyways, we're not here to pick a fight but to study vandalism, and while we might mince methods your study is interesting and it does raise some interesting questions to consider for upcoming studies and their aims. How to do you feel about where we're going with study 2 or the Obama study? JoeSmack Talk 22:26, 2 May 2007 (UTC)
Did you keep track of how many vandalism warnings, if any, were left on talk pages when your vandalism was reverted? Perhaps you can add that to the study. Also your picking the FA article to vandalize, and how, introduced bias into what was otherwise a very good concept for a study.--Chrisbak 04:16, 3 May 2007 (UTC)
I noticed that in each of the usernames you created, you add some text to your user and user talk page so that it would not be red and thus eliminate any quick suspicion that you're a "newbie". Just thought I'd point that out to those who didn't catch that. I'd imagine if you didn't do that, those times would be a lot less. Edit: Also, I think all of your edits were marked minor. Just trying to find all the variable here. :) Pizzachicken 05:40, 4 May 2007 (UTC)

Study of IP Vandalism

Hey guys, just found this page recently. A while back I did a survey at User:Cool3/Analysis, on the percentage of vandalism done by anonymous editors. My methodology may not have been perfect, but it does have the advantage of a very large sample size. Hope you don't mind that I listed it under previous studies. Cool3 15:01, 5 May 2007 (UTC)

Thanks for the link. I will add it to our list of individually done studies. Remember 21:00, 6 May 2007 (UTC)
whoops, you've already added it. Remember 21:01, 6 May 2007 (UTC)

Study of schools and universities

I'm interesting in getting some statistics on the contributions of shared IP addresses, especially schools and universities. From my experience the contribution of these addresses are almost universally puerile vandalism and nonsense. I'm interested in seeing what the breakdown of the contributions actually are, as well as comparing the levels of vandalism coming from the two institutions (one would hope there would be slightly less nonsense on average from universities, but I wouldn't be that surprized if it was the other way around either...)

This should help provide some emperical basis for discussion of shared IP addresses, and catalyze discussion of policies for interacting with these institutions. Frankly, I believe a shared IP address should be banned if it does not, on average, improve the quality of Wikipedia articles, regardless of whether there are good faith edits there as well. Students who vandalize behind the shield of a shared address cannot be held responsible for their actions as they are totally anonymous. This situation just encourages them to show off and damage articles. I believe a policy forcing users to create their own account would allow students to be held responsible for their edits and drastically reduce vandalism. Just today I have seen frequent vandalism from schools on several of the articles I watch and I feel powerless to stop them - I'm not even going to do anything about it, because frankly I can't do anything about it. The constant but slow flow of vandalism is not enough to warrant a block most of the time, which leaves me little option but to revert the edit and feel a little embarrassed for Wikipedia and a little sorry for those 10% or so of readers who come away from predation knowing only that 'james is gay'. I'm not sure I will have time to conduct a study myself for a while at least, but I'd at least like to propose it so we can discuss the subject and if any editor or editors would like to undertake it themselves or work out the details we can make a start.

For selection we could use Template:SharedIPEDU. Looking through a few entries I can't find any order to the list at all. It may be 'random' enough as it is to select from. Richard001 00:31, 25 May 2007 (UTC)

Vandalisms per article

Will the WVS be doing any vandalisms per article studies that track that number over time in the near future? Specifically, to see whether major events affects this number (e.g. natural disasters, terrorist attacks, movie releases, etc). --The Dark Side 02:38, 27 May 2007 (UTC)

Not that I know of. The all active projects are shown at the top of the page. I take it you mean vandalism per unit time in relation to an external event relevant to the article? Richard001 03:01, 27 May 2007 (UTC)

I recommend proofreading the write-up of the study

I had to make some changes, because there were some glaring errors. Without the actual raw data I can't check the rest of it. I don't know where the raw data is. I do find it amusing that the modal time taken for vandalism to be reverted is instantly. 217.43.138.193 22:58, 12 June 2007 (UTC)

What exactly do you mean by "Because articles were randomly sampled and not edits, a ratio estimate must be used to calculate the percentage of edits that are vandalism."? I don't see any ratios in the write-up, nor evidence of their use. 217.43.138.193 23:07, 12 June 2007 (UTC)

Vandalism content list

For automated vandalism detection similar to Lupin's Anti-Vandal Tool it would be very important to have a list of the content added by bored vandals. To collect this information in an unbiased way (i.e. without the existing word lists) I thought that the people who conduct the ongoing vandalism studies might also copy the detected changes, either on a page here or into a file for download somewhere else. Cacycle 02:06, 20 June 2007 (UTC)

suggestions

I would like to suggest some other Research questions:

What is the amount of people using IP that makes constructive edits versus vandalism edits?

What is the amount of people having username that makes constructive edits versus vandalism edits?

In the case IPs users are responsible for the majority of vandalism edits , is it best to have only people registered with usernames to be able to edit wikipedia?

Z E U S 04:29, 6 July 2007 (UTC)

Just place your questions on the list itself, there's no formal proceedure for doing so or anything. Just make sure they aren't already on there, of course. Richard001 05:00, 6 July 2007 (UTC)

Help needed?

Hi, I'd like to offer my services and experience to this study, if required. I've spent almost a year creating and working on the uw- series of warnings, but have begun to see these as 'first aid' so to speak, instead of looking for a cure to the problem. I, and others, created the uw- system with an idea of how these warning were to be implemented, assuming good faith for new talkpages, minimum of two warnings etc, but more and more often I see at WP:AIV editors looking for blocks having jumped straight to a 4th level warning. I've already offered at WT:AIV to write an essay on how warnings should be issued and including some case study style examples, which was met with l;uke warm response. So if I can help out or sign up please just let me know. Regards Khukri 09:14, 19 August 2007 (UTC)

Right now it seems like the whole project is in the doldrums. No one has time to be director of the second study so if you are interested please take over. Remember 11:52, 19 August 2007 (UTC)
Any information on the effectiveness of warnings is welcome. Studying the response of vandals who are warned/not warned would be interesting. Recently a user suggested that warnings were a complete waste of time, and it would be more productive just to keep patrolling and reverting. It would also be interesting to see how warnings are used - how many instances of vandalism result in a warning, and which templates are people using? Which are the best to use in what circumstances?
I use uw-bv a lot myself, since it gives the flexibility to block quickly. I find going through the 1-2-3-4-block cycle a waste of time myself, and only use uw-1 when it could be a mistake or good faith edit, and only use 2 for intermediate cases. I only jump to uw-only if it's particularly bad, but I also get annoyed when people use uw-1 when someone plasters obscenities all over the page, and often adjust the warning to something a lot more stern. I warn most of the time.
The project has become fairly inactive, and most of the people who signed up haven't really done anything, but it can easily be kicked into action by anyone that wants to start something. Richard001 06:34, 20 August 2007 (UTC)

New essay section

I've added a new section to the essay Wikipedia:The motivation of a vandal. -- The Anome 11:35, 21 September 2007 (UTC)

Study Idea (School IPs)

I think a study should be done to determine if anonymous edits from School IPs have a net benefit to Wikipedia or not. It seems like alot of vandalism comes from school IPs and that we might be able to stop it by requiring all school IPs to sign up an account to edit. However, data would be needed before such a policy could be proposed. Life, Liberty, Property 12:17, 16 October 2007 (UTC)

See above. I've had a brief look at the edits of my own university, and it's quite difficult to classify them into good, bad and evil, but it would definitely be worth looking into. Richard001 05:10, 17 October 2007 (UTC)

A thought

I know the proper procedure is to send a warning to vandals after they vandalize a page to thier user page for any vandalism event but has anyone done a study on vandals if they do not get warnings sent to their talk pages? What i mean is; is sending countless warnings just feeding them attention? causing them to repeat the vandilism anyway?? just a thought.... Ottawa4ever (talk) 03:56, 11 December 2007 (UTC)

Just on a whim, because my interest was piqued by the discussion on Protecting the Feature Article, I have been collecting data for the FAs in this month. I have only collected the time that an edit was made, if the edit was constructive or destructive (Most destructive edits are obviously so. Where it is not obvious, I lean towards AGF even when the edit is incorrect.), the user who made a destructive edit and the time that a destructive edit was corrected. I am less than 20% through the month (2:16 a.m. on 7 December) and have no expectation of collecting any day's article in real time.
What I think I will do is to start over at the middle of the month and try to collect as many of the types of information that I can. Besides capturing the users who made any changes, including bots, I will capture some of the editing history of those who make destructive edits and the consequences to the user for making such edits.
I have thought about this change in approach for a couple of days. It made much more sense to me after a bot reverted a slightly vandalized version to a very seriously vandalized version and, on a different article, two attempts to make an improvement that were both reverted.
I will let you know if I come up with anything interesting, assuming that at the end I am still functioning at a normal level. Of course one could ask legitimately into the normal level of one who intends to examine twenty-four hours of edits on sixteen articles.
JimCubb (talk) 19:39, 25 December 2007 (UTC)
What quantities are you studying. Would this be of help? Voice-of-All 21:16, 25 December 2007 (UTC)
I WILL get back to you on that. I need some time to evaluate it.
JimCubb (talk) 06:32, 28 December 2007 (UTC)

=="Compare and contrast"--

Have other wikis been contacted about vandalism (and other similar issues - including "confused newbies" and "fingers in a twist") and how such matters are handled: though there will be different profiles of activity for each. Might as well get some consistency/avoid reinventing the wheel where possible. Jackiespeel (talk) 19:23, 10 January 2008 (UTC)

Impact Statistics

I think more analysis should be done on the impact of anon vandalism. Are statistics readily available per article for "the number of editors who have a particular popular article on their watch list" and the "rate at which people are reading this article"?

Since all editors who have an article on their watch list will be reading the vandalism, a measure of the impact on editors can be quantified per vandalism instance as:

(# users with this article on their watch list) * (time to read an instance of vandalism) +
(time to revert the change + time to post a notice on vandal's discussion page).

The impact on Wikipedia can be further quantified per vandalism instance as the number of lost quality edits as:

(time lost among all editors) / (time quality editors require to make a single edit).

Since vandals chose popular articles to vandalise the impact on editors should be very large since a large number of editors will have a popular page on their watch list.

The impact on readers can be quantified by:

(average read rate on article) * (average length of time an instance of vandalism goes uncorrected).
(average time and not median time should be used in this calculation)

BradMajors (talk) 15:22, 30 January 2008 (UTC)

I think you are on to something here. I think the random sampling studies done every November grossly underestimate the vandalism that the average visiter to wikipedia is likely to see. Why? Because the average user (and average vandal) are both likely to gravitate towards certain articles - namely, the top 100 most viewed article. I propose that it would be more useful to gather a months worth of statistics from a random sampling of the top 100 (or even top 500) articles, not a random sampling across the entire universe of articles, which includes topics so obscure that vandals themselves don't even know they exist.—Mrand T-C 21:05, 31 January 2008 (UTC)
Yes, vandals choose articles where their work will be read by the most number of people. I would first propose that data is gathered and analysed for one article (although not scientific) until the methodology is worked out. But, is the data for the read rate of an particular article even available? If it is available I don't know where to find it.BradMajors (talk) 22:35, 31 January 2008 (UTC)
More less a rant here... I see your point to say that vandals target popular pages which is true but I feel it necessary to remind people reading this that the quality of an encyclopedia is judged based on the accuracy of the articles. Wikipedias credibility is subject to its accuracy in any general article. And it is important to know that an scientific article one that isnt viewed often can often be targeted for vandalism and not easily corrected as the knowledge base is limited to noticing the true vandalism. There are people out there that attack pages such as politicians, sports teams, cities etc.... And it is naive to think the most vandalism occurs on the most viewed pages, perhaps only easily correctable ones do...the serious damage is done where people arent looking everyday, and thats where wikipedias credibility gets hurt the most. Just a thought to keep in mind when discussing where vandals choose to strike and the size of a sample study, I think a universal sampling technique is still perferred over a 'top 100'. Ottawa4ever (talk) 22:44, 13 February 2008 (UTC)
Encyclopaedias are NOT judged based upon the accuracy of the articles. They are judged based upon how often users see accurate versus inaccurate articles.(which is a big difference). If vandalism occurs and is left uncorrected in a little read article for say 12 hours, but only one reader sees the error that is not as bad as if vandalism is left uncorrected for one minute in a popular article which 100 readers read. Until we can measure how frequently users see vandalism, I don't think it is possible to come up with meaningful statistics. BradMajors (talk) 07:39, 14 February 2008 (UTC)

Im still compelled to re emphasize my point, In academia wikipedia is not a credible source, in fact it is common practice to fail a paper which cites wikiepdia. The accuracy of the articles is what wikipedia is judged upon and will be judged. I agree that its next to impossible to get an accurate information on how often vandalism is seen, but its important to understand that vandalism is often I should say 'un noticed' to those who are unfamilliar with the concepts of the topic, and this creates a more serious branch of vandalism that wikipedia is vulnerable too. And often this is left alone in articles that dont recieve enough 'hits'. To do any proper study on how many times an avergae user sees vandalism you still need to consider articles not in the top 100 or so(which will still be viewed, and in my opinion are laced with mis information designed to mislead a reader) just as much as frequently visted ones. Maybe this is a bit out of place in this discussion, but I think people should be aware of this when talking about just targeting frequently viewed pages. But still your point is valid that more poeple will see the vandilism in a larger article, and will likley reconginize it, but we need to be aware that some vandlisms go unnoticed for some time. If you want to see an example of this i recently fixed the jacques plante article (hockey) a month ago to include his career statistics which had been deleted 2 years before the fix and few even noticed that yet in two years that page would have recieved a numerous amount of hits, We need to be aware of this issue too, are people aware that they are even reading vandalism and being mislead and take it as a fact which is serious too. I just think its important to keep this on the back of the mind when deciding how to build up statistics, but not entirely directed at this idea as this is to see how frequently the user sees vandalism. Ottawa4ever (talk) 15:27, 14 February 2008 (UTC)

If a tree falls in a forest and there is no one to hear it does it make a noise? BradMajors (talk) 18:36, 14 February 2008 (UTC)
Yet its absence can be seen...... All articles in wikipedia are just as important as one which registers in a 'top 100' list. otherwise why have an encyclopedia in the first place, why not a top 100 list? Ottawa4ever (talk) 19:07, 14 February 2008 (UTC)
They are not as important. We have them for comprehensiveness, but there is a scale of importance from those that are vital topics and often viewed to those that are less important and seldom viewed. As Brad says, it is how often vandalism is seen, not how long it remains that is important. Studying the most viewed articles would be a good start, though we also need to include less viewed articles as well. It is just that some articles are more important than others. Studying vandalism in less viewed articles would also be a good research topic, and I think having at least one person watching every article is a goal we should strive for (though I've made absolutely no progress in convincing people of this).
Imagine we based our judgment of Wikipedia as a whole on the average article rating. Most articles would be start or stub class, but what if the majority of the important and most viewed articles were FA class? We would be wrong to conclude that Wikipedia was no good just because most of its articles we undeveloped and unreferenced. Richard001 (talk) 21:16, 14 February 2008 (UTC)
It may be useful to split the topics of statistics gathering and statistics analysis. We can discuss all the different useful statistics which can be gathered and then with these various statistics we can come to conclusions. Both of the above types of statistics should be gathered the difference is what conclusion we would draw from them. We currently don't know if a particular instance of vandalism is seen on average by one or one thousand people. BradMajors (talk) 23:30, 14 February 2008 (UTC)

Classification

I have made an attempt at classifying the items on the article page. BradMajors (talk) 21:24, 5 February 2008 (UTC)

Article Read Rate

There does not currently seem to be any way to obtain an article's read rate. There is an easy way to get the data by adding already existing third party links and services. Would there be support for trying to get permission from Wikipedia to temporarily use a third party service to get some statistics? Or is there any other way? BradMajors (talk) 23:03, 18 February 2008 (UTC)

It's definitely highly relevant in any case. Without this variable we would have to use some other estimate, like editing frequency, but some articles will draw more edits than others. Richard001 (talk) 04:22, 19 February 2008 (UTC)
These statistics can be obtained from a third party tool here: read rates BradMajors (talk) 11:16, 29 February 2008 (UTC)
A ballpark estimate for the Obama article is that each instance of IP vandalism was seen by 70 readers. The raw data this application is using is available here BradMajors (talk) 11:30, 29 February 2008 (UTC)

Vandalism count

I once saw a webpage that kept count of the number of vandalism acts on Wikipedia. Does anyone know of this webpage? ~QuasiAbstract (talk/contrib) 12:18, 2 April 2008 (UTC)

Essay of potential interest

Salutations. I've written an essay, Wikipedia:Vandalism does not matter, that may be of interest to vandalism students. Your feedback and comments would be most welcome on the essay talkpage. Sincerely, Skomorokh 00:14, 29 July 2008 (UTC)

Hi!

Hi guys, I'm new. Can someone please give me some direction here? I know what this Wikiproject is for, but I don't know where to start helping out. Thanks. Leujohn (talk) 09:26, 26 November 2008 (UTC)

This wikiproject seems to be slowly dying. The best thing you could do if you want to get it revived again is to take charge of the whole project and try to gin up interest to putting together another study that is more scientifically accurate than the past studies. Remember (talk) 14:00, 26 November 2008 (UTC)

Query on this project's necessity

I'm not sure this project is really necessary. Wikipedia has several years of experience of vandalism of its articles; throughout that time, we've developed a very thorough understanding of why vandals disrupt and how we are able to combat vandalism. That understanding has been developed with a complete absence of a formal task force to do so, and I do not think we're in a different situation just now.

Is continuing this project beneficial? Would its members' time not be better spent elsewhere (eg. reverting vandal edits)?

Respectfully, AGK 15:14, 13 December 2008 (UTC)

FYI, based on a conversation on Jimmy Wales's talk page:

Your feedback is appreciated. rootology (C)(T) 19:27, 19 December 2008 (UTC)

Barnstar Discussion

There is a discussion going on here regarding a proposed change to the name of the "RickK Anti-Vandalism Barnstar." As vandalism fighters (at least, I assume you fight it in addition to studying it), I thought some of you might be interested in commenting. Nutiketaiel (talk) 12:12, 28 April 2009 (UTC)

Weirdo vandals

Why is it that so many vandals revert themselves immediately afterwards? -- OlEnglish (Talk) 09:49, 20 May 2009 (UTC)

I think they experiment. They just want to see if it is possible to edit. Mange01 (talk) 16:18, 20 May 2009 (UTC)

Do such people count as vandals rather than "sandboxers on the loose"? As with "dreadful pun"-creators and similar, they can probably be ignored.

To what extent is there a correlation between "rate of general activity to an article", "level of typos/conflated editings and other accidental features", and "level of vandalism to that article"?

Flagged protection and patrolled revisions

As students of vandalism, I hope you have some opinions to offer at Wikipedia:Village pump (proposals)/Archive 55#Hypothesis testing for Flagged protection and patrolled revisions Josh Parris 04:43, 3 December 2009 (UTC)

Anti-vandalism bot census

Hi. You can read more about this here: Wikipedia_talk:Bots#Anti-vandalism_bot_census. All the help is welcome. emijrp (talk) 22:28, 29 November 2010 (UTC)

Merger proposal

I propose that we move it to the Counter-Vandalism Unit, since it's pretty much in the scope of vandalism, as for the CVU. This project is declared semi-active so it should be done what it means. ~~Ebe123~~ (+) talk
Contribs
10:26, 27 October 2011 (UTC)

Good idea to merge this in to the CVU. I'll detail more of my reasoning as we move forward with the CVU project revival planning/discussion. AndrewN talk 00:49, 30 October 2011 (UTC)
Support I think this would be a good move for both projects. Just make Vandalism Studies a task force of CVU. ~ Matthewrbowker Say hi! 03:03, 31 October 2011 (UTC)
Support Vandalism studies would benefit greatly as part of the CVU. Frankly, although I'd known about vandalism studies before 5 minutes ago, I hadn't known that there was a project devoted to it. Additional publicity and activity would come, and allow for the two projects to work in conjunction. Marechal Ney (talk) 03:34, 7 November 2011 (UTC)
Support This project and the CVU would greatly benefit each other, and the merge would only strengthen both projects. Tarheel95 (talk) 14:18, 16 November 2011 (UTC)
Support Can be useful. Katarighe (talk) 01:40, 2 December 2011 (UTC)
Support There are advantages to both sides. Oddbodz (talk) 20:34, 6 December 2011 (UTC)
Strong Support This would strengthen both projects. Ramaksoud2000 (talk) 19:11, 23 December 2011 (UTC)
Support. I created this project and I think integrating it into CVU makes sense. Remember (talk) 02:03, 17 December 2011 (UTC)

Help desk question

Someone asked this question and didn't get an answer.Vchimpanzee · talk · contributions · 19:06, 11 January 2012 (UTC)

Merging with CVU

So, how exactly do folks want to go about making this merger? The simplest way would be to move all the Vandalism Studies pages to Counter-Vandalism Unit/Vandalism Studies and then update the CVU homepage with information from your homepage. If there's no opposition, I'm fine making those changes. Achowat (talk) 19:37, 11 April 2012 (UTC)

Merger mostly done; all pages with prefix "Wikipedia:WikiProject Vandalism Studies" have been moved to "Wikipedia:Counter-Vandalism Unit/Vandalism Studies". And VandStudies is now a Division of CVU. I have not added anything on the CVU Main Page, since I'm lobbying for a major reform of the CVU Main Page (see WP:Counter-Vandalism Unit/Sandbox and the discussion of that new structuring). Ideally, if the Main Page proposal goes through, we should look towards creating a unified identity across all 4 major working Divisions. Achowat (talk) 18:25, 16 April 2012 (UTC)