Wikipedia:Bots/Noticeboard/Archive 3

From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5 Archive 10

HBC Archive Indexerbot bot flag

HBC Archive Indexerbot (HBCAI) (

) automatically generates indexes of archived discussions. For an example, see Wikipedia talk:What Wikipedia is not/Archive index. The bot's user account is presently not flagged as a bot account. In the approval decision, Mets501 remarked that the bot did not need a flag. However, I'm thinking that should be reexamined. The bot is now generating quite a few indexes (over 100), and that is likely to grow. I first mentioned this to the bot's operator, Krellis, and s/he suggested I raise the question here. So here I am.  :) Thanks for listening! —DragonHawk (talk|hist) 21:29, 31 December 2007 (UTC)

For what it's worth, I don't mind either way, as the current operator/maintainer of the bot - given the number of indexes it's generating, and the fact that it is increasing (slowly but surely) all the time, I'd say it probably does make sense to have a flag, to avoid cluttering recent changes. —Krellis (Talk) 23:47, 31 December 2007 (UTC)
Im guessing the only reason it was not flagged was because it had a low number of edits and so it wasn't really deemed necessary. Saying that, at the moment we seem to flag anything that is a bot, no matter how low the edit count, so yes it should be flagged. -- maelgwn - talk 01:27, 1 January 2008 (UTC)
Flagged. WjBscribe 17:49, 10 January 2008 (UTC)
Thanks, WJBscribe! —DragonHawk (talk|hist) 19:31, 10 January 2008 (UTC)
Thanks! —Krellis (Talk) 21:05, 10 January 2008 (UTC)

BetacommandBot using wrong tags

What can be done about BetacommandBot using the wrong tag when tagging image files that do not have the required FUR? He is continuing to use the following tag on pages that have no rationale at all. It is confusing to other users.

{{di-disputed fair use rationale|concern=invalid rationale per [[WP:NFCC#10c]] The name of each article in which fair use is claimed for the item, and a separate fair-use rationale for each use of the item, as explained at [[Wikipedia:Non-free use rationale guideline]]. The rationale is presented in clear, plain language, and is relevant to each use.|date=January 2 2008}}

And using the following edit summary: tagging as invalid rationale per WP:NONFREE

See: http://en.wikipedia.org/w/index.php?title=Image:Parque.JPG&diff=prev&oldid=181652274

This is just not the best way to deal with the problem of missing FUR's

Is an edit rate of over 700/minute considered acceptable for a bot? Most of his requests do not indicate any requested edit rate and one I did find when someone asked about it was:

"as all my previous tasks and for the edit limit it will be 10-15, per minute same as with all my tasks. I thought this was known all ready as this is only my 8th bot request that you guys have handled. Betacommand (talk • contribs • Bot) 03:24, 5 May 2007 (UTC)

Though I did find some that were marked Max delay =5 Maxlag=5

And of course, it is nearly impossible to tell anything from his bot user page as it is only a link of a bunch of request for approval pages that do not clearly match up with the current edits he has been doing. He seems to be of the belief that if another bot has been approved to do something, he can also do it without requesting permission or even bothering to identify such on his user page.

But then again, I guess BetacommandBot is considered a super bot that does not have to follow the rules that apply to more lowly bots. Dbiel (Talk) 03:02, 10 January 2008 (UTC)

700 edits per minutes? That is a super bot! BJTalk 03:05, 10 January 2008 (UTC)
Dbiel I follow Maxlag which is not max delay. there is a major difference, I know our bot policy better most. I also know what the servers can take, how to operate with them and prevent my bot from causing problems. I use a setting of maxlag=5. which is better on the servers than any hard coded edit per minute rate. when the servers have a high load BCBot does not edit. so unless you know what your talking about please stop trolling and shut up. βcommand 04:20, 10 January 2008 (UTC)
Sorry about the typo and I have no doubt that you fully understand the policy better than most, but my point is that you push the policy to extremes rather than setting the best example of following it. Granted, MaxLag is better, but that is not what you requested in the majority of your requests and that is not what you have stated on your user page. You get an approval for one task and then apply that approval to other tasks and fail even to note that fact on your user page or request that the change apply to your other tasks as well. You call me a troll, fine, but you also thanked me for pointing out a bug in your bot. Then as you have previously stated, you follow the WP:IAR yet at the same time enjoy enforcing the letter of the rules on images as related to FUR's while tagging them with misleading tags. Instead of tagging Images with no FUR with {{nrd}} as stated in your request, you use the misleading tag indicated above. Just another case of saying one thing and doing something else. Dbiel (Talk) 15:35, 10 January 2008 (UTC)
when I filed the previous request maxlag did not exist. Instead of filing new request, just to enable a better system of doing the same thing. IAR has uses and limitatons. I do not state any kind of edit rate on the bot user page. I was approved for tagging images with no valid rationale and that is all the bot does. βcommand 16:39, 10 January 2008 (UTC)
Interesting, you say that you do not state any kind of edit rate on the bot user page, yet you claim you state what the bot is doing on your user page as required by the guidelines; yet the only reference to the bot functions is a link labled "tasks" which is only a list of bot approval requests that do state specific edit rates. So if one is to say that the link meets the requirements of identifing what the bot is done on the bot user page (which I do not believe that it does), then it must be said that the edit rates are also identified on the bot user page. You can't have it both ways. Dbiel (Talk) 19:56, 10 January 2008 (UTC)
Those BRFA's give EPM rates and maxlag=5 as edit rates. Bot policy does not require you to give the rate that you plan on editing. βcommand 22:25, 10 January 2008 (UTC)
My bot also runs at a different edit rate than I requested in the BRfA and I don't list it on my bot's talk page. Are you honestly suggesting that every change made to approved bots needs a new BRfA? BJTalk 23:11, 10 January 2008 (UTC)
No I am not suggesting that every change needs a new BRfA. But I am suggesting that the basic polices as found at the Wikipedia:Bot policy be followed. And yes, changes to the bot should be reflected on the bot's user page. Why link to a bot approval page as the only record of what the bot is doing if it is no longer accurate. If you want an example of what I would expect to see, take a look at User:Polbot

<blockquote>

The bot account's user page should identify the bot as such using the {{bot}} tag. The following information should be provided on, or linked from, both the bot account's userpage and the approval request:

  • Details of the bot's task, or tasks
  • Whether the bot is manually assisted, or runs automatically
  • When it operates (continuously, intermittently, or at specified intervals), and at what rate
  • The language and/or program that it is running

</blockquote>

If not why not simply change to policy to say that it only applies to new bots. and that previously approved bot can make changes at will without any notice, or record of what changes have been made. Dbiel (Talk) 23:48, 10 January 2008 (UTC)
I don't see any rate on that page. BJTalk 00:27, 11 January 2008 (UTC)
I am sorry that I did not make it clear enough see User:Polbot/older tasks The single task on the main page is missing the edit rate but it is also hung up in BAG. It does include the other key elements. Also please note that the other tasks were on the main page until a short time ago.Dbiel (Talk) 03:11, 11 January 2008 (UTC)
Dbeil, like I said stop tolling When it operates is not its edit rate. I cover that on the user page BetacommandBot will be an ongoing bot, run whenever I can run it, or feel like running it.. Like I said before you dont understand the bot policy so dont attempt to force it on others unless you know it. βcommand 02:06, 11 January 2008 (UTC)
Apparently I continue to fail to communicate in a way that you can understand what I am trying to say. The edit rate was not the issue. The last issue I was making was simply in reply to your statement that "it is not on my use page" but also saying the the link to the approval requests is all that is needed for a user to understand what your bot is doing and that it meets the requirement that the information is linked to the user page, in which case the edit rate IS on your user page due to the link. But again, this is not the main point. The main point is that no one can look at your user page and figure out what the bot is doing. But it seems that no one else cases so I will drop the issue.

On a separate point, I still think it is wrong for you to put a 7 day delete tag on images that are used properly under fair user rules, but lack the technically required FUR which is then followed up by an administrator who blindly deletes the images based solely on your tag. And for an example see Image:Parque.JPG where an admin made deletions on 70+ pages in less than one minute using TW. or Image:NforDisco.jpg where the deletion record reads 17:48, January 10, 2008 East718 (Talk | contribs) deleted "Image:NforDisco.jpg" ‎ (CSD I7: Bad justification given for fair use and the uploader was notified more than 48 hours ago) where the image was uploaded over a year ago and the last edit to the page using it was on Nov 11, 2007. a 7 day notice posted on the article page just is not enough. Since nobody seems to care, I guess I might as well give up. The result is going to be that a lot of pages that are using fair use images correctly, will find their images deleted. Which is a shame as it is not that hard to generate the required FUR but it is time consumming. I am slowly reverting all of your Polbot reverts that you marked as vandalism as so far I have not found one reverted on any valid grounds other that it was made by an unauthorized bot. Dbiel (Talk) 03:11, 11 January 2008 (UTC)

Semi-automated tagging of Shared IP Addresses

Betacommand has requested that I post for community discussion about a bot that I just proposed, called IPTaggerBot. If you are interested in commenting on the subject, please review the bot approval request at Wikipedia:Bots/Requests for approval/IPTaggerBot as well as the bot's userpage at User:IPTaggerBot. Thank you. Ioeth (talk contribs friendly) 17:02, 24 January 2008 (UTC)

Bot tag

Perusing the rollback logs, I see User:VoABot II has rollback but not a Bot membership and User:VoABot III has nothing. Shouldn't they both have the bot flag to keep their edits from cluttering recent changes? MBisanz talk 04:11, 27 January 2008 (UTC)

Anti-vandalism bots are not normally flagged. βcommand 05:14, 27 January 2008 (UTC)

Please block PipepBot as out-of-control bot

PipepBot (talkcontribs) is broken and is removing lots of valid interlanguage links, e.g. [1] [2][3][4][5][6] [7][8][9][10] (there are many more examples). It is also moving existing interlanguage links around (out of alphabetic order) for no good reason, e.g. [11]. This is causing disruption. The bot owner has been notified of these concerns [2], but I am suggesting a temporary block to prevent the bot causing further unnecessary disruption. - Neparis (talk) 19:09, 27 January 2008 (UTC)

I think that you should ask at WP:AN/I for a temporary block, not here. NicDumZ ~ 19:14, 27 January 2008 (UTC)
Actually, I copied your message overthere. NicDumZ ~ 19:21, 27 January 2008 (UTC)
Thanks, hope it gets blocked soon. - Neparis (talk) 19:24, 27 January 2008 (UTC)
(cross-posted from WP:ANI [12]) The bot is still operating across other wikis, e.g. fr.wiki, de.wiki, it.wiki (probably more wikis too). It is removing valid interlanguage links there too. I presume it cannot be blocked by admins on en-wiki. Is there a central cross-wiki noticeboard for reporting a bot that is misbehaving across multiple wikis? (rather than making multiple reports to different wikis) - Neparis (talk) 19:40, 27 January 2008 (UTC)
The bot is very active and very out of control on multiple wikis. Is anybody around to give advice on how to handle this? - Neparis (talk) 20:02, 27 January 2008 (UTC)
Ok, this bot is not out of control. The user is fixing interwiki conflicts. Please unblock this bot. Nothing wrong with these edits:
Looks like Neparis owes someone an apology - multichill (talk) 23:11, 27 January 2008 (UTC)
well, as far as I know, fr:Ville is the translation of City, even if it is also the meaning of Town. Interwiki.py usually don't remove "controversial" interwikis like these, unless the -force option is activated. It should not. NicDumZ ~ 23:16, 27 January 2008 (UTC)
City/town is a mess. Probably unfixable in the current interwiki system. But you're wrong about the -force option. I happen to run an interwiki bot myself and i never use the -force option. I do however fix interwiki conflicts every once in a while. This means i pick a page and run the bot without the -autonomous option (and without -force option). Bot asks me a lot of questions and in the end adds and removes a lot of links. Looks like Pipet did the same. multichill (talk) 23:23, 27 January 2008 (UTC)
I also run a bot by myself... I know how the script works. And I don't think he was online, because he hasn't answered on his talkpage... NicDumZ ~ 23:34, 27 January 2008 (UTC)
(this hopping back and forth between WP:ANI and WP:BON is confusing :-); I just replied on ANI — can we stay over there for followups please?):
Well, the ones that really caught my eye were the interlanguage link removals for dioxin.[23] I just reviewed them again and at least some of them still look like they might be considered at least somewhat controversial link removals. I could be wrong about it, but some wikis (e.g. Danish) seem to me to have an article on dioxin, but not yet an article on polychlorinated dioxins, which is a specific type of dioxin. In such a case, having interlanguage links to dioxin, as the general term, seems quite useful to me. User:Blech from de-wiki has told the bot owner that most of the interwiki links that the bot removed were correct and that he has reverted the bot.[2] I have not checked any of the other examples in detail, but I had a quick look at one of them — the aerosol link removals.[24] Particulates are a cause of aerosols, and, though I may well be wrong about it, some wikis (e.g. French) seem to have an article on the latter but not the former, so, in such a case, having the interlanguage links, e.g. to fr:Aérosol, seems quite useful to me. I am acting in good faith here, and if I have made a mistake I will certainly say sorry to the bot owner. Please let me know your thoughts — I can take a wikitrout or two. - Neparis (talk) 00:26, 28 January 2008 (UTC)
Hallo Neparis, I am the operator of PipepBot. Doing my edits I follow the policy in Help:Interlanguage links#Bots and links to and from a section, which says "The activity of the bots also requires that interlanguage links are only put from an article to an article covering the same subject, not more and not less." It is not always easy to find the right interlinks, but until now PipepBot has done thousends of edits in en:wikipedia and many thousands in other wikipedias, most of thems in manual mode, and I never had problems. Of course it is possible, that some edits are not optimal, but I am open for discussion. --Pipep (talk) 20:44, 28 January 2008 (UTC)
NicDumZ, with bot i meant interwiki.py. You dont seem to understand that. How can a bot fix an interwiki conflict without human intervention? It cant, a user has to be present. The force option only removes links to non-existant pages, -autonmous just skips them and in normal mode the bot ask you if you're sure you want to remove a link. So please stop about the -force option.
Neparis, you cant have interwiki's to multiple articles in the same language, that's an interwiki conflict. You have to keep the sets separated. So en:polychlorinated dioxins -> <some language>:dioxin -> en:dioxin is an interwiki conflict. You have to create two sets. One set with articles with a link to dioxin and one set with articles with a link to polychlorinated dioxins. One link between these two sets and you get a conflict. multichill (talk) 21:11, 28 January 2008 (UTC)
Multichill, please, stop this. I know how pywikipedia works. I submit patches overthere, and I'm also a regular user of interwiki.py, whatever you may think.
My point was : a bot owner should check his talkpage, whatever happens. He was obviously not, and his bot made controversial changes. I moved the request to AN/I to request a protective temporary block. That's it. Period.
NicDumZ ~ 22:28, 28 January 2008 (UTC)
I was obviously present while solving interwiki conflicts. It is impossible to solve interwiki conflicts without beeing present, even if I would have enabled the option -force. At the time the bot was blocked, the bot was offline and me too. --Pipep (talk) 19:17, 29 January 2008 (UTC)

Cleaning up the bot status page.

Would anyone mind if I cleaned up Wikipedia:Bots/Status? Three things I'd like to do, for easier readability:

  • Archive the bots on that lists which are currently discontinued
  • Add the RFBA link, since several of them don't have it
  • Expand the "purpose" section, since some of them are empty or have ambiguous descriptions like "Various tasks"

  Zenwhat (talk) 06:13, 29 January 2008 (UTC)

As an example of why this needs to be done: User:Polbot was approved to do minor tasks, like wikifying data for U.S. politicians [25] and removing piped linking in disambig pages [26]. In practice, the bot had been automatically-generating "fair use rational" for tons and tons of images, something he hadn't gotten approval for. When it discovered, the bot was temporarily blocked. [27][28]

Now, let's say hypothetically this user decided to act in bad-faith. They aren't, I don't think -- they seem like a good user, but it is still possible.

A newbie happens to see this bot making changes to fair use rational, wondering, "How strange! I wonder if this bot is approved!" They go to the bot status page and it says it's active and approved for "various -- see bot's user page." But then, if you look at the bot's userpage, the bot is no longer active at the moment and it hasn't been approved for all the stuff it's been doing. Records like this should be kept in good order.

One might also check to see if the list matches up with the list of users flagged as bots. [29], in Category:Wikipedia bots, and Wikipedia:Registered bots should have a link to Wikipedia:Bots/Status to avoid confusion like this. [30] Almost a year ago, Betacommand said he'd merge the two pages. [31]

  Zenwhat (talk) 06:31, 29 January 2008 (UTC)

I totally support record cleanup :) -- SatyrTN (talk / contribs) 05:29, 31 January 2008 (UTC)

Free bandwidth.

I have a pretty decent connection on broadband and I have two computers (a little wannabe\mini-workstation). If any bot-owners here need somebody to run a bot for them, I'd be willing to do it either my main PC (Windows Vista) or my secondary PC (Ubuntu Linux, but I can wipe it clean and install any version of Linux you like, so long as you can walk me through setting up the KDE/GNOME GUI). Preferably, I'd want it to be on my secondary PC, the Linux box, since I rarely use it.   Zenwhat (talk) 23:26, 30 January 2008 (UTC)

what are the system stats? βcommand 23:36, 30 January 2008 (UTC)

Main PC (32-bit Windows Vista)

  • CPU: AMD Athlon 64, 3400+, 2.20 GHz
  • RAM: 3 GB
  • HDs: Dual HDs (Master 150 GB, Slave 75 GB), each currently with about 4 GBs of free space (but I could easily free up a lot more if you need it)

Secondary PC (Ubuntu Linux)

  • RAM: 775 MB
  • CPU: Intel Celeron, 1.70 GHz
  • HDs: Single HD, 30 GB, 22 GB of free space.

Speedtest from Speedtest.net:

  • DL: 14076 kb/s
  • UL: 1537 kb/s

Also, I'm mostly pretty good about keeping my computers clean of spyware and viruses, not the typical end-user "omfg teh bad man haxed me, help me AOL tech supp!!"   Zenwhat (talk) 01:52, 31 January 2008 (UTC)

Im sorry but that is not enough RAM for my bots. βcommand 01:56, 31 January 2008 (UTC)
Beta needs more pickles. BJTalk 02:45, 31 January 2008 (UTC)
(EC)Betacommand, would your bot benefit from a clustered environment? I'm still waiting on a few parts to come in, to complete it, and, I need to get off my butt, and finish setting it up, but, I should have my OpenSSI cluster up and running soon. I would suppose, that the biggest bottleneck will be my uplink (I've only got 1mbps up x 10mbps down or so), but, if you've got a task that'll need more CPU than BW, I might could help with that soon. It'd have to be a threaded app, however, to take advantage of it (depending on how the hardware works, it should be between 12 and 48 machines, all P3-600's, w/256Mb ram. Due to power costs however, I don't plan to run it 24x7) SQLQuery me! 02:54, 31 January 2008 (UTC)

3 GB is not enough RAM? How much do you generally use?   Zenwhat (talk) 02:54, 31 January 2008 (UTC)

On a Vista box 3 GB RAM usually means about ~1536 MB of usable RAM. toss in the user doing anything else and that cuts down even more, and then with only a 2.2GHz processor.... Ill take my 8GB RAM linux toolserver any day. βcommand 03:43, 31 January 2008 (UTC)
If I had to guess, I'd assume the vista box would be less than usable for BCBot (is py available on win32/64? I haven't used windows in a long long time) SQLQuery me! 02:56, 31 January 2008 (UTC)
Yes, Python is available for everything, even toasters! BJTalk 02:59, 31 January 2008 (UTC)
Heh, learn something new every day :) My experience with *nix tools in a windows environment has always been "It works... Technically" :P But, my last real windows encounter was with a Windows 2000 machine :) SQLQuery me! 03:02, 31 January 2008 (UTC)
Python is supported well, unlike most other unix tools. BJTalk 03:08, 31 January 2008 (UTC)

BAG confirmation running

Just a quick note that there's a WP:BAG confirmation (from the trial membership) of myself at Wikipedia talk:Bots/Approvals group#Confirmation under the old system (Snowolf). As has been required in the past, I'm posting this notice on WP:AN, WP:BOWN, WP:BRFA & WP:VP. Snowolf How can I help? 15:54, 1 February 2008 (UTC) Changed link based on change to that page Martinp23

Similarly there is a confirmation running for Cobi (talk · contribs) at the same location. Martinp23 18:28, 1 February 2008 (UTC)
As well as Dreamafter (talk · contribs). ~ Dreamy § 21:23, 1 February 2008 (UTC)

Bot to track bots?

I came across another unapproved bot, who has been uploading obscure European athletes to Wikipedia. [32] This is the problem with having bad records of bots and folks don't seem to be keeping very good track of it, which is a very bad thing, considering how much damage bots can do and how difficult it is to remove.

So, here's an idea for a bot:

  • Check a random user's contribs.
    • If they are uploading stuff at the rate that bots do (1 contribution a minute or more, for several hours at a time) check their name on the bot-status page.
    • If it isn't there, send them a warning.
  • Repeat the same process all over again

Anybody willing and able to do this would be appreciated and it seems to be of critical importance.   Zenwhat (talk) 00:39, 2 February 2008 (UTC)

Do you mean more than 1 edit per minute or more than 1 new article. I can envision many users in the sciences who upload an article a minute for long periods of time (Blofeld is another example) and there are probably many more users with AWB or other scripts adding tags and cats and project boxes to articles at a similar rate. How would we screen them out? white list maybe? MBisanz talk 01:54, 2 February 2008 (UTC)
If an editor is a bot, how would sending them an automated talk page message do anything? High-speed editing isn't always a bad thing, and it doesn't imply that the user is a bot. Some semi-automated scripts can make a couple of edits a minute. GracenotesT § 01:57, 2 February 2008 (UTC)

Yeah, MBisanz, a whitelist. Or you could tighten up the algorithm even further, to make it multiple edits a minute.

Keep in mind, I'm talking about 1 edit a minute for several hours straight. We're human beings, so that shouldn't be possible, unless Blofeld is being fed by intravenous fluid and using a bedpan or empty bottle to pee. Gracenotes: In the case above, the user seemed to be using his regular account as a bot and that's what a fair amount of users do. Even if an editor is using a secondary account as a bot, he should see the talkpage message after the bot is blocked.

Also, the diff above was incorrect!

I accidentally posted the message on the wrong talkpage, rofl. Not even the right user's talkpage, but Talk:Bitburg.

I changed the diff above to this one. [33] I also removed my comments from Talk:Bitburg.

It's User:Markussep. Look at his contribs.   Zenwhat (talk) 05:40, 2 February 2008 (UTC)

Did WP:BOT change recently? I seem to remember at some point that any user could run a bot on their own account, so long as it wasn't disruptive (flooding, etc) and that they took personal responsibility for its edits and couldn't have them excluded from RC via the BOT flag. MBisanz talk 05:49, 2 February 2008 (UTC)
Also on the issue at hand, I'd say it should track and output the names to some sort of noticeboard/editabuse page, but not warn the user. Given the possibility of false positives, a human editor should review/investigate. MBisanz talk 05:55, 2 February 2008 (UTC)

MBisanz, WP:BOT says all bots must be approved. Also, about your suggestion: Your idea isn't exclusive to mine. The algorithm could do both. Obviously, if somebody is editing at a rate of 1 edit a minute for weeks on end, they're using a bot. The specific issues of when and how to warn the user or post it to a noticeboard would probably best be worked out when the bot is tested.   Zenwhat (talk) 08:49, 2 February 2008 (UTC)

WP:IAR has been used, with community approval, for behavior that's against the word of WP:BOT. The only reason the rate limitation is in place is to make sure someone doesn't cause too much damage – and if they do, it's their responsibility to fix it. So long as a task is desired and implemented correctly, it should be fine for someone to run it as an automated task at a reasonable rate (although semi-automated is better, imho). A bot to track who reverted more than three times on one page in an interval of 24 hours was recently on BRFA, but did not pass; there were concerns admins reviewing a situation and jumping in the middle of it might make an incorrect assumption about the reverter and make an inappropriate block. A noticeboard for semi-automated tasks might be useful for preventing abuse, since someone wishing to run such a task can simply say "I plan on doing so-and-so at so-and-so a time", and if no one objects, he/she could run it.
To make these edits (warning: long page), I did not use a bot; in fact, I can show you screenshots of the semi-automated software I used (the screenshots were created when I was outright accused of using a bot, several weeks after the incident). The only interaction I received from the incident was a minor barnstar, initially given to someone else who was using the same script. If this "bot"-tracking bot existed, would I instead be warned? I dunno... GracenotesT § 17:44, 2 February 2008 (UTC)
When I revert vandalism, I'm a lot faster. I dislike this whole idea. If you want, you can take the dumps and analyse them, as it has been done from time to time (I remember dragon flight's report on adminbot of some time ago). I don't see any use in it, unauthorized bots are usually blocked only if they are disruptive or controversial. Uusually, when somebody is running a bot on his main account, a polite suggestion may be used. Snowolf How can I help? 18:32, 2 February 2008 (UTC)
I think a non-bot analyzer script(?) might be more effective than a bot. A smart vandal will learn what timings this bot operates on (say an edit a minute for more than 6 hours) and just change their program to edit only for 5 hours straight and then take an hour off. On the other hand, an analyzer could be run on various filters that would pick up a larger variety of forbidden bots. MBisanz talk 18:52, 2 February 2008 (UTC)
This is the sort of example of an unapproved bot that has community backing and may or may not be picked up by an automated checker User_talk:Misza13/Archives/2007/02#admin_actions_bot (and the weird WP:BOT reference I had stuck in my head). I'd say a human user would need to review all warnings, since there is so much variety in scripts, unapproved bots with community consensus, fast editors, smart vandals, etc. MBisanz talk 18:59, 2 February 2008 (UTC)

Who's been doing this?

Resolved

See this. [34] Did anyone approve this? And if they didn't, who is doing this?

Now, again, it's clear why it's so critical to have very neat records of bots that are updated and regularly checked for inaccuracies, and that such inaccuracies are investigated.   Zenwhat (talk) 11:06, 4 February 2008 (UTC)

That has nothing to do with bot rights; anyone could run a script under their main account to make 2000 edits. The bot flag is irrelevant. — Carl (CBM · talk) 14:46, 4 February 2008 (UTC)
FYI, this is being discussed on Wikipedia:Administrators' noticeboard#Adding useless revisions to pages to make them undeletable.--Dycedarg ж 20:39, 4 February 2008 (UTC)
Tim has already acted by blocking BCBot. Snowolf How can I help? 21:08, 4 February 2008 (UTC)

BAG or the bureaucrats should probably issue a formal warning to Betacommand - it seems the ArbCom case hasn't taught him not to mess with unapproved bots. 1200 edits to even a userspace page is a big enough deal that it should be approved beforehand. There is probably a topic on this at WT:BAG, so this can be a cross-posting. Go read, I suppose. --uǝʌǝsʎʇɹnoɟʇs(st47) 20:59, 6 February 2008 (UTC)

Resolved

I've blocked it indefinately for the time being, pending discussion and resolution

It should not be making edits such as this [35] - Nothing more than removing whitespace. [36] is bad enough, however, if it is going to do that, it shouldnt be just doing that as an edit alone (makes no difference to the template/categorisation).

Reedy Boy 18:00, 14 February 2008 (UTC)

Fully agree. βcommand 18:05, 14 February 2008 (UTC)
I'm sure you've checked with the owner, right? My bot has often done that when I'm in the middle of testing. Usually to no more than 10 articles, but it has happened :) -- SatyrTN (talk / contribs) 18:31, 14 February 2008 (UTC)
Per User_talk:Alanbly#User:BoxCrawler, it appears to be known and have been an issue for at least 3 days. MBisanz talk 18:42, 14 February 2008 (UTC)
Geez - when my bot's making stupid mistakes like that, I don't let it run three hours, much less three days! Thanks for checking! :) -- SatyrTN (talk / contribs) 18:47, 14 February 2008 (UTC)
Yeah, the owners page was a good confirmation of my action. Reedy Boy 19:18, 14 February 2008 (UTC)
OK I'm fine with toning the bot down. What is enough for an edit? How incorrect does the format have to be? The bot works the way i designed it I just didn't think it was this much of an issue. I don't know of any policy against bots cleaning up whitespace and reformatting a template so i went for perfect formatting. I'm perfectly fine with changing the bot I just need some guidance. Adam McCormick (talk) 00:57, 15 February 2008 (UTC)
From WP:AWB's instructions "Avoid making insignificant or inconsequential edits such as only adding or removing some white space, moving a stub tag, converting some HTML to Unicode, removing underscores from links (unless they are bad links), or something equally trivial." Basically in my hand editting, there needs to be at least a spelling correction or major reformat of text to make it worth it to save. Things like a reflist substitution or spacing or heading character spacing, isn't significant enough to warrant an edit. MBisanz talk 01:09, 15 February 2008 (UTC)
Ok, I'm not running AWB so it's not where I was looking. My bot is completely automated once I start it. These are the kind of Edits I would make myself if I ran across them so I didn't see a problem. I still need to know how much of an edit. Does fixing internal spacing count? Capitalization of inputs? Changing to avoid redirect? Removing duplicate of parameters? I guess I'm just asking if there is a line in the sand I can draw so i can fix this. Adam McCormick (talk) 01:16, 15 February 2008 (UTC)
Duplicate parameters yes. Changing to avoid redirects i would say is ok.. Where it is just "capitialising" paramters, or removing/adding a line of whitespace, that is where the problem lies. Reedy Boy 18:31, 15 February 2008 (UTC)
I think a bot shouldn't edit to avoid redirects at all, per Wikipedia:Redirect#Do not change links to redirects that are not broken. -- Jitse Niesen (talk) 19:53, 15 February 2008 (UTC)
And respond to inquiries about the bot. I asked about this on 12 February and got no reply. Gimmetrow 20:38, 15 February 2008 (UTC)
I apologize that I missed your comment. I didn't notice your signature and lumped it with the comment below it. Adam McCormick (talk) 23:38, 15 February 2008 (UTC)

(unindent) Ok, I have made edits to the bot so that it disregard spaces and caps. Is it OK if it changes these things any time it does make an edit (for other reasons)? Is there anything else, or can my bot start running? Adam McCormick (talk) 23:52, 15 February 2008 (UTC)

Yes, if the bot identifies a genuinely useful edit, it's OK to combine other edits. Gimmetrow 23:57, 15 February 2008 (UTC)
Alright, I've changed the bot to correct all this, can it be unblocked now? Adam McCormick (talk) 00:22, 17 February 2008 (UTC)
You still need to have a WP:BRFA, as I understand this, this is an unapproved fully automated bot. It must be approved before it is unblocked --Chris 00:36, 17 February 2008 (UTC)
Ok I've seen the original brfa now, but you will still need approval if the edits aren't combined with adding the infobox or whatever it does --Chris 00:41, 17 February 2008 (UTC)
The bot was approved to edit the {{WPSchools}} template. That's all I'm doing. The other edits support this activity either by cleaning up after former bot errors or by making the template easier to read (the bot only edits templates with certain factors). They are not drastically different from it's initial request. Adam McCormick (talk) 01:02, 17 February 2008 (UTC)
Please unblock my bot or give me reason why not. I have complied with this discussion and would like to complete the run (with changes in place of course). Adam McCormick (talk) 00:03, 19 February 2008 (UTC)
It looks like they want you to do another BFRA. It's important to do so because it details EXACTLY what tasks the bot will perform. (Just my 2 cents) Compwhiz II(Talk)(Contribs) 02:17, 19 February 2008 (UTC)
I could do that but it would consist mostly of reposting the original. I haven't changed what this bot does (or have reverted all changes) it shouldn't be this big a deal to get the bot unblocked. Adam McCormick (talk) 02:21, 19 February 2008 (UTC)
Please, unblock my bot. Adam McCormick (talk) 03:09, 22 February 2008 (UTC)
Unblocked, seeing as no one has brought this to BAG and the issues were minor. -- SatyrTN (talk / contribs) 05:44, 22 February 2008 (UTC)

Betacommandbot

Regarding the Feb 13-14 spree of around 20,000 articles tagged for speedy deletion I have attempted to initiate conversation with betacommand over possible glitches in his bot. It seems that it often fails to notify either the image uploader or the article talkpage. All those images will be up for deletion tomorrow and I'm certain that some "helpful" admins will burn through the lot in no time. I think that it's out of order that images can be deleted in this manner and I attempted to discuss the issue with Betacommand here but with this edit he moved the conversation to an obscure location on his talkpage where his responses were "that page is full of lies and bullshit and is a complete farce and it will not affect how I operate BCBot" and "as for the other images uploaded by English peasant, that was caused by a user re-name while the bot was running." I don't feel that this adaquetly addresses my concerns.

Also the state he leaves inactive users talkpages is disgraceful [37] 532,500k and growing, and the vast majority of the messages concern images that are only lacking a backlink to the article.

The vindictive attitude towards a vocal critic here and here is also way out of line. English peasant 11:51, 18 February 2008 (UTC)

Deletion bots and transparency

Please read Wikipedia:Bot_requests#Image_deletion_bot. BetaCommand is claiming that deletion bots are being run de facto covertly. If this is truly the case, I encourage anyone doing this to come clean. It's entirely unacceptable, not to mention entirely unnecessary. The unease of the community with admin bots is in large part due to the abuse of bot operators who run unauthorized bots because they believe the rest of the community shouldn't have jack to say about their fantastic bot. Pascal.Tesson (talk) 17:18, 2 May 2008 (UTC)

Misza and cyde both admit clearly that they do this. βcommand 2 18:05, 2 May 2008 (UTC)
Well we shouldn't let them. If they believe it's the right thing to do, they should be able to make their case clearly to the community. The community is too dumb to realize that admin bots have some merit? Come on, that argument is beyond condescending. The RfA for ProtectionBot would have been successful if it hadn't been withdrawn and would have faced only marginal opposition if it hadn't been closed-source. Wikipedia:Requests for adminship/RedirectCleanupBot faced a total of 15 opposes. Pascal.Tesson (talk) 19:23, 2 May 2008 (UTC)
I've notified Misza and Cyde of this thread. I'll be interested to see the responses. Franamax (talk) 19:32, 2 May 2008 (UTC)
And just in case people haven't read the other thread, I'm all in favor of image deleting bots and am ready to help in convincing the community to accept these. I'm just uncomfortable with (and actually pretty pissed at) bots doing this through a non-bot admin account. Pascal.Tesson (talk) 19:55, 2 May 2008 (UTC)
Yes deletion bots have been run, and I assume they continue to be run. Anyone looking at Misza's deletion log would have a big clue. This has also been discussed at some length at AN / ANI in the past, though good luck finding the right log for that discussion. Dragons flight (talk) 19:59, 2 May 2008 (UTC)
Naahh, Misza just had a browser with 100 tabs open. Isn't that the standard argument brought up when edits happen really fast? More seriously, there seems to be a gap in definition of what exactly "bot edits" are. Franamax (talk) 20:16, 2 May 2008 (UTC)
Not to mention his rigid compulsion to start his deletions at precise times each day. Dragons flight (talk) 20:20, 2 May 2008 (UTC)
As far as what constitutes bot edits goes: Any process which is fully automated. The policy claims that some semi-automated scripts might qualify as bots, but I've never put one through RFA and rarely see others do so. Generally, it's a bot if the person is willing to admit it's one. Otherwise, it's assumed that it's 100 tabs open in a browser and/or a semi-automated script. In any case, as I said on the other page, an image deletion deletion bot would never make it through RFA in a million years. Period. If you want this stuff done, it will have to be under the table, or you'll have to try to come up with an admin-bot approval process that circumvents RFA, which is just as unlikely as getting one approved. Hence, IAR for the greater good, or volunteer to be the one who slogs through thousands of crap images manually.--Dycedarg ж 20:50, 2 May 2008 (UTC)
Gah... parallel conversations are taking place on two different pages... (my bad!) The thing is that while ImageDeletionBot will probably be shot down at RfA, I am optimistic that BotDeletingImagesInTwoVerySpecificEasyUncontroversialCases would be fine. You just have to do the marketing right. The sucker volunteering to do the Commons dupes clean up by hand is me. I'm pretty fed up with it and seeing as there are huge backlogs in both cats, there aren't many suckers willing to take my place. Pascal.Tesson (talk) 21:14, 2 May 2008 (UTC)
(to Pascal) This is a meta-discussion though, with wider implications.
(to Dycedarg) I won't argue with what you say, but I don't particularly buy into "won't pass so do it in secret and never tell". That's indicative of a problem, we should solve that problem, n'est-ce pas? Look at User:BHGbot#Proposed, there is a good example of stating openly what will be done. Why can't an admin openly state intentions to run an automated series of edits on their own account?
And to the definition of bot edits, I would prefer to adopt "anything where you are blindly changing stuff quickly". For example, why was my own comment on my own talk page changed arbitrarily with no edit summary? That smells to me of a just-do-it-because-I-can mentality, that doesn't fit well into the wiki model, and rapid edits in large quantities leave the rest of us gasping for breath trying to figure out what just happened. The potential for damage is pretty large here. Franamax (talk) 21:35, 2 May 2008 (UTC)
Activities covered in the Signpost are not particularly secret (see the last paragraph). Dragons flight (talk) 21:42, 2 May 2008 (UTC)
Ouch, that was the month I joined up. I'll have to read those threads, I hadn't seen any indication of their existence 'til now. Maybe I wasn't looking hard enough. Franamax (talk) 21:57, 2 May 2008 (UTC)
OK, I've been through those threads, I note that the actual owners of the adminbots contributed a total of once (oops maybe twice) and the discussions seemed to peter out and die. What was your point? Franamax (talk) 00:50, 3 May 2008 (UTC)
I do generally agree that these things should be better documented than they are. Dragons flight (talk) 01:39, 4 May 2008 (UTC)

BotDeletingImagesInTwoVerySpecificEasyUncontroversialCases already exists. Misza13 runs it, as does east718 (and others, I'm sure), with error rates so low they're negative numbers (not really, but yeah). dihydrogen monoxide (H2O) 01:34, 4 May 2008 (UTC)

Tasks

If my bot was approved to subst Welcome templates, do I need to submit another BRFA to subst templates in the Category:User block templates? MBisanz talk 05:49, 3 May 2008 (UTC)

IMHO, this particular BRFA was regarding substing every talk page message, go for it. MaxSem(Han shot first!) 06:18, 3 May 2008 (UTC)

Ongoing BAG membership request

I've put myself up for BAG membership (God that sounds lame) on WT:BAG -- here. Any comments / questions / votes (zomg votes!) are welcome. Cheers. --MZMcBride (talk) 05:08, 4 May 2008 (UTC)

Bot RFC

An RFC on a bot has been opened at Wikipedia:Requests for comment/VoABot II, if you are interested in the proceedings, please stop by and have a look and/or make a comment. Thanks, — xaosflux Talk 03:50, 5 May 2008 (UTC)


Addbot

USer:Addbot: - Seems to have been running without a flag, and non-approved tasks. Rich Farmbrough, 12:11 5 May 2008 (GMT).

Addshore has pointed me to some approvals. Rich Farmbrough, 12:14 5 May 2008 (GMT).
Also flag has now been added. ·Add§hore· Talk/Cont 15:22, 5 May 2008 (UTC)

Adminbot operator needed

Is there any user who is an experienced bot operator and a wikipedia admin, who is prepared to endure a bot RfA to manage a bot like this? From the discussion at AN, it seems that the idea has support from the community, especially if the bot were copied directly from the one at nl.wiki, which has been operating flawlessly for over a year. I'm sure the bot-operating community would support to the hilt any brave soul willing to endure the Skynet comments at RfA for a week. Volunteers, anyone? Happymelon 15:38, 4 May 2008 (UTC)

What are the technical requirements? If its a "run once a day" bot, I would be interested, if its a "run 24/7" bot, I'd have to think it over. MBisanz talk 15:41, 4 May 2008 (UTC)
It seems to be a batch process, currently run once or twice a week at nl.wiki and he.wiki by RonaldB-nl, based on a database of TOR nodes identified through statistical traffic analysis. The BRFA linked above was for RonaldB-nl to perform the same task (using the same script, presumably) for en.wiki, but was denied because he was not himself an admin here. So the task would probably consist of liasing with RonaldB-nl, getting hold of the blocking script, configuring it to work on en.wiki, then regularly processing the data file and doing a blocking run about once every 3 days or so. You can see more details here. Happymelon 15:50, 4 May 2008 (UTC)
Ok, I've read it over, seems simple enough, click a button loading a list every couple of days, bot's code takes care of the rest, check the bot's contrib list for accidental false positives. Yes, I know an RFA will be painful, but where do I sign up. MBisanz talk 15:55, 4 May 2008 (UTC)
I guess file a BRFA and wait for BAG to mark it approved for trial. Then armour up and get ready to face the music! Let me know when the RfA goes live so I can lend some support. Happymelon 17:34, 4 May 2008 (UTC)
Sounds to me as if the GlobalBlocking extension that is currently being developed may save a whole lot of log entries in the near future if this bot would ever be used on meta:Global_blocking. Siebrand (talk) 13:07, 14 May 2008 (UTC)
I think global blocking is going to be contorversial enough without the additional complications of potential GB users knowing that there will be a bot operating on a distant site largely beyond their control, applying blocks to IPs that a distant user, who might not even speak their language, has determined are open proxies. At least we at en.wiki have the advantage that RonaldB-nl is somewhat active here, and can respond to queries and comments. If we can gain the necessary interwiki consensus, then yes, it would be an infinitely better idea to apply these blocks at meta and propagate them through GlobalBlocking, but for the meantime (and given that we'd have to wait until the extension was deployed before even beginning that discussion), local implementations are the way forward. Happymelon 10:47, 15 May 2008 (UTC)
Sure, just painting the big picture :) Siebrand (talk) 13:46, 15 May 2008 (UTC)

New BAG members

Kingturtle volunteered some crat time to close the pending BAG nominations. Krimpet, Maxim, MBisanz, and Mr.Z-man are now BAG members. MZMcBride's nomination did not reach consensus. — Carl (CBM · talk) 19:56, 15 May 2008 (UTC)

Request for BAG membership

I've posted a request for WP:BAG membership here, comments are appreciated. Mr.Z-man 05:55, 6 May 2008 (UTC)

I've also nominated Krimpet here. Any input would be greatly appreciated. SQLQuery me! 06:34, 6 May 2008 (UTC)
I have accepted a nomination to be considered for membership in the Bot Approvals Group. Please express comments and views here. MBisanz talk 08:35, 6 May 2008 (UTC)

I have also nominated myself, here. dihydrogen monoxide (H2O) 02:09, 9 May 2008 (UTC)

I also reopened mine --Chris 11:43, 17 May 2008 (UTC)

The IP block exemption mechanism has now been enabled for non-admin accounts.

Broadly, an account can be tagged to be unaffected by any IP block (including autoblocks), meaning only a direct block on the actual account name, will block it.

This might be useful for "bots that meet some suitable standard of approval and acceptance" (BAG approval, bot flag, don't know, not in BAG)... it will mean such a bot can't be blocked as a result of a toolserver block, or fallout via autoblock from another bot being blocked.

Users at BAG, and crats, might want to consider whether IP block exemption should be given as standard to any bots, such as those "officially approved" or the like, or not, or whether this would help ensure bots run more reliably (ie can't be inadvertantly blocked due to some other incident).

Just a heads up to start a discussion :)


FT2 (Talk | email) 20:21, 10 May 2008 (UTC)

My thought is that bots should only get it if there is a demonstrated need (the owner is an admin passing through IP-exempt, but the bot is hitting a range block. MBisanz talk 20:27, 10 May 2008 (UTC)
I would suggest it be given to all toolserver bots as a precaution - these should only really be blocked individually. dihydrogen monoxide (H2O) 01:20, 11 May 2008 (UTC)
If we decide that all bots should be made ipblockexempt, then the permission should just be bundled with the 'bot' usergroup, rather than manually given to all bots by a crat. Happymelon 17:40, 11 May 2008 (UTC)
I could see the argument for giving it to toolserver-based bots, but, honestly, the last time I even saw a TS bot blocked, was a few months ago, when BHG blocked BetacommandBot (and, also forgot to disable the autoblocker). Situation was resolved within a couple moments. It does not seem worth the trouble to assign really. SQLQuery me! 18:12, 11 May 2008 (UTC)
There are good reasons for not giving to bots as well (mainly non-toolserver bots) - if a bot operator is blocked, the bot should not be able to operate through the autoblock. Mr.Z-man 17:43, 11 May 2008 (UTC)
If the bot actually does operate from the toolserver, though, a block would only catch the bot if that account were used manually, from the same IP as the bot owner. And in that rare case, I'm not sure it's that big a concern. Ral315 (talk) 16:51, 15 May 2008 (UTC)
Since an autoblock would effectively shut down all of the toolserver bots that edit en.wiki, I think it would be a very good idea to grant this to bots that reside solely on the toolserver. Nakon 18:50, 16 May 2008 (UTC)

Problems with an archiving bot

I was recently browsing Talk:List of commonly misused English language phrases and I thought to myself, "there used to be more discussions here, but there aren't any links to archived discussions." So I looked into the page's edit history, and I found several cases where User:MiszaBot I had removed content from the discussion page to archive pages, but hadn't created links to the archive pages [38], [39], [40], [41], [42], [43]. This struck me as very bad. It is important that people who are browsing talk pages be able to see if there is any previous discussion they are missing. Often there have been discussions with important conclusions years in the past but involve things that new contributors to a page often bring up. Not giving diligent editors a clear way to find such discussions is intolerable. I have brought this up with the bot's operator (User talk:Misza13#Talk:List_of_commonly_misused_English_language_phrases) but he does not seem to think there is any problem, and also appears to be laying the responsibility for fixing it at my feet. Neither response is acceptable to me. Furthermore, I am concerned that there are talk pages all over Wikipedia which have had content removed by MiszaBot without links to the archive pages, and I think there needs to be an effort to find those pages and fix the problem by creating links to the archive pages. Nohat (talk) 18:20, 18 May 2008 (UTC)

The bot is not doing this on its own; it was directed to do it by this edit [44]. We ordinarily expect users to handle archive links on their own, since lots of different systems are used. I don't think there are any archiving bots that automatically add links. I do agree that someone who frequents the talk page there should add links to help new participants. — Carl (CBM · talk) 18:28, 18 May 2008 (UTC)
As a temporary fix I've changed the link to point to the Archives. Hopefully that's a bit less confusing. The better solution would be to create a page that would list everything and link there, but this works for now. Adam McCormick (talk) 18:42, 18 May 2008 (UTC)
I think there needs to be a lot more oversight in ensuring pages that archiving bots are working on have links to maintained links to archive pages. What happened to Talk:List of commonly misused English language phrases is not acceptable. As can be seen in the history of the page, User:Rm w a vu did not attempt to get any consensus for the use of an archiving bot, nor did s/he add any archive links. I don't think Talk:List of commonly misused English language phrases was long enough to need archiving, and it certainly wasn't high-traffic enough to warrant monthly archives. If bot operators run bots that are capable of mangling pages if they are not used correctly/responsibly, and they make no attempt to do due diligence to ensure the bot is being used correctly/responsibility, then they at least have a responsibility to clean up the messes their bots have made. How do we make sure it doesn't happen again? Nohat (talk) 18:47, 18 May 2008 (UTC)
Then change it. If you don't like the archiving, then put the content back and remove the archiving template. It's not the bot owners responsibility to police the use of the archiving template. Adam McCormick (talk) 18:52, 18 May 2008 (UTC)
I plan to, but the content is scattered across 19 tiny archive pages, and fixing it is nontrivial. I'm bringing this up here because I am concerned that there are pages all over Wikipedia that have been irresponsibly archived and it's going to take more than just me to deal with the problem. Furthermore, I disagree regarding responsibility: whoever makes edits to a page is responsible for those edits. Being a bot does not exempt one from being responsible for one's edits. Nohat (talk) 19:00, 18 May 2008 (UTC)
(outdent) No, the responsibility lies with the person who set up the archiving. The bot operates, properly, in accordance with the configuration settings given on the page. It's not the bot's responsibility to enforce a specific archiving method. It is up to the page users to decide how they want to archive. As for having links to the archive pages on the main talk page, simply add the {{archive box}} template and it will automatically keep track of any new archive pages (assuming the naming has been done correctly). -- JLaTondre (talk) 19:16, 18 May 2008 (UTC)
The problem is that the bot-created archives are too granular, and need to be merged into larger pages. It seems that it is very easy, indeed too easy, to add archiving to low-traffic talk page when it's very hard to fix problems that are discovered months later. If the person who requested the archiving did it maliciously or incompetently, and is unwilling or unable to fix the problem, or has the left the project, then when a problem is discovered months later, like no links to archive pages, or too much granularity in the archives, it's a big burden to be placed at the feet of the person who discovers the problem when the bot owner refuses to accept any responsibility for the problem. I don't think being a bot does exempts its owner for being responsible for the edits it makes, regardless of who requested those edits. Nohat (talk) 19:21, 18 May 2008 (UTC)
Miza has fixed the problem and collapsed the archives. in the future, such archives can be listed automatically using {{MonthlyArchive}}. Adam McCormick (talk) 19:58, 18 May 2008 (UTC)
ClueBot III has the ability to generate links to content that it archives, and place them in an archive box. If you find it necessary for the bot to maintain the archive lists, try using ClueBot III. -- Cobi(t|c|b) 04:09, 21 May 2008 (UTC)

Bot Assisted Assessment

(It was suggested that I post my request here.) Bot Assisted Assessment is where "a bot looks at all unassessed pages and adds the highest assessment parameter from other project templates on the page to the {{WP India}} template. E.g. if a page that has the WP India template has 'Start' and 'B' classes from other templates, '|class=B' will be added to the WP India one."[45] I was very surprised to learn that this was even possible. I've been working on moving WikiProject Ecuador along and something like Bot Assisted Assessment would be great for WikiProject Ecuador as well. In view of this, is it possible that you guys can create a list of things you have done with the bots as they relate to regional WikiProjects so that non-bot people (e.g. me) are aware of what can be done and can pick an choose among bot options to improve articles within the scope of a regional WikiProject. Thanks. GregManninLB (talk) 00:17, 25 May 2008 (UTC)

if you can think it, we can probably write it. βcommand 2 01:12, 25 May 2008 (UTC)
To an extent, BC. :) But yeah Greg, what you're talking about is a fairly simple bot task. Feel free to ask here about any others. dihydrogen monoxide (H2O) 01:42, 25 May 2008 (UTC)
User:Chris G Bot 2 already does article assessment. It would be a fairly simple modification to the code to create a bot with that functionality --Chris 02:28, 25 May 2008 (UTC)
Every bot which can do article assessment has slightly different heuristics for doing so: MelonBot looks for stub templates on the article, other project assessments, and also looks on the FA, GA and FL lists. Other bots do things slightly differently, but the majority of assessment bots will do the majority of the checks that are possible to best analyse an article. It's mainly a case of who will get to your request first :D. Happymelon 09:59, 25 May 2008 (UTC)
My bot was also written to do this task, assessing by looking at the highest rating the article has recieved from other projects and adding this to the template it is analysing. RichardΩ612 Ɣ ɸ 14:48, May 25, 2008 (UTC)

Single sign-on breaks old bot code

Unifying my bot account has completely broken my bot's login code. I can see how to fix it, but the fix is rather involved, since I would have to completely emulate the cascade of image-load and cookie operations associated with a new-style SSO login. Would it be possible for the SSO mechanism to recognize the traditional login cookies in the (for example) en.wikipedia.org domain, even when an account is unified for single sign-on? Otherwise, I'm going to either have to write a load of SSO bot-login code, or have to port it to a framework which already has this implemented. Either of these would be painful, and I'd prefer not to have to do either. -- The Anome (talk) 20:34, 29 May 2008 (UTC)

I know pywikipedia works with SUL login, you might want to look at that code. βcommand 20:39, 29 May 2008 (UTC)
I've now made the same request at Wikipedia:Village pump (technical), which is probably a better place for it. -- The Anome (talk) 21:19, 29 May 2008 (UTC)

Minor bot edits

I asked a question at the village pump here, but decided this is probably the better place to ask. When I have ShepBot set to make a minor edit I am told it should show mb next to the edit. Yet I only see an m; see [46] for an example. Is there some special b that signifies a minor bot edit I need to turn on? Thanks for your help! §hep¡Talk to me! 22:07, 31 May 2008 (UTC)

If I remember rightly, the "b" only shows up in Special:RecentChanges and the watchlist - try clicking "show bot edits" in your watchlist and see how many "b"s appear - but note that no bot edits have the "b" next to them in the history. IIRC, it's because the bot/not bot status is stored in the recentchanges table, which is purged on a 30-day rolling cycle, so it's only used for watchlists, RC and other things that can't be viewed months after the fact (the IP addresses used for CheckUser are also stored there, and a whole host of other data to make those feeds more comprehensive). All edits by a bot account are unavoidably recorded as bot edits for as long as the bot remains flagged. Happymelon 22:49, 31 May 2008 (UTC)
Okay. An upset user recently contacted me and I thought I may have done something wrong. Thanks for the help! §hep¡Talk to me! 22:51, 31 May 2008 (UTC)
Actually, the "unavoidably" part isn't true anymore; bots may include the parameter "bot=0" when saving an edit to avoid it being flagged as a bot edit. —Ilmari Karonen (talk) 01:02, 1 June 2008 (UTC)
  • Just a note, edits with +m and +b in the User_talk: namespace will use a bot's "nominornewtalk" user access and NOT trigger the 'new messages' flag for the user who owns the page. — xaosflux Talk 01:20, 1 June 2008 (UTC)

BAG request: Bjweeks (BJ)

My request to join the BAG is here. BJTalk 06:58, 6 June 2008 (UTC)

Deflag bots that haven't edited since 2006

The topic of better enforcement came up on IRC and one of the issues I think is the the lack of knowing what bots are doing what task. Having bots that haven't edited since 2004 (!) doesn't help the issue. As a starting point I propose that all bots that haven't made an edit since 2006 should be deflagged. BJTalk 09:35, 6 June 2008 (UTC)

They did this same thing a few months ago. The first step is to contact the owner to see if there are any plans. giggy (:O) 09:37, 6 June 2008 (UTC)
I just checked the first few and the owner already had a message dated March. Seems they were never deflagged. BJTalk 09:40, 6 June 2008 (UTC)
I would say, notify them via e-mail where possible, and, via talkpage. If we don't hear back from them within 2-4 weeks, they should probably be deflagged without prejudice (i.e. they may ask for the bit back at any time in order to re-start their approved tasks). SQLQuery me! 04:31, 17 June 2008 (UTC)

Some of you may know that I have spent my last months working on the bot status page. Well now I think my work is done. I have managed to:

  • Get rid of bad links
  • Fix links
  • Organize into categorys
  • Create templates
  • Taken the page from 75,846 bytes to 59,943 bytes (so far)
  • e.t.c

Now I need a hand from you the "bot owners" This pge has always needed updating and now it is allot easier(see the header at the top of the page explaining how to sue the template). This should cut the page size down even more making it easier to use and easier to load. It will also be allot tidier and also up to date.

All I am asking you to do is to go there and update your bots entry. If you have any questions about the template please ask me on my talk page or here. If you need to add a link to a request approval before the BRFA system then you just need to add it into the "Other Links" in the template. Again is you have any questions abotu that please ask me. I hope I have made myself clear. ·Add§hore· Talk/Cont 13:50, 8 June 2008 (UTC)

Done, thanks for your help also :) — E TCB 22:45, 8 June 2008 (UTC)
Another point I would like to put in bold is, "don't just update them but put them in the template as well" :> ·Add§hore· Talk/Cont 06:46, 9 June 2008 (UTC)
Just like to ask. As you can see on the page there is a link that shows all the new BRFA' using the special prefix page page. I think this takes out the need for other BRFA links on that page e.g. linking directly to the brfa unless they are not approved in the conventional way. Anyway tell me what you guys think :> ·Add§hore· Talk/Cont 06:45, 20 June 2008 (UTC)

LemmeyBOT replacement

This bot was really useful in fixing broken references, but now it's inactive because its owned has been indefblocked. Its sources are available at User talk:Lemmey. Could someone who knows Python pick up this task? MaxSem(Han shot first!) 05:48, 17 June 2008 (UTC)

Since Newsletter delivery is one of the main BOT requests, I have made a new cats and added the known relavent BOTs. Feel free to add them. This will help people to contact the BOTs directly in the non-availablity of another known bot -- TinuCherian (Wanna Talk?) - 15:59, 11 June 2008 (UTC)

Good idea. I've added mine. giggy (:O) 09:12, 12 June 2008 (UTC)
Why dont we categorize the commonly BOT requests like the above , Category:Wikiproject tagging bots etc and add them to a template and put it on BOT request page ? It does save time for people who make bot requests .. -- TinuCherian (Wanna Talk?) - 11:49, 12 June 2008 (UTC)
I think this would be a good idea :> ·Add§hore· Talk/Cont 12:17, 12 June 2008 (UTC)
Honestly, with the exception of interwiki bots, if we have the same type of bots doing the same task we're doin' it wrong. BJTalk 12:31, 12 June 2008 (UTC)
The idea is to categorize the BOTs that are normally requested by people. You dont need Sign Bots / Anti Vandal Bots / interwiki bots to be listed there. The idea is to help the people to contact bot oeprators/BOTs directly if it is urgent -- TinuCherian (Wanna Talk?) - 12:44, 12 June 2008 (UTC)

I am proposing use of {{Botcats}} on BOT requests page. It will produce


You may add more category links needed.Thoughts/Suggestions ??? -- TinuCherian (Wanna Talk?) - 05:58, 17 June 2008 (UTC)

I am being WP:BOLD and added the Template to the BOt request page. Feel free to remove /add / update categories -- TinuCherian (Wanna Talk?) - 05:00, 26 June 2008 (UTC)

Request for BAG membership

Per the bot policy, I am making this post to inform the community of my request for BAG membership. Please feel free to ask any questions/comment there. RichardΩ612 Ɣ ɸ 16:09, 5 July 2008 (UTC)

pywikipedia

owners' of pywikipedia robots please run script table2wiki.py on this page:Chandra X-ray Observatory Amir (talk) 13:34, 8 July 2008 (UTC)

That should be converted to an infobox not a wikitable, I'll do it later today. BJTalk 13:43, 8 July 2008 (UTC)
checkY Done BJTalk 13:59, 9 July 2008 (UTC)

status_line of "302 Moved Temporarily"

Greetings, all. I'm having a bit of a bot issue, and I was hoping someone could help. When I use LWP to follow a link to an older New York Times article, such as [47], I have no problems. But when I try to follow a newer article, such as [48], it fails to read the html content. Instead it returns a status_line of "302 Moved Temporarily". When I put the same URL in my browser, it retrieves the content with no problems. Does anyone know how I can retrieve the html in LWP? Thanks, – Quadell (talk) (random) 00:12, 9 July 2008 (UTC)

(Question moved from Wikipedia talk:Bot requests)
The NYTimes uses cookies to force people to log in after some criteria is met. BJTalk 12:55, 9 July 2008 (UTC)
Aha! This O'Reilly tutorial says "A default LWP::UserAgent object acts like a browser with its cookies support turned off. There are various ways of turning it on..." I'll try that. Thanks! – Quadell (talk) (random) 13:14, 9 July 2008 (UTC)

TinucherianBot

Moved from Wikipedia talk:Bots/Requests for approval

The TinucherianBot Special:Contributions/TinucherianBot is behaving rather destructively catagorising many, many articles incorrectly. See User_talk:TinucherianBot for some recent complaints. This bot should be switched off immediately but I don't know how to do this. I think the number of false positives this bot is producing is not acceptable but I don't know how to go about getting it shut down, most bots have those big red emergency shut down buttons, this bot appears not to have one so I am posting here in the hope that someone can sort this bot out. Jdrewitt (talk) 08:03, 4 July 2008 (UTC)

Blocked. MaxSem(Han shot first!) 08:40, 4 July 2008 (UTC)
Oh come on, Dont block the bot for a bot request given to me... Please see this for the discussion ... I have been asked by the WP:FOOD WikiProject Food and drink members for the tagging of articles for the project . I gave them the entire category tree and I got it 'cleaned' by them (See this ). I have also tried my level best to remove any unwanted categories. What should I do further ??? -- TinuCherian (Wanna Talk?) - 08:55, 4 July 2008 (UTC)
Glass production was tagged because it was wrongly categorized under Category:Wine packaging and storage and Potassium bisulfite was categorized under Category:Food additives . This is no fault of the bot... I request you to kindly unblock the bot -- TinuCherian (Wanna Talk?) - 09:04, 4 July 2008 (UTC)
I wouldn't say Glass production was wrongly categorised as Category:Wine packaging and storage. I think the presumption that anything that is in the wine packaging category belongs in the Food and Drink category is what is wrong. There are many, many issues with the bot which have been brought up by users on your talk page and despite your efforts these false positives are still getting through. The bot should not categorise every single page that might be remotely related to food and drink. Jdrewitt (talk) 09:33, 4 July 2008 (UTC)
Additionally, I am not sure how bots are usually policed, but to allow a bot to carry on its merry way expecting users to flag the false positives doesn't seem right to me. What about the articles which slip through the net. Jdrewitt (talk) 09:35, 4 July 2008 (UTC)
The bot was ONLY servicing a bot request and was done in good faith... Project banner Tagging and Assessing articles are an important part of the workload of most, if not all WikiProjects. Howover it is tedious to keep track of newer articles that come under the scope of the the project regularly and add the project banner manually. Hence TinucherianBot was employed to run over the relavent categories of Category:Foods and Category:Beverages etc. The bot was instructed to tagg these articles upon consenus from WikiProject Food and drink. Maximum caution and careful attention was done to avoid any wrongly tagging any categories , but mistakes may happen...These are mostly due to somebody miscategoried the page. It is obvious to have say 1-2% of error when you are tagging thousands of pages. A WikiProject is a collection of pages devoted to the management of a specific topic or family of topics within Wikipedia; and, simultaneously, a group of editors that use said pages to collaborate on encyclopedic work.The idea of Wikiprojects is to identify the articles that falls under its scope and help to improve them and not to disrupt them. Nothing happens more than collaborative efforts from more interested and experienced people by adding an additional project banner on its talk page. -- TinuCherian (Wanna Talk?) - 09:55, 4 July 2008 (UTC)
Tinucherian: See my response to this at your talk page. Let's try to keep this discussion in one place.
--David Göthberg (talk) 10:27, 4 July 2008 (UTC)
The reason, I commented here is the requested of the block have made a public announcement here and it is my duty to reply here also -- TinuCherian (Wanna Talk?) - 11:14, 4 July 2008 (UTC)

A block is ok to stop the bot from making further edits, but at this point I think it is safe to assume that any problems have been fixed, and that the bot should be unblocked, and allowed to continue the task. There is no need for an indef block --T-rex 15:29, 4 July 2008 (UTC)

I am not so sure if TinucherianBot should be allowed to operate in the future. The WP:FOOD request was obviously flawed and a responsible bot owner would have noticed this immediately. (See also Wikipedia_talk:WikiProject_Food_and_drink#Poorly_thought_out_tagging). Cacycle (talk) 17:08, 4 July 2008 (UTC)
We seems to run into trouble often with WikiProject tagging bots. I believe we already have an informal rule that going through categories recursively is bad. Perhaps we need to codify some rules to avoid problems like this. As for resuming the task, I would agree that it should not resume. If it does, it should use a list of pages, not categories. The comments by the operator I've seen do not give me any confidence. If lists really are hand-checked, error should be far less than 2%. Many of the pages were not incorrectly categorized - the bot was running on categories where not all the pages were directly food related such as Category:Chicken and Category:Wine regions of Canada. Mr.Z-man 20:08, 4 July 2008 (UTC)
This conversation is happening all over... The dangerous thing wasn't just the poor choice of categories, it was also the indiscriminate tagging of all articles in sub-cats. Category:Restaurants => Category:Dining clubs => Category:Traditional gentlemen's clubs => Athletic Club of Columbus. These are not miscategorized. If the bot operator does not know how subcategories work then they should not operate a bot. JohnnyMrNinja 00:14, 5 July 2008 (UTC)
I personally think that somthing should be in place to stop this sort of thing from happening again. I think that before each bot run that tags talk pages with WikiProject banners a specific task approval should be submited to BRFA with a full list of articles and catagories. Currently bot operators may request for approval to tag on request and then can just go ahead with tagging on request, with no one really looking over what is being taged and no apparent community concensus. I think by allowing people to go through lists and being able to comment will avoid disasters like this. Any comments? Printer222 (talk) 02:19, 5 July 2008 (UTC)
Please check the new rule that I have just added to Wikipedia:Bot_policy#Restrictions_on_specific_tasks. Running a tagging bot on a list of categories (and even worse, recursively on sub-categories) makes no sense whatsoever. A bot based on this scheme should have never been approved and its rights should be revoked immediately. User:Tinucherian lacks a basic understanding of the categorization system and was not willing to learn from previous fiaskos as can be seen on User_talk:TinucherianBot. Cacycle (talk) 04:33, 5 July 2008 (UTC)

Better double check that woody. CWii(Talk|Contribs) 04:36, 5 July 2008 (UTC)

It should also be noted you really can't just "Add rules" whenever you feel like. These things need to be discussed beofre going on a policy page. §hep¡Talk to me!
Sorry, I will add it to the discussion page. Cacycle (talk) 04:40, 5 July 2008 (UTC)
Please comment under Wikipedia_talk:Bot_policy#Category_traversing_for_tagging. Cacycle (talk) 05:01, 5 July 2008 (UTC)
Please don't touch my comments. CWii(Talk|Contribs) 05:02, 5 July 2008 (UTC)

Explanation by the Bot Operator

Comments and Appeal to All fellow Wikipedians by the bot Operator
Before everyone slaps me for being a 'reckless and brainless' bot operator, Allow me a chance to explain what had happened. I request everyone to patiently read this completely before jumping all around to shoot me. TinucherianBot is a AWB and Kingbotk Plugin based Bot, which can tag for WikiProjects based on 2 options - Making list from a Category or from categories recursively. While applying for the BRFA itself, I make a comment as follows "To be safe it will not run for categories recursively. The requester has to provide the end node category(s) for the article list collection."...Altough there are many other AWB based bots, I dont think anyone else would have made such a thought full comment while applying for BRFA itself. The reason is pretty simple: You might not expect Category:World War II to be a subcategory of Category:Thailand, but it actually is, and a bot will find it! It is usually safer to give a complete list of categories that should be worked through individually, rather than one category to be analysed recursively. I am not sure whether every bot operator takes each bot task as seriously as I do ( See User:TinucherianBot#Current_Tasks. I even maintain a page to record everything for each bot task I do , See example of WP:THAI Tagging for User:TinucherianBot/Autotagg/Thai on request of Badagnani.

Project banner Tagging and Assessing articles are an important part of the workload of most, if not all WikiProjects.However it is tedious to keep track of newer articles that come under the scope of the the project regularly and add the project banner manually. Hence bots are employed to tag articles based on categories ( You may see this is one of most common bot requests). The idea of Wikiproject tagging is identify and give the articles to the Wikipedia article editors whih a common interest and expertise. Nothing happens more than attention and contributions of more subject experts by additional tagging by a project, which is primarily needed for the growth of Wikipedia. If you are more concerned of cluttering of talk page , we have options of {{WikiProjectBannerShell}} which takes up very less space. It is also sad to see Some Wikiprojects members trying to 'own' articles by preventing tagging by another project , which is against one of our fundamental rules of WP:OWN. Wikiprojects like WP:FOOD have very strong assessment task force members who could remove unnecessary banner tags during manual assessment.

Now let me 'briefly' explain the WP:FOOD Tagging Issue. I am "_NOT_" claiming irresponsibility on what my bot does or did. You have no idea how much pain I took to carry out the request to add tag for WP:FOOD articles.It started when Badagnani , a very senior and respected Wikipedian, made a Bot request for WP:FOOD. See Wikipedia:Bot_requests#Category:Desserts_by_country . He had also said "A few of the above subcats have subcats of their own, and if the talk pages of the articles in those could be tagged as well, it would be great..." I could have used the Category(Recursive) option of AutoWikiBrowse(AWB) on Category:Desserts_by_country and just sat back. But I painfully collected all sub relavent categories in it and run only on them. You can see this activity on the above link. I did the same for Wikipedia:Bot_requests#Category:Desserts_and_Category:Salads and Wikipedia:Bot_requests#Food_tagging_request requests by him. When I in doubt , I always get back to clarify the requester as in Wikipedia:Bot_requests#Food_tagging_request . You can see me asking him "Should I include resturant subcategories like this Category:Hamburger restaurants ?? -- TinuCherian (Wanna Talk?) - 04:43, 2 July 2008 (UTC)" and "Sorry to ask again... I am a bit confused here... Does restaurant chains falls under WP:FOOD and need to be tagged with {{WPFOOD}} ? -- TinuCherian (Wanna Talk?) - 05:18, 2 July 2008 (UTC)" . I had then made a seperate subpage for me to archive this request User:TinucherianBot/Autotagg/WPFOOD.

It was then , I was approached by Jeremy (Jerem43) , a prominent member and assessment task force senior member of WP:FOOD with this request . Having successfully and satisfactorily completed this request also, He came up with a newthis request which was a bit aggressive.

""I have decided to get everything in one fell swoop, so this request is rather large.
Could you hit these next:
  • Category:Beverages
  • Category:WikiProject Herbs and Spices
  • Category:Foods
Again, tag everything in these main categories as well as all sub-categories except Category:Fictional foods. Thank you again, --Jeremy ( Blah blah...) 16:12, 2 July 2008 (UTC)""

I _DID_NOT_ blindly go ahead with using the Category(Recursive) option on Category:Foods on AWB . I told him this :

Important note  : This is a huge effort that involves around 1500 categories and approx 24,000 unique articles. There might be wrong or misplaced cats in the subcat tree. I have collected all the subcats in Category:Foods and made this list . I want the members to carefully verify the entire list and remove ALL unwanted cats and give me a final go ahead. Then TinucherianBot will start tagging the articles in the approved categories...It is a pleasure working for this project -- TinuCherian (Wanna Talk?) - 06:08, 3 July 2008 (UTC)


As you see , I then made this list : User:TinucherianBot/Autotagg/WPFOOD/Category:Foods myself and revised it several times myself eliminating the possibly wrongly tagged ones (with my limited knowledge of the subject matter) , which is evident from the page history . I handed over this list to the ALL the WP:FOOD members at their talk page here: Wikipedia_talk:WikiProject_Food_and_drink#Bot_Tagging

Important note  : Jeremy had asked to tagg articles in Category:Foods and its subcategories... This is a huge effort that involves around 1500 categories and approx 24,000 unique articles. There might be wrong or misplaced cats in the subcat tree. I have collected all the subcats in Category:Foods and made this list . I want the members to carefully verify the entire list and remove ALL unwanted cats and give me a final go ahead. Then TinucherianBot will start tagging the articles in the approved categories...It is a pleasure working for this project -- TinuCherian (Wanna Talk?) - 06:06, 3 July 2008 (UTC)
I cleaned up the list and reduced it to 1256 categories... Please have a look -- TinuCherian (Wanna Talk?) - 06:54, 3 July 2008 (UTC)
I created a quick list at Wikipedia talk:WikiProject Food and drink/Exclude, about 122 categories in all. Any one else please take a look! --Jeremy ( Blah blah...) 07:27, 3 July 2008 (UTC)
Thank you... I did further cleanup and made the final list. The bot is preparing the article list now -- TinuCherian (Wanna Talk?) - 09:36, 3 July 2008 (UTC)

As you see, everything was done me with maximum caution and attention and in good faith in the best interests of Wikipedia. I even took the extra step of leaving a summary on each and every talk page as to what we are doing so that we can get feedback that will let us know what step we will need to alter the way we go about this to ensure this is done right.

Kindly be aware and understand that I didn't run the Bot blindly and recursively on the Category:Food , but created a list of categories from main list, removed the possible wrong categories from them ( with my limited knowledge on the subject matter ) ,gave the list to the project members and got it further cleaned . It was then I created the article list by manually supplying only the 'approved' categories....and finally running the bot over the talk page of the articles ...

Much of the errors was due to mis categorization like Potassium_bisulfite included in Category:Food additives which anyone would think as a category that falls under WP:FOOD. Having said this , we are not rejecting the fact that we should have paid much further attention and caution while selecting the categories. Whether you all ask us or not, as responsible Wikipedians we will go around and cleanup up the 'mess' we have done. What should be understood by fellow wikipedians is that we are also responsible wikipedians with a good history of contributions. We are not vandals or disruptive editors

I am also a wikipedian who is credible and with integrity , hardworking for Wikipedia for the betterment of all , like all other people like you..."To Err is human and to forgive is divine" . I do apologize for all the incovineance and request you to help me continue with my bot. I work for and is coordinators for some of bigger Wikiprojects like WP:Christianity , WP:INDIA etc where the services of my bot is very essential. Hence blocking my bot indefinitely is also unfair.I promise to handle and 'tame' my bot more diligently and carefully in future.I appeal to all the admins and fellow Wikipedians to understand my good intentions and also unblock the bot TinucherianBot -- TinuCherian (Wanna Talk?) - 08:16, 5 July 2008 (UTC)


  • Comment - I concur with this summary and believe an unblock is certainly warranted, for one of our best and most faithful Wikipedians, keeping in mind the following: when I first asked for a bot request, the brakes were put on my very wide-ranging request, and after listening to what I was told by several experienced bot operators, I moderated my request, carefully checking and asking for just a few cats at a time. In the future, could we agree to do this when we get such overly large requests from editors who don't know about things like the WWII / WPTHAI example given above, and the fact that these things *will* happen unless we do proceed with utmost care? Badagnani (talk) 08:41, 5 July 2008 (UTC)
Unblocked, with a hope that next time much more attention will be payed to category selection, and that Tinucherian realises now that bot should be stopped first, fixed next. MaxSem(Han shot first!) 08:49, 5 July 2008 (UTC)
The bot should not have been unblocked. It is still under heavy discussion. I checked, and now it was tagging counties and films! Pretty much every populated area in the world is producing food, I don't think we should food tag all the counties in the world? And most movies contain some scene with foods.
So, I have blocked the bot again.
--David Göthberg (talk) 10:06, 5 July 2008 (UTC)
Whow ! Why was the bot blocked again now by User:Davidgothberg :( ??. The bot was not even running since yesterday when it was blocked( I had stopped it more than 24 hours ago) . See Special:Contributions/TinucherianBot -- TinuCherian (Wanna Talk?) - 10:09, 5 July 2008 (UTC)
Oops, I just noticed that myself. I had mixed up the dates. Right, you are not running the bot now. But just because it doesn't run doesn't mean it shouldn't be blocked. The bot is still under heavy discussion. And as I have stated earlier: You should contact all the affected WikiProjects and give them a week to check the list. And as far as I understand neither you nor any one else have checked the list of article names that your category list results in. It seems you are still planning to blindly tag all articles in any food related categories. Thus, I will leave your bot blocked until these concerns have been handled.
--David Göthberg (talk) 10:20, 5 July 2008 (UTC)
Sir, This is totally unfair. We discussed all the issues here and everywhere about previous issues of running this bot and decided on future course of action on running any bot on categories. I was not even thinking of running the bot for any project for some time now ( I am demoralized enough for now) , let alone WP:FOOD. Then how could you blindly accuse me of "still planning to blindly tag all articles in any food related categories ". -- TinuCherian (Wanna Talk?) - 10:28, 5 July 2008 (UTC)
Well, as I have told you before: I can not know what you think, I am not a mind reader. And if you are not going to run the bot, then why are you so upset that it is blocked?
From the comments I have seen on this page and your talk page and Wikipedia talk:WikiProject Food and drink it seemed you were going to run the bot again, after only removing some more categories from the bot's category list. That is far from enough and not even close to the fixes we were asking you to do. One of the things I am asking for is that you let the discussions and checks take their time. That is, that you wait a week. No matter what fixes you do, I can not accept that the bot runs sooner than that. You have to get people a chance to look into the matter. You can not cheat time. So I will at a minimum leave your bot blocked for 5 more days, since that means about 7 days since you first announced the list. But you also have to do the other fixes.
Your new statement that you are not going to run the bot for the time being doesn't change my decision. Since you still show no sign of understanding how these kinds of bot runs should be handled, I have to leave your bot blocked. Since from what I have seen, the next time you do a run for some other project you are likely to repeat your mistakes.
So I suggest you re-read what people have written about your bot runs and then sit back for some days thinking about it. You might learn something. Then decide on how you want to run your bot in the future, then describe that in detail and let other people look at your description. Then we might unblock your bot. But as I said, I see no reason to unblock the bot within 5 days, no matter what you do, since these things must be allowed to take their time.
--David Göthberg (talk) 11:01, 5 July 2008 (UTC)

On the contrary, the mistakenly tagged articles have been actively sought out and fixed, by several editors including User:Tinucherian. Badagnani (talk) 11:03, 5 July 2008 (UTC)

Davidgothberg , I had enough ! Thanq ! With all respect to you , I should say either you fail to read my words I say or you cant read dates and time from Wikipedia. Where from this page and my talk page and Wikipedia talk:WikiProject Food and drink it seems that I am going to run the bot again, after only removing some more categories from the bot's category list ? -- TinuCherian (Wanna Talk?) - 11:15, 5 July 2008 (UTC)
  • Comment/query - I am also leaning in favor unblocking the bot immediately. I think all parties have learned that they were a bit overzealous and that to tag more than a few hundred articles at once is inviting disaster. Tinucherian, what's the next task that would be assigned to the bot? – ClockworkSoul 13:01, 5 July 2008 (UTC)
Nothing much as of now, probably delivery of newsletters of this month of WP:Christianity and WP:INDIA which is in progress -- TinuCherian (Wanna Talk?) - 13:06, 5 July 2008 (UTC)
I seriously doubt that the chance of any repeats after this much negative attention for one (huge) task. JohnnyMrNinja 14:09, 5 July 2008 (UTC)
While I agree that reblocking was unnecessary, I'm still concerned by the summary above. From a quick read of the article, Potassium bisulfite is not miscategorized into Category:Food additives. It appears that it is simply a category where the bot should not have been run over all the pages. Mr.Z-man 15:58, 5 July 2008 (UTC)
May be it is misjudgment from me for that particular category, I leave it to the requester and the members of the WP:FOOD to analyse whether the category falls under their project.Anyone would normally think that Category:Food additives is under WP:FOOD project . A bot operator can always go an extra mile to help and remove any wrong cats from the list he was given, but primarily he has to take in faith the analysis of the requester or the project members. A bot operator is knowledgeable on how to run bots only but not necessarily a subject expert on the scope of a project. All he should ( and could ) is to ask them to carefully analyze the cats before the bot run , which I had done, as evident from the WP:FOOD Talk page. Having said this, I am not claiming any irresponsibility of his bot action by the bot operator but we should believe him for things done in good faith.-- TinuCherian (Wanna Talk?) - 05:17, 6 July 2008 (UTC)
Much of the category probably does fall under the project, however, some articles in it do not, which is where the errors come from. Wikipedia's category system is not set up so that you can reliably say "every page in this category is relevant to this project" for every category in a list. Mr.Z-man 15:24, 6 July 2008 (UTC)
If Potassium bisulfite is a food additive, then it probably should be tagged for WP:FOOD --T-rex 19:17, 6 July 2008 (UTC)

Re-Block by Davidgothberg

The re-block by User:Davidgothberg was inappropriate. It is uncalled for and contrary to our blocking policy. Blocks are to be preventive and not punitive. This one definitely seems punitive by an involved administrator. Tinucherian appears to be listening to the feedback given him and should be granted AGF. The suggestion that the bot has to blocked a specific time ("That is, that you wait a week. No matter what fixes you do, I can not accept that the bot runs sooner than that.") is unreasonable. I would also note that the Davidgothberg did not block for a week, he blocked indefinitely. -- JLaTondre (talk) 14:40, 5 July 2008 (UTC)

I agree. Does anyone really think that TinuCherian is going to run the bot again without fixing it first? Yes, a few mistakes were made, but nothing that isn't easy to fix. I fully trust that the next time this bot runs, that these issues will have been addressed --T-rex 15:12, 5 July 2008 (UTC)
Yes i think that some people need to go and read WP:FAITH again. A mistake was made, and im sure the user has learned from the mistake. You shouldn't assume that the user intends on running the bot before fixing mistakes. I personally see the bot being blocked further as a form of punishment, theres no need for the bot to be blocked any longer. People learn from their mistakes and now we need to get over it. Printer222 (talk) 15:19, 5 July 2008 (UTC)
No, the block is not punitive, it is preventive. I first ran into Tinucherian and TinucherianBot some week ago when he was masstagging articles with for Wikipedia:WikiProject Computing. Pretty much the same thing happened then. We unblocked him then since he claimed he would not masstag articles like that again, and be more careful next time he uses his bot. But as you can see his claim was empty.
Here is the things I think TinuCherian needs to do if he intends to continue tagging talk pages again:
  • Create a description of what he intends to do and a list of articles that he intends to tag. So others can help out and check the list. (Not done, he only listed categories not articles.)
  • Manually look through the article titles in that list and remove any false positives. (Not done.)
  • Contact related WikiProjects and tell them what he plans to do. (He only contacted 1-2 WikiProjects.)
  • Wait 7 days after he announced what he intends to do. (Not done, the 7 days have not yet passed.)
  • Only do the bot run if he gets consensus for it. (Not done.)
  • When he restarts the bot, run it slowly to minimize the damage if something goes wrong. (The last two times he kept the bot running in spite massive protests on his talk pages.)
  • Change the message that the bot leaves on tagged talk pages. That is, not use links in the heading title and not claim things that are incorrect in the message.
TinuCherian has not stated if he is going to do any of these things or not. So I can not see that he is "listening to the feedback". This is what I have seen him claim so far:
"I promise to handle and 'tame' my bot more diligently and carefully in future."
That's pretty much the same thing he said the last time, and it doesn't tell if he is going to do any of the things we ask him to do. Thus I can not unblock his bot.
--David Göthberg (talk) 15:56, 5 July 2008 (UTC)
I agree with David, the approach that TinucherianBot currently uses is fundamentally flawed. It simply cannot work on Wikipedia. Moreover, Tinucherian has not realized this yet and obviously lacks an understanding of how the categorization system on Wikipedia works. The current category-based bot should stay blocked indefinitely to prevent otherwise inevitable future chaos. Cacycle (talk) 16:50, 5 July 2008 (UTC)
Cacycle: As has already been explained to you at Wikipedia talk:Bot policy, category based tagging is the normal mode and works well in most cases. TinucherianBot problem was not category-based tagging, but recursive subcategory-based tagging. -- JLaTondre (talk) 18:20, 5 July 2008 (UTC)
David: You are trying to dictate personal preferences via a block. That is not the purpose of a block. If you want to see changes in bot tagging policy, than you should suggest those changes and get agreement. There is an ongoing discussion at Wikipedia talk:Bot policy. You should join it instead of misapplying the block button. -- JLaTondre (talk) 18:20, 5 July 2008 (UTC)
JLaTondre, As I had said earlier, I didn't run the Bot blindly and recursively on the Category:Food (upon request by the WP:FOOD member) , but created a list of categories from main list, removed the possible wrong categories from them ( with my limited knowledge on the subject matter ) ,gave the list to the project members and got it further cleaned . It was then I created the article list by manually supplying only the 'approved' categories....and finally running the bot over the talk page of the articles ...The only problem ( _NOT_ a small one ) , attention given by me or the requester to eliminate the wrong categories was not enough. I have admitted that.
David , May I point you to WP:POINT . Upon my explanation, MaxSem , another admin, unblocked the bot...Just after that you blocked the bot again twice . Sadly, You didnt even care to see whether the bot is still running. I am sorry to say this.. but from the reasons you have given, it is evident that the purpose of blocking again and again is not the reason but your own personal thinking . With all respect to you , I appeal to you to kindly don't abuse the powers and trust we have on Admins..Even when others say to keep the bot unblocked, I must say you are unwilling to listen to others or accept any consensus. With huge regret , I give up.You may do what ever you feel is right. Neither me nor others can convince you -- TinuCherian (Wanna Talk?) - 04:57, 6 July 2008 (UTC)
David, I understand your concerns. But asking for the individual articles to be listed at somepage and manually checked is not the norm. If this is the case, the human might as well go ahead and tag those pages. A bot is not needed. Project tagging was always done on selected categories. There are going to be a few articles that may be false positives that will need to be fixed. If you follow all the requests at the WP:BOTREQ, they are based on category tagging. Please check User:WatchlistBot which used a similar method. I hope you could re-consider your request to manually check individual articles instead of categories. The bot problems here were due to tagging using recursive sub-categories, but this example should not used to further complicate the process. Please let me know if I need to further explain. Thanks, Ganeshk (talk) 12:10, 6 July 2008 (UTC)
Concur with Ganeshk: we shouldn't require humans to do bot's work. Careful examination of all categories yielded by recursive search has proven be more than enough. And since the bot hasn't edited after my unblock, David's block was definitely not needed. MaxSem(Han shot first!) 13:57, 6 July 2008 (UTC)
Boy, what a mess. Let's clean up. Bots doing recurses that are approved by group are okay. Bots doing categories are okay. That's the consenus. You've abused your powered. That's the fact. CWii(Talk|Contribs) 17:41, 6 July 2008 (UTC)
Fellows, could we please stick to the process. Per Wikipedia:Bot policy, a bot can be blocked on the following occasions (emphasis mine):
Administrators may block bot accounts that operate without approval, operate in a manner not specified on their approval request, or operate counter to the terms of their approval (for example, by editing too quickly). A block may also be issued if a bot process operates without being logged in to an account, or is logged in to an account other than its own.
It seems that none of these conditions are met here, since the bot's task is in fact approved. So, if you disagree with the approval, you may appeal against it (such as here). I trust the bot operator not to run this specific task (the bot also has others) while it is under discussion here. Somebody should perhaps formally re-open the approval request. But whether or not the task is approved / reapproved, or which restrictions will be put onto it, is a matter that the WP:BAG should decide, and not a single user on his sole discretion, even if he's got the admin buttons. --B. Wolterding (talk) 18:50, 6 July 2008 (UTC)
It's very simple: The bot caused a lot of problems, so it was blocked. I and other users asked questions to the bot owner and even suggested ways to fix the problem. But instead of answering the questions and telling how he plans to fix the problems Tinucherian spends his time repeatedly demanding to have his bot unblocked and complaining, saying things like "why do you hate me like this" and "we are also responsible wikipedians with a good history of contributions". And this has happened before with Tinucherian.
The first partial explanation by Tinucherian in what way he plans to run his bot in the future is his comment above dated (04:57, 6 July 2008). But Tinucherian, you still have not responded to all the questions/suggestions. I listed seven points in my previous comment, I am still waiting for an answer for most of them.
For instance:
  • You state in your message above: "gave the list to the project members and got it further cleaned". As I have stated repeatedly, just posting the list a couple of hours before you do the run is far from enough time. And that was a list of more than 1000 categories! You have still not responded if you are going to give people more time to check the lists in the future, and how long time you will give people?
  • And will you contact related WikiProjects before you do such runs? (Of course, you can and probably should ask the person requesting the bot run to do such contacting, but you should check that it has been done. But will you see to that?)
As long as the bot owner haven't responded to the questions and concerns raised and doesn't answer the questions we ask him, then the bot should be blocked. Since then we can we can only guess about how he is going to run the bot, and this shouldn't be a guessing game. (Well, if I have to guess, then I have to go on previous experience and the vague statements I have seen from Tinucherian, and then my guess is that he will continue to do the same mistakes. Thus the block is preventive.)
Ganeshk: No, no one has asked him to check the articles, we just ask that he (or more correctly the ones requesting the bot run) checks the article titles. That doesn't involve loading any articles, that just involves reading the article titles listed in each category to see if they sound like they belong in the tagging run or not.
B. Wolterding: As you yourself so well stated at Wikipedia talk:Bot policy#Category traversing for tagging: "Regarding the Wikiproject tagging: Articles are assigned to a Wikiproject because that project is supposed to work on them. If so many articles are added that the project can't even go through the list of titles due to lack of capacity, then something is terribly flawed I think."
Oh, by the way, the Wikipedia:Bot policy says that: "bots doing non-urgent tasks may edit approximately once every ten seconds". That's 6 edits per minute. But TinucherianBot was doing 8 edits per minute. So I still very much would like to hear from Tinucherian if he will set his bot to run slower the next time?
--David Göthberg (talk) 02:37, 7 July 2008 (UTC)
If someone had raised the timing issue with him previously I'm sure he would have already addressed it. 6v8 is really splitting hairs now isn't it? --AdultSwim (talk) 02:44, 7 July 2008 (UTC)
AdultSwim: Oh, it has been raised before, see for instance the sixth point in my message above dated (15:56, 5 July 2008). But as you can see, Tinucherian has not bothered to comment on that.
--David Göthberg (talk) 03:23, 7 July 2008 (UTC)
And the bot has not edited since 08:15, 4 July 2008. Please see word 10 of my statement above dated 02:44, 7 July 2008. (Here's a hint its 'previously') --AdultSwim (talk) 03:33, 7 July 2008 (UTC)
Davidgothberg: I still think that you need to assume good faith. Assume that he has learnt from this mistake and that he endavours for it not to happen again. From what i can see you are assuming bad faith. This may not be your intention but this is how it comes accross. Printer222 (talk) 03:41, 7 July 2008 (UTC)

I support unblocking, which feels punitive in this instance -- an exercise in public shaming. I also don't support David Göthberg's proposed changes to the bot's authorization. Here are a few reasons why:

  • Why should any single arbitrary length of time be required for reviewing categories? Should every single project have to wait a week, even if the list of categories is extremely short, just because you're upset over problems with WP:FOOD's decision to tag a thousand categories' articles at once? If it takes me ten minutes to review the list of categories, why should I have to wait another six days, twenty-three hours, and fifty minutes to get the job started? Why not just wait until some member of the project actually reviews them and says the list looks okay?
  • Why should any project have to get "permission" from other projects to tag articles? What possible legitimate reasons do you think WikiProject Pharmacology (for example) could have for objecting to WikiProject Anatomy (for example) tagging articles? Why do you think they would even care?

I've assessed some ten thousand articles, BTW. So far, I have found more articles inappropriately tagged due to individual human editors than due to bots. I would be very sorry to see someone declare that WPMED wasn't allowed to use a bot to tag whatever subcategories of Category:Diseases seemed reasonable to the members of WikiProject Medicine. WhatamIdoing (talk) 05:28, 7 July 2008 (UTC)

Printer222: No, we shouldn't have to assume anything. As the Wikipedia:Bot policy says: "Bot operators should take care in the design of communications, and ensure that they will be able to meet any inquiries resulting from the bot's operation cordially, promptly, and appropriately. This is a condition of operation of bots in general."
As long as Tinucherian refuses to respond to the questions and concerns raised by me and other users, we have to choose the more safe option of not letting his bot run.
It seems that most of you who are complaining about this block don't understand that blocking bots is not like blocking users. We block bots first, then investigate, since bots can do way more damage than users in a short time. Actually, that is one of the reasons why we have separate accounts for bots. And since Tinucherian refuses to respond it means the investigation is not yet finished.
A bot owner should not be upset that his bot gets blocked and start complaining, he should instead get curios why it was blocked and try to resolve the problem.
--David Göthberg (talk) 05:34, 7 July 2008 (UTC)
I still very much would like to hear from Davidgothberg if he will assume good faith next time? --AdultSwim (talk) 06:59, 7 July 2008 (UTC)

Reply by Tinucherian:

David, It is true I said on your talk page , that "Why do you hate me ? " ..I was always cordial in our conversations ( which it is evident from your talk page or here) but this came out my deep demoralization from your persistent attitude...I really have gr8 respect for you as a senior Wikipedian...If I have hurt you in any manner, I apologize.. But you seem to NOT at all listen to what I'm trying to say and coming up with newer and newer reasons for blocking the bot. The latest example is timing issue which is just techinical correct. From a snip from the very first page of the contribs page :

13:39, 4 July 2008 (hist) (diff) Talk:Romanian wine‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)
13:38, 4 July 2008 (hist) (diff) Talk:Portuguese wine‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.)
13:38, 4 July 2008 (hist) (diff) N Talk:Zielona Góra Wine Fest‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.)
13:38, 4 July 2008 (hist) (diff) Talk:Romeo Bragato‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)
13:38, 4 July 2008 (hist) (diff) Talk:New Zealand wine‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.)
13:38, 4 July 2008 (hist) (diff) Talk:Two Paddocks‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)
13:38, 4 July 2008 (hist) (diff) Talk:Moldovan wine‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)
13:37, 4 July 2008 (hist) (diff) Talk:Strade dei vini e dei sapori‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)
13:37, 4 July 2008 (hist) (diff) Talk:Hungarian wine‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)
13:36, 4 July 2008 (hist) (diff) Talk:Saperavi‎ (WP:FOOD Tagging ! ( False Positive ?? ) : (Plugin++) Added {{WikiProject Food and drink}}.) (top)

There are just 2 edits in the 37th minute and 6 edits in the 38 th minute. Technically yes, it may have risen to some 8 edits per minute bcoz of varying internet speed. so if you see, on an average the bot operates in permissible speeds only. If this also concerns you, I can adjust the speed further. I request you to kindly read Wikipedia:Gaming_the_system also, which refer to following an overly strict interpretation of the letter of policy to violate the spirit of the policy

Your comment "But as you can see, Tinucherian has not bothered to comment on that." is very rude... Please understand that we are all geographically situated in different timezones. Your post regarding the timing issue came in deep night time in my country, you should atleast 'allow' me some time to sleep. Being a non-native english speaker, I have tried to explain clearly and exhaustively to my level best , but still accusing me of 'being vague' in my answers is sad.

I will summarize your concerns and questions :

  • Q: Will Tinucherian will control and tame his bot in future and won't try anything stupid ?
    • A: Yes. :)
  • Q : Contact all concerned parties or wikiprojects before running the bot ?
    • A:Yes. I have and will try my best. I will also give enough time for people to think over on 'my possible actions' ...
  • Q:Do I need to give 7 days from a bot request to bot run ?
    • A:It is crazy , but you are free to discuss this at Wikipedia talk:Bot policy and if this is made a policy , we will all comply to it ...
  • Q:....Manually check ALL the categories before running the bot?
    • A:Yes, to my level best. Well this may be not be 100 % fool proof... Many of times I also have to take in faith of the bot task requester and the Wikiproject members. As I had said earlier, A bot operator may not be a subject expert on the scope of a project. He is simply here to help other wikipedians to do tasks that are tedious to be done manually. He will have to rely on the requester's judgement...
  • Q:Manually look through the article titles in that list and remove any false positives. ?
    • A:Wow! This is very tedious task ...I would rather hope the requester do it....
  • Q:Only do the bot run if he gets consensus for it. ??
    • A:The bot operator has to believe the requester that there is a consensus for it. We as Wikipedians should assume good faith...
  • Q:When he restarts the bot, run it slowly to minimize the damage if something goes wrong. ?
    • A:I will run the bot only in the permissible levels as per BOT policy.
  • Q:Change the message that the bot leaves on tagged talk pages. That is, not use links in the heading title and not claim things that are incorrect in the message. ?
    • A:Can you elaborate a bit ? I have left a message on the talk pages like like this in good faith. I even took the extra step of leaving a summary on each and every talk page as to what we are doing so that we can get feedback that will let us know what step we will need to alter the way we go about this to ensure this is done right. Is there a Talk page guideline on using links in the headings ?? If it concerns you this much, I will care of this also . But I request you to Not to state your personal preferences as the policy ...
  • Q:Will Tinucherian guarantee that he will Never make mistakes ?
    • A: No... Can any man gurantee this ? We all take positive feedbacks very seriously and use them to avoid possible mistakes in future. This doesnt mean I or even you will NOT make any mistakes at all in future.. Man, I must be God for this....:) ...All I ( and anybody else ) can promise is that we wil all try to do the best to avoid any mistakes in future.. One thing you should understand is that we DON'T mistakes just for the fun of doing it. 99% of mistakes are unintentional. When some one points a mistake, we take it postively and try to correct if possible...And what If I make another mistake in future ? Just go ahead and Shoot me ! ( Just kidding  :) ) What else can I say, buddy ?

I am sure you will always keeping a check on my bot and me to see if I do " everything right only " in future too ...Hope that will make me very careful in my actions....:)

I am not upset by the idea of the bot being blocked , but your persistent attitude of blocking the bot dragging some reason or another, one after another...

I have tried my level best to address and answer all your concerns and questions, I believe.. If you still have concerns , kindly let me know.-- TinuCherian (Wanna Talk?) - 05:37, 7 July 2008 (UTC)

Awesome. Can we unblock now? Changes that effect every bot that runs this task can be discussed under a new heading, issues with his bot can be taken to his talk page. BJTalk 05:42, 7 July 2008 (UTC)
David Göthberg had valid concerns, Tinucherian appears to have addressed those concerns. It seems appropriate now to unblock the bot. I do hope, though, that Food & Drink do not take the unblocking of the bot as a go ahead to continue inappropriate tagging of Beer articles. SilkTork *YES! 11:07, 7 July 2008 (UTC)
BJ: Well, this discussion has been moved around several times between several talk pages. And Tinucherian has put a big red sign on his talk page stating that the discussion should continue here.
Tinucherian: Ah, finally you answer the questions and suggestions. (Instead of just complaining about the block.) These answers now have to be discussed. It is unfortunate that the only way to get you to communicate properly is to block your bot for several days.
Note that most of the points I bring up above was brought up several times by several editors over the past few days. I just wrote it down as a convenient list in one place. So you have had several days to respond to them, but you didn't.
So, it seems your answers to several of the points are "no":
  • 7 days announcement: Sure, for smaller runs I don't think you need to announce that long. But this was about you feeding your bot a list of about 1100 categories which meant tagging somewhere around 10-100 thousand pages, then I think 7 days is a bare minimum. 7 days is an often used minimum announce time for many things here at Wikipedia since many editors only log in at specific weekdays. Thus 7 days gives most editors a chance to see the announcement. But I take it your answer "It is crazy" means "NO!" ?
  • Checking the lists: You "would rather hope the requester do it"? Hope?!?! If you take on a job you should check that the requester has done the necessary checks. That is, ask him what checks he has done. For these kinds of tasks you should not just "assume good faith". "Assume good faith" really just means we assume that editors don't do bad things on purpose, it doesn't mean we assume that editors don't do mistakes or never forget to check things. If you take on a job you are responsible.
  • Checking if there is consensus: Again, you can not just assume that the requester has achieved consensus for the request. You should verify it, at least if there is any doubt or it is a big request.
  • Running slowly: It seems you are giving contradicting answers to this. First you stated "If this also concerns you, I can adjust the speed further.", then you stated "I will run the bot only in the permissible levels as per BOT policy". Yes your bot runs do concern me and a lot of other editors, so I am asking you to run slow. As in really slow the next time you do a big run, so if things go wrong the damage done will not be so big when you stop the bot (or we block it) next time. Of course, another option is to run the bot in short bursts, whichever is technically easier for you is okay. But it looks better in the logs if you run slow. And about the log entries you pasted: (Your bot was blocked at (08:15, 4 July 2008 UTC) by MaxSem. The entries you pasted seems to be in your local time.) How convenient for you that you copied the snip where your bot did run slower. If one checks the log pages before and after that, then one sees a much higher editing rate.
  • Making the message that the bot leaves on the talk pages better: Actually, I don't bother much about this but several editors complained about the link in the heading of that message, and you failed to comment on that until today. Many editors here at Wikipedia dislike links in the headings. No, I don't think there is a policy about that, but it is unwise to have the link there since many editors dislike it. In that message you also state that "The bot was instructed to tag these articles upon consensus from WikiProject Food and drink". That's not really true, only a very small number of editors were involved prior to the run, and preparation time was very short, and you created and started using that message after you had received lots of complaints. I don't call that a consensus.
So, I am eagerly awaiting your response!
-David Göthberg (talk) 13:51, 7 July 2008 (UTC)
To clarify, then; Are you asking this bot operator to assume that every editor making a bot request of his bot is lying when they represent that they have consensus for a request? I would think that a request including the language "We've discussed the change [[here]], and would like to move forward..." would be simple enough. Maybe that's a change to be proposed at the Requests page. I am also unclear as to the standards of The Bot Policy this bot is failing, despite having read the lengthy replies here. The operator has clearly signaled his intention to exercise more care in the future, and I would advocate an unblock on that basis. UltraExactZZ Claims ~ Evidence 14:22, 7 July 2008 (UTC)
UltraExactZZ: There is a huge difference between assuming bad faith, and just assuming there is a risk that the requester has forgotten or not known enough to do the proper checks before doing the request. For instance, it is fairly common that editors think their request is simple and non-controversial and thus do a bot request without discussing it with others first. And some bot requests are done by editors that are new to Wikipedia, editors that simply lacks the knowledge to do the proper checks. While the bot owner on the other hand is supposed to be an experienced editor who should be able to help the requester getting the job done in the right way. The bot owner is not a bot himself, he should not just blindly perform requests.
And here is one of the parts of the Wikipedia:Bot policy that Tinucherian has been failing: "Bot operators should take care in the design of communications, and ensure that they will be able to meet any inquiries resulting from the bot's operation cordially, promptly, and appropriately. This is a condition of operation of bots in general." Up until some hours ago Tinucherian was not responding to most of the questions and concerns raised by me and other users. And as you can see above I have asked some follow up questions on his late answers. So this investigation is not finished, and thus I can't unblock the bot just yet.
--David Göthberg (talk) 15:04, 7 July 2008 (UTC)
(ec)To go backwards: I viewed that provision of the Bot Policy as referring specifically to concerns about the edits made by the bot, and I note that - per Tinucherian's talk page - he seems to have responded within just a couple hours about why the bot tagged what it tagged. The important bit, in my mind, is that he appears to have been engaged in this discussion, and he did not dismiss concerns, which is what that provision is, I believe, intended to prevent. He responded to criticism and has discussed the bots operation and his actions prior to the problem run. If he did not answer a particular question promptly, he was still answering questions, and - given the volume of discussion on this issue - I can see missing a question in the shuffle. It was also the weekend.
As for good and bad faith, I concur that verifying that there is consensus (or at least no objection) is reasonable. However, I don't think he had any reason to believe that this request was anything other than non-controversial. I would advocate for a requirement to the Bot Requests process that documents prior discussion, and I would even go so far as to require 48 hours of vetting on the category/article lists. But those requirements are not yet in place, and - given that Tinucherian has agreed to take more care with future runs of this type - I don't think an unblock is unreasonable. UltraExactZZ Claims ~ Evidence 15:37, 7 July 2008 (UTC)

UltraExactZZ, As you can see from the very beginning of this discussion and from my talk page , I have been trying to respond and answer to all questions and concerns. We have even setup a centralized section for the bot tagging errors at Wikipedia_talk:WikiProject_Food_and_drink#Bot_tagging_errors . Me and the WP:FOOD members are actively working on this to cleanup all issues. David is dragging one or another law to interpret his judgement and trying to dictate terms on his own thinking . This is inspite of unblock requests by almost everyone like this , 2, 3 , 4 5 6 , 7 8 , 9 10 . I have exhaustively tried to explain each and every question. I have almost given up by now ....Wikipedia is not my full time business either. I am doing all these in between my very busy professional career. But helping and working with Wikipedia with the best efforts...There is no way either me or anyone can convince David. -- TinuCherian (Wanna Talk?) - 15:22, 7 July 2008 (UTC)

Per my comment above, I am highly satisfied with your responses here and elsewhere, and again note that I strongly favor an unblock. My concern, that you'll take care to avoid these issues in future runs, is already addressed - see also Wikipedia_talk:WikiProject_Food_and_drink#Bot_tagging_errors, which will give guidance for avoiding this issue in the future. UltraExactZZ Claims ~ Evidence 15:37, 7 July 2008 (UTC)
Tinucherian: You constantly surprise me. I was expecting (assuming and hoping) you would answer my follow up questions above and then we would be done with this whole thing. So, instead of complaining I suggest you answer my follow up questions above (dated 13:51, 7 July 2008 UTC).
But to make it really easy for you, here are the two most important ones:
1: In my responses above, did I understand your answers correctly?
2: Do you, or do you not agree to run your bot really slowly next time you do WikiProject tagging? As in say 2 edits per minute.
-David Göthberg (talk) 16:53, 7 July 2008 (UTC)
I would submit that, if the list of articles/categories is properly vetted before running the bot, and if the Wikiproject in question has discussed the matter and consensus exists that the tagging as proposed is a good thing™, then I don't think an edit rate lower than that approved by the BAG would have an impact. Most of the issues here appear to be the errors in the list of articles to be tagged, not necessarily the rate at which they were tagged. Garbage in, Garbage out, so to speak. UltraExactZZ Claims ~ Evidence 17:15, 7 July 2008 (UTC)
This is beyond a joke now. You seem to be power hungry and enjoy the fact that you have controll over this situation. He user has said that he will tame and controll his bot in the future and will ensure the lists are checked adequately. It's up to him how he does this and if he doesn't further action will be taken. It's not up to you to dictate the situation, and decide exactly what he has to do. Concensus is that the user will ensure that this sort of thing does not happen again so that the bot should be unblocked. . He's agreed to many of your requests, but thats not good enough for you. You just need to drag it out further. Stop being power hungry and do what is right. Printer222 (talk) 00:05, 8 July 2008 (UTC)
David, earlier you mention 100's of thousands of articles in some runs, then you mention 2 edits per minute. Do you realize that it would take the bot a little longer than 34 days running at 2 edits per minute, 24 hours a day to do 100,000 articles? That is a long time for a single run. The BAG generally go with maxlag values, which can mean the bot can go fast, so long as they aren't causing excessive load on the servers. 2 edits per minute is ridiculously slow. -- Cobi(t|c|b) 00:44, 8 July 2008 (UTC)
Link to ANI Thread: Wikipedia:Administrators'_noticeboard/Incidents#TinucherianBot_and_Issues_with_WP:FOOD_Tagging
MaxSem, Thank you for unblocking the bot. I concur with the valid and genuinie concern and suggestion that the bot operator and the task requester should carefully and diligently review all the categories before the bot run to the best way possible and with the best efforts. Simply running the bot recursively over a category to all its sub-categories is fundamentally very very dangerous. The bot task requester or the bot operator , if at all wants the bot to run over a category and its subcategories , should first collect the list of the sub categories either manually or using tools like AWB,then eliminate any possible wrong categories and prepare a final 'cleaned' list before the bot run. The bot run should be only on these 'selected' categories. Project tagging based on categories is how we always used to do and one of the best methods to identify articles in the scope of a particular project. Having said that, Wiki Categories is very useful but NOT perfect. And hence 'false positive' tagging may occur any time. Having assessed hundreds of articles myself for lot of Wikiprojects I work for, I have also seen cases of manual irrelavent tagging as much as bot tagging !

There are issues and concerns from some Wikiproject members whether 'they should allow' tagging the article by another project in their project scope. Like whether WP:FOOD and Drink can tag WP:BEER articles. Personally I feel such a ugly situation of Ownership is aganist our fundamental Principle of a free encyclopedia that anyone can edit. If that is the case, soon the day will come when a WikiProject will not even allow anyone other than the project members to edit the article itself. Well, This is something we need to have a very serious discussion , but it is beyond the scope of ANI. Being also a member of WikiProject Council , I will start a discussion on this topic there to arrive at a community consensus.


Now David , answering to your questions: On running on a 7 day announcement for any bot run or 2 edits/min , does it mean only to TinucherianBot or any other bot ? I do appreciate your suggestions and concerns if they are logical, but I request you to NOT arbitrarily dictate your own terms and judgement to me or anyone else. Stop admining with a cane on other Wikipedians..Your role is to moderate wikipedia , Not to rule it. Your current behaviour is higly unbecoming of any experienced Admin. I would request a BAG member comment on your 'new rule'. The question is Not whether me should or any bot operator to follow this... The question is whether do we have either a consensus or policy in place on this. I welcome you to discuss this at Wikipedia talk:Bot policy and if there is an agreement, every bot operator will be happy to comply by the new rule. -- TinuCherian (Wanna Talk?) - 07:17, 8 July 2008 (UTC)

To follow up , discuss and resolve some of the important issues on multiple project tagging ,an important discussion on " Should WikiProjects get prior approval of other WikiProjects (Descendant or Related or any ) to tag articles that overlaps their scope ? " is open here . We welcome you to participate and give your valuable opinions. -- TinuCherian (Wanna Talk?) - 14:15, 8 July 2008 (UTC)
Printer222: I and other editors were asking Tinucherian questions and suggesting ways for Tinucherian to handle the problems his bot caused. But instead of answering the questions he spent his time complaining about the block. And he made a big show of it putting his complaints on all kinds of pages, involving lots of people in it. Only after his bot had been blocked for several days did he answer the questions. The only thing I "dictated" was that he answers the questions.
Cobi: When I wrote the suggestion of running slowly I was taking into account that Tinucherian has done several badly prepared bot runs. Perhaps I wasn't enough clear about what I meant? I meant that it would be nice if he runs slowly the first days to give him and other people a chance to find out if his bot run goes wrong or not. So he can stop and fix the errors without causing as much damage as his previous bot runs did. And the bot run we were discussing here according to himself involved 1500 categories and about 24,000 pages. That would take 8.3 days at 2 edits per minute. But I think it would be okay to increase the rate after the first two days or so if the bot run seems to not cause any problems. Note that this was just a suggestion. I was asking him if he could agree to do it like that or not. But it seems to me from his latest answers that he still prefers to only give people some hours to check his lists, and then to run at maximum speed.
Tinucherian: I am sorry that you don't seem to understand the concerns that I and other editors have with your rushed bot runs. You don't seem to understand that Wikipedia has no deadline. But I see that another admin have unblocked your bot, so things are out of my hands now. I can only hope that you will not make as much a mess with your next bot run. But I am sorry to say that from your answers it seems you haven't learnt a thing.
--David Göthberg (talk) 05:12, 11 July 2008 (UTC)
David, with all respect to you, I request you to stop accusing me or anyone in bad faith. It is evident from / my talk page, / WT:BRFA / WT:FOOD /or here that I have been trying and explain in my best way possible from the very beginning and which is understood by everyone except you... You seem to be still holding on to your WP:OWN of your Crypographic articles that Category:Cryptographic protocols should not be tagged by WP:COMP. If you see the discussion here , you will understand the consensus of the Wikipedia community is not the way your think. I wasnt making a show of anything, I was initially replying at my talk page , Jdrewitt started a thread at WT:BRFA and I moved my discussion to keep everything at a place. It was Happy-melon , who had moved it to WP:BON. I see that you became an admin very recently and request you to kindly handle situations more mature and with gentle care than 'trying to teach others' or dictate others on your terms. It was only when you was still relunctant to hear what everybody says, I reported at WP:ANI . I request you to kindly stop any more personal attacks and contentrate more on how to make Wikipedia a better place. Now that you have backlisted me :) , it will make me do my every action with more caution and care in future ;)-- TinuCherian (Chat?) - 05:50, 11 July 2008 (UTC)
Well, David, let us stop these personal attacks and the counter arguments. It really disturbs the valuable time for both of us. If you still have any concerns you may raise it at my talk page. If it is genuine and constructive , I have no objections. If you still want to have lotz of wikidrama and dictating your personal judgments, I may choose to ignore them. Sir, We have better things to do. I am archiving this discussion. -- TinuCherian (Chat?) - 16:51, 11 July 2008 (UTC)

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made on the appropriate discussion page, such as the current discussion page. No further edits should be made to this discussion.
Tinucherian: Oh no, you don't write that and then close the discussion. I reverted your closing of this discussion.
And here is my response:
You twist the truth so well. You should seriously consider a career in politics. For instance you state: "which is understood by everyone except you". Well, as you very well know many editors and several admins disagree with you and agree with me. For instance, the admin Cacycle stated above: "bot should stay blocked indefinitely to prevent otherwise inevitable future chaos." And Jdrewit that you mention has also protested about your bot in several sections above. And over at Wikipedia talk:WikiProject Food and drink a lot of people have complained about your bot.
I've tried to avoid commenting about this, but since you keep mentioning "bad faith" and "personal attacks":
Actually, you are the one that have constantly been assuming bad faith. You have from the very beginning been accusing me of holding a grudge against you, hating you, and doing personal attacks against you. It is sad that you don't see that all I wanted was to discuss things with you, ask some questions and give you some advice/ideas how to avoid the mess you are causing. But instead of answering questions and engaging in discussion you went on full attack.
And regarding me "backlisting" you: Now you try to hold it against me that I have saved some links to these pages so I can follow these discussions? That's just silly.
And finally, as I stated above. These things are out of my hands now. That is, your bot is none of my concern anymore.
--David Göthberg (talk) 07:25, 12 July 2008 (UTC)

Notice: I'm one of the many people who read this page in order to get help with bots, keep abreast of changes to policy, learn about who's running for BAG, etc. We're here for the bots, not the drama. We don't care who done who wrong; this just isn't the place for it. That's why God made talk pages and RFCs. I've been trying to ignore this drama, but it currently takes up four times as much space as all other discussion on this page put together (of stuff added in the last 2 months). May I please close this discussion, and humbly suggest proper channels of dispute resolution? – Quadell (talk) 13:15, 12 July 2008 (UTC)

Or, if it must continue, do the whole "subpage the huge discussion" that is all the rage at WP:AN/I these days. –xenocidic (talk) 14:29, 12 July 2008 (UTC)

Need help with session cookies

Hi everyone, I am writing my first bot for the Hindi Wikipedia; and have run into a roadblock related to session data. Details of the problem are posted on the talk page of the bot there: hi:सदस्य वार्ता:Bolbalabot. Please have a look, any help or suggestions will be appreciated. (I will be monitoring this page for responses as well.) Thanks! -- Longhairandabeard (talk) 23:33, 17 July 2008 (UTC)

WP 1.0 Assessment Categories creation with bot

Wikipedia:Bots/Requests for approval/TinucherianBot 4 and Wikipedia:Administrators'_noticeboard#Wikipedia:Bots.2FRequests_for_approval.2FTinucherianBot_4 . FYI.. I guess this will make our work easier in WP 1.0 Assessment Categories creation with the bot. -- Tinu Cherian - 05:40, 19 July 2008 (UTC)

This bot has made many mistakes, and the owner hasn't taken much steps to fix them. Would I be justified in simply reverting every single edit it's made? –xeno (talk) 22:21, 23 July 2008 (UTC)

Well, the owner is unresponsive, so I blocked the bot until the mistakes are fixed. As for reverting everything, you might as well do that, it'll take some time to manually sort the good edits from the bad. Maxim(talk) 22:45, 23 July 2008 (UTC)
Right, my thought was, it'll take me longer to sort it than it would take the bot to re-do the good edits (considering he can fix the issue), so... Doing....  Done. –xeno (talk) 22:47, 23 July 2008 (UTC)

BAG membership nomination

Per the bot policy, I am making this post to inform the community of a request for BAG membership. Please feel free to ask any questions/comment there. SQLQuery me! 03:15, 28 July 2008 (UTC)

Bot owner's experience/expertise required

Salutations. I've written an essay about counter-vandalism at Wikipedia:Vandalism does not matter, primarily based on my experiences as a recent changes patroller. However, I've never ran or written a counter-vandalism tool so I am worried I may have misrepresented them, or could represent them better. Any corrections or insights you might have regarding the bot-vandalism relationship are very welcome on the essay talkpage. Thanks for your attention, Skomorokh 14:00, 28 July 2008 (UTC)

Good essay, fair representation. To the WP definition of vandalism, 'insults, nonsense or crude humour, or by page blanking.' I would agree that bots have reduced the issue to a minor one. However my personal definition of vandalism includes the addition of any data that is obviously not true. Chaning the infobox on G Bush to say he was born in Toronto Canada is just as much a detreiment to the project as putting "BUSH WAS BORN IN TRONOTO CANANDA FUCH!!!!" at the top of the page. My issue is not that your essay is wrong but the defintion of vandalism is too limiting. My definitions include infobox manupliation (changing large amounts of data in an infobox, usually geographic an numbers while adding no source or summary), numbers inflation (large number changes, usually on discography articles of album sales with no source or change in source), and source creep (inserting a usually dubious statement between a sentance and its source). My experience is that current bots (by design) often do not catch these types of edits. --AdultSwim (talk) 18:31, 28 July 2008 (UTC)
Thanks for the input AdultSwim. I consciously did not define vandalism other than giving some examples, so as to make the points the essay was making short, simple and easy to grasp (instead of giving an exhaustive taxonomy of the myriad ways vandalism manifests itself). I figure the ambiguity will direct curious readers to follow the link to WP:VAND in the essay's opening. I also included a footnote regarding the evolution of vandalism to indicate there was more to it than blatant insertion of text and blanking articles. Regards, Skomorokh 18:54, 28 July 2008 (UTC)

Http 403 error

I'm having bot trouble (or really http trouble), and I was hoping someone here could help. When I load a PubMedCentral article in my browser (e.g. this), it displays just fine. But when I pull the html content using a bot, I get a 403 error: "NCBI/ipmc1 - The requested page has restricted access [Error 403]". Oddly, when I use this toolserver tool to extract header information, it shows a response of "HTTP/1.1 200 OK". I'm using LWP for perl, as follows:

use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("Bots/Polbot/test");
$ua->cookie_jar({});
my $res = $ua->get('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2440634&rendertype=abstract');
die "error" . $res->status_line unless $res->is_success;

Any ideas? – Quadell (talk) 16:41, 29 July 2008 (UTC)

They appear to block based on user-agent string. I just used the User-agent switcher for Firefox, changed my useragent - visted the site and I get:
You are trying to access a restricted page. If you believe that you have permission to view the page, please send an email to PMC and include the following information.

      URL: http://pmc.lb.ncbi.nlm.nih.gov/articlerender.fcgi?tool=pmcentrez;artid=2440634;rendertype=abstract
      Client: 125.238.96.5
      User Agent: Bot
      Server: ipmc2
      Time: Tue Jul 29 17:56:05 2008 EDT 

Rev. 12/21/2007

~ AmeIiorate U T C @ 21:59, 29 July 2008 (UTC)

Well, now that is surprisingly silly. But it works! Thanks much, – Quadell (talk) 22:07, 29 July 2008 (UTC)

AdultSwim / Lemmey / LemmeyBot

AdultSwim, recently revealed and blocked as a sockpuppet of Lemmey / Mitrebox has asked for a review of his block: User talk:AdultSwim#Blocked. He requested a note be posted to this noticeboard because the reference bot that he was running has been previously discussed here. Please see the AN thread: Wikipedia:AN#AdultSwim is asking for a review of his block. –xeno (talk) 02:30, 1 August 2008 (UTC)

The bot request was denied and the bot account was blocked and deflagged. BJTalk 03:03, 1 August 2008 (UTC)
Which bot request? This one was approved, no? Wikipedia:Bots/Requests for approval/LemmeyBOTxeno (talk) 12:21, 1 August 2008 (UTC)
Hadn't seen that one, Wikipedia:Bots/Requests_for_approval/LemmeyBOT_2c. BJTalk 13:36, 1 August 2008 (UTC)
k, thanks. seems to be more a procedural denial, due to being blocked for the socking though, yes? –xeno (talk) 13:38, 1 August 2008 (UTC)
Yeah, other than that it looked like a sound bot. BJTalk 13:47, 1 August 2008 (UTC)

Encoding question

Greetings, bot-ops. I'm hoping for another little bit of help. When I read in a wikilink, it might contain % and hexchars in order to encode characters not allowed in URLs. For example, http://en.wikipedia.org/wiki/%3F is the link to the question-mark article, since 3F is the ASCII code (in hex) for "?". I decode these with the perl regexp

s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg

So far so good. But for non-standard-ASCII characters, I get unexpected behavior. For instance, http://es.wikipedia.org/wiki/Expedici%C3%B3n_Malaspina points to es:Expedición Malaspina. "%C3%B3" is supposed to encode "ó", but when I use my handy regexp I get "ó" instead of "ó". I know Wikipedia uses UTF-8, and perl uses UTF-8 natively, but I'm not really good with character encoding issues, and this is causing me no end of grief. Does anyone know how I can get my bot to interpret "%C3%B3" as "ó"? – Quadell (talk) 23:27, 31 July 2008 (UTC)

You should be able to use the function decode("utf8", $url) from the "Encode" module to convert your "ó" into "ó". --Carnildo (talk) 00:25, 1 August 2008 (UTC)
I don't know exactly how Perl handles Unicode, but that sounds like unnecessary decoding to me -- it could lead to errors, or it could give you something you have to re-encode before you put it on Wikipedia. It sounds like Quadell already has the bytes 0xc3 and 0xb3, which would be "ó" in Latin-1 and "ó" in UTF-8. My guess, Quadell, is that you already have exactly the right bytes, but you're trying to print them on a Latin-1 terminal.[1] (The alternative -- that Quadell ended up with the four UTF-8 bytes that spell ó, like the ones we're typing here -- seems pretty implausible to me, and it would mean that I deeply don't know how Perl handles Unicode.) rspeer / ɹəədsɹ 02:32, 1 August 2008 (UTC)
  1. ^ In which case, tell your terminal to stop, collaborate, and listen, because it's not the '90s anymore.
That's exactly what the decode() function does: it takes the consecutive bytes 0xC3 and 0xB3, and converts them into a single Perl Unicode character. Perl does not store Unicode text as UTF-8 byte sequences, but as sequences of Unicode code points. --Carnildo (talk) 05:58, 1 August 2008 (UTC)

Yeah, it wasn't a monitor issue; "ó" was actually being written to Wikipedia (e.g.). And I couldn't simply decode("utf8", $article_text), since that would mess up the characters in other parts of the text (e.g.). I ended up having to use this instead:

$article_text =~ s/%([0-9A-Fa-f]{2})%([0-9A-Fa-f]{2})/decode("utf8", chr(hex($1)) . chr(hex($2)))/eg;
$article_text =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;

And this worked. Thanks for the tips! – Quadell (talk) 13:56, 1 August 2008 (UTC)

That is going to break on characters that require more than two bytes in UTF-8. Off the top of my head, I'd suggest (untested!):
$article_text =~ s/%([0-9A-Fa-f]{2}(?:%[0-9A-Fa-f]{2})*)/decode "utf8", join "", map chr(hex $_), split "%", $1/eg;
but it might be simpler to just encode the article content to a UTF-8 byte string, do the URL-decoding and then convert it back into a Unicode character string:
utf8::encode($article_text);
$article_text =~ s/%([0-9A-Fa-f]{2})/chr hex $1/eg;
utf8::decode($article_text) or warn "Broken URL-escaped UTF-8 in article text";
(utf8::encode—Ilmari Karonen (talk) 12:34, 11 August 2008 (UTC)

quick question re: templates DATE and DATE2

per discussions here and here, I'm willing to make the effort to change the DATE template to the proper ISO 8601 compliant format and redirect DATE2, but I can't seem to figure out which (if any) bots go about adding missing date parameters to the {{fact}} or other templates. I don't want to make any changes until I'm sure it's not going to give a bot a headache. can anyone clue me in? --Ludwigs2 03:42, 7 August 2008 (UTC)

User:SmackBot is the only one that I know of. BJTalk 05:30, 7 August 2008 (UTC)
(edit conflict) I know User:SmackBot does it and according to its BRFA User:Addbot does this also. ~ AmeIiorate U T C @ 05:32, 7 August 2008 (UTC)
ok, thanks - that's a good start.  :-) --Ludwigs2 23:17, 11 August 2008 (UTC)

ImportError: No module named sax

Hello, I stopped my bot for an hour and when I wanted to restart, I get the same error every time:

C:\Bot>login.py -all
Traceback (most recent call last):
  File "C:\Bot\login.py", line 49, in <module>
    import wikipedia, config
  File "C:\Bot\wikipedia.py", line 123, in <module>
    import xml.sax, xml.sax.handler
ImportError: No module named sax

Do you know the problem? I reinstalled Python and Pywikipedia, but it changes nothing. Best regards, --WikiDreamer (talk) 16:07, 10 August 2008 (UTC)

Help! (pywiki or just python generally!)

Resolved

It does rather seem that this noticeboard is turning into a help forum, but I'm not sure where else to go to ask this python/pywiki question. I've uploaded the code from my latest script: a helper script for TfD closures. However I'm getting the wierdest error. I've coded a direct call to an internal function as you can see; the expected result is for it to go to the 'd' option on my quasi-#switch statement in "handleTfd()", run "closeTfd()", then print 'foo', run "deleteTemplates()" which involves printing 'humbug', then return and print 'bar' on its way out. What actually happens is that "closeTfd()" runs as expected, then I get "foo bar" and a return: as best I can determine, the function "deleteTemplates()" is skipped entirely, and I don't know why. Any bright ideas? I admit my knowledge of python is entirely 'on-the-job' training, so perhaps there's some nasty specification rule that's biting me in my ignorance... Any help would be much appreciated. Happymelon 21:19, 10 August 2008 (UTC)

Maybe that's because Python conventions suggest the use of 'spam' and 'eggs' instead of 'foo' and 'bar'? :] Seriously though, are you saying that the code prints 'foo' and 'bar' but not even a 'humbug' in between? That's hard to believe. Strange logic errors could happen though if one had spaces and tabs intermixed in code, but I don't see the evidence of it in your example - you could double-check it on your side though (just guessing now). Миша13 22:01, 10 August 2008 (UTC)
The deleteTemplates() function contains a yield, which makes it a generator function. That's why the function does not run. I guess you (Happy-melon) don't know about generator functions, in which case you don't want to use the yield statement. -- Jitse Niesen (talk) 22:23, 10 August 2008 (UTC)
I'm afraid that came across more conceited than I had intended. Sorry. Nevertheless, the yield is probably the cause of your problems. You can read a bit about it in the Python tutorial. I don't know much more than what's written there. -- Jitse Niesen (talk) 11:59, 11 August 2008 (UTC)
Thank you, Jitse!! That was indeed precisely my problem. My familiarity with generators is indeed limited to wishing getReferences() wasn't one! I've never found a use for them - maybe I'm missing something, or just lucky to be working on a system with plenty of memory. Removing the yield statement did indeed fix the problem. Thanks again! Happymelon 13:40, 12 August 2008 (UTC)

interwiki warnfiles

The interwiki.py bot generates warnfiles. Is there a place to publish my warnfiles, and download other people's warnfiles, so I could run them from my bot? TaBaZzz (talk) 00:29, 11 August 2008 (UTC)

Try pastebin. Soxπed93(blag) 00:39, 15 August 2008 (UTC)

Query.php to die...

As said here and here, query.php will die in a month. Be sure that none of your bots use query.php! (That means you, Cobi) Soxπed93(blag) 02:26, 1 August 2008 (UTC)

For those of you who are using cobi's classes I've written up the api.php version of wikipediaquery::getpage() and wikipediaquery::getpageid() just add these functions into the wikipediaapi class:
		function getpage ($page) {
			$x = $this->http->get($this->apiurl.'?action=query&prop=revisions&titles='.urlencode($page).'&rvprop=content&format=php');
                        $x = unserialize($x);
                        foreach ($x['query']['pages'] as $p) {
                        	return $p['revisions'][0]['*'];
                        }
		}
		function getpageid ($page) {
			$x = $this->http->get($this->apiurl.'?action=query&prop=revisions&titles='.urlencode($page).'&rvprop=content&format=php');
                        $x = unserialize($x);
                        foreach ($x['query']['pages'] as $p) {
                        	return $p['pageid'];
                        }
		}

of course you will have to change all the $wpq->getpage($page); into $wpapi->getpage($page); but a simple find and replace should fix that --Chris 12:53, 11 August 2008 (UTC)

Cobi has updated his classes to use api.php, so update your local copies and you'll be fine --Chris 06:25, 16 August 2008 (UTC)

Polbot problems

As noted on Quadell's user page Polbot is creating a series of errors (turning accents and foreign scripts into alphabet soup etc. and even apparently adding a commercial link ). See see [49] and [50]. Can we get it switched off? Quadell is not responding. Thanks. --Kleinzach 10:13, 11 August 2008 (UTC)

The bot hasn't edited since that diff. Blocked unneeded. BJTalk 10:20, 11 August 2008 (UTC)
"I'll be looking into the errors you both mention. Until then, I have stopped running the bot. – Quadell (talk) 10:20, 11 August 2008 (UTC)" Heh. BJTalk 10:24, 11 August 2008 (UTC)

This is Quadell, Polbot's operator. I'm glad this is here, because I'd like to get a bunch of eyes on this. Polbot's #8 function, for improving the format of references and external links, was approved at Wikipedia:Bots/Requests for approval/Polbot 8. Some of the details may not have been completely clear in that request, so it would be good to have community input while we hash out what this bot should and should not do. I obviously don't want it to perform tasks where there's no community consensus. I'd be grateful for any comments or opinions anyone may have. So far, the following issues have been brought to my attention.

  1. User:Rockfang asked about whether the bot should be replacing <references /> with {{reflist}}. His question, and my answer, are here. From what I can tell, there are no problems with converting <div class="references-small"><references /></div> to {{reflist}}, as in this edit, since the formatting will be the same. But converting an undivved <references /> changes the formatting to use a smaller font, as in this edit. In the vast majority of cases (I believe), this improves the article in the eyes of everyone involved. (Most people who added an undivved <references /> were not making an explicit formatting choice, but simply going with the default.) If an editor thinks the larger font for references was preferable, he can of course revert the change or switch back. Is this acceptable? Or should the bot never convert an undivved <references /> to {{reflist}}?
  2. User:Algabal told me here that he believes that Polbot should not turn bare (numbered) external links in the bibliography section into references. An example of this sort of a change is here. I disagree, as I think Polbot's changes bring the article into closer conformity with our guidelines. Wikipedia:Manual of Style (links) says that "you should add a descriptive title when an external link is offered in the References, Further reading, or External links section", and "When placed in the References and External links sections, these links should be expanded with link text, and preferably a full citation". And Wikipedia:Embedded citations states "However, because of the difficulties in associating them with their appropriate full references, the use of embedded links for inline citations is not particularly recommended as a method of best practice." Algabal is pretty insistent that Polbot stop converting bare links into references if those links are in a Bibliography section. Does anyone else think Polbot should exclude this section from her activity?
  3. User:Kleinzach brought two issues to my attention. The first is that when an external link expires and the domain is taken by a different company, Polbot's title-generating function reads whatever title is current. Here is the example. This isn't a case of "adding a commercial link" -- it's a case of revealing that a link which used to be informative is now used for a different purpose. I could add "Web Hosting" to the bot's blacklist of titles -- but then the bot couldn't add descriptive titles for links that are legitimately for web hosting, such as at the Go Daddy article.
  4. The other issue Kleinzach brings up is a character-encoding problem, such as this. This is obviously a bug, and I need to fix it.

I'm open to changing any of the bot's functions, so long as there is consensus to do so. If anyone has any input on these issues, I'd be much obliged. – Quadell (talk) 14:49, 11 August 2008 (UTC)

Again, the links your bot is converting are not citations and they are not references. They are links to viewing the book on a site for e-books, like the Internet Archive. The descriptive title for the external link is the title of the book and is offered directly before the link. If you want to program your bot to turn the bare link into a link that says "Book Viewable on Google Books" or something, fine but turning it into a reference exacerbates the problem (which is barely a problem to begin with) and simply makes no sense.
It's one thing to make the links fall in line with Wikipedia guidelines, it's another thing to repeatedly make completely illogical changes to them so they somehow fit the rules but don't even make any sense to the reader. And one more thing: the policies you are citing refer to the "References, Further reading, or External links section", but the bot is making changes to the Bibliography section, too, which is where most of the problems are. Algabal (talk) 21:01, 11 August 2008 (UTC)
I agree broadly with Algabal and I wonder about the bot setup. Did someone ask for a bot run, and if so, who and why? What is it supposed to achieve? On the face of it, the bot is attempting - and failing - complicated jobs that need the judgement of a human editor. What tasks was the bot approved for? --Kleinzach 03:30, 12 August 2008 (UTC)
Please see Wikipedia:Bots/Requests for approval/Polbot 8. What jobs do you thinks it's failing at? – Quadell (talk) 13:02, 12 August 2008 (UTC)
It doesn't matter whether your bot is fulfilling its original mission if its also messing up many more article in the process. It sounds like you're almost deliberately trying to ignore the fact that your bot is making incorrect changes to bare links, and acting as if the fact your bot is making these mistakes is somehow excusable given that bare links are not generally considered a good thing under Wikipedia. Are you not concerned with the fact that your bot is making endless errors that we have to go back and manually correct? Algabal (talk) 19:18, 12 August 2008 (UTC)
Adding descriptive titles to bare links is one thing; converting in-line links to references is another. The first is acceptable in my opinion, the second is not. While some in-line links may be references, most are not. Converting them all to references is inappropriate and a bot cannot distinguish between an in-line link that is supposed to be a citation and one that is supposed to be an external link. Please see Wikipedia:Bots/Requests for approval/SMS Bot 2 for an identical April bot request that was denied after lack of consensus. The objections raised in that one are applicable here. -- JLaTondre (talk) 14:02, 12 August 2008 (UTC)
This is a very relevant request. Thanks for the link; I'm looking through it. – Quadell (talk) 17:54, 12 August 2008 (UTC)
Interesting. In April, a bot that converted numbered external links to references was not approved. In August, a bot that converts numbered external links to references was approved. This could be because consensus changed, because BAG membership changed, or simply because different people were paying attention each time. I personally don't believe there are any cases where a base (numbered) external link is the best practice, but there have been dissenting voices here. Perhaps I need to open an RFC. (I haven't been running this task, by the way.) – Quadell (talk) 18:04, 12 August 2008 (UTC)
Bot requests don't get wide participation by the community. It isn't unknown for something to go through the bot approval process and then have the wider community complain once it's running. The fact that in your BFRA this was one of several subtasks vs. the other being the only task could have something to do with it as it could have been overlooked.
While numbered external links may not be best practice, always converting them to references is worse practice. References are for citations that back up the statement made. In converting them to references, it makes it look like the statement has been cited when it fact it may not be. A couple of examples in addition to the previous ones:
Valter Perić: There is a claim made in this article regarding a picture of a beer label. Back in July, it was tagged with {{fact}} under the rationale "That beer sticker is just an image. I've not found this beer yet." Your bot converted the external link to the image into a reference. So now we have what looks like a referenced statement with a citation request. However, it's not really a reference as the external is just the image itself and does not validate the statement.
Sidney Gulick: Your bot converted an external link to a book into a reference. This is not also not a reference. It's an external link that is already covered by the first link in the external links section and should be removed.
Task 8 does have a number of helpful fixes. Subtasks 1-5 & 7-8 should continue even if subtask 6 is dropped. -- JLaTondre (talk) 19:34, 12 August 2008 (UTC)
I agree here just because there is an inline link does not mean its a reference, part 6 should be stopped and never started again. Over time others and myself have noticed that some spammer and cruft link additions are done using bare inline links, those need to be addressed on an individual basis and should never be turned into references via a automated task. everything except that function I would like to see continue. βcommand 21:26, 12 August 2008 (UTC)
  • I am against changing all occurances of <references/> to {{reflist}}. If, a "div" was previously used to make it small, then it could be made into {{reflist}}. As I learned the hard way when I first started editing wikipedia, there is no consensus to only use {{reflist}}. WP:FOOT and WP:FOOT's talk page are both good reads. Warning: it showed my stupidity shortly after first signing up. ;) But the point of sharing those links is to show how there is no consensus to use one style over the other. I suggest the bot be changed to only do the change to {{reflist}} if the "div" was used to make it small.--Rockfang (talk) 08:50, 13 August 2008 (UTC)

It appears to me that there is no part of Polbot's 8th task that someone won't complain about. However, the strongest and most consistent objections seem to be regarding the conversion of bare (numbered) external links to references. I disagree, but I want to be sure to follow consensus here, so I'll be removing that function from Polbot's 8th task. (I will not be removing other functions unless it is demonstrated to me that the community in general, and not just one person, thinks it's a good idea.) – Quadell (talk) 12:58, 13 August 2008 (UTC)

Like Rockfang I have a problem with changing <references/> to {{reflist}}. Having a smaller font looks rather silly if there are only a few short notes. As Template:Reflist says: "Note that there is no consensus that small font size should always be used for all references." And WP:FOOT says "it is common when there is a long list of references (as a rule of thumb, at least ten) to replace the basic <references /> tag with {{Reflist}}". If the community does think that a smaller font size should (almost) always be used, it seems better to change the global style sheet (Mediawiki:Monobook.css). -- Jitse Niesen (talk) 13:58, 13 August 2008 (UTC)
Like two previous contributors I have a problem with changing <references/> to {{reflist}}. When I edit, I choose one or the other, according to how crowded the article seems. Bots should not override human judgment calls. Opus33 (talk) 15:50, 13 August 2008 (UTC)
That's surprising. Okay, I won't convert undivved reference tags to the reflist template. – Quadell (talk) 17:03, 13 August 2008 (UTC)

FYI, I am restarting the bot. I have fixed the encoding problems, I have removed the function to replace undivved references tags to use reflist templates, and I have removed the function to change bare (numbered) external links to references. – Quadell (talk) 14:47, 14 August 2008 (UTC)

Honestly, I think they BAG should keep the BRfA open while the bot is running for the first week. Atleast, then I would have the time to properly review the operations and read through the source. Anyway, I really don't like how it appends at www.example.com to all the links, it messes up all sorts of regexes for people. — Dispenser 16:20, 14 August 2008 (UTC)

Could Quadell have a look at this one. Why is Polbot delinking one year but not the others? --Kleinzach 23:51, 14 August 2008 (UTC)
I believe MOS says somewhere that only full dates should be linked; that would explain the delinking. §hep¡Talk to me! 23:55, 14 August 2008 (UTC)
Thanks. I've found this in Wikipedia:CONTEXT#Dates. --Kleinzach 00:22, 15 August 2008 (UTC)
Here's another problem. In this case the quotation marks around Depuis Le Jour have unravelled to &quot . . . &quot. --Kleinzach 00:04, 15 August 2008 (UTC)
And again the bot has not unraveled anything. The text comes from the title of the youtube page which is viewable in the HTML of that page. In this case the problem is in the youtube template, go address your issues there. --ENAIC (talk) 08:38, 15 August 2008 (UTC)
ENAIC: I understand this is your second day on Wikipedia. I wonder if you would like to read UNCIVIL? Thank you. If bona fide error reports can't be accepted how can they be corrected? If you look at the diff you'll see the apostrophes were originally rendered correctly. (Quadell has already agreed that there were (are) coding problems. I assume he would like to check whether this is another of them.) --Kleinzach 08:59, 15 August 2008 (UTC)
Even if Enaic was a bit uncivil his point still stands. From the YouTube page's source: <title>YouTube - Grace Moore: &quot;Depuis le jour&quot; (Louise)</title> Polbot updated the title to match. Is it really such a big deal or are you just trying to find excuses to criticise the bot? ~ AmeIiorate U T C @ 11:32, 15 August 2008 (UTC)
That is something that Polbot should be able to handle. If you look at that page in a browser, it will correctly render the title element with quotes and not as &quot;. Polbot should be able to do the same. In addition, I'm not sure why it would replace an editor provided title. If an editor has given a title, it seems like that should be used instead of a bot generated one. However, neither of these issues require the bot to stop operating and can be better handled via a question to Quadell (which I have done). -- JLaTondre (talk) 13:02, 15 August 2008 (UTC)
  • Have you also fixed the ref punctuation problem? 08:06, 15 August 2008 (UTC)

Thanks everyone for the feedback. JLaTondre brought some of these issues to my bot's talk page. I will be fixing the &quot; problem in titles, and I'll make sure to preserve user-given titles in, e.g., [http://www.youtube.com/watch?v=L_YeGgDsd7s YouTube - Grace Moore: "Depuis Le Jour" (Louise)] – Quadell (talk) 20:21, 15 August 2008 (UTC)

Write API enabled

Brion has enabled the write facilities of api.php, which include page editing and other fun things. [51] [52] — Carl (CBM · talk) 00:25, 26 August 2008 (UTC)

Query.php has been disabled

The deprecated query.php interface has been disabled.[53] Bots that use(d) it will no longer work. Possible replacement libraries include pywikipedia for Python or my Mediawiki::API for Perl. — Carl (CBM · talk) 00:30, 26 August 2008 (UTC)

DumZiBoT problems

Like Polbot (above), DumZiBoT is converting bare external links in references into named external links. The problem is that the results never seem to be useful, see this and this. Instead of the name of the (website) publication, we get a badly formed/capitalized version of the title of the work and other, more or less random, details. Can this kind of bot operation ever produce a worthwhile result? It seems unlikely. --Kleinzach 01:24, 14 August 2008 (UTC)

Sorry, I fail to see the constructive part of this comment. Decoder pen maybe ? NicDumZ ~ 02:21, 14 August 2008 (UTC)
Indeed, I'm not qualified to tell you how to improve DumZiBoT - only to tell you it's producing a poor result for the articles that I've seen so far. --Kleinzach 02:59, 14 August 2008 (UTC)
The result that DumZiBoT is producing is by far better than having a bare URL. ~ AmeIiorate U T C @ 03:46, 14 August 2008 (UTC)
But is that the only choice and if so why? Why can't the bot find the name of the (web) publication? --Kleinzach 04:06, 14 August 2008 (UTC) P.S. The bot - which is still operating - is now going through a series of Italian operas marking the external sites LIBRETTI A STAMPA (which means printed libretto) see here. --Kleinzach 04:12, 14 August 2008 (UTC)
Can you find the name of any web publication ? :) I can't. In fact, I can't even define what's a "publication" when considering random webpages - and yes, it is still operating, I really see no reason to stop it ! NicDumZ ~ 04:21, 14 August 2008 (UTC)
You don't consider an all out war on bot editing, because a handful of edits may not provide the absolute best version possible (i.e. problems), reason enough. --ENAIC (talk) 04:35, 14 August 2008 (UTC)

<title> elements are random? /confused BJTalk 11:38, 14 August 2008 (UTC)

DumZiBot is making unqualified improvements to hundreds of articles. Yes, a human could make an even better title for most links than the one the bot autogenerates. I don't think anyone will disagree with you there, and please feel free to improve any titles. But the bot is definitely making improvements, and complaints like "It's not improving articles good enough!" don't reflect well on you. – Quadell (talk) 12:02, 14 August 2008 (UTC)
Objection, Quadell. We are all trying to improve the encyclopedia. That's what I'm doing here. That's what you should be doing here - especially in view of the many criticisms of your bot operation made above. Ad hominem attacks don't reflect well on the person who makes them - which is why I avoid them. The attribution of complaints in deliberately bad English ("It's not improving articles good enough!") - complaints that were never made - is bad form. --Kleinzach 13:10, 14 August 2008 (UTC)
You know what Kleinzach ? I am very patient and try to answer every claims addressed in my talk page, trying to listen to each idea formulated. Now, depending on you, you have three possibilities :
  • You have valuable, reasonable suggestions for me to improve my bot; share them and I'll be happy to try to implement them
  • You don't have any suggestions :
    • and you think my bot is one of these "stupid mass-damage bots" and should be stopped immediately; head for AN/I, or whatever other place to ask for a shutdown.
    • you're only being noisy for the heck of it, only repeating what every readers of this noticeboard already know, i.e. "bots sometimes get it wrong, but overall help improving our encyclopedia". If so, thanks for the interesting input, and let's kill this thread.
But please, please, stop this. This thread is going nowhere.
NicDumZ ~ 13:28, 14 August 2008 (UTC)
This was highly uncalled for. While I don't have a problem with the bot operation, I can see that some people would feel bare links are actually better than having the occasional nonsensical title. Trying to categorize people with that opinion as being bot haters or merely disruptive is in poor taste. If you and Quadell cannot defend the bot's function without attacking belittling other editors, than I think you both need to rethink your approach. -- JLaTondre (talk) 14:19, 14 August 2008 (UTC)
Where have I attacked another editor? – Quadell (talk) 14:48, 14 August 2008 (UTC)
I changed "attacking" to "belittling" given that I forget attack has certain connotations on Wikipedia that I wasn't intending. As for where, it's your previous post in this section. -- JLaTondre (talk) 15:13, 14 August 2008 (UTC)
Even nonsensical titles are useful, as it provides more information to relocate a dead link. Dead links continues to be a problem at FAC and is partly dealt with by Ealdgyth now, but more reviewer are needed (PR & GAN). Now its being suggested that DumZiBot's edits should be done by humans. I've created interactive tool of the DumZiBot script and it gets about 20 hits a day. At least an order of magnitude of bare links are added daily. And some editors seems to depend on this bot doing their work for them, although they really shouldn't. You could try starting up a WikiProject to address reference issues, but I haven't seen anyone actually sustain the effort. — Dispenser 15:54, 14 August 2008 (UTC)

Hello, I've looked over various changes made by this bot and find they are frequently ungrammatical, badly punctuated, and in general reflect badly on our encyclopedia.

My feeling is: you ought to set up a bot that simply finds bare links in the references. Then you visit the article, read it in full, and write editorially acceptable captions for the bare links. You would have fun and learn a lot by reading the articles, and you would be uncontroversially helping the encyclopedia instead of (I think) hurting it. Sincerely, Opus33 (talk) 16:17, 14 August 2008 (UTC)

Are you honestly suggesting the bot operator make hundreds of thousands (if not millions) of edits after fully reading the article? The links are fixed properly when a human editor does a copyedit or cleanup of the articles but the backlog for that is years (decades at the current rate if you take in to account untagged articles). BJTalk 17:05, 14 August 2008 (UTC)
Hello Bjweeks. Yes, indeed, that is what I'm honestly suggesting. I'm aware that the backlog is very large. However, human editorial labor will generally produce improvements, whereas trying to write prose by algorithm is likely to do more harm than good. In general, I think that in our effort to to produce a top-quality encyclopedia, we should proceed patiently and with high standards. Yours very truly, Opus33 (talk) 17:24, 14 August 2008 (UTC)
Opus33, these auto generated titles are better than a number and a web address. at the rate that links are added (~70 per minute) it would mean that humans can never keep up. that means we have two choices either leave crappy links waiting for someday (that may never come) or create basic bot generated titles. I prefer basic versus no information. Im not the only one that thinks that. with every decision there are some who disagree, but this improves the encyclopedia. βcommand 17:31, 14 August 2008 (UTC)
I tend to agree with Bj, the DumZiBoT titles are better than no title at all. Of course doing it manually will be better, but there's currently a shortage of people willing to do it. If the bot generated titles use poor grammar, its because the titles of the websites have poor grammar. Mr.Z-man 19:15, 14 August 2008 (UTC)

Let's look at it from another angle - that of the reader. The reader will have either have confidence or lack it, based on the visible text. So a nonsensical link title will be worse than a bare one. If the nonsensical titles only appeared in 5 or 10 percent of cases we might be able to correct them by hand, but from what I have seen up to now, the bot only occasionally produces a viable result. (I'm working on music articles which are technical and often have references to medium to small sites, so results maybe better for news-related, political articles referencing well-known websites etc.)

I still don't understand how DumZiBoT works. Why for example doesn't it produce a (basic) URL when it can't find a viable alternative? In the case of this diff I gave earlier the URL is www.operone.de. This is much more informative for the reader than the strange mixture of Italian and German in L'Armida immaginaria von Cimarosa. --Kleinzach 23:21, 14 August 2008 (UTC)

Uh, the title element of the page. BJTalk 23:25, 14 August 2008 (UTC)
How many other websites have links that look like [54]? That just looks really unprofessional to me, especially in a numbered list in a References section, you get things like:
  1. ^ [55]
  2. ^ [56]
  3. ^ [57]
I think anything even remotely descriptive would be better than that. Mr.Z-man 01:08, 15 August 2008 (UTC)
We are talking of language processing... don't expect a robot to be as smart as a human, Kleinzach. You can't teach a robot to grab the wikipage context, the external HTML page, and to simply answer "This title is/isn't a viable title". It's an algorithm: you have to tell the robot what to do step by step. Look it from another angle - that of the bot owner. Would you be able to describe a pseudo algorithm to -you chose- analyze a title and state wether it's valid or not, and this in every language available on the web, "create" a title from any web document, or even simply detecting the language of the page ?
Have a look at Natural language processing, please : hundreds of researchers are currently working on that kind of tasks, and the very basic language analysis Google supports in its indexing phase is realllly something new and, I have to say, quite impressive. Antivandal bots are now "analyzing" somehow the value of page changes; But keep in mind that this is doable because we know that we deal with english wikitext - i.e. you can set before which english expressions are "good", and which are "bad" - , and that implementing a general way to analyze, even 70% of the pages linked in our encyclopedia is really something that I can't do.
I'm repeating it again: if anyone comes to me, with a specific idea to improve my bot, I am willing to spend my time looking at this suggestion, and trying to figure out if this can be done. Also, if there are specific questions from non programmers on "how does your bot work", I am willing to explain it in depth. But don't expect me to improve magically my bot because you come to me and say "it's not this good, improve it" without any suggestion... That, I can't do.
NicDumZ ~ 01:24, 15 August 2008 (UTC)
OK. Thank you, that's helpful. However it begs the question: should we use bot scripts to write content to articles given the unpredictable and unmonitorable results? The more I understand about this situation, the more I think we shouldn't be using bots in this way. --Kleinzach 01:51, 15 August 2008 (UTC)
But the results are always 110% predictable. whether or not its a useful/effective improvement is debatable. but the bots results are predictable and monitorable. βcommand 03:47, 15 August 2008 (UTC)

Sometimes the page title is more useful than the url to the reader. Sometimes the url is more useful. Obvious solution: show both the page title and the url (truncated if necessary). As I suggested here, we can use {{cite web}} (or something similiar) to format the showing of both. We should work on the assumption bots are stupid and readers aren't. --Rob (talk) 23:28, 15 August 2008 (UTC)

Update: I didn't realize I was repeating a previously rejected idea. I think DumZiBoT should be terminated as long as it continues to completely hide urls. It's impossible for a bot to know if a page title is meaningful, so it seems absurd to replace what may be an informative url, with what may be an uninformative page title. Our primary objective is not making articles pretty, but hopefully, a big objective, is to inform readers, and inform them of where our information is coming from. --Rob (talk) 15:56, 16 August 2008 (UTC)
Rob: That's disappointing. Your idea was excellent. I was hoping it could be implemented and would prove workable. In the circumstances, I agree that DumZiBoT should be terminated. --Kleinzach 22:57, 16 August 2008 (UTC)
Hidden URL ? The URL is never hidden, you juste have to hover on the link with your mouse to get the address... But I don't get it: most of the time when there's a title, the URL is "hidden", as you're saying ... ?! NicDumZ ~ 02:20, 17 August 2008 (UTC)
Actually, for references, everybody should be including more information than just the page title of the link. We should be including the name of the publisher. Unfortunately, bots are stupid, and don't know that. The url is the best guesse of the publisher. An ordinary user should be able to look at all the references at the bottom of an article, and quickly see if they're all coming from the same place (say myspace.com) without hovering over each one see if a myspace.com link is hiding behind a title like "NY Times Article" or "Official site of the Olympics" (made up examples). Also, if you don't display the url (or truncated version) and you show the page title, with nothing else, it's suggesting that Wikipedia is the source of the textual description, instead of the external page. Finally, your bot should help human editors complete the task, by starting in the use of a proper template, such as {{cite web}} which encourages people to provide all the relevant source information. I find generally now, links your bot makes, just sit there, quite uselessly. The point of references is to say where information *really* comes from, it's not to look pretty. --Rob (talk) 03:50, 17 August 2008 (UTC) Added: I should have said "made less likely and less easy to see" instead of "hidden". --Rob (talk) 03:58, 17 August 2008 (UTC)
Good. It is agreed then that Rob will edit Wikipedia, including more information than just the page title of the link. He should include the name of the publisher. He will edit in such a way to ensure that Wikipedia is not interpreted as the source instead of the external page (possibly by using the external link logic already in the Wikimedia engine). He will help human editors complete the task by starting in the use of a proper template, such as {{cite web}} which encourages people to provide all the relevant source information. His edits will make links that don't just sit there quite uselessly. A list of references to start is located here --ENAIC (talk) 05:10, 17 August 2008 (UTC)
It strikes me as a bit odd how no one reads the link in the FAQ as the citation templates would violate the arbitration ruling. I agree with ENAIC example above with appling the principles on humans too. I have create a tool for him which indentifies links for him to repair, this list is found at tools:~dispenser/view/BELs. — Dispenser 07:47, 17 August 2008 (UTC)
Well, then DumZiBoT is already violating the rule against changing citation formats without the consensus of editors of each article it changes. It already has repeatedly changed multiple articles in a manner objected to by the editors of those articles, and the bot owner's response is {{sofixit}}. To use {{cite web}} by bot, would obviously require some sort of very broad community consensus before anything was done. I just don't see why one citation standard can be implemented without such a consensus, but a better one requires such a consensus. --Rob (talk) 08:24, 17 August 2008 (UTC)
DumZiBoT is not changing the citation format of any reference. It only edits inline external links, and let them as inline external links.
It already has repeatedly changed multiple articles in a manner objected to by the editors of those articles: pardon ? Some users have certainly pointed to me not-so-good titles inserted by DumZiBoT, or buggy edits. I first try to see if any algorithm can be improved for these bogus edits, then if not see if the particular title can be blacklisted, and if not, yes, I apologize and ask the editor to put a better title. But you're giving the impression that DumZiBoT is in purpose disrupting articles, having some enemy users that don't want it to edit their articles at all, and me being happy with this; this is definitely not True. NicDumZ ~ 09:45, 17 August 2008 (UTC)

Arbitrary section break (DumZiBot)

I see at least three citation formats being discussed here. Assume the three examples are each inside a <ref> tag (and therefore used as references):

  1. original: http://domain.net/absurdlylongpath
  2. DumZiBoT: Example Page Title
  3. cite web: ""Example Page Title". domain.net.

Going by what the reader sees, it seems going from #1 to #2 is as much a "prohibited" change of citation format as going from #1 to #3. But #3 is simply more informative to the reader, particularly if there are large numbers of citations together. --Rob (talk) 17:11, 17 August 2008 (UTC)

Seeing as a bare link isn't a citation format, no. BJTalk 17:22, 17 August 2008 (UTC)
So, if somebody adds <ref>http://www.example.net/path</ref> after a claim, to back it up, you're saying that doesn't count as a citation? --Rob (talk) 17:45, 17 August 2008 (UTC)
It isn't a valid citation format, no. BJTalk 18:25, 17 August 2008 (UTC)
Then which of those, if any, do you consider to be "valid citation formats"? Just curious. —Ilmari Karonen (talk) 19:40, 17 August 2008 (UTC)
2 and 3? BJTalk 19:42, 17 August 2008 (UTC)
Why? Neither of them conform with any recommended citation styles any better than the first one. —Ilmari Karonen (talk) 19:58, 17 August 2008 (UTC)
If 2 and 3 are both citation styles, and it can't change 1 to 3, surely it shouldn't be allowed to change 1 to 2 either? Or, more sensibly, it should be allowed to change from 1 to 3. Mr.Z-man 20:04, 17 August 2008 (UTC)
(ec) I agree with this. Adding formatting to an unformatted link clearly isn't changing styles. BJTalk 20:10, 17 August 2008 (UTC)

I don't see that this bot task as problematic - the HTML <title> will have a title that the web page author thought was sensible. And if the bot is only replacing bare links, then any human can always improve the title and the bot won't edit it again. — Carl (CBM · talk) 20:11, 17 August 2008 (UTC)

...except when they don't. The problem is that, in most browsers, the <title> string will only be visible in the window title, which is something most people just mentally filter out. In particular, I've seen many instances where people create new pages on a site by cutting and pasting the HTML code from an existing page (often originally written by someone else) and changing the content — but frequently leaving the <head> section completely untouched. Then there are the people who create web pages in programs like Microsoft Word, which (unless you know how to find the obscure metadata settings for changing it and care enough to actually do it) tend to give the pages silly autogenerated titles based on e.g. the filename of the document or the first few words in it (which sometimes does produce a meaningful title, but often won't). Oh, and did you know PDF files also have titles? Most people, even most people who create PDF files, don't, but apparently DumZiBot is still happy to grab those titles and use them.
Mind you, I still think this bot is, on the whole, useful. But just naively assuming that all web pages have meaningful and correct <title>s isn't really sensible. —Ilmari Karonen (talk) 20:30, 17 August 2008 (UTC)
PDF authors are settings the title metadata without realizing it? BJTalk 20:50, 17 August 2008 (UTC)
Many programs will set it to some default string, often (and probably most usefully) the file name, sometimes something else. —Ilmari Karonen (talk) 20:52, 17 August 2008 (UTC)
For some nice examples, Google for intitle:"Microsoft Word" filetype:pdf. —Ilmari Karonen (talk) 21:00, 17 August 2008 (UTC)
And how many of those are refernced in wikipedia? --ENAIC (talk) 21:02, 17 August 2008 (UTC)
Actually, I wouldn't be surprised if quite a few were, given that many seem to be official reports and other potential sources of facts. I do know I've previously found such titles making their way to Wikipedia via DOI bot's edits. (I'll try to find the diff.) —Ilmari Karonen (talk) 21:07, 17 August 2008 (UTC)
So of 60236 external linked refs between !!! and Cree Summer I found 14986 had 'Bot generated title'. Of those 215 had ".pdf" in them. Would you care to take a stab at what percentage of those are 'bad'? --ENAIC (talk) 21:42, 17 August 2008 (UTC)
Nice list. If that's a representative sample, though, it actually looks a lot worse than I thought. I count maybe one or two dozen (not counting the few apparent false positives that are not bot-generated) that might be at least somewhat valid; all the rest are pretty much garbage, if not actively misleading. There are really only two or three titles in that list that I'd really consider good titles — and even then, some additional information like a hostname would be useful. —Ilmari Karonen (talk) 22:23, 17 August 2008 (UTC)
At a second glance, it's not quite as bad: there may be almost a dozen relatively good titles, and up to three dozen or more marginally valid ones, depending on how low you set the bar. Even so, the overwhelming majority are still useless or misleading. —Ilmari Karonen (talk) 23:20, 17 August 2008 (UTC)
(ec) ...or, even better, try Google for intitle:"created with" filetype:pdf. —Ilmari Karonen (talk) 21:07, 17 August 2008 (UTC)
If "created with" isn't already on the bot's blacklist, I'm sure it can be added. ~ AmeIiorate U T C @ 21:18, 17 August 2008 (UTC)
Indeed it should. Another good entry would be "My Documents". Anyway, the point of the examples was simply to show that such uninformative titles are out there. The other common types I mentioned above are not so easily Googled for (or blacklisted). —Ilmari Karonen (talk) 21:42, 17 August 2008 (UTC)
...but not that hard to find: it took me only a few minutes to locate [58], whose <title> says, translated, "pictures from the winter excursion". (Hint: Finland does not look like that in winter.) That one's from a site I used to maintain but no longer do; obviously the current maintainer copied and pasted the HTML from an older gallery page for a different event. —Ilmari Karonen (talk) 21:53, 17 August 2008 (UTC)
That page doesn't appear to be used as a reference in Wikipedia [[59]]. Since DumZibot doesn't translate titles are we not chasing flies here? --ENAIC (talk) 22:11, 17 August 2008 (UTC)
Of course it isn't. But what makes you assume the sites we do cite are any better? I picked that example since I know the current maintainer of that (sub)site and know that he's not all that good with HTML. (It's not his main job.) I was fairly sure I'd find a couple of examples there, and sure enough, that one came up on the first page. I could probably find you a couple more from other sites if you like. It's kind of hard to find them unless you know where to look for, though, since figuring out whether a page's title accurately described its content can't be automated. —Ilmari Karonen (talk) 23:04, 17 August 2008 (UTC)
Anyway, if you want an incorrect title actually added by DumZiBot, how about "The Settlement of Papela" from your own list? (Yes, the document does mention the settlement in one section, though mostly referring to it by another name. I'm not sure where the title might have come from in that particular case.) —Ilmari Karonen (talk) 23:08, 17 August 2008 (UTC)

I wandered, if we avoid {{cite web}} or any cite tag, would be people support the simple idea that DumZiBot should still show the url (truncated as needed/appropriate) in addition to the page title? I'm not to worried about obviously stupid titles, like "New Document", as readers can see those aren't meaningful. The problem is a page title that makes something appear as a legitimate source, when it's really some unreliable some page hosted on myspace.com or wherever. So, far, I haven't heard anybody say why we shouldn't show at least part of the url, and why we should expect people to hover over each link to see where it's going. --Rob (talk) 22:38, 17 August 2008 (UTC)

While I agree with the general drift here (and much appreciate Rob's well-considered contribution) I think we should remember that 'obviously stupid titles' do discredit WP with the general reader - who is neither a bot tecchie or a content editor. --Kleinzach 23:36, 17 August 2008 (UTC)
What's your point Kleinzach ? NicDumZ ~ 00:45, 18 August 2008 (UTC)
ELIZA? --Kleinzach 07:47, 19 August 2008 (UTC)
Klein, "Who else in your family hates you?" --ENAIC (talk) 13:56, 19 August 2008 (UTC)

ENAIC has now been blocked indefinitely. It was a sockpuppet, see Lemmey. --Kleinzach 00:21, 21 August 2008 (UTC)

That sounds good to me, and indeed I already proposed it to NicDumZ on his talk page. Looking at ENAIC's list (of which I put up a tidied-up version in my user space), just having the hostname visible would help in a lot of cases by telling the reader something about who is providing the information. —Ilmari Karonen (talk) 23:14, 17 August 2008 (UTC)
If it includes the title and a shortened URL, it might as well just use {{cite web}}, if its being used in the context of a reference. Mr.Z-man 23:39, 17 August 2008 (UTC)

Statistics

NicDumZ, can you compile some statistics on how many of your bots edits are reverted? I know that of the 4 articles from my watchlist that your bot recently reviewed I had to revert two of them. The other two were instances where your bot left a helpful note the talk page pointing out a problem that required a human editor to resolve. The talk-page comments were very useful but the mainspace ones were bordering on destructive. Perhaps your bot would be better served making lists of citations that need to be fixed and depositing them on the articles' talk pages? Plasticup T/C 15:56, 19 August 2008 (UTC)

I tried, but chances are that I got my SQL wrong:
select count(*), sum(if(page_latest=rc_this_oldid,1,0)) from recentchanges join page on (rc_cur_id=page_id) where rc_user=6085301 limit 1000;
Returns that 19,955 edits on the 39,524 still stored in the recent changes are the last edit of a page. Accordingly, it means that at least 50% of the bot edits remain as such.
select count(*) from recentchanges left join revision on (rc_last_oldid=rev_id) where rev_user=6085301 and rc_comment like '%by [[Special:Contributions/DumZiBoT|DumZiBoT]] ([[User talk:DumZiBoT%';
Returns the number of edits following immediatly one of DumZiBoT's edit that have an edit summary containing by [[Special:Contributions/DumZiBoT|DumZiBoT]] ([[User talk:DumZiBoT (the default revert summary) : 28 / 39,524. Making the pattern "%Undid%DumZiBoT%" make the number even smaller : 26
But again, something must be wrong, please correct me :)
NicDumZ ~ 17:45, 19 August 2008 (UTC)
I don't know about everyone else, but rather than spend 30 seconds reverting the bot I spent 60 seconds correcting its garbled message into a useful citation. As for 50% of the bot edits being the most recent, I don't think that speaks to the edits usefulness so much as the obscurity of the pages—my edit to Octahedral cluster is the most recent despite being over one-year old. I was thinking of something along the lines of manually examining a sample of 200 edits from last week and seeing how many of the edits were altered. Plasticup T/C 18:26, 20 August 2008 (UTC)
I think that means the bot's working. The bot improved the article, and you improved it further. More meaningful information was added at both steps, which is why reversion wasn't appropriate. Why not appreciate each other's work instead of dogging on each other? – Quadell (talk) 12:55, 21 August 2008 (UTC)
The problem here is one of scale. Of course it is better to fix the reference rather than revert. That's a basic editorial principle - if you see something that needs fixing, you fix it. But there's no way an editor - or a team of editors can keep up with and monitor the ground covered by these bots. There are too many articles and the bots are too fast. It's not possible. The bots should be tested on small datasets, the bugs removed and the design refined and then - when they've been perfected - used on large jobs. --Kleinzach 13:28, 21 August 2008 (UTC)
I wouldn't have called the bots edits "improvements", Quadell. They changed a url, which your average reader recognizes, into a pile of undecipherable text, which no reader recognizes. I didn't revert the edits (which would have been an improvement over the bot edit) but rather improved them even more than a revert would have. I didn't think this needed to be explicit, as this section isn't about criticizing his well-meaning bot so much as it is about building some useful statistics. Plasticup T/C 13:45, 21 August 2008 (UTC)

Progress towards a resolution of the DumZiBoT problems?

It's been a while since anyone posted to this discussion. Has there been progress behind the scenes? Has NicDumZ responded to the technical suggestions made by Rob and talk? It would be good to close this with a sense of something having been accomplished, rather than leaving everything hanging in the air. Regards. --Kleinzach 04:57, 29 August 2008 (UTC)

I've been implementing some of the suggested improvements to Reflinks tool. It now features a individual ref editor for each references it touches, with unused metadata displayed in a separate box. A list (updates daily) provides articles which need their titles added to the references, scanned from GAN, PR, FAC, and some WikiProjects on certain days.
One would hope that if a human being is editing, that he would verify the tool is working correctly? At least he can't hit save directly. But from what I've seen in the logs is somewhat depressing [60], [61]. Since DumZiBot identifies its edits would it not be generally better than these "human bots"? — Dispenser 00:19, 8 September 2008 (UTC)
No, I don't understand your second point. Your links are apparently to edits by a non-native speaker. Can you try again to clarify what you mean. Thanks. --Kleinzach 03:14, 10 September 2008 (UTC)
Read again, maybe ? it makes sense to me : Dispenser provides a tool to correct references, but from his tool's log, he sees that human-made reference correction does not imply better title quality. He hence suggests that having a unique flagged bot, identified as working on fixing references, and needing passive surveillance from real editors, is probably better than several "human bots" blindly "fixing" references without checking the added titles, the difference being that we are not supposed to know that non-flagged users perform automated edits without checking them: Consequence ? We assume that the "human bots" edits are valid while they're not, and no one checks them, leading to bad titles; whereas some editors do check the titles added by DumZiBoT, correcting and reporting the bad ones. 137.132.250.12 (talk) 03:59, 10 September 2008 (UTC)
Thank you. So his conclusion was on the basis of the work of one single 'human' editor? I thought I'd missed something . . . . --Kleinzach 05:10, 10 September 2008 (UTC)
Only a few people actually use the tool a day (about 20-80 hits/day). I wanted to avoid naming names, but if you want me to I can. — Dispenser 06:32, 10 September 2008 (UTC)