Wikipedia:Bots/Noticeboard/Archive 1
This is an archive of past discussions on Wikipedia:Bots. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
Bots with lack of information on user page
Yes the first post on the bot board! I have come across quite few user pages of bots that give little to no description of what the bot does. I'm guessing not all have them because the process of getting a bot approved hasn't always existed. I am suggesting creating a talk message template to put on bot owners talk pages to ask them to add more information regarding their bot on the talk page. Relevant information should include bot owner, all jobs that are carried out, and the language(s) the bot uses.--Andeh 18:41, 22 October 2006 (UTC)
- Yeah - good idea (second post :) - you just beat me to it :( ) - its helpful to have the information - I'm going to work on doing it for my bots soon :) Martinp23 18:42, 22 October 2006 (UTC)
- It might also be a good idea to require this when owners request approval for the bot or for a new task for it (or maybe this is already in policy?) Tizio, Caio, Sempronio 18:47, 22 October 2006 (UTC)
- Yeah, it's a policy already, but sometimes it's not followed. See the userpage section under Wikipedia:Bots#Policy. —Mets501 (talk) 18:49, 22 October 2006 (UTC)
- OK, I've created the template {{createbotpage}} to leave on the talk page of owners who did not set up adequate user pages for their bot. —Mets501 (talk) 19:07, 22 October 2006 (UTC)
- Yeah, it's a policy already, but sometimes it's not followed. See the userpage section under Wikipedia:Bots#Policy. —Mets501 (talk) 18:49, 22 October 2006 (UTC)
- It might also be a good idea to require this when owners request approval for the bot or for a new task for it (or maybe this is already in policy?) Tizio, Caio, Sempronio 18:47, 22 October 2006 (UTC)
Inactive Bots
I was starting to work to reconcile All flagged bots with WP:RBOTs and it seems that we may have numerous bots that are no longer in operation. Any comments on deflaging these accounts? — xaosflux Talk 04:58, 22 October 2006 (UTC)
- Since WP:BOT didn't say anything about deflagging inactive bots, maybe you want to change that first, adding a passage saying "Bots will be deflaged and blocked indefinitely after inactive for XXX, if the operator wishes to continue to run he/she has to request again in WP:BRFA." etc before you start deflagging .--WinHunter (talk) 05:13, 22 October 2006 (UTC)
- Thanks for the reply, hopefuly there will be more, I'm not saying that these should all get deflagged, just attempting to determine what the consensus is on this. — xaosflux Talk 05:45, 22 October 2006 (UTC)
- Another alternative is block indefinitely bots that are inactive for XXX (say 3 months), and then only deflag the bots when it has been inactive for over a year. This would require less work from the burecrats and less work has to be done should the bot owner decides to run the bot again (between 3 - 12 months), after which the possibility is rather slim. --WinHunter (talk) 17:16, 22 October 2006 (UTC)
- Why would we want to block a bot acount? I can understand de-flagging after a long period (i.e. a year), but a block is never necessary. Martin 17:43, 22 October 2006 (UTC)
- Seconded. Don't block (can cause collateral damage) and a year is good enough. Or do you want me to make ping edits with ligulembot? --Ligulem 21:46, 24 October 2006 (UTC)
- That gave me the idea for a wonderful bot: PingBot, a bot with no other function than showing that it is active. I know it will hardly be approved, but if it does, it will never be deflagged. Tizio, Caio, Sempronio 21:58, 24 October 2006 (UTC)
- Seconded. Don't block (can cause collateral damage) and a year is good enough. Or do you want me to make ping edits with ligulembot? --Ligulem 21:46, 24 October 2006 (UTC)
- Why would we want to block a bot acount? I can understand de-flagging after a long period (i.e. a year), but a block is never necessary. Martin 17:43, 22 October 2006 (UTC)
- Another alternative is block indefinitely bots that are inactive for XXX (say 3 months), and then only deflag the bots when it has been inactive for over a year. This would require less work from the burecrats and less work has to be done should the bot owner decides to run the bot again (between 3 - 12 months), after which the possibility is rather slim. --WinHunter (talk) 17:16, 22 October 2006 (UTC)
- Thanks for the reply, hopefuly there will be more, I'm not saying that these should all get deflagged, just attempting to determine what the consensus is on this. — xaosflux Talk 05:45, 22 October 2006 (UTC)
- I'd support deflagging idle bots, encase they are compromised by vandals/malicious users. This should go to the bot noticeboard. :) --Andeh 18:44, 22 October 2006 (UTC)
- Created a list of flagged bots at User:Andypandy.UK/Flaggedbots.
About 1/2 of all bots are currently active, see User:Voice_of_All/Bots.Voice-of-All 06:31, 23 October 2006 (UTC)
- Unflag all bots which have not edited for 6 months?--Andeh 14:58, 24 October 2006 (UTC)
- I propose block (e.g. with block reason as: "bot deflagged, please request permission again at WP:BRFA if you wish to continue to run the bot") because it would prevent the bot owner to accidentally reactivate the bot (didn't noticed the deflag) and flood recent changes. --WinHunter (talk) 17:20, 24 October 2006 (UTC)
- I don't really have an opinion on deflagging inactive bots, but I do think that if bots are deflagged they should also be blocked (per WinHunter above) —Mets501 (talk) 20:51, 24 October 2006 (UTC)
- Well flagged bots don't show on recent changes by default, so if they are compromised in anyway they may cause vandalism which may not be detected for a while.--Andeh 00:56, 25 October 2006 (UTC)
- It' much easier to compromise an active account than a parked one. So should we deflag active bots? -- Drini 23:51, 26 February 2007 (UTC)
- Why would someone compromise a bot account when they could do far more damage by compromising a sysop account? Or, for that matter, a bureaucrat/checkuser/oversight account? There are inactive sysop accounts at the very least. If I was going to hack into an account I think the bot accounts would be rather low on my list.--Dycedarg ж 01:49, 27 February 2007 (UTC)
- I don't really have an opinion on deflagging inactive bots, but I do think that if bots are deflagged they should also be blocked (per WinHunter above) —Mets501 (talk) 20:51, 24 October 2006 (UTC)
- I propose block (e.g. with block reason as: "bot deflagged, please request permission again at WP:BRFA if you wish to continue to run the bot") because it would prevent the bot owner to accidentally reactivate the bot (didn't noticed the deflag) and flood recent changes. --WinHunter (talk) 17:20, 24 October 2006 (UTC)
- We could have a bot to do this... Rich Farmbrough, 23:40 27 December 2006 (GMT).
- If you were to deflag, should should probably have the reflag be a quick and easy process for the bot owner, rather than the few months required for an initial flag. --TheJosh 02:13, 27 February 2007 (UTC)
- Dropping in here: If it is decided to do so, I don't have a problem doing the deflags, if someone will put together a list of the ones that need to be deflagged; it should be fairly short order to fix up links with automatic summaries and go through clicking the links. I would oppose the idea proposed above of blocking these bots; if the bot operator is active, it could end up autoblocking them, and really, there is no need to block the accounts as long as the operator has the password. Essjay (Talk) 02:17, 27 February 2007 (UTC)
Userbox
I created a userbox for a bot owner. Use it if you like. -- Ganeshk (talk) 21:24, 22 October 2006 (UTC)
It produces:
This user runs a bot, Ganeshbot (contribs). It performs tasks that are extremely tedious to do manually. |
- Seems good to me. What would you think of adding a link to Special:Contributions/BotName after the bot name? Tizio, Caio, Sempronio 23:01, 22 October 2006 (UTC)
- I've added it :-) —Mets501 (talk) 23:09, 22 October 2006 (UTC)
- I have linked it little differently to keep the userbox small. -- Ganeshk (talk) 23:24, 22 October 2006 (UTC)
- Awesome! It's on my userpage now :-) —Mets501 (talk) 00:50, 23 October 2006 (UTC)
- Top stuff. I've added this to my userpage too :-) - PocklingtonDan 20:02, 7 December 2006 (UTC)
- Awesome! It's on my userpage now :-) —Mets501 (talk) 00:50, 23 October 2006 (UTC)
- I have linked it little differently to keep the userbox small. -- Ganeshk (talk) 23:24, 22 October 2006 (UTC)
Regex help
Maybe someone watching this page can help me. I'm converting old-style {{PDFlink}} usage to the new style. The old style had a variety of ways that it could be, including
[http://LINKGOESHERE] {{pdflink}} [http://LINKGOESHERE]({{pdflink}}) {{pdflink}}[http://LINKGOESHERE] ({{pdflink}}) [http://LINKGOESHERE]
The new style is standardized, and looks like this:
{{tlp|pdflink|[http://LINKGOESHERE]}}
I was trying to create a regex (or two) to convert it, and I came up with these:
Find | Replace |
---|---|
(\(|)\{\{(PDFlink|Pdf|Pdflink)\}\}(\)|)( |)(\[http(.*?)\]) | {{$2|$5}} |
(\[http(.*?)\])( |)(\(|)\{\{(PDFlink|Pdf|Pdflink)\}\}(\)|) | {{$5|$1}} |
to match the two general cases (link before or after template). The only problem is that if there are two links in one paragraph, it will match the entire thing as one link (for example, see this sandbox edit). Can anyone help? —Mets501 (talk) 15:18, 29 October 2006 (UTC)
- The .* part matches the longest string such that what follows is matched as well. In the assumption that ] is not part of a url (I think this is the case, but you may want to check), you can use [^]]* in place of .*
- This would match the longest possible string that does not contain ], so I think that would work in your case. Tizio, Caio, Sempronio 23:13, 29 October 2006 (UTC)
- Thanks! I'll test it out. —Mets501 (talk) 00:14, 30 October 2006 (UTC)
Code to check for a new message
Is there a Python code somewhere that checks for a new message on the talk page and stops the bot? A list of these little code snippets would be really useful. Thanks, Ganeshk (talk) 23:50, 29 October 2006 (UTC)
- I do not use python, but this might help: you can load http://en.wikipedia.org/w/query.php?what=userinfo&uiisblocked&uihasmsg and check if the result includes "<messages />", which means then new messages are present (the result also tells whether the user is blocked). Generating some python code shouln't be difficult, I think. Tizio, Caio, Sempronio 13:13, 30 October 2006 (UTC)
- yes the python has a chech for new messages look up the pywikipedia bot frameworkBetacommand (talk • contribs • Bot) 14:21, 30 October 2006 (UTC)
- I really appreciate the suggestions. Thanks, Ganeshk (talk) 08:03, 3 November 2006 (UTC)
- yes the python has a chech for new messages look up the pywikipedia bot frameworkBetacommand (talk • contribs • Bot) 14:21, 30 October 2006 (UTC)
Help with project template
I am setting up the India project template as a mini talkpage template. I am trying to get rid of extra spaces on my template, User:Ganeshk/sandbox2. Please check the talk page. I want to get rid of the space between the boxes in the userboxes on the right. Could anyone please check the template and help me fix the problem? -- Ganeshk (talk) 08:02, 3 November 2006 (UTC)
Help with getting redircts to page
Does anyone know of an efficient way to get only the redirects to a certain page in python (in the pywikipedia package)? I'm using
referredPageTitle = wikipedia.input(u'Links to which page should be processed?') referredPage = wikipedia.Page(wikipedia.getSite(), referredPageTitle) gen = pagegenerators.ReferringPageGenerator(referredPage) gen = pagegenerators.RedirectOnlyPageGenerator(gen)
now, but it's really slow, and there has to be a better way. I created RedirectOnlyPageGenerator as
class RedirectOnlyPageGenerator: """ Wraps around another generator. Yields only those pages that are redirects. """ def __init__(self, generator): self.generator = generator def __iter__(self): for page in self.generator: if page.isRedirectPage(): yield page
—Mets501 (talk) 19:14, 12 November 2006 (UTC)
Watchlisting
No exactly for writing a bot here (more - a program :P), but I though some people here might know. How do you put pages on the watchlist? I've got wpWatchThis= in the post string, but does "true/false" or "on/off" go after the brackets. I'll try some testing if no-one knows. Martinp23 20:30, 18 November 2006 (UTC)
- Just append "&action=watch" or "&action=unwatch" to the url. That should work. —Mets501 (talk) 03:53, 19 November 2006 (UTC)
- wpWatchThis is for adding a page to your watchlist while saving a new content for it. If this is what you want to do (and not just adding a page to your watchlist, which can be done as Mets501 said), I think that any value of wpWatchThis (including 0, false, and off) will do that; the edit form uses wpWatchThis=1. You unwatch by not passing wpWatchThis at all. Tizio 17:00, 19 November 2006 (UTC)
- Hmm - I've tried both of these (though Tizio's suggestion is more what I'm looking for), and unfortunately neither added the page to the watchlist. Do you have any more ideas? Thanks Martinp23 19:13, 22 November 2006 (UTC)
- Both what Mets501 and I said should add pages to the user's watchlist. We need some more details, such as: what are you using for HTTP transfer? Does the page being saved (in case you are using my suggestion)? Which HTTP header/content do you get? Tizio 19:45, 22 November 2006 (UTC)
- Sorry for the time taken for me to reply. I'm just using a dll I've constructed for myself (in .NET2) using an httprequest. i'm going to do further digging into this - it may just be something really simple I've overlooked. Martinp23 22:45, 30 November 2006 (UTC)
- I'm sure this is just a crazy stupid obvious question, but are you doing the edits from the same account as account checking the watchlist? This isn't a separate bot account adding to its own watchlist when you want it to show up on the main account watchlist? -- RM 12:55, 22 March 2007 (UTC)
- Sorry for the time taken for me to reply. I'm just using a dll I've constructed for myself (in .NET2) using an httprequest. i'm going to do further digging into this - it may just be something really simple I've overlooked. Martinp23 22:45, 30 November 2006 (UTC)
- Both what Mets501 and I said should add pages to the user's watchlist. We need some more details, such as: what are you using for HTTP transfer? Does the page being saved (in case you are using my suggestion)? Which HTTP header/content do you get? Tizio 19:45, 22 November 2006 (UTC)
- Hmm - I've tried both of these (though Tizio's suggestion is more what I'm looking for), and unfortunately neither added the page to the watchlist. Do you have any more ideas? Thanks Martinp23 19:13, 22 November 2006 (UTC)
Arrows in edit summaries
I'm wondering about the possibility of better-standardised edit summaries for bots and scripts; please leave any comments at Wikipedia talk:WikiProject User scripts#Arrows in edit summaries. --ais523 13:16, 4 December 2006 (UTC)
Help please - adding new comment using Perl script
Hi, I am trying to develop a helpful little bot called PockBot but I've not got massive amounts of experience with Perl or Wikipedia. I am trying to make the bot post a new comment to a given page (after doing a bunch of stuff irrelevant to this problem). However, I have run up against all kinds of problems with edit tokens etc.
I understand that you need to use a GET request to get the form-field page, screen-scrape the edit token off it, and then use this in submitting a second GET request to post the actual comment. However, I've found that if I have the bot make the same HTTP GET request as I manually type in myself, I am presented with an edit token and the bot isn't and so when the bot tries to write its data to the page, on viewing the page post-edit the page is as before, ie the edit has been ignored.
Does anybody have any simple Perl code chunk for performing these actions? I presume that everyone must come up against this hurdle. I have tried looking at Pearle bot but couldn't get code chunks from that to work either.
I'm not a complete novice perl coder, but i'm not a coding expert either. Any help appreciated!
PocklingtonDan 17:48, 5 December 2006 (UTC)
Before I switched to pywikipedia and used Perl, I found a very neat module: HTML::Form. It can parse the HTML text and extract forms:
use HTML::Form; $form = HTML::Form->parse($html, $base_uri); $form->value(query => "Perl");
You can then modify the form any way you want (read the documentation to see how) and finally, it's "click" method provides a ready-to-use-by-a-user-agent HTTP request:
use LWP::UserAgent; $ua = LWP::UserAgent->new; $response = $ua->request($form->click);
Hope it helps. Миша13 18:20, 5 December 2006 (UTC)
- THank you, I will take a look at HTML::Form, although I tend to cringe reading manpage-equivalents on modules, most if it goes right over my head. - PocklingtonDan 18:58, 5 December 2006 (UTC)
- OK, this doesn't work any better than my earlier code. Here's my code snippet:
use HTML::Form; my $ua = LWP::UserAgent->new; my $response = $ua->get("http://en.wikipedia.org/w/index.php?title=Category_talk:Roman_frontiers&action=edit§ion=new"); my $form = HTML::Form->parse($response); my $text = $form->find_input('wpTextbox1')->value; my $summary = $form->find_input('wpSummary')->value; my $save = $form->find_input('wpSave')->value; my $edittoken = $form->find_input('wpEditToken')->value; my $starttime = $form->find_input('wpStarttime')->value; my $edittime = $form->find_input('wpEdittime')->value; print "Content-type: text/html\n\n"; print "Text field: $text<br><br>"; print "Summary: $summary<br><br>"; print "Save: $save<br><br>"; print "Edit token: $edittoken<br><br>"; print "Start Time: $starttime<br><br>"; print "Edit Time: $edittime<br><br>"; exit;
- It gets the values fine from the form except the edit token - the edit token is returned as "\" rather than the long number it should be. (You can see the output for the above yourself here) I'm certain someone must have come across this problem before???/ - PocklingtonDan 19:17, 5 December 2006 (UTC)
- Try asking User:Shadow1Voice-of-All 19:29, 5 December 2006 (UTC)
- Thanks, I have put a note on his talk page, I'll see if he can help. This problem makes me want to bang my head against the wall! - PocklingtonDan 20:51, 5 December 2006 (UTC)
- It appears that your bot is not logged in (or, it is not sending the cookies). You can check by yourself that you got "\" as the edit token if you logout and open an edit form. This is actually quite strange, because the normal behavior of MW was not to send that edit token at all for non-logged-in users; also for logged-in users, the edit token didn't include the final backslash that now I see. It is also strange that you don't get the edit being made as an anon. Tizio 23:17, 5 December 2006 (UTC)
- Thanks, I have put a note on his talk page, I'll see if he can help. This problem makes me want to bang my head against the wall! - PocklingtonDan 20:51, 5 December 2006 (UTC)
- Try asking User:Shadow1Voice-of-All 19:29, 5 December 2006 (UTC)
Well, for starters, try my Perl bot framework, Perlwikipedia. That should give you all the code you need to edit pages, without all the mucking about with edit tokens. Shadow1 (talk) 00:55, 6 December 2006 (UTC)
- Oh, by the way, if you don't want to use that, use the WWW::Mechanize module. It automatically pulls all the necessary variables when you fetch the edit form and submit it. That's the module I use for Shadowbot and Perlwikipedia. Shadow1 (talk) 01:48, 6 December 2006 (UTC)
- Thanks guys, I think perlwikipedia is going to do the job perfectly - it looks like it was my cookie code rather than my form-editing code that was the problem - perlwikipedia's cookie code works great and I'm now getting an edit token returned (albeit with trailing slash). I think I should be able to have PockBot make logged-in edits now, thanks for al the help! - PocklingtonDan 08:44, 6 December 2006 (UTC)
- No problem, glad it worked. Shadow1 (talk) 13:09, 6 December 2006 (UTC)
Oh, by the way: As far as I can tell, the trailing slash is part of the edit token. Shadow1 (talk) 18:49, 6 December 2006 (UTC)
- Thanks, I left the slash in when returning the token and it works great. The bot is now in trial. Thanks for your help :-) - PocklingtonDan 19:33, 6 December 2006 (UTC)
- Glad it worked. Good luck with your bot! Shadow1 (talk) 19:42, 7 December 2006 (UTC)
Incidentally, I have found the explanation of this mysterious backslash at the end of the edit token: r18112. Tizio 16:03, 10 December 2006 (UTC)
- Wow, I bet that took some digging! Nice find :-) - PocklingtonDan 16:30, 10 December 2006 (UTC)
{{AWB bot}}
I've created this template as a more specialized version of {{bot}} for AWB users. MaxSem 16:16, 30 December 2006 (UTC)