Jump to content

User talk:GreenC bot/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Bad edit

[edit]

Hi GreenC bot. Your recent edit to 2010 Turkish Grand Prix kinda borked the article. Regards. DH85868993 (talk) 09:43, 27 May 2016 (UTC)[reply]

Yeah subtle bug. Found and fixed thanks. -- GreenC 16:10, 27 May 2016 (UTC)[reply]
You're welcome. DH85868993 (talk) 07:24, 28 May 2016 (UTC)[reply]

Another thing, which I also noticed on a Formula One article: The bot uses a different date format from what we usually do in the WikiProject, so I had to change it back manually... Maybe there is a way to work that out as well? Zwerg Nase (talk) 12:29, 31 May 2016 (UTC)[reply]

Like all bots it will use the date format for that article defaulting to MDY if none set. See {{use dmy dates}}} or {{use mdy dates}}. -- GreenC 13:58, 31 May 2016 (UTC)[reply]

Suspect edit reverted

[edit]

Hi, I'm not sure what happened here, but I've reverted it. If I'm at fault, please revert. Thanks. —Bruce1eetalk 06:00, 2 June 2016 (UTC)[reply]

Thanks you. This shouldn't have happened and I can't figure out where or how it occurred, logs look fine as if it skipped processing the article, but I'll add extra verification the data isn't blank. -- GreenC 13:38, 2 June 2016 (UTC)[reply]
I've never seen this before and the bot definitely passed the right data so it looks like an AWB bug. AWB had page blank bugs in the past that were hard to track down so it could be another. AWB has an option to skip pages based on a regex pattern so I'll include a skip if it doesn't contain a non-whitespace character (ie. skip on blank page). -- GreenC 14:00, 2 June 2016 (UTC)[reply]
Thanks for the feedback – hope you'll come right with this. —Bruce1eetalk 14:29, 2 June 2016 (UTC)[reply]

GIGO

[edit]

This caused a problem by removing part of a nowiki tag. It probably is just a case of garbage in, garbage out, but I don't know why the bot made that edit. Writing just in case there is a bug. Bgwhite (talk) 04:56, 7 June 2016 (UTC)[reply]

Found another one. Bgwhite (talk) 07:12, 7 June 2016 (UTC)[reply]

Yeah those are bugs good catch, you are right it's garbage data that confuses the bot. -- GreenC 14:48, 7 June 2016 (UTC)[reply]

Fixed. -- GreenC 15:59, 7 June 2016 (UTC)[reply]

Nice work

[edit]

I love this bot. Kendall-K1 (talk) 14:07, 9 June 2016 (UTC)[reply]

Thanks. Link rot includes Wayback itself. Watching the watchers. -- GreenC 14:34, 9 June 2016 (UTC)[reply]
I like how it also cleans up after Cyberbot. Kendall-K1 (talk) 00:04, 10 June 2016 (UTC)[reply]

Bug? Comment

[edit]

[This edit] didn't seem to help. Wayback just has archives of 404 pages but the Bot edit didn't help. Not complaining, just letting you know as I guess the bot is new. Wonderful to see all these tedious changes being made by the bot, great job. Thanks. Fettlemap (talk) 17:37, 10 June 2016 (UTC)[reply]

There's a hidden redirect somewhere the bot (and I) can't see, pointing to a 404 page. The page headers return 200 (OK) so the bot thinks it's a normal working page. Basically it's a soft 404. The bot makes a best effort to detect soft 404's using various techniques but sometimes there is nothing it can do if the page is misconfigured returning a 200. You did the right thing with the delete and cbignore template. Glad you like the bot, this revision is almost complete running and I'll post final stats soon. -- GreenC 17:59, 10 June 2016 (UTC)[reply]

Findarticles

[edit]

@Green Cardamom: I don't know how your bot works yet (haven't really had time too look in the docs), but do you think you can fix the following?

Could your bot mass-tag all links to findarticles.com (inlcuding links to wayback.archive.org/web/$1/*findarticles.com; since their robot.txt was updated and all archives was removed from wayback). Archive.is however seems to still have copies, but that site is blacklisted. See Wikipedia:Bot requests/Archive 69#findarticles.com for more info. (tJosve05a (c) 14:40, 9 July 2016 (UTC)[reply]

@Josve05a: .. archive.is is no longer blacklisted there was a new RfC, but, a large fraction of the pages are soft-404s. What they did is pull down stuff from WayBack without regard to header status codes, effectively turning everything into a status 200 even if it was a 404. Archive.is should be added manually unless you have an idea how a bot could determine a soft-404 on an archive.is page (which might be possible if focused only on findarticles.com). Cyberbot II (User:cyberpower678) is currently in bot approval to work on dead links that are not yet tagged .. it will seek out dead links and try to replace them with a working copy at archive.org and if not then leave a dead tag. -- GreenC 16:34, 9 July 2016 (UTC)[reply]
Ok, but currently there are archiveurl's with links to wayback for findarticles.com, but those archives are dead (not working). Example: http://web.archive.org/web/20100701225947/http://findarticles.com/p/articles/mi_qn4161/is_20010902/ai_n14532259/. Could you bot de-archive those? (tJosve05a (c) 16:42, 9 July 2016 (UTC)[reply]
Yep. There is an open Bot Approval for that. Fix #4. In dry runs and trial and I have seen many findarticles.com get removed (archiveurl) and tagged dead. -- GreenC 17:28, 9 July 2016 (UTC)[reply]

Talk pages

[edit]

I don't know what else this bot does, but is there any purpose to it fixing minor typos on Talk pages like this: [1] ? Derek Andrews (talk) 14:40, 12 August 2016 (UTC)[reply]

User_talk:Cyberpower678#necessaryily -- GreenC 14:40, 12 August 2016 (UTC)[reply]
Yes, please don't run this bot on talk pages, as here. You should not be refactoring comments, even by correcting typos.— TAnthonyTalk 15:34, 12 August 2016 (UTC)[reply]
Ah, I see now that this was a request, perhaps next time you need to do something like this, your edit summary can be more explanatory. Thanks!— TAnthonyTalk 15:38, 12 August 2016 (UTC)[reply]

Another talk page edit is here. It is alarming that the bot is changing discussion text written by others. See WP:TALK#Others.27_comments where it says:

1.6 Editing comments
1.6.1 Others' comments
It is not necessary to bring talk pages to publishing standards, so there is no need to correct typing/spelling errors, grammar, etc. It may irritate the users whose comments you are correcting. The basic rule—with some specific exceptions outlined below—is that you should not edit or delete the comments of other editors without their permission.
Never edit or move someone's comment to change its meaning, even on your own talk page.

As the bot was making many of these kinds of edits, I have blocked it.

EncMstr (talk) 16:40, 12 August 2016 (UTC)[reply]

See discussion linked above, linked in the edit summary, and linked here: User_talk:Cyberpower678#necessaryily .. the edit is made by permission of the person who made the error. -- GreenC bot (talk) 16:44, 12 August 2016 (UTC)[reply]
EncMstr, could you please unblock this account? -- GreenC bot (talk) 16:45, 12 August 2016 (UTC)[reply]
Now that I have looked in more detail, I see it was fixing the edits of another bot. Unblocked. —EncMstr (talk) 16:49, 12 August 2016 (UTC)[reply]
Suggestion: Besides the link already in the edit summary, add "By request of <bot XXX's owner>, fixing typo by <bot XXX>" —EncMstr (talk) 16:53, 12 August 2016 (UTC)[reply]
Done. -- GreenC bot (talk) 16:58, 12 August 2016 (UTC)[reply]
Looks good! Thanks, —EncMstr (talk) 17:00, 12 August 2016 (UTC)[reply]

Incomplete accessdates

[edit]

See, for example, what the bot did to 2008 in aviation in May -- it's creating month/year accessdates instead of d/m/y -- this creates a CS1 error, including red text in the ref list. Don't know how many more of these there are, but if you could clean them up, it would save me a lot of manual work. At a minimum, it should stop doing this (i.e., use complete accessdates) going forward. Floatjon (talk) 22:32, 25 August 2016 (UTC)[reply]

Floatjon I'm aware of it. The bot is fixed going forward. The CS1 error is new added after the bot ran in May and June. My records show this impacted 174 links, out of the 375,000 links targeted. I just sent an AWB command through and it corrected 118 so I assume the rest were already fixed manually yourself and others. -- GreenC 00:01, 26 August 2016 (UTC)[reply]

Disaster. Why? Stop.

[edit]
I don't think you understand what the bot does. If an archive is not working then it is removed. In the second case "You kept on doing it" the bot did the correct thing because the link is not working. The first case I will research what caused it but it's likely due to robots.txt at census.gov and/or a problem at Wayback. -- GreenC 17:33, 30 August 2016 (UTC)[reply]

What I discovered is a cluster of about 20 articles, in a row, that had false-positive deletion of links. The rest of the edits look OK. The proximate cause is bad data from the Wayback API during that period of processing (perhaps 10 minutes). That's OK the bot is prepared for that, there are many backups precisely to avoid this situation. Unfortunately the backups also failed. I don't have a reason why because it's unclear what the network situation was at the time of running - was wayback down down but returning a 200 status? This is the first time I've seen this, hopefully the last, but will think about some additional backup testing that could be done. I've reverted the edits by the bot for those ~20 articles. -- GreenC 21:22, 30 August 2016 (UTC)[reply]

A new procedure now in place should greatly reduce the possibility of this happening again. -- GreenC bot (talk) 02:24, 31 August 2016 (UTC)[reply]

Thanks for sorting it out. I commented more at Administrators' noticeboard/Incidents#Bad bot edits on this and my robot grief generally.

Now GreenC bot has done something wacky here. It mangled a date by replacing only part of the date. Then the next editor fixed the date by deleting the leftover bits of the old date (it seems to effect the original intent). I confirm that the old archive.org link reports the familiar "robots.txt" problem, so it is acceptable to delete it. But how did that link get replaced by a webcitation.org link? Does webcitation.org ignore "robots.txt"? (If yes, would using webcitation.org be okay by Wikipedia policy?) And how would you find the cited content there? And finally, that webcitation.org link seems wrong - it says it is a cache of http://www.geeky.net/images/webbadge.gif . -A876 (talk) 03:53, 31 August 2016 (UTC)[reply]

The mangled date problem is an old bug fixed in the bot, and fixed in the pages some by Floatjohn and some by me. webcitation.org is not a crawler ie. not a bot, so it doesn't respect robots.txt, it's a human-driven archive. Websitation pages are found via the Mementoweb.org indexing service which returns a dozen or so different archive services. The bot tries Wayback first if it can't find it there it uses the Mementoweb.org index to find other archives if available .. mostly webcitation and loc.gov but not many of either .. there is no way a bot can verify a page contains the right content only header status, it trusts the archive has what it says it has. I've seen webcitation return small images like this before, maybe it will be possible to check for it before accepting the link (initial checks show not possible but will keep looking). -- GreenC 13:24, 31 August 2016 (UTC)[reply]

  • The bot now preserves the hyphen in "archive-date" if it exists, otherwise use "archivedate" if that exists. There's no requirement to use one or the other, both are acceptable, but a bot shouldn't arbitrarily change the style rather preserve what exists. If it adds a new field, it will have the hyphen. -- GreenC 18:02, 31 August 2016 (UTC)[reply]

yyyy-mm-dd

[edit]

For change https://en.wikipedia.org/w/index.php?title=Lexus&type=revision&diff=736982496&oldid=736038388 , the bot changed a yyyy-mm-dd style archive-date into a mdy style archive-date. References are allowed to have yyyy-mm-dd dates even if the article has a {{mdy}} tag on it.  Stepho  talk  03:02, 31 August 2016 (UTC)[reply]

Understood the bot should preserve existing format in citation args. It does most places this particular change (modification of snapshot date) I need to update the code. -- GreenC 03:08, 31 August 2016 (UTC)[reply]
Ok, cool.  Stepho  talk  03:41, 31 August 2016 (UTC)[reply]

Duplicate arguments?

[edit]

Hi, take a look at Princess Leia and Grand Admiral Thrawn, the Green C bot added (apparently) redundant archive dates to a citation, which were removed as duplicates by Magioladitis, which were then added again by GreenC, only to be removed as duplicates by Sporkbot. Can you figure this out? Thx — TAnthonyTalk 04:51, 31 August 2016 (UTC)[reply]

Yeah I will look into this today, and have a log of potential affected articles (it won't be too many). Maybe I send the list to SporkBot to target. Good thing an existing bot already has a fix available. -- GreenC 13:28, 31 August 2016 (UTC)[reply]
Found 3 cases, the two above plus one other already fixed by Sporkbot. -- GreenC 13:56, 31 August 2016 (UTC)[reply]

This is caused by "}}" inside the template. regex sees it as the end of the citation, and since the archivedate comes after it thinks the field missing. What I'll do is log the cases without pushing updates to the article and manually check, there were 3 instances in 50,000 articles, and all there were of this type with a template inside a template. This is a new features (checking for missing archivedate) and so far it hasn't found one actual case so it probably could be retired entirely as more trouble than its worth. -- GreenC 17:50, 31 August 2016 (UTC)[reply]

Suspect edits reverted

[edit]

I reverted these two edits of yours ([2],[3]). I take it they were bot errors. —Bruce1eetalk 05:05, 31 August 2016 (UTC)[reply]

Couple more have popped up which I've reverted ([4], [5]). —Bruce1eetalk 06:26, 31 August 2016 (UTC)[reply]
Thanks for the report. Good and bad news. Bad news this problem has deviled me for a long time and have concluded it's caused by AWB which I have no control over. Good news is I found a workaround within AWB itself by using regex to search the page for any non-whitespace character (post page processing), and on failure skip the article. This seems to be working. -- GreenC 13:33, 31 August 2016 (UTC)[reply]
Thanks. I hope that stops the bot blanking articles. —Bruce1eetalk 13:53, 31 August 2016 (UTC)[reply]
I believe it's fixed. No blanking errors in over 4000 edits which is some kind of ignoble record. -- GreenC 18:15, 31 August 2016 (UTC)[reply]
Good ... let's hope it lasts. —Bruce1eetalk 05:22, 1 September 2016 (UTC)[reply]

Notifications

[edit]

Would you please consider marking your bot edits as minor? The edits are generating literally hundreds of email notifications for very minor edits. Also, I notice above it has already been drawn to your attention that the bot is using incorrect/inconsistent date formats, just one example. SagaciousPhil - Chat 11:44, 31 August 2016 (UTC)[reply]

Date problem noted in "yyyy-mm-dd" thread above. I'm using AWB with external script there is no mechanism to set minor on a per-edit basis, either all edits are minor or not. I could try setting it to minor since most edits are but if anyone complains I will probably set it back to err on the safe-side. -- GreenC 13:38, 31 August 2016 (UTC)[reply]
Perhaps a better solution would be to do separate runs for minor edits and non minor then? Do you realise that many editors may be on restricted data plans (as I am) and the huge volume of email notifications being generated by your bot are causing problems? SagaciousPhil - Chat 13:48, 31 August 2016 (UTC)[reply]
No way of programmatically determining what a minor edit would be, it would be too complex. There's a mix of edit types and amounts in articles where to draw the line is subjective. It would also make the process of running the bot difficult. The bot doesn't generate email. Email settings are voluntary in the user options menu (most people have it turned off). This is not the only bot running now or in the future. Finally this is a 1 time bot cleaning up 14 years of accumulated errors, once it's done future runs if any will be a lot less edits. -- GreenC 14:15, 31 August 2016 (UTC)[reply]
I know the bot doesn't generate email - I clearly stated "the edits are generating ..." in my first comment; I also know about the options in Preferences, thanks. This may be a "1 time cleaning" but it already looks as if there are going to be dozens of corrections to be made re: the date errors etc. or are editors expected to go back and correct your bots errors? I see there is already a thread on AN/I about problems with this bot. SagaciousPhil - Chat 14:26, 31 August 2016 (UTC)[reply]

inside <pre> tags

[edit]

With this edit the bot 'fixed' what appears to be a broken archive link inside <pre>...</pre> tags. Probably should not be operating inside these tags?

Trappist the monk (talk) 13:48, 9 September 2016 (UTC)[reply]

This is fixed now I believe. -- GreenC 22:38, 9 September 2016 (UTC)[reply]

Great job

[edit]

I just wanted to say that the Bot is doing a great job! It is beneficial for the project so, I support it fully,--BabbaQ (talk) 20:57, 9 September 2016 (UTC)[reply]

Article hijacked

[edit]

No idea what happened here, but it appears to call for urgent investigation: Noyster (talk), 12:34, 12 September 2016 (UTC)[reply]

Previously reported here. -- GreenC 12:37, 12 September 2016 (UTC)[reply]

Bot flag?

[edit]

Is there a reason why this bot is not making use of the bot flag for edits like this? Headbomb {talk / contribs / physics / books} 15:30, 12 September 2016 (UTC)[reply]

The bot uses the Pywikibot framework for posting which correctly flags bot edits. The b symbols are visible in watchlist not article history. -- GreenC 16:49, 12 September 2016 (UTC)[reply]
My bad, I should have remembered that. Move along! Headbomb {talk / contribs / physics / books} 16:58, 12 September 2016 (UTC)[reply]

Incorrect bot edit?

[edit]

I am not sure why this edit was made by the bot. As far as I can see, the Archive.org link is working. — SMUconlaw (talk) 11:21, 13 September 2016 (UTC)[reply]

This is due to the website operator of law.smu.edu.sg changing robots.txt after the article was processed (there is a 24-36hr delay between processing and uploading). Compare August 12 (deny everything) and August 13 (deny some things). I notice you are affiliated with the website, and that the robots.txt is regularly fluctuating between states of denying everything, and denying only some things. It looks like it could be some kind of error. Unfortunately our systems are not setup to deal with regular fluctuations in robots policy changes as monitoring 1+ million links for daily robots changes is beyond our resources. -- GreenC 12:56, 13 September 2016 (UTC)[reply]
Oh, I see. I'm afraid I have nothing to do with how the website operates; I just teach at that university. — SMUconlaw (talk) 17:33, 13 September 2016 (UTC)[reply]

Appears to be making random, unchecked changes...

[edit]

...the bot appears to be making changes to dates of versions of references, etc., not checking the content. Grateful if you can press pause until it is un-bugged. Hchc2009 (talk) 20:06, 14 September 2016 (UTC)[reply]

If you look closer, it is correcting archive dates which do not match what the actual archive date is (the correct date is part of the url).— TAnthonyTalk 20:49, 14 September 2016 (UTC)[reply]
I doubt that the user Green C has been checking whether the date or the archive date is correct... Especially given that they haven't been leaving a relevant edit summary. If you look at the pace of the edits, there's no way they've been checking their work Anthony. Hchc2009 (talk) 21:00, 14 September 2016 (UTC)[reply]
GreenC bot is a bot, not a person. If your going to halt the bot by posting on this page (see notice at top) you will need to post a diff demonstrating an unambiguous problem. Otherwise post on my Green Cardamom talk page. -- GreenC 21:09, 14 September 2016 (UTC)[reply]
From the name, I'm assuming it has some connection to you, however...? If you're changing a citation, Green Cardamon, you need to personally check that the changed citation matches up with the content. You also need to leave an informative edit summary. "WaybackMedic 2" doesn't tell another editor what changes you've made, or why. Hchc2009 (talk) 21:13, 14 September 2016 (UTC)[reply]
The bot has approval to run as noted at the bottom of the page WaybackMedic 2. I don't need to personally check every edit -- it's a bot. Please familiarize yourself with bots on Wikipedia. The edit summary is a limitation of the tool I'm working with (AWB external script). A feature has been requested for it. In the mean time I made a detailed page explaining what the bot does. -- GreenC 21:17, 14 September 2016 (UTC)[reply]
I'm sorry that you haven't got time to add edit summaries to your changes, but edit summaries are important to other editors. There's some helpful advice at Help:Edit summary. You might want to consider editing normally until you've got your feature sorted... 21:20, 14 September 2016 (UTC)
"Haven't got time" is not what I said. This is request with the AWB development team. The request is over 2 years old. If you know some way to speed up that process that would be great. -- GreenC 21:23, 14 September 2016 (UTC)[reply]
Hchc2009, the edit summary contains a link to User:Green Cardamom/WaybackMedic 2 which explains the bot's function in detail, I don't see any problem here.— TAnthonyTalk 21:26, 14 September 2016 (UTC)[reply]

That's not a very helpful edit summary, however, as it still forces other editors to click on the difference to find out what Green Cardamon has actually done... I'm not entirely convinced that saying "I know there's been a problem for two years but I'm carrying on doing it anyway" is ideal. Hchc2009 (talk) 21:30, 14 September 2016 (UTC)[reply]

GC is not the only bot using AWB, so I'm not sure is you are expecting all such tasks to wait for years until someone programs a fix? Regardless, the bot was approved to perform this task as-is and I didn't have a problem figuring out what it was doing by simply looking at the edit. Sorry you had to do an extra click, but I don't see how GC could have adequately described the scope of the edit in a single line of edit summary.— TAnthonyTalk 23:35, 14 September 2016 (UTC)[reply]
PS: sorry GC if this discussion is stopping you from reactivating the bot! I'm done.— TAnthonyTalk 23:36, 14 September 2016 (UTC)[reply]
A little late but thanks for your help here TAnthony. In fact I disabled to the disable feature after the above editor kept posting so it wasn't disrupting anything. -- GreenC 16:15, 10 October 2016 (UTC)[reply]