User talk:Cyberpower678/Archive 34

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 30

←

Archive 32

→

Senior Editor II

Chicago—a failed rescue of a dead link—U.S. Census Bureau search results require requesting a "bookmark" permanent URL

When tables generated by a search of U.S. Census Bureau are cited with a URL copied from the URL of the page where the search results are viewed, the URL is transient, and not permanently valid. There is a button on the table display page (or one page back) to request a "bookmark" URL which is a permanent link. If I remember correctly, this holds throughout The U.S. Census website for all results of searches which return tables.

The instance in the article Chicago is a cite at the end of the last sentence of the first paragraph in the body. The permanent "bookmark" URL [1] And the table title is 'Estimates of Resident Population Change and Rankings: July 1, 2014 to July 1, 2015 - United States -- Metropolitan Statistical Area; and for Puerto Rico 2015 Population Estimates'.

Hope this helps in the never ending struggle to extract information despite the best efforts of ... — Neonorange (talk) 04:47, 30 May 2016 (UTC)

Unfortunately, there is nothing I can really do about that source. I would suggest you tag it with {{cbignore}}.—^cyberpower_Chat:Online 14:01, 30 May 2016 (UTC)

I had just retrieved and cited with a persistent link, but perhaps the bot could:

match the initial part of the URL string 'fact finder.census.gov' when inspecting a cite

then check for the presence of 'bkmk'

If 'bkmk' is not present, the link is not persistent

then the bot adds the {{cbignore}} template and adds, on the article talk page, "The bot can't fix this, but you can." plus the instructions for manually fixing the cite to the message the bot leaves anyway.

I'd offer help, but it would have to be an assembly language module, a route I'm sure you'd not like to choose B^{) — Neonorange (talk) 05:20, 31 May 2016 (UTC)

Sorry, but given the bot's scope, making a URL specific modification will just create a mess in the long run, in terms of code maintenance, so I'm going to have to decline your suggestion. Also the bot doesn't add cbignore just to tell itself to ignore the cite. That's rather pointless.—^cyberpower_{Chat:Limited Access} 02:46, 1 June 2016 (UTC)

Moving a section

Just curious but why would the bot move a section down. Also did you know that the Arctic hares are turning brown. Always a good sign. CambridgeBayWeather, Uqaqtuq (talk), Sunasuttuq 09:02, 31 May 2016 (UTC)

The consensus is that newer requests go on the bottom now, so it sorts the requests by age. Also why are Artic Hares turning brown a good sign?—^cyberpower_{Chat:Limited Access} 02:48, 1 June 2016 (UTC)

bot request post

I think perhaps you coded the bot that displays current unblock requests..please see this post I made: https://en.wikipedia.org/wiki/Wikipedia:Bot_requests#Bot_to_log_all_templated_talkpage_unblock_requests If you'd be willing to advise whether this is straightforward/realistic as far as coding it/making it happen..68.48.241.158 (talk) 11:55, 3 June 2016 (UTC)

I have no idea why the spacing is weird above..68.48.241.158 (talk) 11:57, 3 June 2016 (UTC)

Cyberbot II repetitive edits at Yakeen (1969 film)

Not sure why, but User:Cyberbot II seemed to get stuck in a loop adding spaces at Yakeen (1969 film), and leaving notices on the talk page. I only spotted it because it showed up on the list of most edited talk pages this week! the wub "?!" 12:07, 3 June 2016 (UTC)

Bad links still being added

As a test, I removed cbignore from a few links that had the SQL batch updates.

The link in reference for article Bharti Airtel:

http://web.archive.org/web/20120315161508/http://www.ametw.com/free_news_AfricanOperators.html

Sequence of events:

Link added by CB (diff)
Link removed by WM (diff), add cbignore
Link included in SQL batch cb20151231-20160304.00001-10000.sql
Removed cbignore (diff) to see how CB would respond
Link re-added by CB (diff)

API results:

wget --header="Wayback-Api-Version: 2" --post-data="url=http://www.ametw.com/free_news_AfricanOperators%2Ehtml&closest=either&timestamp=20120315161508&tag=4&statuscodes=404&statuscodes=200&statuscodes=203&statuscodes=206" -q -O- "http://archive.org/wayback/available"

No snapshot available.

All evidence points to a problem in CB code, unless there is still a problem with the link still existing in the cache database. -- GreenC 16:22, 5 June 2016 (UTC)

It probably means it still exists on an article somewhere.—^cyberpower_Chat:Online 16:45, 5 June 2016 (UTC)

I don't understand because the link should have been removed from the cache database (set to "0" and "NULL"). And the API is returning no link available. -- GreenC 17:35, 5 June 2016 (UTC)

I mentioned a while ago that if Cyberbot sees an archive present on a source, it saves it. I'm saying there is probably an article out there that still has an archive set to that specific link, and Cyberbot is saving it.—^cyberpower_Chat:Online 17:38, 5 June 2016 (UTC)

I see. Sure enough here it is, in Airtel Africa (cb diff). Ok so do you think if WM catches up checking articles edited by CB (up the present), only then it would be safe to remove cbignore? -- GreenC 17:57, 5 June 2016 (UTC)

Yes. It should be safe then. Cyberbot can traverse the entire wiki in a matter of hours now. I've been finding ways to repurpose then now unused reviewed column in the DB. I've been thinking of making them settable via an interface, and the reviewed parameter keeps the entry locked essentially. But that's going to take a bit before it's ready.—^cyberpower_Chat:Online 18:06, 5 June 2016 (UTC)

The DB would need to purged again, before removing cbignore.—^cyberpower_Chat:Online 18:30, 5 June 2016 (UTC)

Yes all queries even the ones already done since the cache has been reset by article data. WM will finish by tomorrow morning for the cb20151231-20160304 set. Then will begin 20160305-present. Eventually one problem, there is no way WM can ever fully catch up so if possible might need to shut CB down for a few hours while WM does the last few days. -- GreenC 18:48, 5 June 2016 (UTC)

Cyberbot completely ignores a source with cbignore. Instead of actually getting data from the parser, it returns an array with "ignore" = true set. During the DB data retrieval, it skips over data with that flag set.—^cyberpower_Chat:Online 19:20, 5 June 2016 (UTC)

I understand: there's a field in the DB set to ignore on encountering a cbignore. Meaning the queries won't need to be loaded twice, since the DB wasn't modified based on links from other articles without the cbignore. -- GreenC 21:04, 5 June 2016 (UTC)

?—^cyberpower_Chat:Online 21:08, 5 June 2016 (UTC)

I was restating what I thought you said. It's quite possible I didn't understand what you said - but then it's possible you didn't understand what I said, either :) -- GreenC 22:41, 5 June 2016 (UTC)

I said that there is currently an unused field called 'reviewed' and I want to repurpose it to lock the entry from changes if set. So if different archive data is found on Wiki, it won't change the DB entry. and instead overrides the page data with the DB data. Setting it will be doable on an interface that still needs to be designed.—^cyberpower_Chat:Online 00:03, 6 June 2016 (UTC)

As a side note, please don't forget to add yourself to the Skype room if you would like to join the meeting.—^cyberpower_Chat:Online 00:16, 6 June 2016 (UTC)

Cyberbot at RFPP

Regarding this edit, it says 4 remaining, but I only see 1 remaining (a request for unprotection). Are there some that wouldn't be picked up on (like the extended confirmed template)? --kelapstick^(bainuu) 04:48, 9 June 2016 (UTC)

It cannot yet detect extended confirmed protection.—^cyberpower_Chat:Online 13:14, 9 June 2016 (UTC)

Well that seems to answer fors one of them (I will be sure to tag for archiving when I see that. --kelapstick^(bainuu) 19:37, 9 June 2016 (UTC)

Improving Cyberbot?

Hi Cyberpower678, I wonder if it might be possible to adopt links to //timetravel.mementoweb.org/ vice //archive.org to get a more general (and robust) result? The major web archives seem to have moved to the Memento Protocol. {{memento}} might be of interest. Cheers, LeadSongDog come howl! 22:19, 9 June 2016 (UTC)

A barnstar for you!

	The Civility Barnstar
	Hello Hmemberguy (talk) 17:39, 12 June 2016 (UTC)

Crumpling of templates

Dear colleague,
I have two questions. When you will repair all templates crumpled by your bot, and don't you think that half-done bots should not be launched? Stas (talk) 09:36, 13 June 2016 (UTC)

User:Cyberbot I/Current AfDs appears to be not updating for a couple of weeks?

User:Cyberbot I/Current AfDs appears to be not updating for a couple of weeks? --SmokeyJoe (talk) 00:52, 17 June 2016 (UTC)

Not a biggie, but...

This edit summary seems incorrect/misleading. --Dweller (talk) Become old fashioned! 13:26, 15 June 2016 (UTC)

That's ancient. The edit summary has changed since.—^cyberpower_Chat:Online 14:52, 17 June 2016 (UTC)

Bot adding empty archive URL

The bot seems to be adding an empty archive URL with the date of January 1, 1970 when it can't find an archived copy. See here. – nyuszika7h (talk) 21:43, 15 June 2016 (UTC)

That date is the epoch for Unix time, or "the zero-second", so it does make some sense. LeadSongDog come howl! 21:57, 15 June 2016 (UTC)

Yeah, I know that. Anyway, I've disabled the task for now. nyuszika7h (talk) 21:58, 15 June 2016 (UTC)

Hmm, how interesting. I was certain I had fixed that a while back. I'll have to dig deeper to find the root cause.—^cyberpower_Chat:Online 14:53, 17 June 2016 (UTC)

Citation bot for links that are not yet dead

Hey CP, I think I brought this up before when the InternetArchiveBot project was first starting, but I wanted to reiterate that it would be really great if there was some flag (e.g., |archiveme=) I could add to my citations such that I could have the citation bot archive/save it for me. Having this kind of task automated would save me hours a year, multiplied times many other Wikipedians... Consider it? czar 20:20, 12 June 2016 (UTC)

(talk page stalker) Hours? How many references do you create each year? –Compassionate727 ^(T·C) 22:26, 12 June 2016 (UTC)

A freaking lot. For a user who is fast, unautomated, and remembers the CS1 parameters for the citation, it takes about a minute a citation. For a user, such as myself, who throws the citation to WebCite via the omnibar and has text expansion to fill out the CS1 details, it takes about 20 seconds a citation, though you can shave a few seconds off by multitasking in another window. But even better is not thinking about archiving your citations at all (or worrying about the Internet Archive incompatibilities, the WebCite errors, and having the whole process automated), especially now that I'm moving to Zotero/Citoid expansion. Let's say I do ten full citations on a bad week—here's the obligatory math. But that doesn't count all the bare URLs I leave for someone else to cleanup when I need to dump and move on. czar 23:47, 12 June 2016 (UTC)

Sorry. I've been sick with a parasite for the last 8 days, and I still am, though starting to feel better. I'm not sure what you are referring to when you want me to archive/save the link. If you mean ensure it is available in the Wayback machine for future use, IA already runs a quiet bot that archives all newly added URLs to the site. If you mean you would like Cyberbot to automatically attach archive URLs to your citations, sorry, that will not be a feature that will be implemented, as that is a site wide config change, and when the latest BRFA gets approved Cyberbot will automatically be rescuing/tagging all untagged dead links, and the near foreseeable future.—^cyberpower_Chat:Online 14:46, 17 June 2016 (UTC)

Related question: why is the bot "rescuing" links like that are valid links to archived versions? They do not need to be rescued... The Banner talk 12:31, 15 June 2016 (UTC)

archive.org is an archive it belongs in the archiveurl field per template instructions. The url field is the original link (non-archive.org). The bot did the right thing moving the archive.org link to the archiveurl field. There are reasons for this it is not arbitrary. -- GreenC 14:15, 15 June 2016 (UTC)

Let me guess, template was changed and nobody communicated that with the actual users of the template? The Banner talk 17:49, 15 June 2016 (UTC)

This is one of the many silly things I have encountered on ENWP. Too often you see completely useless edits, just because it is technically correct but then it lacks all common sense. The Banner talk 17:58, 15 June 2016 (UTC)

But just leave it this way, I am unwilling to start fighting purists. The Banner talk 18:01, 15 June 2016 (UTC)

There is actually an RfC over this and it had a near unanimous outcome favoring Cyberbot's actions, and no this is not as a result of a recent template change.—^cyberpower_Chat:Online 14:46, 17 June 2016 (UTC)

@The Banner: The default behavior of the template, unless |deadurl=no is specified, is to make the archived URL the primary, all it does is adds a note "Archived from the original on [date]" (and a link to the original version, though I think Cyberbot has started using |deadurl=unfit, so that link may not always be present). nyuszika7h (talk) 15:00, 17 June 2016 (UTC)

Question about Cyberbot II

Will Cyberbot II ignore links which that have been tagged with {{dead link|fix-attempted=yes}}, which is used to indicate that a user has attempted to find an archive and failed, or do I also need to include {{cbignore}}? –Compassionate727 ^(T·C) 19:22, 11 June 2016 (UTC)

Hi, quick question regarding the Joint Ground Based Air Defence Command page on Wikipedia. I notice that you had updated the page and was wondering if you knew how to change the page name? I served in the HQ until recently and one thing that has been really bugging me about this page is that the name is wrong. It is currently listed as Joint Ground Based Air Defence Command, however it is actually just called Joint Ground Based Air Defence or Joint Ground Based Command HQ and is certainly not a 'Command' which is a term in the British Military reserved for much larger organisations and a higher ranking officers command. Your help would be much appreciated. — Preceding unsigned comment added by 95.144.105.238 (talk) 19:57, 14 June 2016 (UTC)

(talk page stalker) Logged in users can use the page move tool, but I imagine this might be controversial. See WP:RM#Requesting a single page move for instructions on how to start a discussion. —Compassionate727 ^(T·C) 17:29, 15 June 2016 (UTC)

Sorry. I've been sick with a parasite for the last 8 days, and I still am, though starting to feel better. The parameter you ask about was added specifically for Cyberbot, but Cyberbot does not yet have that function. I will be implementing it the near future, once I'm not sick.—^cyberpower_Chat:Online 14:04, 17 June 2016 (UTC)

So for now, I do need to still use {{cbignore}} to prevent Cyberbot from making changes? —Compassionate727 ^(T·C) 15:43, 17 June 2016 (UTC)

Right now, the bot is disabled, so no. I'll deploy the feature with my next update.—^cyberpower_Chat:Online 15:54, 17 June 2016 (UTC)

Typo in Cyberbot source check notification

I don't think this is what you intended – User:Cyberpower678/FaQs#InternetArchiveBot*this simple FaQ – that should be a pipe character there. nyuszika7h (talk) 08:22, 14 June 2016 (UTC)

Also, minor thing, consider substing {{plural}} – this should do the trick: {{<includeonly>subst:</includeonly>plural|1|one external link|1 external links}}. nyuszika7h (talk) 08:25, 14 June 2016 (UTC)

I

Fixed the typo as seen here, but unfortunately, using the subst: command will result in the substitution happening on the config page, and your suggestion won't work at all, sorry.—^cyberpower_Chat:Online 14:51, 17 June 2016 (UTC)

@Cyberpower678: I didn't realize the template had different semantics from the magic word, perhaps one external link (subtle difference) will work. nyuszika7h (talk) 14:56, 17 June 2016 (UTC)

Actually that wasn't the point I was driving. As you can see, the text gets pulled from a config page, and is copied out replacing the magic word placeholders with meaningful data. I also removed your nowiki, and it didn't work either. Sorry.—^cyberpower_Chat:Online 15:01, 17 June 2016 (UTC)

The nowiki was just for demonstration (it's the output you see that matters, ignore the code and nowiki tags). Anyway, I see the problem. Perhaps you could wrap the whole thing (after the explanation paragraph) in <pre><nowiki> (and then remove the bullet points as they won't show up as actual list items anyway), but it's not a big deal. nyuszika7h (talk) 15:11, 17 June 2016 (UTC)

Sorry, but that will most likely cause parsing problems when parsing the config. Bullets aside, I'm hesitant to wrap everything in nowiki, as it makes the config page unreadable, and also, I prefer the text settings to parse some of the static output, so users configuring can see the rendered output.—^cyberpower_{Chat:Limited Access} 16:01, 17 June 2016 (UTC)

Crumpling of templates (2nd attempt)

Dear colleague,
I have two questions. When you will restore all templates crumpled by your bot, and don't you think that half-done bots should not be launched? Stas (talk) 09:36, 13 June 2016 (UTC)
(restored an unanswered message from archive. Stas (talk) 16:10, 17 June 2016 (UTC))

As far as I am aware, there exists currently no plans to uncrumple them as the rendered output will not change. Also yes, I agree that half done bots should not be launched, Cyberbot however isn't one of them.—^cyberpower_{Chat:Limited Access} 16:12, 17 June 2016 (UTC)

|dead-url=unfit

From the documentation for |dead-url=:

When the original URL has been usurped for the purposes of spam, advertising, or is otherwise unsuitable, setting |dead-url=unfit or |dead-url=usurped will not link to the original URL in the rendered citation; |url= is still required.

At Atmosphere of Pluto, Cyberbot II made these edits: here and here.

In all cases, the |url= values that Cyberbot II declared to be unfit, are not in fact, unfit and are working correctly.

The unfit and usurped keywords were added to the cs1|2 templates to disable the URL link when it links to something clearly inappropriate. It appears from a quick look at the history of the article listed in this insource search that Cyberbot II is unconditionally delaring the url of every cs1|2 template that it touches to be a link to inappropriate material. This is inappropriate behavior. Please fix it.

I will modify Module:Citation/CS1 to add articles with |dead-url=unfit and |dead-url=usurped to a maintenance category so these templates are marked and can be inspected and repaired.

—Trappist the monk (talk) 10:46, 20 June 2016 (UTC)

Please inspect those edits more carefully. These edits were correct. Cyberbot moved the URL to the archiveurl parameter and placed the original URL in it's place. That's why there is an unfit declaration there. Cyberbot is following cite template usage.—^cyberpower_{Chat:Limited Access} 11:48, 20 June 2016 (UTC)

I think the point is that |dead-url=unfit should not be used unless it's known to be replaced by a spam/inappropriate website. However, since the bot's configuration has VERIFY_DEAD = 0, it can't know the URL is actually alive, so the best it can do is |dead-url=yes. nyuszika7h (talk) 11:53, 20 June 2016 (UTC)

I think the real point is being missed here. Cyberbot functions to correct template formatting usage, by moving the archive url to the archiveurl parameter and leaving the original URL in the URL. An RfC with near unanimous outcome has determined this to be the correct way to use the template. To prevent accidental linking of the original URL when Cyberbot moves them, it was requested to use unfit in those cases to maintain a rendered direct link to the archive without inadvertently linking to the original, which wasn't happening in the first place. In other words, Cyberbot is correcting the usage of the template while not altering the link readers click through to when looking at the source. This behavior has been supported by a good majority of the community.—^cyberpower_{Chat:Limited Access} 11:58, 20 June 2016 (UTC)

Nobody has a problem with moving to |archiveurl= here. I checked the archive and realized the request was for cases where the |url= parameter contains an archive.org link. Since the bot is not configured to verify whether the links are actually dead as it's not possible to 100% reliably determine that, this is not really a big deal. The bot leaves a message on the talk page and humans can correct it anyway. nyuszika7h (talk) 13:14, 20 June 2016 (UTC)

Indeed, but to preserve current rendered output and prevent the original URL from being made clickable in the cite, Cyberbot automatically uses unfit to prevent the original URL from rendering in the cite. This is because the original URL wasn't being rendered either before Cyberbot touched the cite template. @Trappist the monk: Please understand Cyberbot isn't flagging every CS1/CS2 template it touches with unfit. Just the ones it moves the archive url to the correct parameter. The final rendered result is still pretty much the same.—^cyberpower_Chat:Online 13:19, 20 June 2016 (UTC)

The bot is correctly moving an archve url to |archive-url= and leaving behind the original url in |url=. That is not in dispute. That the bot then declares any such changes to be 'unfit' is a determination that it should not be making without confirming that the original cite is in fact unfit. To do so goes against the template documentation that I quoted in my initial post. The |url= values that the bot declared to be unfit at Atmosphere of Pluto are not dead, and are not unfit according to the definition of unfit in the template documentation.

While it may be desirable to preserve current rendered output, that motivation should not be used as a excuse to misuse template parameters. Editors reading the raw template should not be misled into thinking that working or dead |url= values are unfit.

Semantics were deemed to be important during the keyword selection process. That process discussion began here and concluded here. You can see when you read those discussions that the door is open to additional semantically appropriate keywords that accomplish the same thing as |dead-url=unfit and which will not mislead editors. Until such a keyword is proposed and adopted, Cyberbot II should discontinue adding |dead-url=unfit to cs1|2 templates that it touches unless it knows that the url in |url= is unfit.

—Trappist the monk (talk) 14:33, 20 June 2016 (UTC)

A new keyword could be something like deadurl=possibly-unfit / maybe-unfit / bot-unfit or .. However these are kind of fiddly and unclear. Another option is something more general such as deadurl=unknown which would behave the same way as unfit since the link status is unknown. It would basically flag it for cleanup ie. manual determination of status needed. -- GreenC 15:27, 20 June 2016 (UTC)

Please also see this unanimous RfC that supports the bot's actions. There is consensus for this. And unless there is a discussion somewhere else that agrees I shouldn't do this, I'm not going to unilaterally reprogram the bot because the word unfit was chosen to hide the original URL from the rendered output. If you had hidden as a keyword that does the same thing as unfit, I will be more than happy to switch to that. When the bot moves an archive link to the correct spot, the fact that the archive was being used over the originally implies the link was assumed dead, and originally, it set the deadurl parameter to yes, before I started getting complaints, and the advice to use unfit instead.

I try to conform the bot to meet community expectations as best as I can, obviously I will not always be successful.—^cyberpower_Chat:Online 15:33, 20 June 2016 (UTC)

Not all that surprising to see a unanimous RfC when the two options are 1) clearly wrong and 2) mostly right, so, yeah, of course editors achieved consensus. They chose the lesser of two evils; a flawed choice, but the only one that they could make.

Must the keyword be hidden or will you accept some other keyword with a less nebulous meaning – Editor Green Cardamom's unknown, for example?

It is not clear to me that the use of an archival url in |url= necessarily implies that the original url is dead. It may be. But, as the fixed cs1 templates at Atmosphere of Pluto illustrate, relying on that 'implication' is problematic. Except that the original url returns an http error code, without a comparison of the original to the archive, the original cannot be known to be dead, live, or unfit.

I have made no statement suggesting that you do not try to conform the bot to meet community expectations. Please do not rise in defense of an attack that I have not made.

—Trappist the monk (talk) 18:53, 20 June 2016 (UTC)

My apologies, my statement conforming to consensus was more general and not meant to imply you were attacking me. You may choose any keyword you like as long as it functions the same as unfit. From my point of view, if an archive is being used to source something, then the original source is too unstable to be considered alive reliably. The bot will assume dead in such cases, in an effort to conserve resources not have to do needless checks.—^cyberpower_Chat:Online 19:55, 20 June 2016 (UTC)

User:Cyberbot II/Dead-Links Log

This page has more than 5,000 revisions so it isn't possible for normal admins to delete it - it has to be done by a steward. I've filed a request for one to do so here. Hut 8.5 21:27, 20 June 2016 (UTC)

Kidnapping of Jaycee Dugard citation archiving needed

Hi, Cyberpower678. I just completed getting Kidnapping of Jaycee Dugard to WP:Good Article status and I would like to have all the citations archived to prevent future WP:linkrot. Right now all the citations are live and they are all solid. Ping me back. Cheers! {{u|Checkingfax}} {Talk} 05:59, 17 June 2016 (UTC)

IA automatically archives any new source that appears on Wikipedia. No need to worry here. :p—^cyberpower_Chat:Online 12:00, 20 June 2016 (UTC)

Hi, Cyberpower678. Ping me back on this. The sources have mostly been up since 2009. Can you please have IA do a run-through? Cheers! {{u|Checkingfax}} {Talk} 07:26, 21 June 2016 (UTC)

Like I said IA takes care of this automatically. No need to do anything.—^cyberpower_Chat:Offline 10:50, 21 June 2016 (UTC)

RFPP task is down

Small break from Cyberbot II doing bad things for you here, for some reason the RFPP clerking task on Cyberbot I is down, can't restart it because it requires me to be within the cyberbot group in order to restart the instance. No logging because that went away (still need that back btw, very helpful for diagnosing issues) so just give it a poke, get it back into action. tutterMouse (talk) 15:09, 22 June 2016 (UTC)

Rebooted.—^cyberpower_Chat:Online 15:12, 22 June 2016 (UTC)

Thank you. --NeilN ^{talk to me} 15:23, 22 June 2016 (UTC)

I fetched this from the logs.

PHP Fatal error: Uncaught exception 'BadTitle' with message 'Invalid title: Kim Min-hee (actress, born 1982)|' in /home/cyberpower678/Peachy/Includes/Page.php:2053

—^cyberpower_Chat:Online 15:53, 22 June 2016 (UTC)

Ysee, if we had public logs I'd have known that and fixed it... maybe. tutterMouse (talk) 16:40, 22 June 2016 (UTC)

Sorry, but setting up a web server simply to publish logs, is low on the priority list. :P—^cyberpower_{Chat:Limited Access} 16:44, 22 June 2016 (UTC)

I blame the labs of course but it's not like I have options to fix it if I or anyone else needs to come to you for really simple fixes so it's more just wanting ways to manage the bot for small things and limit your involvement so you can focus on more important stuff unless something goes incredibly wrong. The reset link is good but useless to most anyone sadly and no logging means minor issues like can't be easily diagnosed so it's sad that it's that low on the list but it's pretty low maintenance in general I guess. tutterMouse (talk) 16:48, 22 June 2016 (UTC)

You can't blame labs in this case. I moved Cyberbot off of tool labs, which has everything pretty much everything handed to you, and moved to a separate project on labs where I have to set everything up on my own, but in exchange have total control over my resources.—^cyberpower_{Chat:Limited Access} 16:51, 22 June 2016 (UTC)

Okay, doesn't really make anything better if a problem happens though, does it? Nobody else can diagnose an issue but you, nobody can kick things back into action but you and I don't want nor even like bugging you when something messes up and you need to fix it, that's time out of your busy day and that's exactly why I requested such features in the first place. If you want that sort of control then okay but I still feel allowing others to be able to know what's going on should anything go wrong is still beneficial. tutterMouse (talk) 17:15, 22 June 2016 (UTC)

It's not that I want total control over everything, I just haven't gotten around to setting them up yet. When I moved off of toollabs, it was so I can make use of more resources I need, and toollabs was too restrictive in that regard.—^cyberpower_{Chat:Limited Access} 17:18, 22 June 2016 (UTC)

I know, I'm really not accusing you of wanting control the whole here but I also know you're busy too and may not get to them. Just that they're considerations too and I get it if you need to push them down the to-do list in favour of having more control over what can be done. You'll get to it, I'm sure. tutterMouse (talk) 17:40, 22 June 2016 (UTC)

Citation bot for links that are not yet dead cont'd

User talk:Cyberpower678/Archive 34#Citation bot for links that are not yet dead

Ah, the last thread was archived before I saw your replies. Hope you're feeling better

IA already runs a quiet bot that archives all newly added URLs to the site

Do you know where I can read more about this? Did they announce it somewhere or explain how it works?

If you mean you would like Cyberbot to automatically attach archive URLs to your citations, sorry, that will not be a feature that will be implemented, as that is a site wide config change

To be clear, I meant to ask whether the bot could run when an editor adds a template or some flag specifically asks the bot to crawl the page/citation. If it's still outside your scope, what would you recommend as a next step towards getting such a bot? Should I copy the bot code, configure and propose it myself? As I mentioned in the previous thread, it would save a heck of a lot of time on citation expansions. (And by sitewide config change do you mean that it would require something difficult or time-consuming on the programming end or on the permissions end?) czar 06:29, 22 June 2016 (UTC)

So if I understand your query correctly, all you are interested in is making sure that the new citations are added into the payback machine correct? IA told me they have a bot that patrols recent changes, and automatically initiates the archiving process of new URLs encountered. All that bot does is essentially ensure that the sources are in the wayback machine, respecting robots.txt of course. So I don't think more needs to be done here. As for where the documentation is, I have no clue, this is something IA told me personally.

As for Cyberbot automatically attaching archive urls, it can do that with any source, even non-dead ones, but there's no way to call it on demand. Cyberbot has a configuration page on wiki, where a variable can be adjusted to expand it's operation to non-dead sources, but that will result in every source being modified, and I'm sure the community would skin me if I arbitrarily set the variable to that respective value, especially considering how fast Cyberbot operates.—^cyberpower_{Chat:Limited Access} 02:23, 23 June 2016 (UTC)

I wonder about the IA bot's activities because I've gone to archive many recent links and haven't seen previous archives in the Wayback Machine—it would be nice to know (from them) how it works so we can better arrange our own archiving. Re: the second point, surely there has to be some middle ground? There are dead links, there are links-not-yet-dead, and then there are citations made by users who want the archiveurl added to the citation without manually doing it themselves (or waiting for the link to go dead for Cyberbot to try to save it). Perhaps I'm confusing it with another bot, but wouldn't it be possible to propose something as simple as a dummy template—e.g., {{archive this citation}}—to place in the same part of the citation where {{dead link}} would normally go? The template could contain nothing but indicate to Cyberbot that an editor has requested the citation's archiveurl to be cached and added to the citation. Wouldn't that be very similar to how the bot is looking for {{dead link}} tags in the same position of the template? Would that be even remotely controversial at the bot requests board? An alternative to {{archive this citation}} could be putting something in the parameter itself |deadurl=bot.) Thanks for hearing me out czar 02:54, 23 June 2016 (UTC)

Given the current bot code, it would take a bit of rewriting, but I imagine it could be done. I'll have to think how to nicely code it. BTW, you really don't want to copy IABot. It's code is very large and complex.—^cyberpower_{Chat:Limited Access} 03:01, 23 June 2016 (UTC)

In case you're interested. That code is the entire framework for IABot.

VGTM

About, this edit. The link is working fine.--Vin09 (talk) 09:11, 23 June 2016 (UTC)

Please read the provided FaQ.—^cyberpower_Chat:Online 15:37, 23 June 2016 (UTC)

Hello C. The bot did not leave a message on Talk:VGTM Urban Development Authority so Vin09 wont know what FAQ you are talking about. I will add the link to this message for V's convenience User:Cyberpower678/FaQs#InternetArchiveBot. Cheers to you both MarnetteD|Talk 15:57, 23 June 2016 (UTC)

IABot error

IABot repaired a dead link in this diff, but the URL was mangled. You can see the repair in this diff. I am reporting this here as requested in the FAQ. generic_hipster 17:32, 23 June 2016 (UTC)

The underlying bug that resulted in mangled archives being saved should be fixed. This is a result of lingering bad data hanging in the DB. By fixing the link on wiki, it should fix itself on the DB.—^cyberpower_{Chat:Limited Access} 17:59, 23 June 2016 (UTC)

That's good news re: it should fix itself after being fixed in the wiki database. Since this is a simple search/replace, I found it's trivial to fix these using AWB (replace) and an offline tool to regex the Wikipedia database dump (search) for articles containing the error. The first run was on a dump from September 1, 2015 and it found and fixed 142. The latest dump will be July 1, 2016 -- will see how many are left. -- GreenC 14:32, 24 June 2016 (UTC)

The reviewed column is now in use too. When set to 1 for a given entry, the archive can not be changed in the DB onwiki. I've fixed a couple of entries manually, and locked them in with the reviewed bit already to prevent some bad cites from overwriting the archive information in the DB.—^cyberpower_Chat:Online 14:54, 24 June 2016 (UTC)

Ok.. great. Do we need to lock in the changes made by WM? They were modification of snapshot date, and deletion of archive. I can easily make new sql queries like before, but don't know the review field name (assuming that is the right course of action). -- GreenC 15:43, 24 June 2016 (UTC)

If you would like, that would be of great help. Just add `reviewed` = 1 to the list of values being set. You don't even need a switch case, and those pre-existing timestamp bugs can be fixed too if possible?—^cyberpower_Chat:Online 15:48, 24 June 2016 (UTC)

You mean the bug where the API returns an invalid date. Starting with batch 4, I modified the source to detect those cases and scrape the page for the valid date which exists in the HTML, so that problem basically disappeared after the third batch (AFAIK). There were only 7 or 8 links you identified in the first few batches. I could manually build a query for those. -- GreenC 16:27, 24 June 2016 (UTC)

If it's not too much trouble. I'm refining archive detection, to support more than just the wayback machine.—^cyberpower_{Chat:Limited Access} 16:30, 24 June 2016 (UTC)

I don't think I identified all of batch 3 problems. It might be worth scraping for the urls that have a bad timestamp.—^cyberpower_{Chat:Limited Access} 16:31, 24 June 2016 (UTC)

a heads-up

Your bot unnecessarily reformatted reference it rewrote here. Bots like yours should be as conservative as possible in avoiding unnecessary changes as a failure to do so erodes the utility of our revision control system. Geo Swan (talk) 08:35, 24 June 2016 (UTC)

It corrected the usage of the cite templates. It is part of the boy's function.—^cyberpower_Chat:Offline 09:04, 24 June 2016 (UTC)

Well, it might be nice if it preserved the existing formatting (personally I don't like the multiline format, but that's irrelevant) and the date format (ISO is acceptable for archive dates), but I don't think that's a serious issue, it can be corrected by human editors, as there is a notification on the talk page. nyuszika7h (talk) 10:45, 24 June 2016 (UTC)

I've been getting complaints about Cyberbot crumpling templates. So now, if a template is multiline, Cyberbot keeps it multiline, otherwise it makes it all inline. As for date formatting, Cyberbot follows date formatting tags on articles. By default it follows {{usemdy}} unless {{usedmy}} is on the article. If there are other date tags, I can certainly add support for those.—^cyberpower_Chat:Online 15:04, 24 June 2016 (UTC)

I don't think there are other date tags, but ISO dates in access and archive dates should probably be left alone, at least if consistently used in the article. nyuszika7h (talk) 15:07, 24 June 2016 (UTC)

There's no way for Cyberbot to reliably know that if a date format tag is absent. That is what those tags are for.—^cyberpower_Chat:Online 15:09, 24 June 2016 (UTC)

(edit conflict)With that being said, it would be trivial to create a date format tag and apply it to articles. It would also be trivial to implement it into Cyberbot. It has a really elegant date format handler.—^cyberpower_Chat:Online 15:15, 24 June 2016 (UTC)

What Nyuszika7H said – {{usemdy}} and {{usedmy}} tags don't necessarily apply to ISO dates in 'accessdates' and 'archivedates' (in fact, they often don't). This isn't a huge thing, but it is worth pointing out... --IJBall (contribs • talk) 15:12, 24 June 2016 (UTC)

Then what's the point of those tags then? Now I'm confused.—^cyberpower_Chat:Online 15:15, 24 June 2016 (UTC)

This is all covered under MOS:DATEUNIFY – basically, article date formats and reference pub. date formats are usually the same (though, again, not always), but ref. accessdates are often in ISO date format even if the article's date and ref. pub. dates are in 'mdy' or 'dmy' date formats. --IJBall (contribs • talk) 15:18, 24 June 2016 (UTC)

Also, it says this at {{Use mdy dates}}: "In general, the date format used for publication dates within references should match that used within the article body. However, it is common practice for archive and access dates to use the alternative ymd format. This usage is valid and is specifically mentioned at MOSDATE. In those cases, the archive and access date formats should not be altered when fixing dates." nyuszika7h (talk) 15:21, 24 June 2016 (UTC)

(edit conflict)Grunt. This does seem to conflict with how the {{wayback}} template works which operates on mdy by default unless the df parameter is set making it use dmy.—^cyberpower_Chat:Online 15:23, 24 June 2016 (UTC)

So what should I do in this case?—^cyberpower_Chat:Online 15:23, 24 June 2016 (UTC)

That template is a different case, I would say just use it according to the tag on the article as it does not support ISO dates, but other ISO access and archive dates should be just left alone (for someone else to clean up if they are used inconsistently). nyuszika7h (talk) 15:36, 24 June 2016 (UTC)

I wouldn't even know how to reliably extract the time format and keep it consistent. PHP can easily format timestamps into any format but it can only convert formatting timestamps back into epoch time, AFAIK.—^cyberpower_Chat:Online 15:45, 24 June 2016 (UTC)

If the issue is coverting formats, another option is the new |df= parameter in CS1. Trappist, a regular on the CS1 help page, also knows about functions that convert dates czar 19:58, 24 June 2016 (UTC)

Level 1 warning on User_talk:Ansh666

Per discussion, I reverted the AfD close to relist, and the bot dropped a template on Ansh666's page before I even had a chance to relist the article. I've reverted the article page to before the warnings, and I am having an issue with the AfD relist, but the bot shouldn't have dropped the warning, unless there's a certain order to the process I missed. MSJapan (talk) 23:54, 24 June 2016 (UTC)

Quilting reference bot checks

Please check Talk:Quilting and tell me what should happen now, if anything.--DThomsen8 (talk) 00:13, 25 June 2016 (UTC)

Keith Holland (racing driver)

Hello, the bot just altered a source on the above and the new link (shown on the talk-page) goes to a generic page for all Channel 4 programmes. The programme concerned was in fact 12 years ago and is unlikely to still be found from the generic home-page. I have just checked the previous link here and it seems to be working. I'm not sure if it's OK to just go back to that link or if I've misunderstood something (not unusual). Thanks. Regards, Eagleash (talk) 19:27, 27 June 2016 (UTC)

Please read the provided FaQ.—^cyberpower_Chat:Online 19:45, 27 June 2016 (UTC)

Where do I find that? Eagleash (talk) 20:00, 27 June 2016 (UTC)

It's linked on the talk page message. I put it there for the common questions. But I'll just give you the short answer. You don't need to do anything. Cyberbot simply altered the formatting of the citation template. It still works as before. :-)—^cyberpower_Chat:Online 20:04, 27 June 2016 (UTC)

Ah... did look there but missed it. Good, yes all seems well. Cheers, Eagleash (talk) 20:09, 27 June 2016 (UTC)

WaybackMedic 2

Hi,WM2 will be making changes to existing archive.org links across the entire EN project. Example of three kinds of changes: the first is the same as WM1; the second is a case where the archive.org link doesn't work but the primary URL still works; the third where the archive link works, but the url syntax is normalized (https, /web/ and web.archive.org, remove :80). I know CB needs to know about the first case via SQL queries - will it also need to know for the second and third case? -- GreenC 19:36, 28 June 2016 (UTC)

It does, need an SQL update to fix it. If you provide batches again and set the reviewed column for those entries to 1, it will keep that data locked, and bring the DB one step closer to being perfect.—^cyberpower_Chat:Online 19:41, 28 June 2016 (UTC)

Ok looks like case 1 and 2 are already being done, and I'll add case 3. Looking back at previous runs, I didn't do sql updates for fix #5 (URL with trailing . or , or etc). There were 104 links with that problem, though some were deleted as they were dead even after removing the trailing character. Example. How would a query be built for this case? I guess change the archive_url field, but not sure if the main url field changes. Or maybe not worry at all, since the url will presumably never show up again in other articles, since it was a typo unique to that article. -- GreenC 14:34, 29 June 2016 (UTC)

(talk page stalker) I'm not familiar with the bot's SQL schema, but something like url LIKE '%,' should catch URLs ending with a comma. nyuszika7h (talk) 16:15, 29 June 2016 (UTC)

Any URL entry that is misformatted is best deleted from the DB altogether.—^cyberpower_Chat:Offline 10:41, 30 June 2016 (UTC)

Ok good idea. Would it be:

SET `url` = CASE `url`

WHEN 'http://www.nspcc...' THEN NULL

Or the command to delete a whole record?

DELETE from <dbname>

WHEN url = 'http://www.nspcc...'

-- GreenC 13:10, 30 June 2016 (UTC)

It would be DELETE FROM externallinks_global WHERE `url` IN ( 'url1', 'url2', .... );

However, the externallinks_enwiki table has a dependency on the global table. So those dependent entries would need to be purged too. To take care of those, we need the command

DELETE FROM externallinks_enwiki WHERE `url_id` IN (SELECT `url_id` FROM externallinks_global WHERE `url` IN ( 'url1', 'url2', ... ));

—^cyberpower_Chat:Online 14:00, 30 June 2016 (UTC)

Cyberbot bug?

[2] deleted two other non URL based references when archiving one — Preceding unsigned comment added by Dresken (talk • contribs) 10:39, 30 June 2016 (UTC)

Stale

– This bug report is stale.

—^cyberpower_Chat:Online 14:01, 30 June 2016 (UTC)