Jump to content

Wikipedia:Link rot/URL change requests/Archives/2021/October

From Wikipedia, the free encyclopedia


Gamasutra

The website http://www.gamasutra.com has gone offline permanently. Earlier this year, they rebranded themselves to Game Developer and changed their address to http://www.gamedeveloper.com. For a period of time, Gamasutra addresses were redirecting appropriately but that no longer seems to be the case. While some Gamasutra articles have been relocated to the new site, some have not, and those that were carried over are not formatted very well. I am requesting a bot run through Wikipedia and mark all http://www.gamasutra.com links as dead and append archive.org links as able. TarkusABtalk/contrib 21:52, 1 October 2021 (UTC)

Gamasutra articles are now redirecting appropriately it seems. Best option may be to just add archive links to the old Gamasutra versions if available, but not mark the links as dead. TarkusABtalk/contrib 12:01, 2 October 2021 (UTC)
@TarkusAB:, the bot is not designed to add archive links to live links but we can be confident archives exist at Wayback so if/when the old domain stops working it will be possible to add the archives then. Another option, the bot could determine what the redirected URL is, and move the URL to it. This assumes you are confident the content at the new site is equiv to the old site, for cite verification purposes. If the new site is significantly different, we should treat the old site as dead and add archives to lock in the old site content. What do you think is best? -- GreenC 16:52, 2 October 2021 (UTC)
Based on my spot check, the new version seem to be generally satisfactory. However, I am not confident enough to move the URLs, and there are cases where the old site is more legible as in:
Remo, Chris (May 22, 2009). "Interview: Riot Games On The Birth Of League Of Legends". Gamasutra. Archived from the original on March 6, 2016. Retrieved January 10, 2021.
Ideally, I'd like to format the sources like that, with the "live" redirect remaining in place but with an archive link appended to the old Gamasutra version if an archive exists. Can the bot do that? If not, I think we're OK with status quo. As you said, the archives will always exist. TarkusABtalk/contrib 11:13, 3 October 2021 (UTC)

Browsers are beginning to withdraw support for FTP (Firefox 90+, Chrome from nowish?, Edge 88+). We have some 8k links to insecure ftp sites onwiki in article space. Is there anything we want to do about that fact? Figured I'd start here to ask the question and we can take it somewhere else if we want to organize some sort of party for it. Maybe Internet Archive Bot can go and download and have IA serve files from an ftp URL as a proxy? IDK if that would be supported in a browser. It looked like some browsers would not be happy with cross-protocol data service. Izno (talk) 23:46, 4 October 2021 (UTC)

Good point, but i dont think we should take off links because Chromium based browsers remove support for it. Safari is still going strong and still supports FTP. Along with Pale Moon, Seamonkey, Old Edge, etc.... (More people use those browsers than we are led to believe).
I definitely support your idea of adding archives for a ftp "proxy" for the people on the newer browsers. It would be useful for the vast majority on a Chromium based browser (Google Chrome, Edge, Ungoogled Chrome, Opera, Vivaldi, Brave, etc.), and also serve as an archive - killing two birds with one stone. But wayback doesn't support FTP archiving by default, does it? The FTP archiving by the Wayback is deceptive: it only archives FTP servers that have a HTTP interface, even if you type in "ftp://" in the wayback search thing. I Could be wrong though, cyberpower or greenc can advise further. Rlink2 (talk) 01:07, 5 October 2021 (UTC)
Three of the major browsers going this way is 60% of our viewership, and given Apple's opinions on security, I would be surprised if Safari, the last major browser (25%ish of views), will be left for too long supporting ftp. Actually, it apparently already doesn't support it directly but passes it to another application on the browser system, which is how Firefox does things. (I guess Chromium too maybe?) Izno (talk) 01:43, 5 October 2021 (UTC)
Oh, I didn't know that. Then yes I 100% agree, no use having a link that no one can click on. I do know Chromium already had a very gimped FTP support, so I don't think much will be lost by them removing a insecure FTP protocol. Thanks for this insightful idea. But we'll have to see what the archive sites say about this. Also, keep in mind that 99% of all FTP sites have a HTTP interface with them already. I have yet to encounter a FTP server that doesn't also have a serverside HTTP interface for web browser viewing. So maybe part of the solution is to switch the "ftp://" to just "http://" - should work for at least some of them. Rlink2 (talk)
Izno thank you for raising this and Rlink2 thank you for the technical insights. Based on spot checks, Wayback Machine does save ftp:// links if they have a HTTP interface example but won't save if no HTTP interface example. I suspect most ftp:// links on enwiki are saved at Wayback, also based on spot checks. A number of systems don't support ftp:// such as the Wayback Availability API and IABot itself, so converting to http:// would be a triple win, and other tools will benefit (Citation bot, refill etc). A problem scenario is converting to http:// creates a 404 because the link is currently dead, and it won't be in Wayback Machine as http:// since it was saved as ftp:// so some bespoke steps would be required to find the links since the Wayback Availability API does not work with ftp://. Thus, some links will convert to http:// if the link is live and has a HTTP interface, some to archive URL with a ftp:// source URL if the link is dead and has a HTTP interface, and some untouched since there is no HTTP interface. What do you think? -- GreenC 02:55, 5 October 2021 (UTC)
My bet is it's best to start with doing a sweep for all FTP protocol links that appear to have an HTTP interface and replacing those. Then go back and look at the ones that failed: check if either there's an archived FTP or HTTP version, and if not, and the reference is still live, request for it to be archived. Otherwise tag it as a dead link. There'll probably need to be some human steps in that, but I oddly feel that FTP sites tend to not suffer from link rot as much as a lot of modern sites, so I'm hoping the number of "sites with odd states" will be (fairly) low. Perryprog (talk) 02:59, 5 October 2021 (UTC)
There are enough scenarios it should be written out to not miss anything but I'm pretty sure most of it can be automated with the rest logged for manual review. One thing though, if there is no HTTP interface (and the FTP link is working) there is nothing to be done. It's not possible to archive non-interface links so no archives could possibly exist - I've tried saving a working non-interface example at wayback, archive.today, ghostarchive.org and webrecorder.io - if someone discovers a place where they might be archived then it's different. -- GreenC 04:34, 5 October 2021 (UTC)
I know that ghostarchive.org said something on their blog about supporting blob files, after all it is a "general purpose" archive site. Will email them to see if it is something that can support can be added for. Other than that it sounds like a good idea, thanks. 8k links is relatively small, i think? Rlink2 (talk) 13:42, 5 October 2021 (UTC)

Results of bot:

  • A. Pages checked: 7,626
  • B. Pages edited: 2,294
  • C. Non-interface FTP link alive (skipped): 2,811
  • D. Interface FTP link dead, bot added an archive URL: 665 (example)
  • E. Interface FTP link alive, converted from ftp:// to http:// : 2,501 (example)
  • F. Interface or non-interface FTP link dead, no archive available (skipped): 6,876

I think that is everything. For the majority nothing could be done (C + F). But it was able to do something with D + E. Next step I guess would be find a way to archive the C links before they die (list available on request). @Rlink2, Perryprog, and Izno: -- GreenC 03:20, 11 October 2021 (UTC)

I found many broken links on time.com starting with this prefix: many of these broken links have not yet been replaced with archived links. Jarble (talk) 02:17, 16 July 2021 (UTC)

Confident the entire time.com domain is full of unfixed dead links, hard and soft, on enwiki and the iabot database and across all mediawikis - sigh. This will be my next project after I finish world-gazetteer ^ -- GreenC 04:28, 16 July 2021 (UTC)
There are about 25,000 links to www.time.com that are dead. They seem to have gone through 4 phases. The pages were created prior to 2013 (ca. 2006-2012), and the articles were available in full. Sometime after, Time truncated the articles so the free version was the first few paragraphs and subscribers got the rest. Then sometime later (post 2016?), Time moved the content to a new subdomain content.time.com with a redirect. Then sometime later, Time removed the redirect. That's where we are now, dead links with no redirect. So my bot finds the earliest working Wayback page prior to 2013, because it was still readable in full, and because Wayback has really good coverage of Time going, well, way back in time. -- GreenC 21:26, 20 July 2021 (UTC)

Results

  • Edited 22,098 pages
  • Added 25,593 new archive URLs
  • Added 212 {{dead link}}
  • Various other general fixes (convert archive.today URL format etc..)
  • Updated 56,758 URLs in the IABot database, adding archives and setting status to dead. These are unique URLs with impressions across 100s of wikis, will likely result in 100s of thousands of links rescued, via IABot.

@GreenC: Great work, as always ! Alexcalamaro (talk) 20:46, 12 October 2021 (UTC)

leighrayment.com

Would someone please run a bot through and nullify any reference that points to leighrayment.com as that domain has been hijacked and is now a gambling site.

billinghurst sDrewth 13:07, 14 August 2021 (UTC)

See commentary about possible alternate site Wikipedia:Administrators' noticeboard#FYI hijacked sites and citation botbillinghurst sDrewth

@Billinghurst: in case you didn't see my reply at AN about the domains being blacklisted which also blocks the bot. I only checked seapower but assume leighrayment is also. -- GreenC 17:40, 14 August 2021 (UTC)

@GreenC: Thanks. I have not blacklisted "leighrayment.com"—at this point. With regard to "seapower-digital.com" I have whitelisted the domain locally so the links can be fixed (apologies, I sort of did it earlier and then didn't save it (too many tabs open). — billinghurst sDrewth 03:09, 15 August 2021 (UTC)
@GreenC: At User talk:Citation bot/Archive 27#Citation bot and picking up text from hijacked sites BrownHairedGirl suggests to use {{Rayment}}. — billinghurst sDrewth 04:59, 15 August 2021 (UTC)
@Billinghurst: {{Rayment}} in some cases, but in most cases the appropriate template will be another one in the same family, e.g. {{Rayment-hc}} or {{Rayment-bt}}. Most of these templates need parameters, so applying the template needs some care; it can probably be automated if careful URL parseing is applied. --BrownHairedGirl (talk) • (contribs) 05:07, 15 August 2021 (UTC)
I'm going to pass on the conversion, it is not simple. Also the templates use hard-coded archive snapshots (here) which may or may not work for every URL. IMO if anything were done they would be converted to CS1|2 so standardized tools can maintain them long-term (IABot, WaybackMedic, Citation bot, etc). Custom templates like this are a big and growing problem with maintenance because each require custom programming. -- GreenC 15:25, 15 August 2021 (UTC)
@GreenC: On the contrary, the tempalates saved a huge lot of edits. When Rayment's website died, it took only a few edits to update the templates to use the archive. The headache would have been in making repetitive edits to thousand of pages. --BrownHairedGirl (talk) • (contribs) 20:12, 18 August 2021 (UTC)
It's not a headache when there are bots. Now you have a headache - finding the archive links that don't work on a per-instance basis because they don't have snapshots that work with that static date. How do you plan on fixing it? -- GreenC 00:38, 19 August 2021 (UTC)

@Billinghurst:, it is done. Edited about 2500 pages and 3 to 4 thousand cites. Any problems let me know. -- GreenC 20:40, 15 August 2021 (UTC)

Thanks. I have added this to COIBot's monitor list and if it becomes problematic (which isn't at the moment) then I will blacklist. — billinghurst sDrewth 12:23, 16 August 2021 (UTC)

Bump. -- GreenC 14:09, 21 October 2021 (UTC)

I put this on the talk page, but i think here might be more appropriate for finding someone who could fix it. Jennifer Rush link Jennifer Rush's Official Website http://www.jennifer-rush.com/ The external link to the "official website" leads to an empty page. Is this a scam by someone who grabbed the name when it wasn't being used and is now somehow benefitting from all the wikipedia traffic? I don't want to remove the link in case it is somehow useful, but someone should look into it. THANKS --2600:1700:1C60:45E0:5CF7:A2D3:774D:D411 (talk) 09:06, 22 October 2021 (UTC)

I have no idea what the correct page should be maybe that is it. -- GreenC 00:31, 26 October 2021 (UTC)

80stvthemes.com is dead

The site has been down for ever and probably isn't coming back. Articles using the URL are here: https://en.wikipedia.org/w/index.php?search=insource%3A%2F80stvthemes%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1 . Maybe IABot can fix these links? Rlink2 (talk) 03:52, 24 October 2021 (UTC)

Ok will look into once finished the above judi slots. -- GreenC 00:32, 26 October 2021 (UTC)