Jump to content

Help talk:Using archive.today

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

September 2013

[edit]

One of the editors posted the following edit summary, when removing the words commenting on the copyright issues:

"Remove grossly inappropriate description, US law is not world law, so don't apply it here, not to mention this is seriously confusing what copyright really is, whether there is or isn't robots.txt is irrelevant and libraries have different laws."

First, no one has claimed US law is world law. All I am saying is that Wikipedia does not want (and is not legally able) to violate US copyright law, nor does it want to incur endless DMCA take-down requests, which will surely be the result if people start linking Wikipedia articles to unauthorized archived copies of copyrighted works. Pointing out that copyright laws are different in other countries is obviously irrelevant.

Second, I agree that the honoring of robots and the honoring of copyright laws are two different things. However, as the proposed wording explains, robot exclusion files are the only known means used by responsible web archives to avoid copyright infringement. If Archive.is has some other way of avoiding copyright infringement, that would be fine. But they don't. The Archive.is contains a large amount of copyright infringing material, which anyone can see for themselves. (See an example on the Wikipedia article on Archive.is, but you better hurry, because there is a nomination for deletion of that article.) So, the fact that Archive.is refuses to honor robot exclusions for copyrighted material is closely related to the fact that they are violating copyright law.

Third, the editor says "libraries have different laws". I don't know what that is supposed to mean, but if anyone thinks it means that libraries or online archives are allowed to violate copyright law, they are mistaken.

Fourth, the editor says the proposed text is a "grossly inappropriate description", but justification for this claim is based on the misunderstanding noted above. The proposed text is entirely appropriate. Wikipedia should not be a party to copyright infringement. Can we at least agree on this?Weakestletter (talk) 21:12, 23 September 2013 (UTC)[reply]

The text I removed is misrepresenting, clearly written in a biased and non-neutral tone with a prejudice against Archive.is. This is a how-to guide, instead the users are presented with this text in boldface, first thing on the page, saying they shouldn't use it, and making legal claims. If you cannot see how this is inappropriate, I'm afraid I will not be able to explain it.
I replaced the text with the most straight-forward, unbiased version that makes no assumptions or claims: "Note: Archive.is does not follow robots exclusion standard and may archive content that the website owners have excluded from automated crawlers.". Everything else is merely adding inflammatory language instead of being a helpful guide. And you cannot be making legal claims on Wikimedia's behalf.
For your first point, archive.is is not in US. They have no obligation to follow DMCA or any US law. And they are not breaking US/CA copyright laws, which is what Wikipedia uses. We link to thepiratebay.sx from The Pirate Bay. By your assertion, Wikipedia is breaking the law.
For your question about libraries, US library services (in the way archivers are classified) have different laws regarding copyright infringement, mainly that they do not infringe anything if they follow certain US rules (this is even in the link attached). Google is such a service. robots.txt is just one way of following those US rules. —  HELLKNOWZ  ▎TALK 22:10, 23 September 2013 (UTC)[reply]
Many US magazines and publishing houses (and I suppose they know about copyright much more than us and take care on it) see no problem to use archive.is and to link it. I suggest to remote the copyright alarm as lame. 88.15.83.61 (talk) 19:35, 24 September 2013 (UTC)[reply]

About the recent tag edits

[edit]

It was from an editor near the end of Wikipedia:Deletion review/Log/2013 October 28, followed by its reversion by me. --Lexein (talk) 14:40, 29 October 2013 (UTC)[reply]

No it isn't needed at all. This passed MFD, period. The dispute was resolved howto kept. Nobody brought up WP:Using Archive.is during the WP:Archive.is RFC discussion period. It wasn't considered relevant, and it isn't. Tag removed. Please don't re-add it. If you're dead set against deletion, start another MFD. People do it all the time, with rarely changed results. --Lexein (talk) 23:35, 29 October 2013 (UTC)[reply]
Taken to Wikipedia:Administrators'_noticeboard/Incidents#De-linking_of_Wikipedia:Using_Archive.is_a_challenged_How-To_to_its_RfC. --SmokeyJoe (talk) 00:17, 30 October 2013 (UTC)[reply]
[edit]

How do I properly link to http://archi ve.is/jPlGB (added space) in a reference for an article? (It *was* http://kappapiart.org/join.html)Naraht (talk) 17:48, 5 May 2016 (UTC)[reply]

@Naraht: archive.li/jPlGB may work but it looks like the content you wanted to reference has moved to a new address: http://kappapiart.com/new-members. —LLarson (said & done) 18:52, 5 May 2016 (UTC)[reply]
@LLarson: Thanx! for *both* how to use the site (with a different country) *and* finding the new national page. Odd that it doesn't show up on the first page of google.Naraht (talk) 18:59, 5 May 2016 (UTC)[reply]

RfC: Should we use short or long format URLs?

[edit]
The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the conclusions reached follows.

There is clear consensus that long form URLs are preferred. Long forms include timestamps and the original URL. Short forms can be used to mask the destination and circumvent blacklistings. Adding short form URLs should not result in warnings and/or sanctions against good faith editors.

Any URLs in a short form should be converted to long form. This can be done by any editor. There is also clear consensus that a bot automatically convert short form URLs and tag articles using blacklisted URLs.

Example long URL forms which include timestamps and the original URL:

  • archive.is: http://archive.is/YYYY.MM.DD-hhmmss/http://www.example.com
  • WebCite: http://www.webcitation.org/5kbAUIXb6?url=http://www.example.com/
(Protocol is outside the scope of this RfC.)
— JJMC89(T·C) 00:35, 5 August 2016 (UTC) 03:27, 5 August 2016 (UTC)[reply]

This RfC is to gauge community consensus about the preferred URL format for archive.is and WebCite when used in citations.

Both sites permit two URL formats, a shortened version and a longer version. Examples:

archive.is
WebCite
(pretend they go to the same link)

Which one is preferred, or are either equally appropriate?

Related information:

  • The document Using WebCite says "Either is appropriate for use within Wikipedia," while the document Using archive.is says "The [longer] is preferred for use within Wikipedia because it preserves the source URI."
  • During the archive.is RfC, some users brought up concerns short URLs can hide spam links, noting URL-shortening services such as bit.ly have been blacklisted from Wikipedia.
  • A user at the German Wikipedia said they mandate long form because if the archive service went out of business, they would still know what the original URL was.
  • Reverse engineering a shortened link to find the original URI can be done using the WebCite API, or web scraping in archive.is case.
  • WebCite and archive.is default to the shortened version when creating a new link or in page display.

Please leave a !vote below, such as short or long or either. -- GreenC 21:50, 5 July 2016 (UTC)[reply]

Discussion

[edit]

This shouldn't be an RfC — we always use long. Link-shorteners are not allowed. Carl Fredrik 💌 📧 23:48, 5 July 2016 (UTC)[reply]

Following the below discussion lets go with long, forbid short. And I don't mean block short, just to automate conversion to long links. Carl Fredrik 💌 📧 08:58, 6 July 2016 (UTC)[reply]

Is it link-shorteners that are forbidden or is it redirection sites? What I've been able to find is Wikipedia:External links#Redirection sites. If there isn't an explicit existing guideline then it makes sense to show consensus for one for similar reasons as for redirection sites. PaleAqua (talk) 05:20, 6 July 2016 (UTC)[reply]
  • Can/should a bot be set up to automatically convert short links to the long form? Especially given the default link returned by the archive services is a shortened version. PaleAqua (talk) 03:50, 6 July 2016 (UTC)[reply]
    Could a bot do it? Yes in most cases. But without community consensus (RfC) for using the long form URL, it could have trouble in the bot approval process or during the bot run should someone raise an objection. -- GreenC 04:22, 6 July 2016 (UTC)[reply]
    Figured that. I prefer long, but think that rather than forbidding short links that they should be converted to long links, either manually or via bot. Archive links should be matchable to the dead url which is harder to do with shortened links. I'd also like to see the URLs to archive.is unified if possible as during the ban it seems aliases were used to bypass the filters. ( I supported unbanning, but still think it's important to be aware of the prevalence of the links etc. ) PaleAqua (talk) 05:20, 6 July 2016 (UTC)[reply]
    Would also support mixed-format for WebCite per comments by Boshomi below. PaleAqua (talk) 12:40, 28 July 2016 (UTC)[reply]
  • Prefer long, but don't forbid short - archive.is gives you the short version of the link by default, but the long version contains the source URL - so running a bot to convert them to the long version strikes me as the ideal solution - David Gerard (talk) 08:08, 6 July 2016 (UTC)[reply]
  • (edit conflict) Prefer long, but don't forbid short – per above. Though short URLs are nice, it's probably best to use long URLs given the concerns such as the possibility to bypass the spam blacklist. I would support having a bot convert them to long URLs rather than forbidding short URLs, and unifying all archive.is domain links to "https://archive.is/" (while you're at it, change HTTP to HTTPS as well) is a good idea as well. Note that "preserving the source URI" is not a problem if editors use the {{cite web}} template correctly by placing the original URL in the |url= template and the archived version in the |archiveurl= template. Also, note that WebCite uses a query parameter for the original URL in the long form, so it may still be possible to bypass the blacklist. Perhaps the easiest would be to have a bot also check those and perform URL decoding and matching against the blacklist, and reporting if it finds a blacklisted link. nyuszika7h (talk) 08:10, 6 July 2016 (UTC)[reply]
Note: Please see my comment below, I have now realized having a bot always clean up after new link additions is not a good idea. nyuszika7h (talk) 13:45, 10 July 2016 (UTC)[reply]
@Nyuszika7H: Good idea about matching against blacklist. There are also bare links that don't use cite templates but could use a template like {{wayback}}. Do you know other domains for archive.is and webcitation.org ? — Preceding unsigned comment added by Green Cardamom (talkcontribs) 14:01, 6 July 2016 (UTC)[reply]
@Green Cardamom: As far as I know, WebCite doesn't have any alternative domains. Archive.is has archive.is, archive.today, archive.li, archive.ec and archive.fo. nyuszika7h (talk) 14:09, 6 July 2016 (UTC)[reply]
@Green Cardamom: Freezepage also has short form (example http://www.freezepage.com/1465141865DVZXYCBROO). 93.185.30.40 (talk) 15:33, 6 July 2016 (UTC)[reply]
  • Long. Let's remove one concern off the minds of the people who oppose this service: Obfuscation by spammers. —Best regards, Codename Lisa (talk) 08:52, 6 July 2016 (UTC)[reply]
  • Long. If there is any possibility of spammer abuse with the short links, then the short form will be added to the blacklist anyway. ~Amatulić (talk) 13:42, 6 July 2016 (UTC)[reply]
  • Long for bare links, Short for use in templates like {{cite web}}. They preserve the original URL and the capture date in other arguments so it would be an ugly duplicate and another RFC about how to deal with that duplication of long urls, by introducing special templates for the archive links or by developing {{cite web}} to calculate archive links by itself or somehow else. 93.185.30.40 (talk) 15:28, 6 July 2016 (UTC)[reply]
Wouldn't the spam problem remain with short use in templates? Theoretically a bot could do the verification, constantly checking a link over and over again to make sure it matches the url argument, but that's a lot of network resources and bot maintenance. Ideally a bot converts from short to long 1 time, that way editors can visually see the URL as being legit or not, and other bots can check for blacklisted links. -- GreenC 16:18, 8 July 2016 (UTC)[reply]
  • Better to set up a bot to change short links to long as soon as they are inserted. It is not a job for a human to read such a warning and then concatenate strings. 93.185.30.244 (talk) 11:43, 10 July 2016 (UTC)[reply]
  • Nope, it is not better - the bot would not be able to save the converted links (as they were blacklisted), and then someone has to later come and a) remove the 'redirect', and b) find the original editor who inserted the evading link (and take action where needed). It is better to prevent, and educate (which in itself prevents the number of edits, and that a bot has to do a rather cosmetic edit, and that the bot may make mistakes in the conversion and that all of these need to be double checked by yet another human). --Dirk Beetstra T C 12:46, 10 July 2016 (UTC)[reply]
  • If the bot notices it's blacklisted, it could place a tag on the article like already done for existing links directly found on the blacklist. Though I suppose an edit filter which tells users how to obtain the link could work (for WebCite it's straightforward, for archive.is the user needs to click the "share" link first), might be simpler than having a bot do the cleanup now that I think about it. But the edit filter shouldn't block those edits completely, at least initially, because articles currently containing short URLs need to be cleaned up – ideally by a bot. nyuszika7h (talk) 13:01, 10 July 2016 (UTC)[reply]
    • I was mainly suggesting this for the newly added links. What is there now should indeed be changed by a bot first, and some cleanup regarding the few that might be there that are linking to otherwise blacklisted material. --Dirk Beetstra T C 13:19, 10 July 2016 (UTC)[reply]
  • If we go with long-only agree an edit filter for new link additions is a good idea, and a 1-time bot to cleanup old cases, which is significantly more likely to get done than a demon bot, which has added complexity and maintenance. -- GreenC 15:13, 10 July 2016 (UTC)[reply]
  • I don't like the idea of blocking short links. The entire reason why we allowed archive.is is to make it easier to edit, to add more hoops you need to jump through is negative. I suggest a continuous bot that always runs and checks whether the short links are being used — and automatically replaces them. What we need is for editing to be easy — forcing people to jump through hoops will only cause them to think its too much of a hassle, abandoning their edits. Carl Fredrik 💌 📧 11:15, 13 July 2016 (UTC)[reply]
  • @CFCF: As I state below, the situation is not much different from the current situation with url-shorteners - people there do have to go back and repair their link (or may choose to abandon their edits). But I would at first just suggest a 'warning-only' edit-filter (obviously not with a warning, but with a good faith remark), and have a bot lengthen all the short links that are there, and those that are still being added anyway. In that way it all gets tagged (and with a bit of luck many people will lengthen their links, avoiding the extra bot-edit), and it is easy to check what gets added anyway, and if that indeed shows significant real abuse of the short links (real spammers don't care about warnings, bot reverts, blocks on IP addresses, etc.) then we can always reconsider the situation (though I don't expect that it will become that bad - and there may be intermediate solutions as well). Do note that a template solution would remove the need for the user to add the link at all - the original url is in one parameter, the archive url is in another parameter - one could chose to have the template construct the archive link (which will work for most) from the original link and a archive-date-parameter. That would make the work easier for the editor. --Dirk Beetstra T C 11:47, 13 July 2016 (UTC)[reply]
  • Agreed, documentation of the long form syntax is a simple 1-time learning curve; and use of templates for external links (eg. {{wayback}}). These are the correct solutions already used by other archives like Wayback. Allowing short URLs against policy, then assuming a bot will fix intentional mistakes forever, is not a good idea for a bunch of reasons. Anyway I certainly won't be writing a bot to do that. Maybe someone else will... we are talking 100s of hours of labor and a personal commitment for unlimited years to come. These bots don't run themselves they have constant page formatting problems due to the irregular nature of wikisource data, it's not a simple regex find and replace you have to deal with deadurl, archivedate, {{dead}}, soft-404s, 503s, 300s at archive.is etc.. its complex and difficult to write and maintain. -- GreenC 13:18, 13 July 2016 (UTC)[reply]
  • Long only but no conduct enforcement against individuals who use short. The only remedy against short url's should either be automatic change by a bot (strongly preferred) or being changed by the editor who finds them and objects to them, or both. Whether an individual editor only includes short ones one time or does it routinely even after being informed that long ones are required should not subject him/her to sanctions. This is a very minor rule applicable to only a couple of sources and if we're going to have a rule, it needs to be drama-free. The only time conduct enforcement should be involved is if someone edit wars over it (and there absolutely shouldn't be a 3RR rule exception for changing or reverting short url's) or the short url is clearly being used for a nefarious or unencyclopedic purpose. Regards, TransporterMan (TALK) 18:19, 12 July 2016 (UTC)[reply]
  • Long, Second TransporterMan. Long is better, and maybe have a template to mention it to people. Probably not even that. I don't care how many times someone "violates" this, it's a ridiculous thing to punish people for. It's that kind of rule creep that people complain about, because it discourages new editors. Just fixing it is just standards maintenance. This is something a bot can handle with ease, so lets not waste peoples time with it. Tamwin (talk) 06:48, 13 July 2016 (UTC)[reply]
    • @TransporterMan and Tamwin: Nobody suggested punishing people for it (unless of course they are using it to evade the blacklist). Note how Special:AbuseLog reads at the top: "Entries in this list may be constructive or made in good faith and are not necessarily an indication of wrongdoing on behalf of the user.". – nyuszika7h (talk) 08:21, 13 July 2016 (UTC)[reply]
    • @TransporterMan and Tamwin: I want to second this, except for deliberate disruption (e.g. intentional evasion of the blacklist) there will/should be no punishment. Even now, if you add a blacklisted link you are not punished, you are just blocked to save the edit. No-one will/should go after you when you are, in good faith, blocked by the spam blacklist. That will/should not change. The problem with allowing the short links is that a) people may, in good faith, add blacklisted links which may need removal, b) people will use the short links to evade the blacklist (they do that with regular url-shorteners on a regular basis), c) every addition of a short link will need to be followed up with another edit by a bot (and maybe even checked whether the bot was always correct) - all of which can be avoided. The situation in the end will not be different from editors inadvertently using an url-shortening service (or similar 'bad' links), something that happens regularly due to online services who standard shorten certain links, who will now not be able to save their edits, having to go back, and 'lengthen' their current urls. --Dirk Beetstra T C 08:59, 13 July 2016 (UTC)[reply]
  • Long form, convert short, in the way User:CFCF suggests. Use automated conversion to the long form. The short form obfuscates the archived page, while the long form provides essential information about the archived data at a glance, which are the date, and the original address. Users can insert both the long and short forms, but a bot or a script would automatically convert the short form into long (without any warning messages or similar cautionary measures). It should be mentioned in the appropriate WP pages that the long form is preferred. —Hexafluoride Ping me if you need help, or post on my talk 13:46, 14 July 2016 (UTC)[reply]
  • Long, forbid short, no punishment, no first-offense talk page template. Unlike something like signing (which SineBot gives a talk page warning for after just 3 in 24 hours, in addition the the "unsigned comment" shame template on the message itself), making this mistake and forcing the bot to fix it isn't really disruptive, it just adds an extra diff to the history. I suspect that this will be decided at bot approval, but they'll read this RfC anyway, an I think that any reasonable editor will get the point when they read the bot's edit summary from their watchlist, and won't need a talk page WP:BITE. Any talk-page warnings should have a high margin to get slapped with, like 10 bad cites over at least 3 days. It's a little issue, but the little issues are big issues when it comes to new editor retention. Jhugh95 (talk) 17:56, 25 July 2016 (UTC)[reply]
  • Long form, convert short to preserve the address to the original content.--Adam in MO Talk 20:00, 26 July 2016 (UTC)[reply]
  • Long format – Accept short format from editors but have a bot convert it to long format. — JFG talk 09:44, 28 July 2016 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

archive.is in HTTPS

[edit]
Moved from Talk:Archive.is

@93.185.30.56: I see that you're from Russia, where archive.is is blocked in HTTPS. However, the situation varies. China for example block HTTP but not HTTPS. archive.is is HTTPS now by default. There's no policy on how to link to archive.is. And seeing how most of the world can access it through HTTPS, then linking to it in the encrypted form is better than exposing traffic to an unsecure link. Users in countries where one form or the other is blocked should make the extra effort to modify the link (HTTP/HTTPS) when trying to access it. —Hexafluoride Ping me if you need help, or post on my talk 13:33, 14 July 2016 (UTC)[reply]

Also note that Wikipedia itself is only available in HTTPS, so the point is pretty much moot (as in irrelevant). nyuszika7h (talk) 13:49, 14 July 2016 (UTC)[reply]
Hexafluoride, also note that pinging IPs does not work, try leaving a talkback on their talk page instead if you want to get their attention. nyuszika7h (talk) 13:50, 14 July 2016 (UTC)[reply]
@Nyuszika7H: I've mistakenly put this in the archive.is talk, instead of Wikipedia:Using archive.is. I'm moving it there. I've also Tb'ed the IP user. —Hexafluoride Ping me if you need help, or post on my talk 14:00, 14 July 2016 (UTC)[reply]

There was an RfC that decided to use use https for archive.org .. I don't see why archive.is would be any different result if there was another RfC. The argument that certain countries block https, would that be solvable with Wikipedia:Protocol-relative URL? — Preceding unsigned comment added by Green Cardamom (talkcontribs) 15:13, 14 July 2016 (UTC)[reply]

@Green Cardamom: As I said, Wikipedia itself is only available over HTTPS, so it's pointless. nyuszika7h (talk) 15:17, 14 July 2016 (UTC)[reply]
you have misinterpret the point. The countries do not block the https as whole, they block particular domains. https://archive.is is blocked in Russia, but https://archive.li is not. As for China, another way of blocking there (https is available and http is blocked) is not relevant here because Wikipedia is blocked in China, so it is pointless to optimize the Wikipedia pages targeting China. https://archive.is is the only protocol-domain combination (among 10: {http:archive.is, http:archive.today, http:archive.li, http:archive.ec, http:archive.fo, https:archive.is, https:archive.today, https:archive.li, https:archive.ec, https:archive.fo}) with geo-problems. Clicking such links from Russia (where Wikipedia is not yet blocked) lead to a network error, not even to a message "the page is blocked", thus the https://archive.is links look like dead links. If you insist on using https, consider linking to another domain (archive.li, .today, .fo or .ec - they all do work with https well) 93.185.28.66 (talk) 15:29, 14 July 2016 (UTC)[reply]
If they block archive.is, I don't think it's worth playing whack-a-mole with them, they will eventually block other domains if they get widespread use anyway. nyuszika7h (talk) 15:32, 14 July 2016 (UTC)[reply]
The problem is different (you may read about it in Russian Wikipedia and new sites). "They" do not "want" to block archive.is. They need to censor few pages which are illegal in Russia. With http - because it is not encrypted - only those pages are blocked and replaced with the message explaining why it is blocked. With https - because it is encrypted - it is not possible to block a particular page, only whole domain. So, using another domain can be considered a whack-a-mole while using http is a fair play with the big brother: he sees your unencrypted traffic and tries to be the least disturbing as possible.93.185.28.66 (talk) 15:38, 14 July 2016 (UTC)[reply]
* BTW, https://archive.today/any-path-here redirects to http://archive.is/any-path-here in Russia and to https://archive.is/any-path-here outside and serves no content, only redirect so it has the smallest chance to be blocked. It can be a good compromise as default form of linking to archive.is from Wikipedia. It should make everybody happy: editors who like to see https links in Wikipedia, visitors from Russia who like to see the content instead of network error and Russian gov who like to setup fine-grained censorship. 93.185.28.66 (talk) 16:00, 14 July 2016 (UTC)[reply]
The real question is, what is best for the Wikipedia project, not what is best for users in Russia. We've already had an RFC on this. The western world is gradually migrating to everything being https. What other governments choose to censor, whether they isolate their populations or not, isn't really a concern for the Wikipedia project. It would not be hard for the Russian government to advise everyone to try http instead of https if something doesn't work. I see no compelling reason to make http the default when secure connections are always the better option for the vast majority of users outside of Russia. ~Amatulić (talk) 22:31, 14 July 2016 (UTC)[reply]
I do not think you are in a position to represent the trends of the western world. For example, Reddit uses http linking to archive.is. Encrypting transmission of the archived content which is available to anyone is ridiculous. 93.185.28.66 (talk) 23:50, 14 July 2016 (UTC)[reply]
What Reddit does is irrelevant to Wikipedia. And encrypting transmission of archived content is useful to avoid having one's interests tracked by unknown parties and governments. If it is "ridiculous" as you say, then convince your government to stop blocking https on archive.is. You haven't offered a compelling argument to force Wikipedia to go against its own consensus to use https, in fact there's a better argument to force the Russian government to allow https. You're welcome to start another RFC if you believe you can change the community consensus. ~Amatulić (talk) 07:41, 18 July 2016 (UTC)[reply]

OK, it seems archive.is has now started redirecting HTTPS requests to archived copies (not the main page) to HTTP, with an annoying 10-second delay. nyuszika7h (talk) 16:24, 25 July 2016 (UTC)[reply]

Hi. I think we should keep adhering to MOS:WEBADDR then. Best regards, Codename Lisa (talk) 07:30, 26 July 2016 (UTC)[reply]
in dewiki we use only https. we transfromed all http URLs Boshomi (talk) 20:38, 27 July 2016 (UTC)[reply]
According to the site's owner, they may stop offering HTTPS altogether on archive.is in the future, but it will continue working on other domains such as archive.fo. [1] [2]nyuszika7h (talk) 22:10, 4 August 2016 (UTC)[reply]
If I'm reading those right it sounds like they put up the redirects to some IPs as an attempt at anti-bot protection. I don't see the quite about stopping HTTPS altogether though I might have missed it. PaleAqua (talk) 00:01, 5 August 2016 (UTC)[reply]
[edit]

Regarding this section:

There is no legal precedent establishing that this kind of hosting of U.S. copyrighted material without permission is a violation of the U.S. Digital Millennium Copyright Act (DMCA)and no plausible legal theory that normal use of archive.is links in citations would implicate Wikipedia in violations of copyright laws or prompt DMCA take-down requests, but for other uses, editors may want to use Archive.is links with some caution with U.S.-copyrighted content.

Recently modified from the original:

Re-hosting U.S. copyrighted material without permission may be a violation of the U.S. Digital Millennium Copyright Act (DMCA) - for this reason, to avoid implicating Wikipedia in violations of copyright laws and incurring DMCA take-down requests, Archive.is should be used with some caution regarding U.S.-copyrighted content.

Neither of these are sourced, and they give somewhat conflicting POVs. Do we think it's a good idea to give legal opinions in this essay? -- GreenC 18:26, 17 November 2016 (UTC)[reply]

@Green Cardamom: Here Vice's journos asked attorneys of Electronic Frontier Foundation about copyright implications of using archive.today on a community websites such as Reddit. You may reword their answers for the article or better ask the same persons about the legal aspects of using archive.today on Wikipedia. 93.185.28.131 (talk) 15:02, 18 December 2016 (UTC)[reply]

"Removes pages on DMCA requests" - bad source

[edit]

Accessed today, the example referenced as a source for this assertion shows the page archived normally instead of taken down because of a DMCA request. One can find other examples with a Google search but the case is the same for them. I could not find a better source for this but I imagine there could be something on the author's blog.Saturnalia0 (talk) 01:25, 24 January 2017 (UTC)[reply]

which TLD to use?

[edit]

Hi!
Recently archive.is announced: "Please do not use http://archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon."[3][4] (thanks for info @user:KurtR)
All their TLDs seem to be unstable. So what is a good solution for wikipedia? (ping user:GreenC) -- seth (talk) 11:10, 5 January 2019 (UTC)[reply]

@Lustiger seth: Presumably the webmaster will ensure that the website stays up at least one location. In the event that the links stop working (or earlier, as a preemptive measure), GreenC's bot or another bot would most likely be able to handle changing all of the links to archive.today or archive.fo. Jc86035 (talk) 13:54, 5 January 2019 (UTC)[reply]
@KurtR, Lustiger seth, and Jc86035: Yes I am aware of this as of yesterday I've been in contact with the archive.is owner. He is not 100% positive archive.is is going down yet. Working on changing all the links in the IABot database immediately so IABot stops propagating .is as whatever we do on-wiki it won't matter if IABot is still posting .is links.
The new domain will be .today -- this is a sort of like a CNAME or alias and the others are sort of like server names, and as the server name changes or go down the alias can be repointed elsewhere. Or think of it like a private bit.ly for Archive Today. We should also consider renaming the service "Archive Today" in all the documentation, basically deprecate the Archive IS naming scheme. -- GreenC 14:44, 5 January 2019 (UTC)[reply]
Nothing ist going on. GreenC, do You have any new informations? Archive.is was often better than archive.org because they didn't accept scripts. -- Dietrich (talk) 03:27, 4 April 2019 (UTC)[reply]
@GreenC:Today User:InternetArchiveBot archived a link at Birchgrove ferry wharf‎ to [5] but it is not working. No links to https://archive.fo or the previously used https://archive.is seem to be working. Should sites still be archiving to these addresses?Fleet Lists (talk) 01:27, 10 June 2019 (UTC)[reply]

Change archive.is to archive.today

[edit]

Being that it's now called archive.today, should the article title and content be changed?Yaakovaryeh (talk) 04:10, 9 January 2020 (UTC)[reply]

Unable to add new archives

[edit]

Every time I try to archive a page, it gets stuck at the submitting page with the screen stopping and endlessly rotating at "loading". What to do? Kailash29792 (talk) 06:41, 2 September 2021 (UTC)[reply]

Contact archive.today via the blog they usually answer. -- GreenC 06:53, 2 September 2021 (UTC)[reply]
[edit]

For the archive links that contain Chinese or any foreign characters, it seems like using the long format URL with ".today" would not redirect properly when inserting onto Wikipedia for some reasons (if you click on it directly). For example: [1]; the only way for it to work is to copy the archive link and paste it directly into the browser link bar But when I change it to ".ph", it shows up properly when clicking on it. For example: [2]

Short format URL has no issue. So question, I am aware Wikipedia prefers using the long format URL and also the ".today" domain name (as .ph is only the server name). But for these specific links, what format should I use so editors would have no issue when clicking on them?

PS: I tested the link on a Wordpress post, clicking on the long format URL was able to redirect the link correctly.Link So is this a Wikipedia issue? Please help!

Thanks.--TerryAlex (talk) 19:43, 18 October 2021 (UTC)[reply]

Are you even able to create new archives using archive.today? Kailash29792 (talk) 03:58, 19 October 2021 (UTC)[reply]
After testing the link on Wordpress, I don't think it is archive.today problem, but something goes wrong when it is being inserted onto Wikipedia. It just cannot redirect the link correctly for some reasons.--TerryAlex (talk) 04:19, 19 October 2021 (UTC)[reply]

References

  1. ^ "Noble Truth by Gigi Yim on Apple Music". Apple Music. Hong Kong. Archived from the original on 30 September 2021. Retrieved 30 September 2021.
  2. ^ "Noble Truth by Gigi Yim on Apple Music". Apple Music. Hong Kong. Archived from the original on 30 September 2021. Retrieved 30 September 2021.

Browser extensions

[edit]

It's mentioned that browser extensions are available for Chrome, Edge and Firefox. Are there advantages in downloading the extensions rather than having the site on bookmark bars? Mcljlm (talk) 15:45, 31 December 2022 (UTC)[reply]

Requested move 25 July 2023

[edit]
The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.

The result of the move request was: not moved. Speedily closed per nom request. (non-admin closure) WPscatter t/c 16:03, 26 July 2023 (UTC)[reply]


Help:Using archive.todayHelp:Using Archive.ph – Site name changed. While Archive.today presently redirects to Archive.ph, there's no guarantee that will continue indefinitely. The text in the page will need changing as well, of course, but that should be a quite search-replace operation.  — SMcCandlish ¢ 😼  22:11, 25 July 2023 (UTC)[reply]

  • Oppose: archive.today has several domains, and archive.today has redirected to many different domains over time. The project's name is still archive.today (displayed unchanged on each domain), and its article's title remains archive.today. jlwoodwa (talk) 00:56, 26 July 2023 (UTC)[reply]
  • Oppose. The owner of archive.today has requested we use .today throughout Wikipedia. It is a redirect server that sends requests to other domains (such as .ph) as they are available. In this way if the content server .ph becomes unavailable, the redirect server can just change where it sends traffic to say .is instead. This is done because the site has had problems with domain hijacking and cancellations due to the nature of their service which can be controversial. The .today domain is safe(r) from these problems since it doesn't host content, and possibly other reasons I don't understand. -- GreenC 03:44, 26 July 2023 (UTC)[reply]
    Okay, I was not aware of that. This can just be speedily closed.  — SMcCandlish ¢ 😼  09:43, 26 July 2023 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Convert to archive.today

[edit]

Special:Diff/1167250578/1169669968: "Links (mostly sources in articles) to archive.is are inaccessible for me (I get a security check that cannot be completed). Is it worth somehow editing all existing links to use archive.today?". (post by User:WhyNotHugo).

I actually started work on a bot to do this, as well as expand from short form to long form. On Enwiki and 200+ other wikis. Real-time monitoring. I got pretty far along then had to let it go for other projects. It's a lot more complicated when scaling like this and keeping it up to date, not merely a 1-time pass through. It's a viable project if I can find the time to complete it, probably about 75% done. -- GreenC 15:59, 10 August 2023 (UTC)[reply]

All mirrors seem to have been down for about a week as of today (archive.is/.today/.ph). :-( Muzilon (talk) 00:07, 12 August 2023 (UTC)[reply]
Is it down reports about a week, though I recall checking yesterday in response to the above post, and it was working. Not working right now. The blog still works https://blog.archive.today/ -- GreenC 00:57, 12 August 2023 (UTC)[reply]
The problem is relative to what DNS resolver you are using. If the DNS service uses CloudFlare it won't work. Anything else, works. Reddit. Also Archive.today#Cloudflare_DNS_availability. -- GreenC 01:29, 12 August 2023 (UTC)[reply]
I use Quad9 (9.9.9.10) and it does not work either.
Firefox uses Cloudflare by default for DOH.
I respectfully submit that a site which actively prevents access from popular DNS resolvers, and therefore inaccessible by many users, is unsuitable for archival purposes. The effort is better used to convert those links to archive.org, which does not sabotage access. Alex O. (talk) 04:37, 16 August 2023 (UTC)[reply]
Maybe. In my work, I only link to archive.today as a last resort when there are no other options, and that is often. Wayback won't / can't archive everything. Furthermore, back in 2012-2015 era, archive,today archived most everything on Wikipedia at the time, and they did a much better job then Wayback on many sites with new web 2.0 features. And Wayback wasn't systematically archiving everything until later. So a lot of old dead links are really only available at archive.today .. nothing is simple with archiving -- GreenC 05:09, 16 August 2023 (UTC)[reply]
Currently, I can access it through Tor, but not through a direct connection. :-/ Muzilon (talk) 07:03, 20 August 2023 (UTC)[reply]
Every time I’ve gone to use it recently I’ve been hit with a repeated captcha. It seem’s I’m not the only one.
@GreenC: are you aware of any method of contacting the owner of archive.today, to alert them of this problem? Apologies for the ping — I only ask because I can’t find any contact methods for the site online, and in 2021 you inserted text referring to a request from the owner of the site in Special:Diff/1060801802. Best, A smart kitten (talk) 08:41, 3 September 2023 (UTC)[reply]
There is a Contact page for the site's blog. Muzilon (talk) 10:06, 3 September 2023 (UTC)[reply]
Thanks @Muzilon! Not sure how I missed that. A smart kitten (talk) 10:22, 3 September 2023 (UTC)[reply]
They know. They are doing it on purpose.

The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users. This is especially problematic as we work to encrypt more DNS traffic since the request from Resolver to Authoritative DNS is typically unencrypted. We’re aware of real world examples where nationstate actors have monitored EDNS subnet information to track individuals, which was part of the motivation for the privacy and security policies of 1.1.1.1.

Somebody should update the main page with that information. I would but I am not sure how to phrase it. Alex O. (talk) 16:37, 3 September 2023 (UTC)[reply]
Also see https://jarv.is/notes/cloudflare-dns-archive-is-blocked/ Alex O. (talk) 17:16, 3 September 2023 (UTC)[reply]
Crikey. I definitely don’t want to rush to conclusions, but I wouldn’t personally be sure of Wikipedia being able to use (and, potentially in some cases/articles, rely) on an archive service that can just be turned off like that at the whim of the owner. In my opinion, archives by their very nature need to be reliable. Again, definitely not wanting to rush to conclusions here, but if this doesn’t get resolved, then I could see a possible way forward where archive.today domains end up blacklisted as an unreliable archive service (if the community believes it to be appropriate and takes action to that effect—to be clear, I’m not intending this as a threat of any kind).
I just checked by manually changing my DNS servers, and - sure enough - the page works again. But I wasn’t even aware that I was on Cloudflare DNS (and my DNS servers weren’t showing 1.1.1.1 either) - and I hadn’t noticed this happening until recently (maybe something else changed, who knows - but I won’t speculate). But I think there’s definitely a conversation that needs to be had about whether an archive that intentionally blocks access to certain people based on their DNS resolver is an archive that can be trusted for use on Wikipedia.
Best, A smart kitten (talk) 18:21, 3 September 2023 (UTC)[reply]
I’ve just taken a very quick glance at the the last RfC, and immediately the words it can always be re-blacklisted if future issues occur jumped out at me. Again, not wanting to cast any asperations here, but this suggests that there may have been issues with this site in the past. However, let me qualify what I’ve said by saying that I’m very new (1) to WP, and (2) to the subject of using archive.is on WP. So I’ll leave this discussion for right now to allow editors more experienced in one or both fields to weigh in. Best, A smart kitten (talk) 18:27, 3 September 2023 (UTC)[reply]
None of the archive providers are reliable. None. WebCite went down for almost two years. None of the links worked. The community response was "let's wait and see what happens". We loose on average 1 archive provider per year. Search on "deprecated" and "deceased" at WP:WEBARCHIVES. Even old faithful archive.org is constantly in flux with archive snapshots stop working. There is only 1 solution to reliable backups: redundancy. How reliable (99.99% or 99.999% etc.. ) is determined by how redundant it is. Also, archive.today carries pages that no one else will or can - blacklisting will create a massive amount of link rot both now, and ongoing in the future. As for this spat with CloudFlare, Archive.today actually has a good reason to need the information CF refuses to send. The way archive.today is engineered to keep costs down, they need it. So if they can't get it, they block the request - their servers need to load balance based on location. Taking revenge on them by blacklisting Wikipedia will solve nothing except create a lot of link rot making Wikipedia less reliable. -- GreenC 18:48, 3 September 2023 (UTC)[reply]

Prior to today, accessing archive.today would display a "Welcome to nginx!" page. Today, I'm often getting a connection timeout error. The iidrn.com website (Is It Down Right Now?) has no details about how long it's been down. Fabrickator (talk) 18:21, 14 October 2023 (UTC)[reply]

How to maintain hybrid archiving?

[edit]

There's apparently an issue with archiving Ajax websites with Internet Archive.[6] To put it another way, it doesn't work. However, it works just fine with archive.today. My question is how do we maintain both in a single article without having IABot overwrite the archive.today link? If a safeguard already exists to prevent this, let me know. In other words, does IABot check to see if archive.today links are already present, and if so, does it ignore them? Viriditas (talk) 18:47, 2 November 2023 (UTC)[reply]

IABot should not be overwriting an archive.today link. Is there an example of that? -- GreenC 18:51, 2 November 2023 (UTC)[reply]
No, but, I'm testing it right now. Viriditas (talk) 18:52, 2 November 2023 (UTC)[reply]
Update: It left it alone just as you said it would. I wasn't sure it would, but now I know. Viriditas (talk) 19:03, 2 November 2023 (UTC)[reply]
FYI you can also setup test scenarios in userspace, like in your sandbox, and run the bot on that page to see how the bot responds to specific input. -- GreenC 21:23, 2 November 2023 (UTC)[reply]
Thanks. I just got lucky that it involved a valid edit and a needed bot run. Viriditas (talk) 23:26, 2 November 2023 (UTC)[reply]
You are right some archive providers IABot doesn't recognize and it will delete the archive URL if there a replacement available. In those cases use {{cbignore}} to keep IABot from editing the citation. Those are rare cases. -- GreenC 23:38, 2 November 2023 (UTC)[reply]
Good to know. I learned at least one thing today! Viriditas (talk) 23:42, 2 November 2023 (UTC)[reply]

Why are we using this?

[edit]

In addition to the aforementioned DNS resolver issue trapping people in a CAPTCHA loop with no explanation, I don't understand why we are putting this service forward as a choice when it (as far as I can tell) is operated by a single private individual we otherwise have no information about. I don't mean that in a nefarious way, but it just seems obvious that we shouldn't lean on this like it's real infrastructure in the way IA is, fraught infrastructure as it may seem—they at least have, you know, an office and a means of contact other than a tumblr blog. How much is getting broken or lost if this breaks further (since apparently one major DNS resolver isn't enough of a dealbreaker) or falls off the internet entirely? Remsense 13:22, 4 May 2024 (UTC)[reply]

I use it when there is no other option - they have the goods and I hate dead links more than anything else. Wayback doesn't and can't have everything. There is only one big provider in this world and that is Wayback. BTW the archive is being sued by the recording industry for existential damages (think Napster). Meanwhile. we loose about 1 archive provider per year on average, mostly run by institutions like universities and governments - they are no more reliable. So, I would like to know what safe place exists because I know of none. AFAIK, archive.today is not in danger of destruction, and in a way it's secretiveness and one-person control is something of an advantage from getting taken down. The owner.operator is well aware of these issues and has taken extraordinary steps of hosting multiple domains and servers all of the world so no one entity or country can take it down (and they have tried already). The issue with CloudFlare and DNS is somewhat related to this. The other factor is they have been in existence for over 10 years and statistically speaking the longer something exists, the longer it will probably exist. Newer sites like ghostarchive are more prone to failure simply by the fact of their newness (also a 1-person run archive). -- GreenC 17:22, 4 May 2024 (UTC)[reply]
I agree with GreenC's arguments. I tend to go for Archive.today for any webpage that has too much third-party javascript, especially when the most infamous trackers such as googletagmanager are linked. It doesn't make sense that someone who thinks that s/he is reading an archived page on a trusted website is in fact also sending his/her data to third-party trackers. Boud (talk) 20:02, 29 May 2024 (UTC)[reply]
Ask a question, get an answer! Thank you for the explanations. Remsense 20:05, 29 May 2024 (UTC)[reply]

Typical archiving time outdated

[edit]

We currently have several statements that the time taken to archive a webpage is typically 5-15 seconds, and in one case 15-30 seconds. It seems like this is ancient history, from e.g. 5-10 years ago, when third-party javascript pollution was less prevalent. My impression is that the typical time scale is 30-300 seconds. This might vary between what sort of webpages are archived: plain html will obviously be much faster. We could put a bigger range, e.g. "5 to 300 seconds", although that looks a bit odd. So I propose "a few seconds to a few minutes" to replace all the current timing estimates. Boud (talk) 20:14, 29 May 2024 (UTC)[reply]