User talk:Emijrp/Wikipedia Archive

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Ideas[edit]

Library of Congress is going to save every public tweet. Why don't they save a copy of Wikipedia? emijrp (talk) 16:47, 10 September 2010 (UTC)[reply]

iBiblio[edit]

I have contacted iBiblio for hosting a copy of the latest dumps, working as a mirror of download.wikimedia.org. No response yet. emijrp (talk) 13:12, 15 November 2010 (UTC)[reply]

Their response: Unfortunately, we do not have the resources to provide a mirror of wikipedia. Best of luck!

Who can we contact for hosting a mirror of the XML dumps?[edit]

We are working in meta:Mirroring Wikimedia project XML dumps. emijrp (talk) 23:36, 10 December 2010 (UTC)[reply]

How can i get all the revisions of a language for a duration ?[edit]

I want all the revisions that are happened from 18-10-2010 tp 26-10-2010 for a particular language. How do I get it? —Preceding unsigned comment added by 207.46.55.31 (talk) 11:53, 29 November 2010 (UTC)[reply]

You need to extract that date range from a full dump, stub-meta-history (only metadata) or pages-meta-history (metadata + text). You can use the xmlreader.py from meta:pywikipediabot. emijrp (talk) 15:58, 29 November 2010 (UTC)[reply]
Also, you can request a meta:Toolserver account and make a SQL query to the server. emijrp (talk) 15:58, 29 November 2010 (UTC)[reply]

Update with latest dump info?[edit]

If you are around, would you mind updating with more recent dump info? I'd do it but am reluctant to edit another person's user page area. Thanks... --- 85.72.150.131 (talk) 17:04, 19 August 2011 (UTC)[reply]

Please, go ahead. Thanks. emijrp (talk) 09:39, 20 August 2011 (UTC)[reply]

Offline reader[edit]

If you are interested, there is another Offline-Reader (with image databases at archive.org): http://xowa.sourceforge.net/ https://sourceforge.net/projects/xowa/ — Preceding unsigned comment added by 188.100.234.211 (talk) 10:56, 13 January 2014 (UTC)[reply]

Tarball archive from 2005[edit]

@Emijrp and Nemo bis:

User:Emijrp/Wikipedia_Archive#Image_tarballs says: "Another one from 2005 only covers English Wikipedia images." The file description says: "all current images in use on Wikipedia and its related projects". Is it possible to find out that these pictures come from all projects or only from the English Wikipedia? Samat (talk) 23:16, 31 October 2014 (UTC)[reply]

@Samat: The 23 MB text file in that item shows only "/en/x/xy" paths, so we can conclude that they are only English Wikipedia images. emijrp (talk) 16:51, 22 October 2015 (UTC)[reply]

Offline Wikipedia as epub(s) for e-readers[edit]

I'm endeavoring to create an offline Wikipedia in the form of epub(s) (more than one file if at least one e-reader turns out not to be able to handle a single 2GB epub) that inexpensive, high-autonomy e-ink readers could read. I intend to download a dump, make necessary transforms using mediawiki-utilities, sort articles by PageRank using for instance https://spark.apache.org/graphx/ , then take the top n articles until they (and the media they link to) reach 2GB. Does that look sound to you? ZPedro (talk) 21:16, 9 December 2016 (UTC)[reply]