User talk:MER-C/Wiki.java

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Changelog[edit]

Version Diff Comment
0.01 diff Initial.
0.02 diff Add default constructor, getCategoryMembers(String name).
0.03 diff Added namespace support.
0.04 diff Moved to use the mediawiki api. Added category intersection. License -> GPL 3.
0.05 diff Added logging, sketchy user support. Worked around silly api limitation of 500/5000 elements returned per query.
0.06 diff Fields for the various mediawiki logs. Added spamsearch, getDomain() (should have done this earlier).
0.07 diff Optimized for bandwidth, add userRights() caching. Debug.
0.08 diff Log support. Added a few utility methods.
0.09 diff Add listPages(), editing throttle, better cookies. Now uses GZIP compression. Various other fixes.
0.10 diff Add persistence, getImage(), whatLinksHere(), imageUsage(), getCurrentDatabaseLag(), getRenderedText(), getTalkPage(), getProtectionLevel(), pageExists(). We now check whether a page is protected before editing it. Various fixes, including ones below.
0.11 diff Add upload(), parseList(), hasNewMessages(), assertions, maxlag. Rewrite login(), intersection().
0.12 diff Add ip block list, transclusions. Exception overhaul. Various optimizations.
0.13 diff Add random page, thumbnails, ability to parse arbitrary wikitext.
0.14 diff Added arbitrary scriptpath support, search, statistics, some other stuff.
0.15 diff Short/long pages, bug fixes.
0.16 diff Added API edit, move, edit counter, various "stuff about this page" methods.
0.17 diff rm screen-scrape edit; add contribs(), Revision, section editing, purge()
0.18 diff More "stuff about this page" methods, condense, status check, better error handling
0.19 diff Rollback, page history, bug fixes
0.20 diff Image history, old images, undo, new pages, revdelete bug fixes
0.21 diff diff, attempt at upload API, various bug fixes
0.22 diff quick user agent fix

Special page equivalents[edit]

See Special:Specialpages for a list of special pages. The text on special pages may be edited by editing the appropriate system message.

Special page Equivalent code
Special:Allmessages listPages("MediaWiki:", Wiki.FULL_PROTECTION, Wiki.ALL_NAMESPACES)
Special:Allpages listPages()
Special:Contributions contribs() (excludes Special:Contributions/newbies)
Special:Ipblocklist getIPBlockList()
Special:Linksearch spamsearch()
Special:Listusers allUsers()
Special:Log getLogEntries()
Special:Longpages longPages()
Special:Movepage move()
Special:Mypage String title = "User:" + wiki.getCurrentUser().getUsername();
Special:Mytalk String title = "User talk:" + wiki.getCurrentUser().getUsername();
Special:Newimages getLogEntries(int amount, Wiki.UPLOAD_LOG) or newPages(int amount, Wiki.IMAGE_NAMESPACE)
Special:Newpages newPages()
Special:Prefixindex listPages()
Special:Protectedpages listPages()
Special:Random random()
Special:Search search()
Special:Shortpages shortPages()
Special:Statistics getSiteStatistics()
Special:Upload upload()
Special:Userlogin login()
Special:Userlogout logoutServerSide()
Special:Whatlinkshere whatLinksHere()

Two Errors[edit]

Hello, There are two little Errors in your Code:

First:

In the method "getPageText(String title)" the row

text.append(line);

should be

text.append(line + "\n");


second:

the method "login" doesn't work at the german Wikipedia, the Bot log in correctly, but the Function returns false, because in the German Login-page the text "Login successful" doesn't exist.

--88.72.43.131 11:05, 14 November 2007 (UTC) I hope you can understand me. I know, my english isn't very good ;)[reply]

Fixed both, but it would be some time before they are live - the todo list for 0.10 is quite long. (The fix for the second one is to replace "Login successful" with "wgUserName = \"" + username + "\"", if you can't wait). MER-C 05:50, 16 November 2007 (UTC)[reply]

getPageText() can use API[edit]

public String getPageText(String title) throws IOException
{
	// pitfall check
	if (namespace(title) < 0)
		throw new UnsupportedOperationException("Cannot retrieve Special: or Media: pages!");
	
	// go for it
	String URL = query + "prop=revisions&rvprop=content&titles="+URLEncoder.encode(title, "UTF-8");
	logurl(URL, "getPageText");
	checkLag("getPageText");
	URLConnection connection = new URL(URL).openConnection();
	setCookies(connection, cookies);
	connection.connect();
	BufferedReader in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream()), "UTF-8"));
	
	String result = "";
	String content = "";
	
	// get the text
	String line = "";
	while ((line = in.readLine()) != null)
		result += line+"\n";
	
	if (result.indexOf("missing=\"\"") != -1)
		content = "(not yet written)";
	else if (result.indexOf("invalid=\"\"") != -1)
		content = "(Bad title)";
	else if (result.indexOf("<rev />") != -1)
		content = "(empty)";
	else
		content = result.substring(result.indexOf("<rev>")+5,result.indexOf("</rev>"));
	
	in.close();
	log(Level.INFO, "Successfully retrieved text of " + title, "getPageText");
	return decode(content);
}

— Preceding unsigned comment added by 80.143.120.164 (talkcontribs)

Sorry about the wait - I only check this page when I release a new version. The current way avoids parsing any XML. Sometimes it's harder and slower to use the API - rollback is another example. Red X Won't fix. (I did, however, tweak the docs to detail what happens when exists(title)[0] == false). MER-C 10:56, 22 August 2008 (UTC)[reply]

I'm having second thoughts about WONTFIXing this, the API's resolve redirects functionality could be handy here. MER-C 12:55, 22 August 2008 (UTC)[reply]

using rights and not groups for "apihighlimits"[edit]

Use rights to chance highlimit, not group ('BOT' or 'ADMIN' are groups see 'query("meta=userinfo&uiprop=rights|groups")', but you call it right ('User.userRights()')

int limit = 500;
String result = query("meta=userinfo&uiprop=rights")
if (result.indexOf("apihighlimits") != -1)
	limit = 5000;	//500 per default
This adds a query for no real reason because the result of User.userRights() is cached. (Just tweak the source if the default doesn't apply to you.) The method is named after Special:Userrights before I realized they were groups. Implementing the whole permissions model would result in lots of public static final long (ints aren't good enough) spam and take 500+ lines.  Later. MER-C 14:01, 23 August 2008 (UTC)[reply]

Upload bug?[edit]

Hi, I'm trying your code (great BTW) to upload files. There seems to be a problem with "special" chars in the destination filename and the description (see for example http://commons.wikimedia.org/wiki/File:Test%2Bkgoiyfyktgkggukgku.jpg):

  • Spaces in the dest filename will turn into "+"
  • Upload will say "Successfully uploaded" but fail when the dest filename contains a German Umlaut (äöüÄÜÖ)
  • Upload will say "Successfully uploaded" but fail when the dest filename contains a comma (,)
  • If upload succeeds, special characters in the wikitext will turn into gibberish

I tried to add "Content-Type:text/plain; charset=utf-8;" to the upload description and/or the wpDestFile (both with and without the content-type), but no luck. Do you know a quick fix? Cheers, --Magnus Manske (talk) 23:34, 7 August 2009 (UTC)[reply]

Update: I've managed to clean up the contents by encoding it as iso-8859-1:
        try {
            contents = new String(contents.getBytes("UTF-8"), "iso-8859-1");
        } catch (UnsupportedEncodingException ex) {
            Logger.getLogger(BArchangleView.class.getName()).log(Level.SEVERE, null, ex);
        }

No luck with the dest filename yet, though. I suppose the entire request should rather be utf-8 instead of these ugly hacks... --Magnus Manske (talk) 13:18, 8 August 2009 (UTC)[reply]

Update 2: Got it working now! Here's the code of the entire function:
    public synchronized void upload(File file, String filename, String contents) throws IOException, LoginException
    {
        // TODO: API upload? Still in the pipeline, unfortunately.
        // throttle
        long start = System.currentTimeMillis();
        statusCheck();

        // check for log in
        if (user == null)
        {
            CredentialNotFoundException ex = new CredentialNotFoundException("Permission denied: you need to be registered to upload files.");
            logger.logp(Level.SEVERE, "Wiki", "upload()", "[" + domain + "] Cannot upload - permission denied.", ex);
            throw ex;
        }

        // UTF-8 vodoo
        try {
            contents = new String(contents.getBytes("UTF-8"), "iso-8859-1");
        } catch (UnsupportedEncodingException ex) {
            Logger.getLogger(BArchangleView.class.getName()).log(Level.SEVERE, null, ex);
        }


        // check if the page is protected, and if we can upload (incorporates lag check)
        String filename2 = filename.replaceAll(" ", "_");
//        String filename2 = URLEncoder.encode(filename.replaceAll(" ", "_"), "UTF-8");
        try {
            filename2 = new String(filename2.getBytes("UTF-8"), "iso-8859-1");
        } catch (UnsupportedEncodingException ex) {
            Logger.getLogger(BArchangleView.class.getName()).log(Level.SEVERE, null, ex);
        }


        String fname = "File:" + filename2;
        if (!checkRights(getProtectionLevel(fname), false))
        {
            CredentialException ex = new CredentialException("Permission denied: image is protected.");
            logger.logp(Level.WARNING, "Wiki", "upload()", "[" + domain + "] Cannot upload - permission denied.", ex);
            throw ex;
        }

        // prepare MIME type
        String extension = filename2.substring(filename2.length() - 3).toUpperCase().toLowerCase();
        if (extension.equals("jpg"))
            extension = "jpeg";
        else if (extension.equals("svg"))
            extension += "+xml";

        // upload the image
        // this is how we do multipart post requests, by the way
        // see also: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2
        String url = base + "Special:Upload";
        logurl(url, "upload");
        URLConnection connection = new URL(url).openConnection();
        String boundary = "----------NEXT PART----------";
        connection.setRequestProperty("Accept-Charset", "iso-8859-1,*,utf-8");
        connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary);
        setCookies(connection, cookies);
        connection.setDoOutput(true);
        connection.connect();

        // send data
        boundary = "--" + boundary + "\r\n";
        DataOutputStream out = new DataOutputStream(connection.getOutputStream());
//        DataOutputStream out = new DataOutputStream(System.out); // debug version
        out.writeBytes(boundary);
        out.writeBytes("Content-Disposition: form-data; name=\"wpIgnoreWarning\"\r\n\r\n");
        out.writeBytes("true\r\n");
        out.writeBytes(boundary);
        out.writeBytes("Content-Disposition: form-data; name=\"wpDestFile\"\r\n");
        out.writeBytes("Content-Type: text/plain; charset=utf-8\r\n\r\n");
        out.writeBytes(filename2);
        out.writeBytes("\r\n");
        out.writeBytes(boundary);
        out.writeBytes("Content-Disposition: form-data; name=\"wpUploadFile\"; filename=\"");
        out.writeBytes(filename);
        out.writeBytes("\"\r\n");
        out.writeBytes("Content-Type: image/");
        out.writeBytes(extension);
        out.writeBytes("\r\n\r\n");

        // write image
        FileInputStream fi = new FileInputStream(file);
        byte[] b = new byte[fi.available()];
        fi.read(b);
        out.write(b);
        fi.close();

        // write the rest
        out.writeBytes("\r\n");
        out.writeBytes(boundary);
        out.writeBytes("Content-Disposition: form-data; name=\"wpUploadDescription\"\r\n");
        out.writeBytes("Content-Type: text/plain\r\n\r\n");
        out.writeBytes(contents);
        out.writeBytes("\r\n");
        out.writeBytes(boundary);
        out.writeBytes("Content-Disposition: form-data; name=\"wpUpload\"\r\n\r\n");
        out.writeBytes("Upload file\r\n");
        out.writeBytes(boundary.substring(0, boundary.length() - 2) + "--\r\n");
        out.close();

        // done
        BufferedReader in;
        try
        {
            // it's somewhat strange that the edit only sticks when you start reading the response...

            String line ;
//            in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream()), "UTF-8"));
            in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            line = in.readLine();
//            while ((line = in.readLine()) != null) System.out.println(line);
            in.close();

        }
        catch (IOException e)
        {
            // retry once
            if (retry)
            {
                retry = false;
                log(Level.WARNING, "Exception: " + e.getMessage() + " Retrying...", "upload");
                upload(file, filename, contents);
            }
            else
            {
                logger.logp(Level.SEVERE, "Wiki", "upload()", "[" + domain + "] EXCEPTION:  ", e);
                throw e;
            }
        }
        if (retry)
            log(Level.INFO, "Successfully uploaded " + filename, "upload");
        retry = true;

        // throttle
        try
        {
            long z = throttle - System.currentTimeMillis() + start;
            if (z > 0)
                Thread.sleep(z);
        }
        catch (InterruptedException e)
        {
            // nobody cares
        }
    }

I still think the iso-hack is ugly, though... --Magnus Manske (talk) 16:07, 8 August 2009 (UTC)[reply]

Yeah. I need to rewrite it for the upload API anyway, which will be with us on the next scap (Wikimania, perhaps?). Hopefully things will be saner then. MER-C 06:51, 9 August 2009 (UTC)[reply]

Bug in move()?[edit]

// success
if (temp.contains("move from"))
	in.close();
// failure
checkErrors(temp, "move");

Should be:

// success
if (temp.contains("move from"))
	in.close();
else
	// failure
	checkErrors(temp, "move");

? --Nat3738 (talk) 03:16, 8 October 2009 (UTC)[reply]

Issue with the APIs returning blank lines before actual response[edit]

This may occur in several places, I found the problem in login and edit.

These are the changes I made to make it work

in login:

         String line = in.readLine();
         boolean success = line.contains("result=\"Success\"");
         in.close();

becomes

 		String line;
 		boolean success = false; 
 		while ((line = in.readLine()) != null){
 			if (line.contains("result=\"Success\"")) {
 				success = true;
 				break;
 			}
 		}
 		in.close();

in edit the call to checkErrors causes an Exception if the first returned line is blank even though subsequent lines exist with the success message; you need to loop through the returned lines to check for success.

Glen.mccormick (talk) 13:56, 12 January 2010 (UTC)[reply]

 Works for me at least on WMF sites. You're probably thinking of the XML pretty-print format. MER-C 05:35, 12 February 2010 (UTC)[reply]

Small corrections[edit]

Hello MER-C,
I took the liberty to make 2 modifications on your code:

  • I corrected a bug when getCategories() is called on non existing page or page without category
  • I corrected some javadoc
  • I corrected a bug when getImagesOnPage() is called on non existing page or page without images

But I did not modify the changelog.
I hope you don't mind.
In all cases, thanks a lot for your library and have a happy new year.
Best regards, Liné1 (talk) 07:46, 2 January 2011 (UTC)[reply]

Thanks for the bug fixes. MER-C 09:33, 14 February 2011 (UTC)[reply]


Android compatible[edit]

I'm using your code for some android apps i'm writing ATM. I had to change some things as android java is missing some functions native java has i.e. isEmpty on Strings had to be replaced with equals("").

So i don't have to maintain the whole thing on my own ... is there any chance i could maintain android compatibility in your repo? My mail is at Freakolowsky. 10x. —Preceding undated comment added 15:13, 23 May 2011 (UTC).[reply]

Stop supporting stale versions of Android, then. MER-C 03:38, 10 September 2012 (UTC)[reply]

checkRights() bug[edit]

Hi, im developing Commons:VicuñaUploader and I found bug related with cookies. If someone will log in not using uppercase in first letter (eg. "myaccount"), method user.getUsername() will return "myaccount", but cookies contatins "Myaccount" received from server. As a result CredentialExpiredException will be returned, but it should't. The same situation with spaces and underscores: server will return plus instead.

Fix below:

    protected boolean checkRights(int level, boolean move) throws IOException, CredentialException
    {
        // check if we are logged out
        String s = user.getUsername();
        s = s.substring(0,1).toUpperCase() + s.substring(1); //first to upper
        s = s.replace(" ", "+").replace("_", "+");           //spc to plus

        if (!cookies.containsValue(s))
        {
            logger.log(Level.SEVERE, "Cookies have expired");
            logout();
            throw new CredentialExpiredException("Cookies have expired.");
        }
//(...)

Cheers, Yarl 14:00, 8 September 2012 (UTC)[reply]

Noted. MER-C 03:35, 10 September 2012 (UTC)[reply]
OK, and is there an easy way to check upload progress? Yarl 12:59, 10 September 2012 (UTC)[reply]
The MW API is blocking serverside, so you will need to edit upload to update whatever progress bar you have. It is not possible to monitor single chunk uploads. MER-C 08:01, 17 September 2012 (UTC)[reply]
Might be fixed in r89 (not tested). MER-C 08:27, 17 September 2012 (UTC)[reply]