User talk:West.andrew.g/Archive 9

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 5 Archive 7 Archive 8 Archive 9 Archive 10

WMF now reporting mobile/zero pageviews

Just one month ago I took to many of these same talk pages to explain that WMF statistics were under-reporting per article views by approximately 1/3, because mobile traffic was not being included in those totals. Further details were included in a Signpost article. I'd like to commend the wMF for quickly rectifying that situation, as files including mobile and wp-zero traffic are now available. The Wikipedia Zero project currently sees very little traffic, but I'll be including it in all my reports regardless (recall that mobile views were also minor just a few years ago).

You'll notice the WP:5000 and WP:TOPRED now breaks down the (increased) totals into "mobile" and "wp-zero" percentages (the complement being the "desktop" views we had previously). This will be the case from the OCT-14-2014 report onwards. In addition to the higher totals, another immediate benefit is that articles with very low mobile participation are often indicative of bot/misconfigured traffic. Though an intelligent malice spammer can evade this by altering user agent strings, I anticipate this be of great utility moving forward.

I know the WMF has reached out to stats.grok.se about updating their user-facing tool. I greatly look forward to having this new data on board, and aside from the fact its going to make year-end aggregation a bit messy, I'm excited to see what we can learn from deeper dives into the data. Thanks, West.andrew.g (talk) 23:44, 9 October 2014 (UTC)

May I make a suggestion about the way to organize the table?

I think this might be a better idea in the top 5000 list to reorganize the table:

Instead of the current:

Rank Article Views %-Mobi %-Zero
608 Chelsea F.C. 91,967 41.01% 0.00%
609 Nelson Mandela 91,966 35.88% 0.00%
610 Taj Mahal 91,593 36.57% 0.00%

Would it be better if it looked like:

Rank Article Class Views %-Mobi %-Zero
608 Chelsea F.C. 91,967 41.01% 0.00%
609 Nelson Mandela 91,966 35.88% 0.00%
610 Taj Mahal 91,593 36.57% 0.00%

I'm sure with this new look, the list would take less time to load because of the size of the table in both bytes and appearance. I wouldn't have the big space of the articles' classes, because if the data were on the right, it would make it hard to read because the article title is far away. I tested this on my user sandbox. With the current list, the example is 768 bytes long. With my suggested list, it's 566 bytes long. The size of the table (in bytes) should be more than 25% smaller. It makes the distance from the article title to the pageviews much closer. Is this a good change to make?A Great Catholic Person (talk) 03:20, 12 October 2014 (UTC)

 Not done -- I agree that sometimes the titles are quite far away, visually. As a result, I recently changed the maximum character width of the "article" field so that very long titles will be truncated while leaving the wikilink intact. However, I don't see the need to aggregate all the "class" columns into a single one. Byte savings aren't of primary concern here, and more crucially, your suggestion breaks the ability to sort by class. Users want to quickly see, for example, which "top 5000" list entries are of "stub-class" quality -- and my format supports that. Thanks, West.andrew.g (talk) 06:32, 13 October 2014 (UTC)

A pie for you!

Great job with the Popular Pages list. Any theories about "New Jersey Photographer"? Where is all the traffic coming from for that nonexistent page? Brianrisk (talk) 03:10, 13 October 2014 (UTC)
I have no specific information regarding that particular "not article". However, the talk archives of WP:5000, WP:Top25Report, and WP:TOPRED have tried to more broadly understand this phenomena. To summarize, it's likely: (1) An automated program that is inflating the numbers with malice intent. (2) An automated program that obtains Wikipedia content and re-serves it to end users; and therefore the article request comes at every user visit to that third-party page (i.e., A pay-per-click or informational website). (3) An automated program/script requested this page (maybe just once), and is now caught in an infinite loop as it keeps retrying to obtain the content as part of its failure condition. Thanks, West.andrew.g (talk) 06:45, 13 October 2014 (UTC)

WP:5000 question

I've just looked at WP:5000 for the first time in a few months, so this was my first encounter with the %-Mobi and %-Zero columns. I am confused, however, by one thing: how do you calculate the number of desktop views? I'm using a laptop to view Wikipedia; how can you tell the difference between laptop views and desktop views? Nyttend (talk) 14:39, 21 October 2014 (UTC)

@Nyttend: -- (1) "Desktop" refers to standard desktop machines as well as laptops; (2) "Mobile" refers to cellular phones and tablets; (3) This is not something I calculate, but an aggregation provided by the WMF in the raw stats; (4) The WMF makes this distinction by looking at the User agent string of the page requests they receive. Clients (i.e., browsing users) self-identify themselves to servers with the type of browser they are using, and fully functioned Firefox (like that on your laptop) identifies differently than the slimmed-down version that might be found on your cellphone. West.andrew.g (talk) 14:45, 21 October 2014 (UTC)

Is STiki down?

Hi, when I try to open Stiki it displays an error message saying,

Unable to connect to the STiki backend: This is likely the result of one of four things;

  • (1) You are not connected to the Internet-- Obviously not this one
  • (2) Port 3306 is not open, due to a firewall or your netwrok admins settings-- Never got this error message before, I didn't install a new firewall or an antivirus software either, Network settings are the ones I used before. I didn't change them.
  • (3) The STiki server is down
  • (4) A required software upgrade has been issued breaking down this version-- I have the latest 2.1 version

Can you help me fix this issue? Your software is really great and has helped me a lot in reverting vandalism. I'm in a desperate need to fix this. Thank you--Chamith (talk) 16:55, 2 November 2014 (UTC)

@ChamithN: The best place to monitor these things is: WT:STiki. The server did experience some downtime, but everything is up and running again. Thanks, West.andrew.g (talk) 20:17, 3 November 2014 (UTC)

This week's update?

I know you've done a lot of updating recently, so I understand if you haven't done the weekly update, but I was wondering if it will happen this week. Serendipodous 08:14, 13 October 2014 (UTC)

The raw files are having a hiccup. I've dropped them a line at WikiTech: [1]. The report will run itself automatically if/when that is resolved. Thanks, West.andrew.g (talk) 13:35, 13 October 2014 (UTC)
And now back up again. The report should appear within a couple hours once the processing gets caught back up. Thanks, West.andrew.g (talk) 00:35, 14 October 2014 (UTC)
Slight delay in the pickup this week? Will it be out today? Serendipodous 11:15, 2 November 2014 (UTC)
Whenever there is a delay, please check [2] which will tell you the status of my server. If the server is down, I am aware of the issue, and the report will compile once the issue is corrected. There is often related discussion on WT:STiki, my service that relies on the same machine. Thanks, West.andrew.g (talk) 16:48, 7 November 2014 (UTC)

IP using Stiki

Why is this IP using Stiki on the iPhone 6 page? Have the guidelines in Wikipedia:STiki been relaxed? - in which case they may need revision.

108.58.97.50 (talk)‎ . . (34,618 bytes) (-213)‎ . . (Reverted 1 good faith edit by 71.95.53.113 using STiki) (undo)

The reversion was fair, if a tad abrupt, so there are no operational / etiquette quibbles. Regards. Chienlit (talk) 16:46, 19 November 2014 (UTC)

@Chienlit: -- See discussion at Wikipedia_talk:STiki/Archive_18#IP_editing.3F, where this is believed to a WMF/Mediawiki bug in how it maintains user sessions. Despite the fact it is not STiki's fault, there is still the capability to detect/prevent this from happening. That is a forthcoming feature tracked as T#048 in the table that sits atop WT:STiki. Thanks, West.andrew.g (talk) 16:56, 19 November 2014 (UTC)
Thanks. Chienlit (talk) 16:59, 19 November 2014 (UTC)

Wikimedia genealogy project

Just wondering if you have any thoughts re: the idea of WMF hosting a genealogy project. If so, feel free to contribute to this discussion. And apologies if I have made this request before. ---Another Believer (Talk) 16:53, 10 December 2014 (UTC)

How should I cite your findings in a potential journal publication? OhanaUnitedTalk page 03:23, 17 December 2014 (UTC)

@OhanaUnited: That's a tough one. I have previously cited the Wikipedia Signpost as a journal and included a URL; there are volume and issues numbers associated with each edition. The research that led to the discovery was done with User:Doc James and could be cited as:
Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language. 
James M. Heilman and Andrew G. West. Medicine 2.0 '14: The 8th World Congress on Social Media, Mobile 
Apps, and Internet/Web 2.0 in Health, Medicine, and Biomedical Research. Maui, HI, USA. November 2014
That was just a conference presentation, though, the formal journal paper is currently under review. Thanks, West.andrew.g (talk) 03:59, 17 December 2014 (UTC)
Yes and will likely take a few months before it is published. Doc James (talk · contribs · email) 04:02, 17 December 2014 (UTC)
Thanks @Doc James: and @West.andrew.g:. Our submission deadline is September 2015 so we have plenty of time to wait for your paper to be published. OhanaUnitedTalk page 04:23, 17 December 2014 (UTC)

Help: Article creation for a STiki co-author

Talk page watchers... As you are probably aware, my skill lies in reverting vandalism and writing code that enables others to do so --- not in authoring articles or understanding the notability guidelines that govern them. It has come to my attention that my PhD advisor and WP:STiki co-author, Insup Lee [3] does not have a Wikipedia article. Having run across several other researchers of similar (or perhaps, lesser) merit who qualified for articles, I wonder if someone might be willing to look at his notability, and create an article if that is appropriate (or advise me accordingly, as I clearly have a COI). Page 46 of his CV [4] lists some of the third-party sources that have reported on his work. Thoughts? West.andrew.g (talk) 02:51, 11 January 2015 (UTC)

Signpost articles

User:The ed17 suggested that you would be the person to ask if it was possible to find or generate a list of articles in the Wikipedia:Wikipedia Signpost namespace that have received the most traffic? Thanks. Gamaliel (talk) 19:15, 15 January 2015 (UTC)

@Gamaliel: Unfortunately, I only store page view data for the article namespace. Going back and downloading/processing a year's worth of raw pageview files (on the order of several GB of data -- and hours of time -- for each day) to get out these few datapoints is not a worthwhile use of my resources. I would be willing to set up monitoring moving forward if you'd like to discuss that. However, if you are looking for a quick answer, I'd talk to some of the Wikipedia Research folks internal to the WMF. They *might* have these files sitting on a disk internal to a compute cluster that could expedite the process relative to my own infrastructure. Thanks, West.andrew.g (talk) 19:44, 15 January 2015 (UTC)
Oh, it's certainly not worth that degree of effort, I was just wondering if there was an easy way to do it. Thanks! Gamaliel (talk) 19:46, 15 January 2015 (UTC)
@Gamaliel: User:The ed17 and the Signpost have been good to me (and I'll soon have another special report regarding my work with DocJames), so I'd like to do a small favor. Since I only need to modify a couple lines of code to start tracking Signpost traffic moving forward, I'll go ahead and put that in place sometime this weekend. Loop back to me when you think this has amassed to something meaningful that you'd like results from. Thanks, West.andrew.g (talk) 19:52, 15 January 2015 (UTC)
That's fantastic, thanks! I am working on a feature for our tenth anniversary, so I am interested in historical data, but certainly data on current articles will have all kinds of uses to us in the future, assessing the popularity of regular sections and so forth. Gamaliel (talk) 19:54, 15 January 2015 (UTC)
Andrew, that's awesome! Thank you very much. Ed [talk] [majestic titan] 00:45, 17 January 2015 (UTC)

Are you planning to do an annual tot-up?

I should be getting on with that. Serendipodous 17:43, 3 January 2015 (UTC)

How do we handle the introduction of mobile views, which we only have from mid-September(?) onward? West.andrew.g (talk) 17:51, 3 January 2015 (UTC)
Maybe keep them for next year, but do a quick mobile check for some of the more problematic entries to see if they're spam? Serendipodous 17:58, 3 January 2015 (UTC)
@Serendipodous: Currently I'd prefer to hold until [5] reaches its next refresh. The "all-titles-in-ns0" table is critical in keeping the year end tabulation efficient, and the current dump would exclude new titles from the last 3 weeks of the year. It looks like that should come in the next few days. Else, someone may be able to generate this file on the fly for us (the file isn't that large), if they have "labs" access, but I'll let you track those folks down if you wish. Thanks, West.andrew.g (talk) 12:56, 4 January 2015 (UTC)
No one is breathing down my neck this time. So yeah I can wait a few days. Serendipodous 12:58, 4 January 2015 (UTC)
I really appreciate your work on the weekly lists. They have been extremely helpful in identifying key articles in the English-language encyclopedia to improve. (I have one good article creation to my credit, and it's routinely in the top 4000, namely IQ classification.) Is the intended project mentioned in this talk page section going to be a top 5000 list for the most recent year? That would indeed be interesting; I came over to this talk page to see if you had doing something like that in mind. Keep up the good work. See you on the wiki. -- WeijiBaikeBianji (talk, how I edit) 21:59, 9 January 2015 (UTC)
It was done for 2013: User:West.andrew.g/2013_popular_pages and tonight I'll start the 2014 list. West.andrew.g (talk) 21:25, 10 January 2015 (UTC)

I managed to circumvent the dump and just get NS0 titles myself using the API. Mid-day yesterday I started the big JOIN process (mapping the weekly tables onto the yearly version; containing only NS0 title keys). This is not lightweight stuff, with more than 11 million NS0 titles and the average weekly table having 40+ million titles (we store hits to red links, but computationally can't afford to include those in the yearly aggregation). Currently about 1 week's data is getting merged per hour. I estimate another day of computation, and from there, the output will be straightforward. Thanks, West.andrew.g (talk) 18:52, 12 January 2015 (UTC)

Er, those tables at the end of the year that contain mobile data are 3x the size of the earlier ones (as they break out desktop, mobile, and WP:Zero views). Incurred a bit of a slowdown when we got to those (moreover, the task doesn't always scale linearly). Only a couple of hours away now, regardless. Thanks, West.andrew.g (talk) 19:18, 15 January 2015 (UTC)

How are the updates (weekly and yearly) coming along? Serendipodous 19:11, 19 January 2015 (UTC)

I also noted the weekly hadn't run yet, just checking in before the beasts start asking me for the new Top25/Traffic Report.--Milowenthasspoken 17:47, 20 January 2015 (UTC)
 Done + ClockC -- The year-end report is completed and posted to User:West.andrew.g/2014 Popular pages‎ (news I will be broadcasting in a new section, and across many pages). I stopped the nightly ingestion while that behemoth of a database operation was completed. I am currently batching through the missed days (currently downloading the Jan-15 data; a report will trigger once Jan-17 is finished), so at ~2 hours computation per day, the new "top 5000" is imminent. West.andrew.g (talk) 14:30, 21 January 2015 (UTC)

The 10,000 most popular Wikipedia articles of 2014

See The 10,000 most popular Wikipedia articles of 2014. Thanks, West.andrew.g (talk) 14:49, 21 January 2015 (UTC)

Cookie for you!

Bananasoldier (talk) 00:15, 2 February 2015 (UTC)

STiki problem?

Hello Andrew, what's up with STiki today? It's displaying only very unsuspicious-looking edits from days ago. Percentage reverts is much lower than usual. Worth checking?: Noyster (talk), 15:48, 21 January 2015 (UTC)

 Done -- The edit ingestion got lagged, given that I had given processing priority to computing the year-end traffic report. Since that report completed earlier today and cycles have freed up, the STiki process seems to have caught back up, and I am seeing very recent edits atop the queues. I apologize for any inconvenience and/or drop in accuracy. Thanks, West.andrew.g (talk) 23:51, 21 January 2015 (UTC)

STiki is not loading for me since this morning... - Cwobeel (talk) 19:34, 5 February 2015 (UTC)

@Cwobeel: -- The server is up and looks fine from my perspective. Please post to WT:STiki where more interested people are watchers. More details would also be helpful; i.e., were you previously using it at home and now trying from a workplace/school/etc.? How is "not loading"? Thanks, West.andrew.g (talk) 19:47, 5 February 2015 (UTC)
My bad... sorry to bother you ... I was trying to use an old version. - Cwobeel (talk) 20:07, 5 February 2015 (UTC)

About your (non)participation in the January 2012 SOPA vote

Hi West.andrew.g. I am Piotr Konieczny (User:Piotrus), you may know me as an active content creator (see my userpage), but I am also a professional researcher of Wikipedia. Recently I published a paper (downloadable here) on reasons editors participated in Wikipedia's biggest vote to date (January 2012 WP:SOPA). I am now developing a supplementary paper, which analyzes why many editors did not take part in that vote. Which is where you come in :) You are a highly active Wikipedian (61st!), and you were active back during the January 2012 discussion/voting for the SOPA, yet you did not chose to participate in said vote. I'd appreciate it if you could tell me why was that so? For your convenience, I prepared a short survey at meta, which should not take more than a minute of your time. I would dearly appreciate you taking this minute; not only as a Wikipedia researcher but as a fellow content creator and concerned member of the community (I believe your answers may help us eventually improve our policies and thus, the project's governance). PS. If you chose to reply here (on your userpage), please WP:ECHO me. Thank you! --Piotr Konieczny aka Prokonsul Piotrus| reply here 23:21, 11 February 2015 (UTC)

AAHHHH!

What is with all the school kids today??? Where are their teachers and librarians? This has got to have been the wildest day ever on STIki I'm trying to change my strategy and suggest they write about someone they are interested in and help contribute to the whole sum of human knowledge.....this is nuts!   Bfpage |leave a message  17:57, 12 March 2015 (UTC)

@Bfpage: If you want to draw attention to a STiki issue, I'd suggest you post to WT:STiki. The talk-page watchers there are more likely to be STiki users, and you may be able to draw a couple into providing patrolling support with the allure of high revert rates. Also, its a bit unsettling to receive a Mediawiki notification message titled "AAHHHH!" from a username I know is not a vandal, but I was relieved to arrive here and see nothing had burned down. West.andrew.g (talk) 19:46, 12 March 2015 (UTC)

Pageview statistics for medical articles published in JMIR

I'm happy to announce that User:Doc James and I recently had our paper "Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language" published in the Journal of Medical Internet Research (JMIR). The journal is open-access, so I don't need to ramble too much here about the content. It suffices to say that we look at statistics surrounding the content, editor demographics, editing behaviors, and page view statistics of Wikipedia:WikiProject_Medicine and its international equivalents. My largest contribution was mining 2013 traffic statistics to analyze articles, topics (intra-language article clusters), and language editions. The journal buries this fact somewhat, but we also published a data appendix that points to our "raw-aggregate" data and extends the tables presented in the main write-up. Thanks for tolerating this bit of shameless self promotion, and be on the lookout for deeper insights in the future (mobile data, more statistical dorky-ness). West.andrew.g (talk) 16:16, 19 March 2015 (UTC)

Consistently-popular pages

Hello Andrew. I'm contacting you about your work on popular Wikipedia pages. I'm trying to come up with a list of Wikipedia articles that have been consistently popular on a multi-year scale. I know that you're generating the list of popular pages every week, and that you also have the list of popular pages for 2013 and 2014. I was wondering if you'd be able to use the same process to extract the ~1000 most popular pages for the longest time period where you have data. It would be incredibly helpful for a project I'm currently working on :) Let me know if you can help. Many thanks! Guillaume (WMF) (talk) 17:57, 18 March 2015 (UTC)

@Guillaume (WMF): Greetings. The most I can offer is to sum-and-sort the 2013+2014 lists (the full lists sitting in the database, not just the top pages), which I am happy to do, but I suspect doesn't meet your needs. The raw data still exists back to 2007 per [6]. I do some weird ingestion based on the use-case I designed my database schema for, so it can take more than an hour for me to read in a single day's data. Thus (365 hours per year * 8 years) of processing time is just not feasible for me or the single machine I dedicate to WP tasks. I encourage you to contact User:Ironholds or his equivalent WMF account. I think I recall him reporting on some cool pageview work recently on the [wiki-research-l] mailing list. Moreover, I think these files sit internal to WMF infrastructure (whereas I incur network costs) and he has a cluster to chew through them more efficiently. Assuming this is true, the aggregate query should be easy to write, its just a matter of how long it takes to finish. Thanks, West.andrew.g (talk) 18:30, 18 March 2015 (UTC)
Let me just chime in and say that I appreciate both the weekly lists and the year-total lists as guides to what to fix next on Wikipedia, and a list of perennially popular pages would be helpful for that purpose too. Keep up the good work. -- WeijiBaikeBianji (talk, how I edit) 18:28, 18 March 2015 (UTC)
Thanks, Andrew. Thank you for your response. A sum-and-sort of 2013+2014 would be a great start if it's not too much trouble, since it would level the peaks a bit. And I'll see if our research wizards can come up with a better list :) Guillaume (WMF) (talk) 19:23, 18 March 2015 (UTC)
@Guillaume (WMF): Can you email me privately so I can just dump you a text file? [7] at the bottom has my email. West.andrew.g (talk) 19:25, 18 March 2015 (UTC)
Done! Did you receive it? Guillaume (WMF) (talk) 15:23, 20 March 2015 (UTC)
 Done -- Received and replied with the statistical CSV file. Also uploaded to PasteBin if anyone else is interested. Thanks, West.andrew.g (talk) 04:57, 21 March 2015 (UTC)

Wow. The trash sure floated to the top, didn't it? Nine slots down until an actual human view. Serendipodous 05:31, 21 March 2015 (UTC)

I won't lie, I've considered writing my own "traffic spambot" for purely scientific purposes (hello, WP:BEANS). I'm talking about single-threaded dead simple code that just hits a single obscure article in an infinite loop from a commodity machine with a commodity connection. How high will that article get? How do the numbers compare with the other "spammers"? The goal here would be understanding "spammer" infrastructure and characteristics; an attempt to put the numbers in perspective. Honestly, though, I really don't want to stomach the approvals/permissions/backlash this could bring. West.andrew.g (talk) 05:43, 21 March 2015 (UTC)

Popular pages report late?

Popular pages report seems a bit late this week. Is it coming out today? Serendipodous 11:20, 3 May 2015 (UTC)

Now published. This was related to server downtime as described at WT:STiki. Distinct from that, the WMF is a couple hours behind its usual pace in publishing the raw files. Thanks, West.andrew.g (talk) 15:01, 5 May 2015 (UTC)
Andrew, I was wondering if there were any archives of the popular pages list. I get the feeling the answer is "no" and that you will tell me to look in the page history but you have a lot of subpages and I thought I'd ask in case previous weeks were saved anywhere. Liz Read! Talk! 21:45, 7 May 2015 (UTC)
@Liz: Yep, only in the article history. It wouldn't be hard to have my program write to a new article every time, but I guess I'd like a compelling reason for doing so. Thanks, West.andrew.g (talk) 21:47, 7 May 2015 (UTC)
Yes, for the past month or so, I've been working on organizing the Signpost archives since few of the articles had been categorized. The Traffic Report and the Top 25 are archived so I just wondered if the Top5000 was, too. Can you tell me when you started pulling out this data? Liz Read! Talk! 21:53, 7 May 2015 (UTC)
The location has been static, so the creation date of WP:5000 should correspond to the first report; looks like October 2012. West.andrew.g (talk) 15:08, 11 May 2015 (UTC)

In future, if you are going to delay, could you let me know in advance? I need to know whether to reschedule my week. Serendipodous 07:33, 24 May 2015 (UTC)

@Serendipodous: I don't ever "delay" intentionally. Occasionally the server has unexpected downtime and that delays the report. WT:STiki usually gets first notification of such issues because it also breaks that service and folks are usually quick to post/investigate/query about the issue on the talk page. This week the issue seems to be on the WMF side. See Wikipedia:Village_pump_(technical)#Server_rejecting_large_edit_via_browser_and_API. I actually have a TXT copy of the report sitting on my local machine, I just can't get Wikipedia to accept it as an edit yet. Email me if you want a copy. West.andrew.g (talk) 15:18, 25 May 2015 (UTC)
The wikitext copy has been posted to [8] in case anyone would like to compile the WP:5000 report from it or make their own attempt at uploading it piece-wise. West.andrew.g (talk) 02:20, 27 May 2015 (UTC)
  • Andrew, I saw the new redlinks report is up, but not the popular pages report, so I assume there is still a posting issue. If you can post he wikitext like you did last week, I will use to create the Top 25. Thanks.--Milowenthasspoken 12:18, 1 June 2015 (UTC)
I assumed that the Village Pump recommendation had fixed the problem, but I guess it didn't. If so, we might start thinking about not posting, because only when it starts to grind up the gears higher up are people going to pay attention. Serendipodous 12:27, 1 June 2015 (UTC)
If you want something to take forever to get done around here, send it to someone who gets paid. For this week, maybe I'll highlight the problem in the Top 25 report and Traffic Report.--Milowenthasspoken 13:59, 1 June 2015 (UTC)
The raw copy has been posted to [9] for this week while I continue troubleshooting efforts. West.andrew.g (talk) 18:44, 1 June 2015 (UTC)

 Done -- Looks like we have a fix. I've got the two missing reports re-generated and posted to WP:5000 and I'll assume things will go as scheduled on Saturday night / Sunday morning. Thanks, West.andrew.g (talk) 15:18, 3 June 2015 (UTC)

Andrew, is the popular page report not updating again? I can do TOP25 if you can post data again like you have done before. Thanks!--Milowenthasspoken 12:42, 16 June 2015 (UTC)
I am guessing you may not be around this week. I can also go back and create this Top25 after the fact I suppose.--Milowenthasspoken 12:31, 18 June 2015 (UTC)
 Fixed -- As I am reporting on widely (including below), the WP:5000 reports have been back processed and posted (just wanted to close this thread for archival purposes). I am not sure doing the Top25 reports in hindsight is the best use of one's time (they are just going to get buried in the archives), but more power to you if you decide to do so. Certainly cutting back on the social commentary might be advisable in favor or just getting the order and statistics on record so we can reference those for historical purposes. Thanks, West.andrew.g (talk) 14:56, 1 July 2015 (UTC)

STiki access

I left a message on Wikipedia_talk:STiki requesting access to use STiki about three days ago, and no one's responded yet. I noticed that that's longer than most of the other requests for access. I apologize if I shouldn't be posting here about this (I saw the notice at the top of the STiki talk page), but I was wondering if you could read and reply to my request. KSFT talk 18:45, 13 June 2015 (UTC)

@KSFT: Greetings, and apologies for my delay in responding. As you may have picked up on over at WT:STiki, your approval request came at a time when STiki was experiencing the worst outage in its history thanks to some WMF/Mediawiki changes that were implemented without warning. I would have quickly approved your request had I not been neck-deep in the broader damage. I think I recall our milestone page showing you did get STiki access via rollback or 1k+ edits, and you were able to make your first classifications (a formal welcome should arrive on your talk-page shortly). If you used STiki anytime in the past two weeks you probably weren't impressed, as edit ingestion problems caused users to experience absurdly low "hit rates" (the percentage of edits a user sees that they revert). Typically these numbers hover in the 20-33% range, so I encourage you to give us another try now that things are running more smoothly (and I am now back to a more attentive level of community involvement). Thank you for your patience and interest. West.andrew.g (talk) 15:05, 1 July 2015 (UTC)
Yeah, I was granted rollback permissions after posting that request, and posted about that on Wikipedia talk:STiki, but I guess I forgot to do the same here. I used STiki for the first time almost exactly two weeks ago, and have mostly switched to Huggle, but I definitely will try it again. By the way, how do you pronounce STiki? Is it like steekee? sticky? es-teekee? KSFTC 15:21, 1 July 2015 (UTC)
An archived post (probably of WT:STiki) had this out at greater length, but I -- like most -- say "sticky". West.andrew.g (talk) 16:40, 1 July 2015 (UTC)

Wikibreak? Back generation of WP:5000 reports

If you've gone on Wikibreak, can you please let us know when you'll be back? Serendipodous 14:05, 27 June 2015 (UTC)

Thanks dude. I'm glad you're OK. I was wondering if you'd had an accident. :-) Serendipodous 19:06, 30 June 2015 (UTC)
BTW; is this last week's data? Serendipodous 19:10, 30 June 2015 (UTC
@Serendipodous: More explanation to come, but I am marking the back reports with dates (see the edit history). The final missed report should output shortly. Thanks, West.andrew.g (talk) 19:35, 30 June 2015 (UTC)
A more complete explanation of the downtime has been posted to WT:STiki. Thanks, West.andrew.g (talk) 17:38, 1 July 2015 (UTC)

Page view stats

Hi Andrew, I noticed that you populate the list of "most viewed" pages for En WP using the page view data dumps. I was wondering if there is a way to do this completely programmatically; I want to replicate that for Fa WP through a bot, if possible. Please advise here or on my talk page or feel free to email me. Thanks in advance, hujiTALK 17:51, 6 August 2015 (UTC)

@Huji: Hi there. My lists are generated completely programmatically, but it would take some work and a database server to port them to another language. Basically, my workflow is that a 'cron' script hits [10] nightly and gets the 24 hourly files. It parses only English NS0 portions to an SQL database on a machine that I own. On Sunday nights I essentially run a big sort query over the weekly column aggregate to produce the ranking, and format this as a pretty report on-wiki committed via the Mediawiki API. All meaningful code is in Java. If we're still speaking the same language here I am more than happy to provide the source code I use to run this, but I cannot volunteer my SQLDB/server. Thanks, West.andrew.g (talk) 19:07, 6 August 2015 (UTC)
Thanks Andrew. I would love to see the code. I may end up rewriting it as a bash script, and most likely run it on the Labs servers, but having a look at your code can be really influential. You can email me via the wiki. Thanks hujiTALK 18:52, 7 August 2015 (UTC)
 Done. West.andrew.g (talk) 19:08, 7 August 2015 (UTC)

TOP 5000 isn't loading properly in my computer anymore

It loads the first image, but then crashes. I've done a check of my computer and internet connection, and apparently there's nothing wrong on my end. Might you have some idea of what's going on? Serendipodous 08:08, 11 August 2015 (UTC)

Works fine here. Anyone else? Thanks, West.andrew.g (talk) 22:28, 11 August 2015 (UTC)
I saw this week's just fine, as I have for quite a few weeks in a row. The new workflow of deleting the old list just before posting the new list seems to help. -- WeijiBaikeBianji (Watch my talk, How I edit) 22:45, 11 August 2015 (UTC)
It must be my computer then. I have no idea what's wrong. According to my ISP, my connection is fine. Serendipodous 07:02, 13 August 2015 (UTC)
I'm probably not going to get this sorted without paid help, so for the time being, could you create a "Top 50 or so" (possibly before Sunday?) so I can do the Top 25 report? Serendipodous 13:58, 13 August 2015 (UTC)

Maybe the raw wiki mark-up version via API is of some help? Thanks, 20:28, 13 August 2015 (UTC)

Seems searchable, so I guess it will have to do for now. Thank you! :) Serendipodous 20:33, 13 August 2015 (UTC)

Detecting socks working on promotional articles

In case you or someone in the group you left at your university have some time, there is an interesting problem that could possibly be solved with an analysis of recent edits. The background is that a massive group of socks has just been revealed as being involved in a nasty racket to earn money. They would monitor new articles (or drafts) looking for weak articles on companies that will probably be deleted. I'm not sure of the details, but apparently they kept the wikitext, then waited for the article to be deleted. When that happened they would contact the company and offer to create an article and monitor it, for a fee.

Background reading:

Suggestion #11 ("Actively look for mutual patrolling") on the last page is where an analysis would help. Part of the method was that editor A would create the article after a fee had been set, and editor B would mark the page as patrolled. A and B are part of a team of socks who mutually support each other. By marking the page as patrolled, they hope one of the hard-nosed new-page patrollers won't notice, and the weak article might not be nominated for deletion. A nice project for an advanced student (or an advanced PhD!) would be to try to detect groups of mutually supporting editors who do a few inconsequential edits on other pages to disguise their intentions, but who focus on helping each other on new company articles. Interesting? Johnuniq (talk) 10:58, 1 September 2015 (UTC)

STiki Down

Hi Andrew, is it just my computer or is stiki down at the moment? Regards, Telfordbuck (talk) 17:59, 3 September 2015 (UTC)

ClockC -- STiki is down, and I've contacted my colleague. Unfortunately, he reports that physical security has changed and he no longer has access; troubleshooting that now. Thanks, West.andrew.g (talk) 18:23, 3 September 2015 (UTC)
 Done -- Everything should be back online and queues are re-populating. Thanks, West.andrew.g (talk) 18:24, 4 September 2015 (UTC)

5000 update?

Everything cool? Serendipodous 13:43, 16 November 2015 (UTC)

ClockC -- My server and its operation are fine. The reports keep failing because the Wikimedia API keeps kicking back errors. My code is aggregating the stored statistics just fine, but once it has the list it starts hitting the API to ask about existence (for the red links report) and category memberships for quality assessments. At some point during this, the API is failing to respond in a timely manner, even after a couple retries. This causes my entire script to fail. Normally I am able to ask millions of queries consecutively without any problem, so this high failure rate suggests something is up. I'm investigating, but I don't think its on my end. Thanks, West.andrew.g (talk) 17:32, 16 November 2015 (UTC)
You'll observe the "5000" report did produce. I consider this largely to be luck at this point, as the other two continue to fail as the results of timed-out API queries. I've tried to get some perspective at: Wikipedia:Village_pump_(technical)#Large_amount_of_HTTP_504_responses_on_API. Thanks, West.andrew.g (talk) 18:33, 16 November 2015 (UTC)
 Done -- All three generated after 20+ attempts. Thanks, West.andrew.g (talk) 19:19, 16 November 2015 (UTC)

STiki

Is STiki available for Urdu wikipedia? I would like to use it on Urdu wiki. Muhammad Shuaib (talk) 08:35, 2 December 2015 (UTC)

STiki is currently only available for English Wikipedia. Thanks, West.andrew.g (talk) 04:39, 7 December 2015 (UTC)

My STiki usage

Hello there! I have to say - I'm surprised that anyone actually noticed me using it. But to clarify - there's nothing wrong with your permission checks. I simply downloaded the source code from the git repo, modified it because there were some bugs stopping the buttons render properly (at least, for me), bypassed the permissions checks, got the DB/IRC details from the public release and ran it! I've done a fair bit of vandalism-fighting before - but I'd never really bothered to register an account here and it felt like a brand new user requesting special permissions would have been met with heavy skepticism. However, had I known that it would have caused you guys trouble worrying about bugs, I would have perhaps taken a different approach! Sorry about that. Spolglans (talk) 09:06, 7 December 2015 (UTC)

@Spolglans: Very interesting. I've known it was possible for someone to do this, but always imagined it would be a "bad guy", and in that case why pull some minor hacks around my anti-vandalism tool when they clearly have the capability to do something much worse directly to Wikipedia via API. That said, I do have some monitoring scripts (outside of the Github source) that I use to monitor the ratios of stored procedure calls, etc. in case someone really wanted to attack.
This discussion may continue at WT:STiki, but I think I am inclined to let you continue your use without making a big deal of this. You clearly know how to code, and if you make any significant improvements, I could always pull them back into the repo. Yes, the buttons render weirdly on Mac specifically (but look perfect on Windows; so much for cross-platform). I rarely have the bandwidth for anything but the most egregious bug fixes any more, so any and all help is appreciated. Thanks, West.andrew.g (talk) 17:47, 7 December 2015 (UTC)

deadlinks

re: User:West.andrew.g/Dead links/Archive 1014 I run into it when fixing sixtiescity.com deadlinks. There are dozens of them. Do you happen to know any tool to automate the job? (.com->.net && .shtm->.htm) - üser:Altenmann >t 08:21, 28 December 2015 (UTC)

@Altenmann: I haven't produced those reports in ages. That said, the underlying issue is a relevant one. I would see Wikipedia:Link_rot and continue the discussion over there. Thanks, West.andrew.g (talk) 01:44, 30 December 2015 (UTC)

Top 5,000 article

Are you still compiling the top 5,000 Wikipedia article? Where is it now? Thank you. 2606:6000:610A:9000:8CE4:F907:4926:6E60 (talk) 17:36, 17 January 2016 (UTC)

WP:5000. Thanks, West.andrew.g (talk) 02:35, 18 January 2016 (UTC)
Thanks, but it's just redirected to your talk page? 2606:6000:610A:9000:D5B6:955A:5E57:6BDA (talk) 16:21, 7 February 2016 (UTC)
Correct, the actual content is in my user-talk space. What difference is it? This is done because the edits are automated. Bot policy states that automated edits to one's user space are exempt from bot approvals and policy. This saves me some bureaucracy. Thanks, West.andrew.g (talk) 15:20, 8 February 2016 (UTC)
The list was not showing up in your talk page anywhere. Now it's available here https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_pages but it wasn't last week. Thank you for compiling it. 2606:6000:610A:9000:7507:B343:17C1:3C46 (talk) 00:56, 9 February 2016 (UTC)

It generates automatically each week in the early hours of Sunday. The script blanks the page and then uploads the new table. This amounts to two really big edits, which often work, as opposed to the really nasty diff calculation of doing it in one edit, which consistently causes the server to timeout. Recently, however, the second step of the two phase process has been too much for the server too, so we end up with a blank page. When I notice this, I manually re-run the script and it tends to commit after one or two tries. Thanks, West.andrew.g (talk) 02:24, 9 February 2016 (UTC)

Okay thanks. Today at 8:50am PST it's just redirecting to your User page with no list to be found anywhere. It never used to do this. It hasn't worked from the Wikipedia Statistics page for over two months now (except once a few weeks ago).
Under "page views" see User:West.andrew.g/Popular pages – current weekly 5,000 most popular articles based on raw data
https://en.wikipedia.org/wiki/Wikipedia:Statistics
Redirects here:
https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_pages

Shout out

Hey Andrew, I gave your 2013 Whitney Houston piece a shoutout in the blog today. Just thought you'd want to know! Best, Ed Erhart (WMF) (talk) 08:24, 10 February 2016 (UTC)

@Ed Erhart (WMF): Thanks! Very interesting to see the fine-granularity traffic graphs compiled by The Foundation. Certainly a lot more than I was able to do a couple of years ago and probably a much more efficient process. Keep up the good work. West.andrew.g (talk) 18:41, 10 February 2016 (UTC)

Source code for popular redlinks

Dr. West, do you have the source code for User:West.andrew.g/Popular redlinks? I am told that the Arabic Wikipedia is interested in a similar list. Harej (talk) 15:16, 2 April 2016 (UTC)

@Harej: I do, but I have also recently learned that the statistical files that I am using are about to be deprecated by the WMF in favor of a new type (free of bot and spider traffic) that also might be different in format. I should probably get that cleaned up before sharing with anyone; as things will break soon. I'll also note that there is a good bit of infrastructure helping me generate these lists. I store English WP statistics in a dedicated database on a server I own, and it can take more than an hour nightly to ingest a day's worth of data. Doing this for Arabic would be an order(s) of magnitude less processing, but my code still assumes there is a database infrastructure. Point being, this isn't a "just run it" type of things. Someone with computer science skill, however, wouldn't have much problem making things work. Thanks, West.andrew.g (talk) 14:21, 4 April 2016 (UTC)
Makes sense. I am fine with setting up the infrastructure provided I am told what the database schema is. What other metrics do you run for Wikipedia? Harej (talk) 14:24, 4 April 2016 (UTC)
@Harej: My user page has a fairly decent summary. Thanks, West.andrew.g (talk) 14:34, 4 April 2016 (UTC)

StiKi not working.

Hi kindly check out the issue here. KagunduWanna Chat? 09:03, 8 April 2016 (UTC)

 Done -- This issue has been resolved. See the referenced talk page. Thanks, West.andrew.g (talk) 14:52, 25 April 2016 (UTC)

You've got mail!

Hello, West.andrew.g. Please check your email; you've got mail!
Message added 20:04, 22 April 2016 (UTC). It may take a few minutes from the time the email is sent for it to show up in your inbox. You can remove this notice at any time by removing the {{You've got mail}} or {{ygm}} template.

Ed Erhart (WMF) (talk) 20:04, 22 April 2016 (UTC)

 Done -- Replied. Thanks, Ed. West.andrew.g (talk) 14:52, 25 April 2016 (UTC)

Question

I have downloaded Stiki, and when I double-click on the file (or whatever it is) to open it, I am asked what program I want to use to open it. I use a Windows 8.1 computer, with Google Chrome. Peter Sam Fan 18:47, 5 May 2016 (UTC)

@Peter SamFan: Hi Peter. You need to have the Java Runtime Environment (JRE) installed in order to run STiki. If installed correctly, a double-click should launch the program. If you're still confused, posting at WT:STiki will get you a faster response. Thanks, West.andrew.g (talk) 00:18, 7 May 2016 (UTC)

Distressed

I have been an editor on Wikipedia for several years but have become semi-inactive for the past 6 months or so. When I logged into http://stats.grok.se/ today I found that it hasn't worked for the past 6 months. Could you direct me to the traffic tool that you are currently using, please?«Marylandstater» «reply» 14:54, 10 June 2016 (UTC)

@Marylandstater: I believe [11] is the new standard. Thanks, West.andrew.g (talk) 17:56, 10 June 2016 (UTC)

User:West.andrew.g/Popular redlinks for pl.wiki

Hello. Did you consider generating User:West.andrew.g/Popular redlinks or pages similar for other wikis? Both with @Magalia: we are intrested in such initiative for pl.wiki. Is there technical possibility to do this for Polish Wiki? PMG (talk) 13:38, 3 August 2016 (UTC)

@PMG: @Magalia: Unfortunately, I do only English Wikipedia at this time due to space considerations on my personal machines. I generate this from the dumps at [12], and unfortunately I am using one of the deprecated data formats to do it. I don't know if the newer datasets have the capability to calculate redlinks (its a tremendously space-consuming long tail in addition to actual article traffic), and I don't know when/if my data will be turned off. That said, I am willing to share my source code (Java/SQL) if there were someone wanting to do this for other projects. Thanks, West.andrew.g (talk) 15:23, 5 August 2016 (UTC)

Update this week?

Is the update coming today? Serendipodous 14:30, 9 August 2016 (UTC)

  • haha, I just went to get started on the new Top 25 and had the same question.--Milowenthasspoken 14:31, 9 August 2016 (UTC)
We switched, Milo, remember? :-) 14:37, 9 August 2016 (UTC)Serendipodous

@Serendipodous: @Milowent: -- It appears the WP:5000 has ended its run in its current form. It looks like the WMF has made good on their claim to deprecate the data feed we use: [13], with no data after 12:00 UTC on AUG-05. It may take me non-trivial time to understand and integrate the new statistical dump format. I suggest you find an alternative data source in the meantime. West.andrew.g (talk) 14:46, 9 August 2016 (UTC)

  • Thanks, Andrew. Well, that sucks, but I think we can make TopViews work in the meantime.--Milowenthasspoken 15:08, 9 August 2016 (UTC)
(Just as an FYI for the archives) I have switched to an alternative data source and the WP:5000 is once again in operation. Thanks, West.andrew.g (talk) 13:47, 25 August 2016 (UTC)

User:West.andrew.g/Popular redlinks

Hello there. I was curious when (and if) User:West.andrew.g/Popular redlinks might make a return. Thanks again for all you do. Best wishes. Biosthmors (talk) pls notify me (i.e. {{U}}) while signing a reply, thx 00:40, 12 September 2016 (UTC)

@Biosthmors: There is probably some more detail in the talk archives of WT:5000 or WT:Top25Report, but summarily, the red links report will not and cannot resume. The raw statistics the WMF releases are now limited only to traffic on existing articles. This: (a) saves space on statistical dumps; given the tremendously long tail of non-existent pages that would get just one visit per period, and (b) provides some privacy protections; all types of random junk showed up due to folks (presumably) typing things in the URL bar after an article when they thought their cursor was elsewhere. When they'd hit enter, this is logged as an article request. I am sure the WMF stats folks could provide some more color, but the end result is that we can no longer produce the redlinks report. Thanks, West.andrew.g (talk) 04:05, 12 September 2016 (UTC)
OK thank you. Biosthmors (talk) pls notify me (i.e. {{U}}) while signing a reply, thx 11:57, 12 September 2016 (UTC)

Extended confirmed protection

Hello, West.andrew.g. This message is intended to notify administrators of important changes to the protection policy.

Extended confirmed protection (also known as "30/500 protection") is a new level of page protection that only allows edits from accounts at least 30 days old and with 500 edits. The automatically assigned "extended confirmed" user right was created for this purpose. The protection level was created following this community discussion with the primary intention of enforcing various arbitration remedies that prohibited editors under the "30 days/500 edits" threshold to edit certain topic areas.

In July and August 2016, a request for comment established consensus for community use of the new protection level. Administrators are authorized to apply extended confirmed protection to combat any form of disruption (e.g. vandalism, sock puppetry, edit warring, etc.) on any topic, subject to the following conditions:

  • Extended confirmed protection may only be used in cases where semi-protection has proven ineffective. It should not be used as a first resort.
  • A bot will post a notification at Wikipedia:Administrators' noticeboard of each use. MusikBot currently does this by updating a report, which is transcluded onto the noticeboard.

Please review the protection policy carefully before using this new level of protection on pages. Thank you.
This message was sent to the administrators' mass message list. To opt-out of future messages, please remove yourself from the list. 17:49, 23 September 2016 (UTC)

I have defined all the link targets from this redirect to Jim Rosenhaus, there is only one more page linked to it User:West.andrew.g/Dead links/Archive 183, else you can delete the redirect if appropriate. - Mlpearc (open channel) 03:46, 25 October 2016 (UTC)

@Mlpearc: That link is from a very old historical report that no human would ever be interested in moving forward. Do with it as you wish. West.andrew.g (talk) 06:12, 25 October 2016 (UTC)
LOL, thanx. - Mlpearc (open channel) 14:00, 25 October 2016 (UTC)

Two-Factor Authentication now available for admins

Hello,

Please note that TOTP based two-factor authentication is now available for all administrators. In light of the recent compromised accounts, you are encouraged to add this additional layer of security to your account. It may be enabled on your preferences page in the "User profile" tab under the "Basic information" section. For basic instructions on how to enable two-factor authentication, please see the developing help page for additional information. Important: Be sure to record the two-factor authentication key and the single use keys. If you lose your two factor authentication and do not have the keys, it's possible that your account will not be recoverable. Furthermore, you are encouraged to utilize a unique password and two-factor authentication for the email account associated with your Wikimedia account. This measure will assist in safeguarding your account from malicious password resets. Comments, questions, and concerns may be directed to the thread on the administrators' noticeboard. MediaWiki message delivery (talk) 20:34, 12 November 2016 (UTC)

A new user right for New Page Patrollers

Hi West.andrew.g.

A new user group, New Page Reviewer, has been created in a move to greatly improve the standard of new page patrolling. The user right can be granted by any admin at PERM. It is highly recommended that admins look beyond the simple numerical threshold and satisfy themselves that the candidates have the required skills of communication and an advanced knowledge of notability and deletion. Admins are automatically included in this user right.

It is anticipated that this user right will significantly reduce the work load of admins who patrol the performance of the patrollers. However,due to the complexity of the rollout, some rights may have been accorded that may later need to be withdrawn, so some help will still be needed to some extent when discovering wrongly applied deletion tags or inappropriate pages that escape the attention of less experienced reviewers, and above all, hasty and bitey tagging for maintenance. User warnings are available here but very often a friendly custom message works best.

If you have any questions about this user right, don't hesitate to join us at WT:NPR. (Sent to all admins).MediaWiki message delivery (talk) 13:48, 15 November 2016 (UTC)

ArbCom Elections 2016: Voting now open!

Hello, West.andrew.g. Voting in the 2016 Arbitration Committee elections is open from Monday, 00:00, 21 November through Sunday, 23:59, 4 December to all unblocked users who have registered an account before Wednesday, 00:00, 28 October 2016 and have made at least 150 mainspace edits before Sunday, 00:00, 1 November 2016.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2016 election, please review the candidates' statements and submit your choices on the voting page. MediaWiki message delivery (talk) 22:08, 21 November 2016 (UTC)

Another one crosses 250,000

Hey Andrew, we have another: WP:STIKI/M. Ugog Nizdast (talk) 16:02, 29 November 2016 (UTC)

 Done -- @Ugog Nizdast: I can't remember what I did in previous iterations, but something spontaneous and personal was probably my initial goal in not creating "super barnstars". West.andrew.g (talk) 19:38, 29 November 2016 (UTC)