User:GoldenRing/MoveStats

From Wikipedia, the free encyclopedia

This page is for feedback, ideas and discussion related to the MoveStats tool. Please treat it as you would any other talk page.

MoveStats

Ideas for future development[edit]

Ideas are welcome!

  • Thanks for this GoldenRing, a couple of things you could consider:
    • Track moves by namespace (template, category, article etc)
    • Perhaps instead of/in addition to most recently registered, a number of days time frame?
    • Maybe also a filter on only those who've done multiple namespace moves (e.g. select only those who have done template + article and so on)
  • cheers. —SpacemanSpiff 14:24, 31 July 2015 (UTC)
    • @SpacemanSpiff: I'll look into those things. There's a couple of things that look odd about the data already (try increasing the user limit to 1000!) and I'll try to figure out what's going on there first. GoldenRing (talk) 15:10, 31 July 2015 (UTC)
      • I'll have to take a look later as my internet connection has gone very slow on me that even simple pages are taking ages to download. Maybe The Blade of the Northern Lights can provide some good feedback as he's more familiar with this issue than I am. —SpacemanSpiff 16:52, 31 July 2015 (UTC)
        • @SpacemanSpiff:Regarding the idea of "a number of days time frame," how would you see that working exactly? Should it plot every user who has done any moves within the last n days? And should it show how many moves they've done in total, or how many moves they've done in the last n days? I'm not sure how many of these ideas will be practical - the query performance is already getting annoyingly slow. GoldenRing (talk) 13:45, 3 August 2015 (UTC)
          • I'm not entirely sure GoldenRing, I was thinking along the lines of like a filter along the lines of (a) first edit in the past 30/60/90 days or so, (b) type of moves (all that apply - template, category, WP, article), (c), Number of moves (show only those greater than 20/50 etc). I don't know anything about how the tools work, so if anything I say doesn't make sense from a technical sense, feel free to ignore. cheers. —SpacemanSpiff 14:03, 3 August 2015 (UTC)
            • @SpacemanSpiff:I've implemented a couple of these ideas - you can now filter to only users with moves in certain namespaces, and also find only users who have moves in more than one namespace. As for a time limit, the most obvious thing to do is to limit it to users who registered accounts in the last n days. Unfortunately, the way that the registration timestamp is stored in the database makes querying on it rather slow (I'm still waiting for such a query to finish, so I don't know how long). GoldenRing (talk) 16:51, 4 August 2015 (UTC)
              • @GoldenRing:I can see that the Timur sock is the top hit from recent times and that's good as it's highlighting what we want, one thing I might add is that currently the namespaces are an "OR" choice, if there's an option to make them an "AND" choice, it'd be helpful. I've tried to change the number of users from 100 to 1000 (to include Eldizzino in the graph) but it's taking a loooooong time, I'll wait for that to happen or time out and post later.—SpacemanSpiff 17:09, 4 August 2015 (UTC)
                • @SpacemanSpiff: I've been making quite a few changes to how the query works this afternoon and there were various times when it just never returned. Try again, noting what I've written below about how the user limit is applied. GoldenRing (talk) 17:32, 4 August 2015 (UTC)

@GoldenRing: I just tried with (a) default 100000 users Mainspace-Template-Category; (b) default 100000 users Only users with moves in multiple namespaces and the repeated these two scenarios with 100001 users. For all four scenarios I got only 13 days of users, so I'm guessing something might have gone a bit amiss here? cheers. —SpacemanSpiff 17:39, 4 August 2015 (UTC)

  • @SpacemanSpiff: That appears to be correct. As we go to press, the most-recently-registered user id is 25930752. User 25830752 registered 21:35:32 on 22 July, 2015. GoldenRing (talk) 17:45, 4 August 2015 (UTC)
    • @GoldenRing: That is a bit weird because when the default was 100 I could see more users (this was just before my earlier post). If that's the case, what's the maximum user number we should cap at before we cause problems at the backend? I ask because it's unlikely that people will check this on a daily basis or even on a weekly basis after the array of socks go hibernating. *Unrelated musing: if we get that many registered users in less than two weeks, where are they disappearing?* —SpacemanSpiff 17:53, 4 August 2015 (UTC)
      • @SpacemanSpiff: Yes, as noted below, it used to take the 100 most-recently-registered users who had done page moves; now it looks for users who have done page moves and who are in the 100 most-recently-registered users. This is marginally less useful, but somewhat quicker to load. So far I've tried a limit of 10,000,000, searching about 40% of registered users. It isn't quick, but it doesn't cause problems.
if we get that many registered users in less than two weeks, where are they disappearing Just so. As noted below, there have been 1,000,000 users registered in the last 100 days. GoldenRing (talk) 18:14, 4 August 2015 (UTC)
  • Some ideas:
    • A bar graph will provide clearer info here (especially if the actual number can be included at the top of the bar). Failing that, lines between each data point would definitely be an improvement.
    • It'd be much better if the y-axis was in number-form rather than scientific notation-form.
  • Now a general question – exactly what data is this trying to convey? I'm confused by what the x-axis ("Days since registration") is trying to get across... --IJBall (contribstalk) 20:43, 31 July 2015 (UTC)
    The sockmaster who this is designed to ensnare, Tobias Conradi, has an MO of getting autoconfirmed and then moving pages all over the place without any discussion or discernible reason. The original tool did something similar, IIRC it highlighted relatively new accounts doing huge numbers of page moves in red. Once you see an account that's going ballistic with page moves, if you're familiar with Conradi it's immediately obvious whether or not it's him (see the move logs for User:TigreTiger, User:Bogdan Nagachop, and User:Eldizzino for demonstrative but in no way exhaustive examples). His idée fixe tends to change, so each time he goes somewhere new it takes a while for the people who frequent the topic to notice something is up, but the carelessness with which he moves articles around is unmistakable if you've dealt with him. The Blade of the Northern Lights (話して下さい) 03:34, 1 August 2015 (UTC)
Are there any threshold we should consider on the graph, or just look for young accounts with multiple moves? (Seems all accounts are rather old atm) EvergreenFir (talk) Please {{re}} 03:55, 1 August 2015 (UTC)
@EvergreenFir: Others have far more experience looking into these things than me, but my initial reaction is that any sort of threshold would only serve to give the editors we're looking for a guideline to how to stay under the radar. What we're looking for here are outliers. It may be possible to construct some sort of statistical test, I guess - list anyone who's more than X standard deviations from the mean of some metric taking into account both axes or some such. GoldenRing (talk) 08:01, 3 August 2015 (UTC)
@GoldenRing: Something's off with the "days since registration"... TimurKirov registered 20 July 2015 (15 days ago), but the chart says 949 days. I refreshed and see that's been fixed. EvergreenFir (talk) Please {{re}} 16:35, 3 August 2015 (UTC)
Yes, this was broken but has been resolved. GoldenRing (talk) 08:55, 4 August 2015 (UTC)
  • @IJBall:"Days since registration" means pretty much what it says on the tin - days since the editor registered the account. As The Blade of the Northern Lights has explained, this is aimed at finding editors with particular patterns of editing. So a bar chart (I assume you mean some sort of binning?) would not be very helpful. Also, the points are not related in any sort of series, so lines between them wouldn't mean much. The point is to look for users who have only been registered a few days but have done lots of page moves. Actually I think there are an alarming number of editors who have done page moves very early in their careers - IMO, anyone who's figured out page moves within a month of starting editing either has an unhealthy obsession with editing or is someone who's been here before under a different name. I guess IP editors who register an account must account for some of these, but it's hard to believe it accounts for all of them. Note User:TimurKirov, who has made 30 moves in 13 days, stands out somewhat on this chart, and has been blocked as a sock of Tobias Conradi. On this basis, OrganicEarth is probably worth at least a cursory looking-into (24 moves in 9 days).
  • I agree decimal values in the Y-axis would be more helpful - I'll look into it.
  • @The Blade of the Northern Lights: Please note that I have fixed a bug in how registration times are calculated. It doesn't appear to affect editors registered in the last year or so, but beyond that the "Days since registration" calculation was producing garbage. GoldenRing (talk) 07:56, 3 August 2015 (UTC)
  • All, please also note that I've changed the way the user limit is applied. It used to find the n most-recently-registered editors who had done page moves; it now find editors who have done page moves who are within the n most-recently-registered editors. This is somewhat faster than the previous method, though still far from instantaneous. Note that means that it is now necessary to set a much higher user limit; a user limit of 1,000,000 only returns 845 editors who have done page moves. I was surprised to find that those 1,000,000 users registered inside the past 100 days. GoldenRing (talk) 17:30, 4 August 2015 (UTC)
  • @SpacemanSpiff: Per your suggestion above, it is now possible to combine namespaces in either 'and' or 'or' fashion - 'and' means return only users who have done page moves in all of the selected namespaces, while 'or' means users who have performed moves in any of the selected namespaces. GoldenRing (talk) 16:41, 6 August 2015 (UTC)
  • All - I've also added colour coding to the graph, indicating the editor's edit count. GoldenRing (talk) 17:23, 6 August 2015 (UTC)