User:Wugapodes/GAStats

After learning to build a bot, reading about the new GA cup, and wanting to keep my stats skills from getting rusty, I decided to try and look at the GA backlog and see if there are any trends. The following are some data that I gathered on the GA nomination backlog with some analysis to try and quantify why and how the backlog changes, particularly with regards to the GA cup. The main takeaways are:

there seems to be a reducing number of open reviews over time—which could mean fewer reviewers, faster turnaround, or some combination
the GA cups as effective means of reducing the backlog
the GA cup has diminishing returns—while more investigation is needed, the cups do not seem to encourage more people to keep reviewing after the cup or after getting knocked out (or at least not at the same rate)

This isn't a finished work, as I plan to update it every so often when I get bored enough to do statistics for fun. Hopefully it can help inform discussions on ways to reduce the lengthy backlog.

The full data can be found on the data page.

Last two years at WP:GAN

Methodology

The sampled version was chosen as the last version on the 28th day of the month so that there would be a consistent gap of one month between each sample. Nominations were counted by regular expression match to the following expression <# \{\{GAN> meaning that only entries on the page were counted, rather than members of the category (which is how {{GAN counter}} reaches its numbers).

Analysis

There is an obvious correlation between the decline in number of nominations awaiting review and the GA cups, which is good evidence to their efficacy in reducing the backlog. Despite this, there is not a significant difference between the number of reviews during a cup as when a cup is going on ( t(22.563)=0.31109, p=0.7586 ). This may be because the GA cup rounds tended to finish at the end of the month when samples are taken, or because the number of reviews are the same, they just have a faster turn around. There may also be a lack of data points. Upon further analysis with a much larger sample, it turns out that there is indeed a significant difference in the number of open reviews between GA cup days and non-GA cup days.

There is also a rather sharp increase in the backlog in between the two cups. It is unclear whether this increase is due to limited numbers of reviewers, an increase in nominations, or some combination of the two. While more investigation is needed, there are some hints in the data. Over the two year period, there is a non-significant negative correlation (r=-0.319 between the number of reviews open and time (F(1,23)=2.61, p=0.1195 n.s.). This may mean that the number of reviewers is shrinking, but the data are limited. An investigation into the efficacy of the GA cup would need more nuanced data than these broad strokes.

First GA Cup

Methodology

Daily data points were used and chosen as the last revision of that day. The period of interest was the first GA cup (1 October 2014 to 26 February 2015). The control was the period in between the first and second GA cup (27 February 2015 to 30 June 2015). Nominations were counted in the same way as above.

Analysis

There is a highly significant difference ( t(257.94)=-15.026, p < 2.2e-16 *** ) in the number of reviews open during the first GA cup and in the period between GA cups showing that the GA cup is indeed what contributes to reducing the backlog. This effect is limited though as there is a significant (F(1,146)=54.835, p = 9.636e-12 *** ) negative correlation between the number of open reviews and the time since the start of the GA cup. This shouldn't come as a surprise, the GA cup narrows down the number of participants as time goes on resulting in fewer competitors, however this also seems to mean there are fewer reviewers as well. While a good way to reduce the backlog in the short term, it seems the GA Cups do not encourage continued participation in GA reviewing which is likely what contributes to the inevitable backlog increase after a GA cup.

Nomination and Clearing Rates

Analysis

These data provide a good look into how the GA cup affects the backlog. As these data are more granular than previous runs, they lend themselves to analysis better than sheer numbers or monthly totals do. Unfortunately the methodology makes doing correlation over time a rather difficult task so questions of whether the number of reviews over time is decreasing which was brought up in previous sections is still unresolved.

A t-test was performed on the number of daily nominations during the cup (n=149, M = 9.4)^{[note 1]} and in the inter-cup period (n=125, M = 10.6)^{[note 1]}. There was no significant difference between the daily number of nominations during the cup and during the inter-cup period (t(246.52) = -1.13, p = 0.26 n.s.). This should not be extrapolated from. It could be that the first GA cup had an effect on nominations that lasted afterwards (since the control is from after the cup not before), that there was a positive correlation within the GA cup nominations but it cannot be seen because of the scrambled data, or that the difference is too small to notice. Regardless, this is a good result as it shows that if there is an effect on nominations, it is likely very small.

A t-test was also performed on the daily number of closed nominations^{[note 2]} during the cup (n=149, M=7.03) and during the inter-cup period (n=125, M=4.78). There is a highly significant difference between the number of closures during the cup as in between the cups (t(270.9)=5.92, p=9.32x10^-9***). This is evidence that no one really needed though: The GA cup does its job.

I also investigated whether the GA cup led to significantly more passes or fails. I did not find evidence that this was the case. The number of passed or failed nominations was divided by the total number of closed reviews for that day producing a daily rate of acceptance and a daily rate of failure. These were then compared by t-test. Non-numerical values (arising from division by zero when no reviews were closed that day) were omitted. Neither passage nor failure showed any significance. Failure: t(239.69)=0.856, p=0.392 n.s. and Passage: t(239.69)=-0.856, p=0.392 n.s.^{[note 3]}^{[note 4]}

Notes

^ ^a ^b n is the number of days sampled, not the number of nominations in total. Likewise, M is the mean number of nominations per day
^ Closed nominations are those that were removed from the list as passed or failed.
^ These are the same because they are reciprocals of each other as (passage / close) + (failure / closure) = closure
^ Interestingly, a Mann–Whitney U test gives p = 0.1131, which is not significant but much closer.

[units-1] s the number of days sampled, not the number of nominations in total. Likewise, M is the mean number of nominations per day

[2] Closed nominations are those that were removed from the list as passed or failed.

[3] These are the same because they are reciprocals of each other as (passage / close) + (failure / closure) = closure

[4] Interestingly, a Mann–Whitney U test gives p = 0.1131, which is not significant but much closer.

[note 1]

[note 2]

[note 3]

[note 4]