Jump to content

Wikipedia:Village pump (WMF)

From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:VP/WMF)
 Policy Technical Proposals Idea lab WMF Miscellaneous 
The WMF section of the village pump is a community-managed page. Editors or Wikimedia Foundation staff may post and discuss information, proposals, feedback requests, or other matters of significance to both the community and the Foundation. It is intended to aid communication, understanding, and coordination between the community and the foundation, though Wikimedia Foundation currently does not consider this page to be a communication venue.

Threads may be automatically archived after 14 days of inactivity.

Behaviour on this page: This page is for engaging with and discussing the Wikimedia Foundation. Editors commenting here are required to act with appropriate decorum. While grievances, complaints, or criticism of the foundation are frequently posted here, you are expected to present them without being rude or hostile. Comments that are uncivil may be removed without warning. Personal attacks against other users, including employees of the Wikimedia Foundation, will be met with sanctions.

« Archives, 1, 2, 3, 4, 5, 6, 7

Sunday July 28 Strategic Wikimedia Affiliates Network meeting (Results of Movement Charter ratification)

[edit]
SWANs gathering for a conversation

Hello everyone!

The Strategic Wikimedia Affiliates Network (SWAN) is a developing forum for all Wikimedia movement affiliates and communities to share ideas about current developments in the Wikimedia Movement. It expands on the model of the All-Affiliates Brand Meeting (following the re-branding proposal by the WMF) to help lay some of the groundwork for further Wikimedia 2030 strategy process work.

At this meeting we will focus on the results of the Movement Charter ratification. We will also discuss the aftermath of the Board of Trustees' decision to veto the Movement Charter, including their recent proposals. We will also cover updates about upcoming Wikimania 2024.

This month, we are meeting on Sunday, July 28, and you are all invited to RSVP here.

UTC meeting times are and

Nadzik (talk) 17:35, 20 July 2024 (UTC)[reply]

[edit]

One of our most important tools, Earwig's Copyvio Detector, depends on access to Google. According to the tool's creator and operator, The Earwig, the WMF has kindly been paying for this Google access. Unfortunately, we've been hampered by a strict limit on the number of searches allowed per day. The Earwig mentioned that there might be a way to work out a special arrangement with Google to increase the cap. Would someone at the WMF be able to pursue this?

In case it helps, this is a vital tool to a number of English Wikipedia processes, and it would surprise me to learn that the sister projects aren't using it as well. We use the tool routinely as part of our new page patrol, articles for creation, contributor copyright investigations, did you know, good article, and featured article processes. Historically, the WMF has taken a special interest in supporting volunteer work that focuses on our legal responsibilities, of which compliance with copyright law is an obvious example. Firefangledfeathers (talk / contribs) 01:34, 31 July 2024 (UTC)[reply]

My understanding is that Google has a hard daily limit of 10,000 API accesses per day for absolutely everyone across the board, without exception. User:Novem Linguae/Essays/Copyvio detectors#Earwig copyvio detector. My impression is that an exception wasn't possible because Google doesn't provide an exception to anyone. Earwig would know best though.
Was this post made because Earwig said "please ask WMF on my behalf to negotiate with Google", or is this more of general question? –Novem Linguae (talk) 04:15, 31 July 2024 (UTC)[reply]
cc Chlod. –Novem Linguae (talk) 04:18, 31 July 2024 (UTC)[reply]
Earwig didn't ask me to do anything on his behalf. He mentioned that "the WMF pays for it, but Google's API terms limit our usage without some kind of special arrangement that I have been unable to get." This was at a discussion at his user talk. I wasn't sure who might be able to negotiate a special arrangement, and I'm not sure it's a possibility, but this was the best place I could think to ask. Firefangledfeathers (talk / contribs) 04:26, 31 July 2024 (UTC)[reply]
A bit odd that Earwig isn't doing the advocating himself, but on the linked user talk page, it does sound like he's asking for some help with this. Would The Earwig be willing to share his contacts at WMF that have helped with this in the past? Sounds like WMF pays for the tool, so there's some accounting/finance/grants contact that knows a little about it. And we also have partnerships people like NPerry (WMF) that I believe has worked with Google before. –Novem Linguae (talk) 04:43, 31 July 2024 (UTC)[reply]
From what I've heard, the WMF contact that Earwig had has since left the Foundation and wouldn't be able to help in this case. You are correct that WMF pays for the tool. I had mentioned this at the Hackathon with staff and it seems there's some resistance in getting the cost of extra tokens funded, although I'm unsure of exactly how the WMF's budgeting process works, so no clue on the impact it has in this situation (considering we don't have a Google liaison to begin with). Chlod (say hi!) 05:20, 31 July 2024 (UTC)[reply]
Even the name of a former WMF employee contact would be helpful. Let's get all this documented so we can start figuring out what WMF departments/teams have assisted in the past. –Novem Linguae (talk) 05:26, 31 July 2024 (UTC)[reply]

Hi all. Firefangledfeathers, thanks for starting the discussion. Novem, sorry if this thread came up a bit strangely. In truth, I struggle a little with motivation these days, so I really appreciate others' help getting the ball rolling. I am still here, though—it comes in waves. (It's also good to involve the community in the tool so the institutional knowledge isn't stuck with me.)

BTW, I am working on more effectively managing automated/excessive tool usage and will soon require OAuth to run searches (see this active thread on my talk). Right now the tool really doesn't have any usage guards or a way to limit individual users' activity, which isn't good when our resources are so limited. It's possible doing that will free up our resources a lot if a substantial fraction of our current usage is coming from malicious crawlers, despite Chlod and I's attempts at blocking them (the tool has been running for over a decade and it's never been this bad, though I have a theory what this is about). Even so, finding a way to increase our search quota will enable us to support some requested features that are current nonstarters, even if the tool's entire current quota could be devoted to it, like checking all new pages.

My main point of contact with the WMF in the past was Kaldari. The last time we spoke about the tool was 2020; since then, the situation has been unclear. (MusikAnimal, do you remember if we've spoken about this?) Last year Runab WMF and DTankersley (WMF) reached out to me to discuss the tool in the context of WMF efforts "to find ways to reduce single points of failure for tools that require a third party API", but after an initial conversation I haven't heard back aside from being told that Deb was moved to another project, so I'm not sure what happened with those efforts.

Frequently we've discussed adding an alternate search backend aside from Google. While Google is really the gold standard for breadth of search coverage, as far as I'm aware—and this is really what the copyvio detector needs, not necessarily quality/intelligence; people have suggested services like DuckDuckGo, but they're really unsuitable because they just republish raw results from Bing with some additional flair that is basically useless for us—something like Bing itself might work as an (automatic) fallback if we exhaust our Google credits for the day. I believe Bing has roughly equivalent pricing/usage limits as Google, but it's been a while since I've looked into it. And we/the WMF would need to establish a relationship with Bing for that to work; I don't know if that's a better idea than attempting to negotiate our Google limits. There are also other options like Yandex (which the tool did use one dark time in the past before the Google relationship and after Yahoo ended their free service... it wasn't great, at least for English results, but it's something that could be looked into for some other language projects, perhaps). Finally, there was a discussion on my talk earlier this year with Samwalton9 (WMF) about adding The Wikipedia Library as another search backend, and I did correspond briefly with someone at EBSCO about this, but again, I haven't heard from either of them about this in several months. — The Earwig (talk) 06:51, 31 July 2024 (UTC)[reply]

So the Google API proxy and the Google account it runs on are wholly part of Community Tech's budget. Kaldari was the contact in the past when they were my manager on CommTech. So the good news is Community Tech is still here, and we are actively maintaining this proxy (I just migrated it to a newer Debian a few days ago). The part that hasn't changed is our quota from Google, and sadly I doubt it will change. We are already paying hefty fines for the quota we have now, but I believe it is also correct that 10K is a strict limit from Google. I can see from the graphs in the API console that we almost always hit that limit within the first 12 hours of each day.
I am working on more effectively managing automated/excessive tool usage and will soon require OAuth to run searches … – that is most certainly the best immediate recourse for addressing this problem. From my years of shielding XTools from web crawlers, I can say with confidence that putting up a login wall by itself should make a big difference. I also think mitigating excessive and automated use is something that would probably be required before we could consider dishing out more money to Google. However again, I don't think such negotiations would get us anywhere anyway :(
As a general note, such "negotiations" are typically done these days via the Partnerships team. I went though them recently when we solidified our partnership with Turnitin. Speaking of which… do others find the "Use Turnitin" option of Copyvios at all useful? Because that's using the old Turnitin account (the new one can't be used outside CopyPatrol), and the last I checked there were still a few million credits left. MusikAnimal talk 20:52, 1 August 2024 (UTC)[reply]
Thanks for those details. So I can improve my notes at User:Novem Linguae/Essays/Copyvio detectors#Earwig copyvio detector, do you know why/how Google API Proxy ended up separate from the main tool? And does "paying hefty fines" mean that there is some sort of sliding scale of pricing and that getting near the cap gets more expensive? My notes currently state that Google API credits cost us $50/day. –Novem Linguae (talk) 00:52, 2 August 2024 (UTC)[reply]
The Google API Proxy (docs at wikitech:Nova Resource:Google-api-proxy) exists solely to anonymize requests to the Google APIs, should they be used in a fashion that sends personal data such as your IP or user agent. As far as I know, Copyvios has always accessed Google APIs through this proxy.
I'm not aware of any sort of sliding scale as far as pricing goes, and my use of the word "hefty" was relative to my team. However since I made my reply above, I have been informed that the budget is actually not solely from Community Tech, as it was in the past (but we do still maintain the proxy). I don't have any details about internal accounting, I'm afraid. My apologies for any confusion caused. MusikAnimal talk 19:48, 2 August 2024 (UTC)[reply]
Wouldn't the Toolforge tool itself serve as an anonymizing proxy as long as the Google API requests are being sent via a backend rather than via browser JavaScript? But that's a bit of a tangent :) –Novem Linguae (talk) 22:18, 2 August 2024 (UTC)[reply]
I imagine it also serves to decouple who pays Google from who operates the service using the Google API. isaacl (talk) 21:11, 5 August 2024 (UTC)[reply]
From my years of shielding XTools from web crawlers, I can say with confidence that putting up a login wall by itself should make a big difference
Please, please, please do not restrict Earwig to only editors with accounts. Anonymous users have enough doors slammed in our faces as it already is. 2603:8001:4542:28FB:750C:1A25:D002:877B (talk) 19:32, 5 August 2024 (UTC) (Actual talk)[reply]
I'm willing to entertain alternate methods for anonymous users to access the tool. I do need some way to attribute usage to a human, though. Any suggestions? I might present a challenge page where you have to answer some question (essentially a CAPTCHA, but one that works without JS). Or I can require anonymous users request a token (that would be saved as a cookie so does not need to be entered with each request). — The Earwig (talk) 00:00, 10 August 2024 (UTC)[reply]
@The Earwig will the new meta:IP Editing: Privacy Enhancement and Abuse Mitigation feature be of any value? I'm still figuring out the details, but my understanding is that it will use cookies to track anonymous users across changing IPs. It's not foolproof (and isn't meant to be), but it's a least a first pass at "all of these IP edits were done by one human". RoySmith (talk) 12:40, 10 August 2024 (UTC)[reply]
That's a good question, but I can't tell from the documentation whether temporary accounts are able to use OAuth. I would need to test it. (And we may need some other solution in the meantime, before that's deployed.) — The Earwig (talk) 14:25, 10 August 2024 (UTC)[reply]
I know nothing about what the "modern" methods are of telling bots and humans apart these days (besides knowing that CAPTCHAs are apparently broken: e.g. Table 3 in [1]), so I probably won't be able to provide many helpful suggestions. If we need to request a token from you, I for one would be okay with that if it's a one-time process, but would that be handled by an automatic system or by asking you for one manually—and if the latter, is there any indication how many anonymous editors (legitimately) use the tool so you won't be potentially flooded with hundreds of requests?

Another idea I had is making sure the requesting IP address' /64 range (if IPv6, or the address alone if IPv4) has made at least one edit to the Wikimedia site the Earwig request is for. I could easily be assuming wrong, but I would think there wouldn't be many cases where an anonymous user would want to check for copyright violations on a site they've never even edited before. This would admittedly require anonymous users to be on the same device they edit from to use the tool, but so would requiring a browser token. ...however, I don't know how technically feasible that is, and obviously someone malicious could just manually change their IP address (but would someone running hundreds of web crawlers go to that effort for every single one of them?). 2603:8001:4542:28FB:9566:5D77:1AC4:CB78 (talk) 18:47, 12 August 2024 (UTC) (Actual talk)[reply]
@The Earwig: OAuth is not available for temporary accounts. The T&S Product Team aims to make the experience with temporary accounts very similar to what IPs have now, which includes not having access to OAuth or user groups. @2603:*: For IPs, we don't have access to IP addresses on Toolforge, so we can't check on wikis if a specific Toolforge visitor's IP has made an edit on the wikis. Requesting a special token, as you've mentioned, is probably a better option to go for if we want to have anonymous editors keep their access. Since there aren't that many editors who do all of their edits on an IP, we shouldn't get too many requests for this. What should be decided is the minimum requirements for giving this access, and the criteria for revoking it (should we find out that an editor has been using a token maliciously). Chlod (say hi!) 05:44, 13 August 2024 (UTC)[reply]
Given IP addresses (and thus contributions) can't be measured by Toolforge, what prerequisites could even be measured? Or would the token-giving occur on a different site?
In terms of revoking, my knee-jerk thought is more than M requests in a minute or N requests in a day could cause the token to get revoked until the anonymous user interacts with a human to request it back. But A. I don't know if that's how things work with tokens and Toolforge, and B. I'd be curious to know what the already-established practice and limits for e.g. the Wikipedia API and whatnot are, to have some sort of a benchmark. 2603:8001:4542:28FB:C10F:60AC:578A:3F8 (talk) 22:48, 13 August 2024 (UTC) (Actual talk)[reply]
more than M requests in a minute ... could cause the token to get revoked You need to be careful with that. When I'm checking DYK queues, I check a whole set at a time, which means opening nine (or more) Earwig windows in a few seconds. RoySmith (talk) 23:25, 13 August 2024 (UTC)[reply]
@The Earwig Thanks for the reminder about the EBSCO/Library integration - I've just poked that email thread to see if we can make any progress there. Samwalton9 (WMF) (talk) 10:49, 9 August 2024 (UTC)[reply]

Wikimedia Foundation Bulletin July Issue 2

[edit]
Subscribe or unsubscribe · Help translate

Previous editions of this bulletin are on Meta. Let askcac@wikimedia.org know if you have any feedback or suggestions for improvement!


MediaWiki message delivery 21:48, 1 August 2024 (UTC)[reply]

SWViewer

[edit]

Howdy!

I am an occasional user of SWViewer, a vandalism patrolling tool that is (in my understanding) partly maintained by the Wikimedia Foundation. I generall like its interface and design, and I have found it to be a useful tool.

I do have one item that I would like to discuss though: I think it would be wise to add a checkbox for whether or not to mark an edit as minor. Currently, the interface allows users to tick a box for "Use undo" when rolling back an edit with a summary. This is good, but it only allows users to revert edits in a way that is marked as minor. I understand that certain wikis, such as the English Wikipedia, have a very narrow understanding of what constitutes a minor edit, and there are times when I want to undo an edit but it is not technically a minor edit. This makes me have to manually go to the English Wikipedia and click the "undo" button there rather than keeping all of this in the SWViewer interface.

Is there a way that the WMF could either:

  1. Not automatically mark edits made while "Use undo" is ticked as minor (i.e. submit them to non-minor edits); or
  2. Allow users to select whether or not their actions are considered to be minor edits (via a tick box)?

Thank you!

Red-tailed hawk (nest) 02:19, 9 August 2024 (UTC)[reply]

SWViewer is not AFAIK maintained by the foundation. It looks like their bug tracker is at m:Talk:SWViewer. * Pppery * it has begun... 04:08, 9 August 2024 (UTC)[reply]
Noted. — Red-tailed hawk (nest) 18:18, 9 August 2024 (UTC)[reply]

Mobile fundraising

[edit]

Hi, I'm not WMF staff (although I am the Wikimedian of the Year) but I noticed something I thought should have wider community input. Please keep in mind that as far as I'm aware this is in very early stages and there is no guarantee that it will actually be implemented. Also, please be nice. I anticipate that some people will be surprised by what it was being proposed here so I felt like this was an important reminder. Anyways: mw:Wikimedia Apps/Team/iOS/Fundraising Experiment in the iOS App. There is a feedback section towards the end of you wish to give it. I have already commented there. Clovermoss🍀 (talk) 17:47, 10 August 2024 (UTC)[reply]

This is an example of what is being proposed. Essentially, there would be a donation button near the top right of an article and these would kind of be like Reddit trophies. I think this needs way more visibility than being buried in the depths of mediawiki which is why I'm posting here. Clovermoss🍀 (talk) 18:52, 10 August 2024 (UTC)[reply]
As alluring as this proposal may seem at first glance, I think it's ultimately better if things stay as they are right now. KINGofLETTUCE 👑 🥬 07:14, 11 August 2024 (UTC)[reply]
@Kingoflettuce: If you have feedback, I would say this on the page itself. There is a link to the mediawiki link in my first comment. Clovermoss🍀 (talk) 07:21, 11 August 2024 (UTC)[reply]

Wikimedia Foundation banner fundraising campaign in India postponed to start on the 27th of August

[edit]

Dear all,

As mentioned previously, the WMF is running its annual banner fundraising campaign for non logged in users in India. Initially, we planned the campaign to start tomorrow and run until the 10th of September. We have had some issues with our local payment provider in India and due to this we are postponing the campaign by a couple of weeks. Our new campaign dates are the 27th of August to the 24th of September.  

You can find more information around the campaign, see example banners, and leave any questions or suggestions you might have, on the community collaboration page.

Generally, before and during the campaign, you can contact us:

Thank you for your understanding and regards, JBrungs (WMF) (talk) 10:44, 12 August 2024 (UTC)[reply]