Jump to content

Wikipedia talk:Wikipedia Signpost/Single/2016-04-24

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Comments

The following is an automatically-generated compilation of all talk pages for the Signpost issue dated 2016-04-24. For general Signpost discussion, see Wikipedia talk:Signpost.

Arbitration report: Amendments made to the Race and intelligence case (2,430 bytes · 💬)

Just a small point - I do not believe that the Gamaliel arbcom case scope covers Gamergate. The clerks are removing evidence of Gamaliel's Gamergate actions. Mr Ernie (talk) 18:25, 25 April 2016 (UTC)

I'm planning on going more into details about how the case started next issue. Especially with the sudden additions into the case. This is so far the most baffling Arbcom case I've seen since I started paying attention to them. And I've seen the GamerGate case unfold. GamerPro64 18:37, 25 April 2016 (UTC)
  • "Kharkiv09" in the bottom-most "In brief" item appears to be a typo. --SoledadKabocha (talk) 01:29, 27 April 2016 (UTC)

Filing party here: The case is about the Signpost posting an Wikipedia:Attack page, the alleged misconduct of its editor-in-chief over an "April's Fools joke" days after April's Fool day and the failure of the community to execute WP:BLPRESTORE. Given this was my statement on gamergate case, and the statements by arbirtrators on the current case [1], [2] make it clear this is not about gamergate, the narrative that this is some "gamergate payback" is not supported by evidence. NE Ent 10:06, 27 April 2016 (UTC)

Take this to where else? This is the comments section. It's in response to what was written in the report. GamerPro64 01:37, 1 May 2016 (UTC)

Featured content: The double-sized edition (0 bytes · 💬)

Wikipedia talk:Wikipedia Signpost/2016-04-24/Featured content

I'm surprised to see my Homestuck joke being listed as a "negative comment". If anything, that is the number 1 reason why we should go through with this initiative :p (I'm flattered to be quoted in The Signpost, though!) Aside from that, good reporting. ~Mable (chat) 05:36, 25 April 2016 (UTC)

<blush> I thought you were becing sarcastic. Tony (talk) 05:50, 25 April 2016 (UTC)
Well, it was a bit of silliness, but it was part of my response to someone saying they were worried evil aliens would find the disc and use it to find Earth. What the joke boiled down to was that I prefer to give evil aliens information about humans and get destroyed in the process than to never get any kind of legacy out there at all. So it was indeed a positive comment! ~Mable (chat) 06:57, 25 April 2016 (UTC)

Hiring and board matters

Also worth noting was this post by Lisa Gruwell, responding to a question whether the hiring for execs for HR, Tech, and Community would wait until after the new ED is selected, or whether those recruiting processes would happen independently of the ED hiring process: "We are moving forward with hiring the CTO, VP of HR, and VP of Community (in that order) during the interim."

Alice Wiegand also published a blog post on the criteria for chapter representatives on the board: http://lyzzy.de/blog/2016/04/criteria/ Andreas JN466 11:28, 25 April 2016 (UTC)

  • The board should select a shortlist of their candidates for all senior staff positions, post their bios and CVs and perhaps a short video interview, then the entire Wikipedia projects should vote. Time to hand some power back to the unpaid volunteers who provide the raison d'être for those fat salaries and their junkets. Kudpung กุดผึ้ง (talk) 00:39, 28 April 2016 (UTC)

Moon

I sincerely hope that aliens find Wikipedia's lunar time-capsule, read a couple of ANI pages, realize how disgusting we are, and then destroy humanity once and for all.--Catlemur (talk) 19:19, 25 April 2016 (UTC)

This is the best-grounded look at the whole Heilman affair since it began, aided of course by the digging you folks at the Signpost have done and by the addition of the actual email chain between Wales and Heilman.

What a tale of technical overreach, fiduciary irresponsibility, behind-the-scenes machinations, treachery and duplicity!

Magnificent wordsmithing by Andreas Kolbe. StaniStani 00:10, 25 April 2016 (UTC)

My compliments on another excellent piece of work, Andreas. You should really try to get these articles more widely distributed. -- Seth Finkelstein (talk) 01:28, 25 April 2016 (UTC)

  • Wow, usually when someone says that the other party "took things out of context", I assume that they meant that the discussion before and after the quoted section would lead to a different interpretation. I didn't think that Wales literally cherry-picked sentences out of a long discussion to make both sides of the discussion look radically different than what they were. I really don't understand why Wales and the WMF have been so ridiculous about this whole thing- they had an obvious problem, came up with an ambitious solution, it turns out that they couldn't really do it, and... they now feel the need to lie and cast aspersions and throw people under buses for it. Guys, if you want to be a big-shot "tech company in the field of education/charity", then you need to take tech company 101: not every neat idea you have works out, and the takeaway is to learn from it, not fire everyone who disagreed. --PresN 01:35, 25 April 2016 (UTC)
  • Nice work, Andreas. Carrite (talk) 01:50, 25 April 2016 (UTC)

Sorry but this is just a bunch of misconceptions. A query dialog engine is not a Google competitor, it is not even close. (Why do I waste time on reading this?;/) Jeblad (talk) 06:21, 25 April 2016 (UTC)

  • Excellent work, Andreas. It is clear that ambitions went far too high. Chiswick Chap (talk) 07:04, 25 April 2016 (UTC)
  • These are Wikimedia Movement resources and the WMF is simply a steward of the resources. It is disclosure in normal English of our strategy / goals that I am currently requesting rather than full scale consultation. Also typically those most involved in a conversation are also some of the most informed , I agree w/DocJames on this 100%....in my view we are not painted on the wall (we edit for hours work , logic dictates we should have a voice). While the general idea by Jimbo Wales is great, its a matter of "whether the end justify the means"? (lack of transparency)..NO.--Ozzie10aaaa (talk) 11:44, 25 April 2016 (UTC)
  • Of course. WMF prepares 35 million dollars for a "knowledge engine" but can't spare a couple thousand for digitizing public domain materials in "the global south" or "developing communities" or whatever their term is now. Priorities, priorities... and the people who speak out get shafted. — Chris Woodrich (talk) 15:51, 25 April 2016 (UTC)
    • Chris, more to the point, one might weigh up some of the spending on physical meetups, trinkets, and carbon-intensive travel and accommodation, against clearly high-impact tasks such as digitizing. Just my 2 cents' worth. Tony (talk) 15:58, 25 April 2016 (UTC)
  • Since everyone has (rightfully) praised Andreas for this penetrating article, I'd like to ask a possibly stupid question: what would this proposed search engine offer a user that isn't already available? Besides the usual search function, there are hyperlinks between articles, similar articles are grouped into categories, & similar materials on different projects (viz. Commons or Wikisource besides Wikipedia, or even other-language Wikipedias) have links in the article. And when Wikidata matures sufficiently, that will provide a means to search for material between projects. And while improvements to the search function could be made, it will help a user to mine Wikipedia for all related information. So if I want to know what the Wikimedia projects have about Tom Cruise or Queen Elizabeth II, it's not that hard to find it all at present. Far easier than the library card catalog (or Reader's Guide to Periodical Literature) I had to rely on as a student decades ago. So what would a search engine offer that a user doesn't currently have -- or is likely to have in the not so distant future? -- llywrch (talk) 17:09, 25 April 2016 (UTC)
    • The idea that simply duplicating the ability to answer a simple question like "How old is Cruise?" on Wikipedia's home page will pull readers off of Google's home page to ours (seems we do want to compete with them for "home-page market share")... the idea seems silly. How many readers are so helpless that they can't search for Cruise himself, and easily find his age in the infobox. The only reason to leave Google's engine is for a specialized search that it can't handle. We recently had a discussion about Semantic Mediawiki, which tries to answer more sophisticated questions. From that you'll see that we have a long way to go to catch up to the Wolfram Alpha knowledge engine. It might be less expensive to just buy that. wbm1058 (talk) 23:43, 25 April 2016 (UTC)
      • +1 The future would be an IPA (e.g. Siri, Cortana), not Wolfram, and some are even free software. --Molarus (talk) 01:02, 26 April 2016 (UTC)
      • Platypus (backed by Wikidata) can answer it, FWIW. --Ori Livneh (talk) 03:57, 26 April 2016 (UTC)
      • Which just shows that spending resources developing a search engine is wasted effort. If the future is something similar to the proposed semantic web, Wikidata is a strong first step towards that -- & already supports a few proof of concept examples. Further, IIRC those examples were developed without Foundation backing. All that having the WMF create another search engine accomplishes is to add another line to someone's resume. (And by saying "someone" I'm not trying to say Lila Tretikov in a cute way; as more information comes out, the more obvious it is that there are other people who are likely to be the real person behind the Knowledge Engine. Treitikov might have been only a scapegoat.) -- llywrch (talk) 16:08, 26 April 2016 (UTC)
    • The {{Orphan}} template features a nifty "find link" tool that is very helpful for creating links to orphan articles. This is part of the work of crowdsourcing for relevance. Doesn't Google's algorithm give priority to pages with a lot of incoming links pointing to them? So whatever "knowledge engine" we build will be more powerful if it has a stronger web of interwiki links to build off of. Just a little thing like a bot that ran Edward Betts' tool against our entire database of orphans and pointed out the most linkable ones would be helpful. Maybe I'll ask for it in the next round of the community wishlist survey, but that seems like a waste of time when the bulk of resources are directed elsewhere. wbm1058 (talk) 00:53, 26 April 2016 (UTC)
  • Another excellent piece by Andreas Kolbe, the best writer in the field of Wikipedia-focused journalism. So we see again that Jimmy Wales has some honesty problems. How soon is he leaving the Board? Chris Troutman (talk) 17:47, 25 April 2016 (UTC)
  • Gotta pile on and agree that Andreas rocks. Nice to see Signpost hitting its stride again and putting April Fools' Fortnight behind us (but for another ArbCom melodrama). wbm1058 (talk) 01:25, 26 April 2016 (UTC)
  • Agree, excellent piece of work. I hope we can make progress getting some answers. SarahSV (talk) 02:13, 27 April 2016 (UTC)
  • Well, Jimbo allowed the Signpost bot delivery and some users' comments posted after it to be archived off his talk page with no reply, which is his usual way of "responding" to things he doesn't want to respond to, so I think you can interpret that as his "answer". Note that he has edited his talk page since then. --71.110.8.102 (talk) 17:51, 27 April 2016 (UTC)
  • Excellent article. I hope the WF doesn't spend tens of millions on a better search engine. --Frmorrison (talk) 15:09, 6 May 2016 (UTC)

Some thoughts

Hi. Sorry if this is a daft question, but this piece is marked as an op-ed. What opinion is being expressed?

Does anyone disagree that our internal search needs improvement? I would think that Andreas and others would be supportive of efforts to have free, open, and independent search functionality. Below other mission-critical services such as providing SQL and XML data dumps, search is pretty important infrastructure, especially as the Wikimedia projects grow.

If we took an input string such as "How old is Tom Cruise?" and broke it up into pieces, I think we could, with some effort, program this and similar queries to return specific data points. We could look at the most relevant Wikidata item (d:Q37079) to extract the "date of birth" field's value ("3 July 1962") and then do a simple date calculation to show that Tom Cruise is currently 53 years old. Or, if we can get the search results to be better, we can pull out and highlight specific data points alongside the search results.

After we solve "How old is [famous person]?"-type queries, we can add support for alternate phrases such as "What age is [famous person]?" Once we solve that, we can move on to programmatically answering other "easy" queries. I don't think what's being described here requires artificial intelligence or IBM's Watson.

You want a concrete opinion? The search results at Special:Search/How old is Tom Cruise? are currently terrible. Tom Cruise bafflingly doesn't appear in the top 100 results. If Tom Cruise did appear in these results, we could look at the search input, see that it uses a known keyword ("age" or "old"), and then extract that information programmatically to serve our reader/researcher more quickly. Who opposes doing this?

Let's talk about how we can improve search and what that will require. Does an organization similar to the Wikimedia Foundation (or the Knight Foundation, for that matter) need to be involved? What value do these organizations provide? I think there's plenty of room for intelligent and thoughtful discussion about priorities and functionality and serving our readers. Can we start now? --MZMcBride (talk) 03:23, 26 April 2016 (UTC)

Hi! I think that many people are aware of imperfections in our current search functionality. But I don't think that it is a good idea to try to build a searchengine that uses natural language processing to get answers from a semantic wiki. That seems far too ambitious to me. And let's face it, they are not Google, they simply do not have the people and skill required. To reach such a goal you need to split it up in smaller, more manageable tasks, and I think it starts with improving or even rewriting the current search functionality.
Wikipedians don't really need a search engine that tells them how old Tom Cruise is, because we got a template for that (in this case {{birth date and age|1962|7|3}} which renders as: July 3, 1962 (age 53)). Internet users in general may need such a search engine, but creating it is difficult and making it popular is even more difficult, and I believe that big companies like Google and Apple (and even Microsoft) who have been doing research into (and experiments with) this kind of stuff for a long time now are far more likely to create something that actually works. The WMF is not a software company, and I don't think they can compete with the big guys in this field (Google, Siri), so I think they should focus on their niche.
Personally I wish they would be far less ambitious. I do want them to improve the search engine, maybe even to rewrite it from scratch if they believe that that is the best solution, but please keep offering roughly the same functionality as before, with some improvements and additions, instead of trying to create something superambitious that is gonna be a waste of time and money in the long run. There are many smaller improvements possible, for example the MediaWiki software does offer the ability to search for links only in a specific namespace, but this functionality is disabled on WikiMedia projects, due to efficiency issues.
Imagine if they would successfully create a search engine that gives correct answers to questions in plain English. Imagine if people (who are currently using Google for this type of task) would switch to using this new search engine, built on open standards with open data. Then Google will immediately embrace, extend and extinguish it. The Quixotic Potato (talk) 15:33, 26 April 2016 (UTC)
Hi The Quixotic Potato. The Wikimedia Foundation sure hires a lot of tech folks (cf. wmf:Staff and contractors) if it's not a software company. What kind of organization do you think the Wikimedia Foundation is? :-)
As noted at MediaWiki#Searches and queries, the search back-end was basically rewritten/replaced in 2014. And there have been substantial improvements to the site search functionality since then. But we need to do better; I think we're all agreed on that.
Regarding the threat that Wikimedia projects face from Google, I wrote about that here: mailarchive:wikimedia-l/2016-April/083722.html. --MZMcBride (talk) 00:51, 28 April 2016 (UTC)
MZMcBride, I agree, the op-ed designation seems odd; this strikes me as simply good reporting that, in some areas and transparently, draws conclusions that could be construed as opinions. In most publications, this is simply referred to as "news reporting." But the Wikipedia world can be highly sensitive around the issue of neutrality, and this particular topic is highly sensitive. My guess is that's why it was presented as an op-ed. That designation signals that others might be welcome to submit competing interpretations. In that sense, I like the choice; Jayen466 (Andreas) is a Signpost editor, so it's good to be extra cautious about any impression that his own views and the editorial position or policies of the publication are getting blurred.
On the substance of the piece: Yes, I think everyone can agree that there is room for substantial improvement in Wikipedia/Wikimedia search. I think that has been broadly agreed by many people over the recent months. But I don't see that as a central question in this piece. A very important, unanswered question remains: was the board justified in dismissing a recently-(s)elected Trustee? Or was Docjames actually the only Trustee trying to do the right thing, in the face of a board apparently deeply tied to going about things in a bad way (standing by its Executive Director despite massive staff opposition and attrition, and neglecting to clearly communicate its ambitions to important stakeholder groups like volunteers and staff)?
That question is an important one, and this piece advances the effort to unravel it. -Pete (talk) 15:38, 26 April 2016 (UTC)


I don´t think we need a question/answering-software, rather an assistant that could do some tasks for editors and readers. We could start with writing into the searchbox something like: WD, show me the article about Tom Cruise, or WD, read out aloud the introduction of that article, or WD, tell me who wrote most of this article, or WD, show me all media files commons has about Tom Cruise. --Molarus (talk) 23:48, 26 April 2016 (UTC)
The question is moot. We are here to write an encyclopedia with no end goal and no deadline. Meanwhile, the Board foolishly panicked because the pageviews are down thanks to Google's knowledge graph. While the Board wants to lie to us in pursuit of the next hot thing, I'm happy as a clam to write articles that no one will ever read. We, the editors, are fundamentally different than the Board and incidents like this make plain the depravity of the folks in San Francisco. Chris Troutman (talk) 17:32, 7 May 2016 (UTC)
There is a category on Commons for media files related to Tom Cruise. And the searchbox already works, if you type in Tom Cruise you will go to his article. Reading aloud is not something a computer can do; try using a screenreader for a day (I did, because I was curious how blind people experience the internet). Trust me, it sucks. Asking who wrote most of an article is incredibly difficult to calculate, and the answer is useless. The Quixotic Potato (talk) 15:32, 27 April 2016 (UTC)
You think it's useless? I think it would be useful to know whether an article was mostly written by, say, a PR firm or publicist being paid by the subject of an article. I don't dispute that accomplishing this might be difficult. --71.110.8.102 (talk) 17:51, 27 April 2016 (UTC)
Lots of things can be useful. Only some of them are practical. What you propose is not the latter. 173.79.20.33 (talk) 22:48, 27 April 2016 (UTC)
We could start with writing into the searchbox something like: WD, show me the article about Tom Cruise. Ah, yes, eight words to get to the Tom Cruise article rather than two. How incredibly useful. 173.79.20.33 (talk) 22:48, 27 April 2016 (UTC)
Sure, it's possible to cherry-pick a silly example. Can you think of a reason that our search functionality should not support examples such as "show me all media files about Tom Cruise" or "What is Tom Cruise's age?"?
A friend showed me <https://askplatyp.us/?lang=en&q=How+old+is+Tom+Cruise%3F>. This gives the correct answer (53) using Wikidata and other Wikimedia sources as its back-end, as I understand it. Pretty neat! It fails for queries such as <https://askplatyp.us/?lang=en&q=How+old+is+Tom+Cruise%3F> and <https://askplatyp.us/?lang=en&q=What+age+is+Tom+Cruise%3F>, but queries such as <https://askplatyp.us/?lang=en&q=What+is+Tom+Cruise%27s+age%3F> work. This tool is fun to play around with. Like Wolfram Alpha, it's easy to confuse or break it, but we could incorporate this type of functionality into our internal search engine immediately. No making perfect the enemy of the good, especially if we can keep false positives low with more conservative logic (err on the side of being quiet). --MZMcBride (talk) 00:42, 28 April 2016 (UTC)
The point is that I have written "We could start". After we have installed an open source software that is able to do that, we could append our own code for more useful tasks. This is a low cost solution with quick results. The board was right that we have to adapt new technology, but it is not Google that we should try to follow. Part of the problem that the board is deciding such things behind closed doors is that they lack the experience of Wikipedia editors and WMF software engineers. --Molarus (talk) 17:50, 29 April 2016 (UTC)

About Jimmy's behaviour and character

  • The big issue for me here is Jimmy's lying and defaming. It is clear now that James's Facebook comment exactly, accurately reflects Jimmy's statements to James about the Knowledge engine but in his gaslighting email Jimmy accuses James of misrepresenting his (Jimmy's) position. We have a serial liar strutting about posing as our spokesperson, squatting on a board seat, defaming a hard-working popularly-elected volunteer.
  • Another thing: A new board member discovers there's been a plan to develop an internet search engine that could cost tens of millions of dollars, and takes it to the other board members. And Jimmy's response is evasive and ambiguous. When James points up the contradiction between

    "we are not building a search engine"

    and the Knight grant documentation's,

    "Knowledge Engine by Wikipedia will be the Internet's first transparent search engine ... the Knowledge Engine by Wikipedia, a system for discovery of reliable and trustworthy public information on the Internet"

    Jimmy says,

    "I'm not really sure what is causing your confusion here. Perhaps it is just the term 'search engine' which in some contexts may mean 'a website that one goes to as a destination in order to find things on the web, such as Google, Bing, or Yahoo' and in other contexts can mean 'software for searching through a set of documents and resources'. But I'm not really sure what your concern is..."

    Here is where the gaslighting begins in my opinion: It's bloody obvious what James's concern is. And Jimmy is acting as if there's nothing remarkable or potentially concerning going on and that "we are not building a search engine" can be reconciled with the Knight grant documentation.
  • And another: It's clear now that the WMF was waiting for the right moment to let the community in on this scheme. Jimmy:

    "For me, it's more of a question of what kind of consultation should happen and when. A commitment to explore a concept through an external grant doesn't strike me as the right point necessarily to engage in a full-scale consultation."

    So, James's concern that this was being kept from the community was well-founded. Jimmy didn't trust the community with this information. James did. --Anthonyhcole (talk · contribs · email) 01:22, 27 April 2016 (UTC)
Hi Anthonyhcole and Pete. What do you make of the fact that the vote on the resolution removing James Heilman was 8–2 (really 8–1)? --MZMcBride (talk) 01:18, 28 April 2016 (UTC)
I don't know what's behind that. But the trustees on that board seem to be in the habit of voting along with the rest of the board, to get along, to present an image of decisiveness, or something (a practice that seems juvenile and deceptive to me). Jimmy claims that he went into the meeting that sacked James with the intention of voting with the majority. James's proposing the motion to accept the Knight grant despite his misgivings may be an instance of that. That might explain the numbers. Or perhaps James deserved it for some (still) as yet unexplained reason. My point above isn't that James is a useful board member (though having worked beside him on the WikiProject Med Foundation board for years I can attest to his integrity and honour there), it's that Jimmy isn't. --Anthonyhcole (talk · contribs · email) 03:14, 28 April 2016 (UTC)
Hi MZMcBride. All hail the master of all master sockmasters. Starved for attention, much? Do us a favor and stay away when the grown-ups are talking. You have nothing of value to add here. DracoE 12:25, 29 April 2016 (UTC)
Hi DracoEssentialis. I think Special:Contributions/DracoEssentialis speaks for itself. :-) --MZMcBride (talk) 19:25, 1 May 2016 (UTC)
MZMcBride, I don't think it's particularly significant. It's easy for any group to get swept up in the moment for any number of reasons -- legitimate or illegitimate. If I didn't know anything else, my starting assumption would be that they reached near-consensus through a deliberation that was careful and thorough commensurate with the gravity of what they were doing. But in fact, we now know a whole lot of other stuff, which tells a pretty clear story of a group that was layering one bad decision on top of another, rushing this one due to external factors, neglecting to seek independent advice or mediation, underestimating the gravity of removing a board member, etc. etc. So no, I don't think the vote number is especially useful information in the face of all the other stuff we know several months in. -Pete (talk) 05:07, 2 May 2016 (UTC)

Was the related email, from Jimbo to Doc James of 30 December 2015 ever shared publicly? HolidayInGibraltar (talk) 19:10, 27 April 2016 (UTC)

No. --Andreas JN466 20:28, 27 April 2016 (UTC)
Jimmy selectively quoted from the email exchange on his talk, in a way that made James look like the unreasonable party. James was more than within his rights to make the context public. --SB_Johnny | talk✌ 00:42, 28 April 2016 (UTC)

Kudos to Andreas for an incisive and revealing exposé. Is Wales any longer appropriate as a WMF board member and self-appointed WP figurehead? Given the long, damaging record of evasions, obfuscations, manipulations, lies, misdirections, misrepresentations, distortions, and self-serving personal attacks, the answer couldn’t be more obvious. Writegeist (talk) 19:11, 29 April 2016 (UTC)

Same. I had doubts for some time, but this makes it very clear. Peter Damian (talk) 06:07, 30 April 2016 (UTC)

Data sources

"In recent months, [Jimmy] has multiple times referred to the possibility that 'non-WMF resources might be included in a revamped discovery experience' or that 'some important scholarly/academic and open access resources could be crawled and indexed in some useful way relating to Wikipedia entries' while insisting that any suggestions 'that this is some kind of broad Google competitor remain completely and utterly false.'"

Please don't forget Fox News appearing in a sample search result. --NaBUru38 (talk) 16:11, 29 April 2016 (UTC)

  • Nice article Andreas  TUXLIE  10:37, 20 May 2016 (UTC)

Wavelength (talk) 23:41, 24 April 2016 (UTC)

User:Wavelength how best to add it? Doc James (talk · contribs · email) 11:06, 25 April 2016 (UTC)
User:Doc James, I posted a request at User talk:Alvin Seville, because that editor has been updating Wikipedia:Backlog.
Wavelength (talk) 16:22, 25 April 2016 (UTC)
I've manually added Category:Wikipedia backlog to User:EranBot/Copyright/rc and the subpage I am currently working on (Batch 30). — Diannaa (talk) 20:16, 25 April 2016 (UTC)
I think there is something distressing about the "Copied" template. It gives the feeling that Wikipedia text is not really portable and reusable after all - that the deletion of the originating page or the eventual collapse of the site throws all its material into doubt. I think our best practices should involve extracting whatever-the-hell-it-is-we-have-to-keep from the history of the originating page into a single text file and posting that for attribution somewhere standard and close by the page that needs it, so that the page is relatively free to continue wandering about the Web. Wnt (talk) 12:21, 25 April 2016 (UTC)
I appreciate we have a problem with deletionism, but how often have we seen articles deleted that have been partially copied elsewhere? Would a better solution be to exempt such source articles from the prod process and have a bot that notified AFDs so that this could be taken into account? As for the eventual collapse of the site, I think with the WMF starting an endowment fund the more likely scenario is that eventually the site or at least archives of its early twentyfirst century versions, will fall out of copyright. ϢereSpielChequers 06:20, 1 May 2016 (UTC)
  • Couple of questions - why is the handling of the bot report built as a complete separate process from what happens at established copyright checking places, like WP:SCV? While adding new automated detection is invaluable, is the multiplication of processes to check similar reports the best way to deal with the endemic problem of copy / pasting? And for the matter, does EranBot use the same whitelist than CorenSearchBot? MLauba (Talk) 17:59, 26 April 2016 (UTC)
Does not use the same whitelist. We definitely should consider this though. Were is CorenSearchBot's whitelist?
I assume it is these ones [3] User:MLauba?
You can see the whitelist for eranbot is longer for what it is worth per here [4] Doc James (talk · contribs · email) 18:31, 26 April 2016 (UTC)
I think this is all the more reason to have both bots use the same list. Pinging Coren to get his perspective. MLauba (Talk) 14:52, 29 April 2016 (UTC)
  • Perhaps this has been mentioned elsewhere, but the use of Turnitin.com is a bit problematic. It has been banned in the past at at least one university as it in itself is a copyright violtion machine: it makes copies of student essays and archives them. HappyValleyEditor (talk) 23:26, 26 April 2016 (UTC)
    • From what I understand students sign a form giving Turnitin the right to this use. Doc James (talk · contribs · email) 21:02, 27 April 2016 (UTC)
    • It was informally banned at a few places where I have taught with the reasoing that requiring students to sign a copyright release form was considered to be a contract of adhesion--i.e. a forced contract. HappyValleyEditor (talk) 05:38, 28 April 2016 (UTC)
Which is different than saying they infringe on copyright. There is also fair use. Doc James (talk · contribs · email) 18:14, 28 April 2016 (UTC)
  • What we really need to do as a community is have a chat about Contributor Copyright Investigations (CCI). I first came into contact with it in conjunction with a copyright-related Arbcom case several years ago and I figured out right away that the CCI methodology does not scale, that backlogs would continue to grow, and that most cases would never be resolved. This has proven correct. Now the backlog is five years. Next year the backlog will be six years. The year after that the backlog will be seven years. At what point do we recognize that the system has failed and scrap it for another? Carrite (talk) 15:50, 27 April 2016 (UTC)
    • There's a simple solution - offer the users to clean up their own mess and putting them under a temporary topic ban for creating new content until it's done, and if they're unwilling, ban them and nuke their contributions. Except then we'll get people trying to posit that it is unreasonable to require cleaning up 5 years worth of copyvios. MLauba (Talk) 14:52, 29 April 2016 (UTC)
      • Have you seen this work, requiring people to clean up their copyvios? Doc James (talk · contribs · email) 15:36, 29 April 2016 (UTC)
        • No, but simply stopping to care about the massive mess some have created without doing anything is not an option either. MLauba (Talk) 01:07, 30 April 2016 (UTC)
          • Agree. Which is why we have pushed to build this tool. Efforts began after we found a few editors who had made 10s of thousands of copyright violations before being noticed and dealt with. Doc James (talk · contribs · email) 07:24, 30 April 2016 (UTC)
          • It's not a question of not caring, it's a question of using our limited resource (editor time) in a more efficient way. I have already used the new tool to locate and block several repeat violators and discovered some folks who created sockpuppets who continue to insert copyvio. The difference is that they are getting discovered and stopped within a matter of a few days or weeks, instead of continuing to do damage long-term like Epeefleche or continuing to insert copyvio with sockpuppets like Mushroom9. I continue to work on the CCI case for Epeefleche every day. — Diannaa (talk) 13:10, 30 April 2016 (UTC) Just want to add, normally at WP:CCI the violating contributor is not expected to help with the clean-up. Historically it has been rare that this is even permitted. The main thing is to get them to stop adding new violations. — Diannaa (talk) 13:16, 30 April 2016 (UTC)
            • I'd quibble with the historically - as one of the minor co-drafters of the process, I was firmly convinced that cooperation on clean-up would be the best way both for them as a path forward, and for the project as a means to tackle the masses. Except, as you know from more direct experience than I over the past couple of years, that we rapidly confronted users who didn't care in the first place, users who vanished rather than clean up (including a sitting arb, at least for a time), and the group that I suspect Carrite is hinting at above first and foremost, the "too big to fix". I do think we're all in agreement that new tools to help with the cleanup is a godsend. Nonetheless, both the endless stream of new users generating copyvios out of ignorance and the "too big to fix" increasingly call for rethinking the way we deal with them. For the former, having a tool that catches everything that CSB doesn't check will certainly help. For the latter, if they're unwilling to help with cleaning up, the cost / benefit evaluation (particularly if their article creation is subject to the added burden of reviews by others) may need to be rethought. MLauba (Talk) 21:31, 30 April 2016 (UTC)
              • Just want to mention that the sitting arb was Rlevse, and he later helped clean up his CCI (under the user name PumpkinSky). — Diannaa (talk) 01:22, 1 May 2016 (UTC)
  • I echo what HappyValleyEditor said. Using tools with questionable copyright license to catch copyright infringement sounds a lot like doublespeak. Personally I have done research on classical music (and maybe other topics as well but that is the one which comes to the top of my mind). After writing it in my essay submitted for marks, I also added what I researched onto the music article. Luckily that particular course does not use turnitin but if it did, the Wikipedia music article would flag the phrases I added as identical to those in my assignment. And EranBot would unintentionally "out" the editor's public identity because obviously the student doesn't identify themselves with their wiki username on their class assignment (and doubtful that turnitin will generate the full report of that piece so that the student can proof it's their own work "double-dipping" on assignment and wiki). My question is, why is EranBot configured to search student essays when it's extremely unlikely that a third-party person would gain access to the digital copy of the student's assignment submitted to turnitin and the same third-party person would submit it to Wikipedia? OhanaUnitedTalk page 16:03, 27 April 2016 (UTC)
    • It is mainly used because it also includes webpages, textbooks, and journal articles. Doc James (talk · contribs · email) 21:02, 27 April 2016 (UTC)

Traffic report: Two for the price of one (0 bytes · 💬)

Wikipedia talk:Wikipedia Signpost/2016-04-24/Traffic report