Wikipedia talk:Persondata/Archive 9

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 5 Archive 7 Archive 8 Archive 9

Persondata and AfC

I posted a message at AfC asking that Persondata be taken out of the process, and the only response thus far has been questioning the disconnectedness between Wikidata and Wikipedia. Does someone who is more knowledgable about Wikidata than I, care to respond? Also, just a note that @APerson: has recently mentioned at bot requests (link) that the next version of the script has Persondata taken out of it (waiting for approval I think?). Thanks, —Msmarmalade (talk) 02:16, 3 June 2015 (UTC)

Msmarmalade, well, as soon as Theopolisme updates the code in his userspace, the version of the script in use by everyone will no longer add any sort of Persondata to articles.
Periglio responded to your post over there by noting that [t]here is no need for Wikipedia editors or reviewers to worry about wikidata. I agree with this viewpoint, but if anyone disagrees with me and would like the AFCH tool to keep adding Persondata in the meantime, I'll immediately revert the change I made. APerson (talk!) 03:58, 3 June 2015 (UTC)
So is a bot going to be removing all the persondatas? Quis separabit? 18:20, 13 June 2015 (UTC)
The debate lingers on! Wikipedia:Bot_requests#Remove_persondata +++ Periglio (talk) 18:52, 13 June 2015 (UTC)

Next steps

One theme that does come out is the disjoint between Wikidata and Wikipedia, perhaps a separate RfC might address the form of a feature request to manage this, which is probably the most widely expressed concern.

— Guy (The RfC closer)

Although there seems to be a lot of enthusiasm with the demise of Persondata, the second sentence of the closing statement seems to have been overlooked. Persondata has provided a focus for the discussion of the basic biographical information on Wikidata but at the moment this discussion has nowhere to go. WP:Biography is a likely candidate but I have struggled to get any response when I post there.

The disjoint between Wikidata and Wikipedia will require a lot of work and discussion. I personally have huge lists of articles that conflict with Wikidata that I do not know how to handle. The Persondata debate attracted quite a few people, it would be great if the spirit of Persondata lived on improving the biographical data. The big question for me is where to go from here? Periglio (talk) 05:39, 2 June 2015 (UTC)

First, get a bot going to delete it from all the articles. Lugnuts Dick Laurent is dead 06:36, 2 June 2015 (UTC)
Wikipedia:Bots/Requests for approval/Yobot 24. -- Magioladitis (talk) 07:31, 2 June 2015 (UTC)
Bazinga! Lugnuts Dick Laurent is dead 07:33, 2 June 2015 (UTC)
Persondata is being removed because Wikidata is a replacement, but unless there is a continuation of the discussion going forward, we probably should have kept Persondata and deleted Wikidata. (Exaggeration for dramatic effect). There has been a lot of interest in the Persondata debate, I would like to see this developing into an ongoing project to improve the biographical data on both WIkidata and Wikipedia. I am hoping for some suggestions on how to do this. Periglio (talk) 08:13, 2 June 2015 (UTC)
  • I'm not sure if it's a loss that has been mentioned, but removal of alternative names from article pages will result in worse location of those articles from search engines if the alternative name is not mentioned in the main text. Particularly this will affect people for whom there are many possible alternative transliterations (e.g. from Cyrillic, Arabic, Chinese, etc.) which don't really warrant an individual mention in main text. I'm conscious that the current solution doesn't address that point at all. SFB 09:17, 14 June 2015 (UTC)
The persondata template is not rendered on the html page and would not be picked up by search engines. Having said that, any content not in the article would be lost. The current discussion is at Wikipedia:Bot_requests#Remove_persondata if you want to take it further. Periglio (talk) 13:13, 14 June 2015 (UTC)
@Periglio: I've just had a go at searching some alternatives and I think Google picks up on the redirects for alternative names. For example, "Denis Kimetto" is only present in Dennis Kimetto in the persondata but google picks it up, yet "Mutaz Barsham" is not found for Mutaz Essa Barshim, despite it being in the persondata. The obvious difference I see is that one search term is a redirect and the other one isn't. That may be an obvious and useful alternative to listing the names on the page. SFB 21:38, 14 June 2015 (UTC)
I think that is due to Google doing an alternative spelling search Did you mean: "dennis kimetto". Certainly, redirects or disambiguations can be used to handle alternative names on Wikipedia. Wikidata on the other hand contains fields for alias and different languages. One day in the future Wikipedia will be showing the Wikidata information, maybe. The only problem now is that any alias solely in Persondata is in danger of being lost. My experience has shown that the data entry over the years has been too random for a useful bot extraction. I am personally hoping the methodical deletion crowd win the day so that every article will undergo a individual visual check. Periglio (talk) 22:43, 14 June 2015 (UTC)
@Periglio: I think it would be reasonable to combine that with a bot run which confirms that all extractable data on an article's persondata is now present in Wikidata (i.e. no subsequent additions are deleted) and that the alternative names are empty and if so delete the persondata. Maybe even try and read for alternative names as simple semi-colon delimited names and delete if we have a 100% Wikidata match for those (I know practically all of my additions can be read this way). SFB 19:51, 17 June 2015 (UTC)

Bot request KasparBot

Please participate in the discussion on the RfA for KasparBot. After approval the bot will remove Persondata information. Warm regards, -- T.seppelt (talk) 18:19, 4 November 2015 (UTC)

Challenges with more precise dates on Wikidata

Just been trying out the tool and just wondering in cases where person data only has a year and wikidata already has a day, month & year, is it worth marking these as already checked since wikidata already has a more precise value? -- WOSlinker (talk) 17:51, 24 January 2016 (UTC)

@WOSlinker: thanks for testing. This is an interesting question for Izno. I would say yes... -- T.seppelt (talk) 18:07, 24 January 2016 (UTC)
@WOSlinker: If they're the same year, I'd say yes, mark them as checked. If the years differ, the first thing to do would be to check the article (the lead or infobox). If the article agrees with one or the other, keep the agreeable one in question. If the article doesn't provide a particular date, you should either skip or add the new year (keeping the old date as well), IMO. This will be caught in the various database checks we run. --Izno (talk) 18:18, 24 January 2016 (UTC)
@Izno and WOSlinker: Great, I will run a script which marks all these challenges as already imported. -- T.seppelt (talk) 18:25, 24 January 2016 (UTC)
@Izno and WOSlinker:  Doing... the script is running. -- T.seppelt (talk) 19:45, 24 January 2016 (UTC)
If Persondata only has a year, there's a good chance that is true of the main body of the Wikipedia page. In that case, is there an argument for adding Wikidata's complete date to the WP article? If there isn't a readily available verification a {{citation needed}} is important, or we could end up mutually reinforcing incorrect information. And this is just a subset of the notion that any Wikidata record could contain information not in the Wikipedia article, something that would be hard to find automatically. David Brooks (talk) 19:15, 26 January 2016 (UTC)

The former comment is an interesting one and certainly one I should have thought of. @T.seppelt: did you keep a track of the ones removed via this script?

The latter issue is also an interesting one but is in-general the same as the "how do we even begin to import information from some infoboxes". As it is now, Wikidata has a number of database reports started to keep it consistent with the wikis. --Izno (talk) 17:38, 29 January 2016 (UTC)

@T.seppelt: This challenge for example I could find no support for the more specific date. This series of edits added the more specific date but nothing I quickly looked for proves the date is correct. You said you were doing the task; can you undo it? :o --Izno (talk) 01:09, 3 February 2016 (UTC)
@Izno: Yes, I can undo it. I put it on the list. -- T.seppelt (talk) 16:24, 4 February 2016 (UTC)
@Izno:  DoneT.seppelt (talk) 19:13, 4 February 2016 (UTC)
This discussion also implies that we should not backport complete dates to Wikipedia unless there is a readily available external source, yes? David Brooks (talk) 00:27, 5 February 2016 (UTC)
I usually look for consistency within an article. If I find that the PD date differs from the rest of the article (or if the article is internally inconsistent regardless of the PD), then it's likely that something is mismatched and I go looking for an RS. This is most-true with dates but is also a consideration for locations since we should strive to add the most-specific location we can find. --Izno (talk) 00:30, 5 February 2016 (UTC)

Just a minor quibble...

The template still populates Category:Persondata templates without short description parameter, which has been deleted. Apparently these categories have already been removed from the template at one point.[1] GregorB (talk) 23:11, 28 February 2016 (UTC)

@GregorB: This is just related to the caching of the articles. Somebody needs to purge the cache of all members of this category. This will happen anyway when KasparBot removes the templates. -- T.seppelt (talk) 06:50, 29 February 2016 (UTC)
Thanks - the categories have been taken care of,[2] so indeed it's now a matter of purging the cache. GregorB (talk) 09:44, 29 February 2016 (UTC)

New Tool

As part of the bot approval request I launched a new tool for the migration of Persondata to Wikidata. You can find it under https://tools.wmflabs.org/kasparbot/persondata/. At the moment it contains a small data set for testing. Please help to find bugs and come up with ideas for improvement! Warm regards, -- T.seppelt (talk) 19:51, 22 November 2015 (UTC)

What's the latest? I've added Persondata removal to my current AWB find/replace, and I just saw that @GiantSnowman: has deleted it from a page I'm watching, presumably non-robotically. But I feel that's just meaningless chipping at the very edges if a robot is being readied to do the task in bulk. Or is there still some community unease at wholesale deletion of information? David Brooks (talk) 22:59, 19 January 2016 (UTC)
@DavidBrooks: - I do it manually whenever I am editing an article in general, I agree we definitely need a bot to mass remove this. GiantSnowman 12:21, 20 January 2016 (UTC)
@DavidBrooks and GiantSnowman: I'm at the moment requesting the bot flag for exactly this task at Wikipedia:Bots/Requests for approval/KasparBot 3. -- T.seppelt (talk) 14:46, 20 January 2016 (UTC)
@T.seppelt: Reading that bot request, I get the impression that there are still some problems with representing the information correctly in Wikidata. Is it really safe to remove Persondata yet; is it still preserved somewhere? Or am I misreading the objections? David Brooks (talk) 15:57, 20 January 2016 (UTC)
@DavidBrooks: The whole data set of November 23, 2015 is preserved at the tool. It is save to remove the information. The problems which are mentioned at the request page just mean that some challenges have to be re-parsed. This is a technical process which I am executing at the moment. As you can see on the tool's page there are currently 159,185 descriptions, 337,516 aliases and 644,588 places or dates available. Therefore the most important issue at the moment is to get more users to work with the tool. -- T.seppelt (talk) 16:09, 20 January 2016 (UTC)
@T.seppelt: (to reactivate this thread) I see Persondatas (Persondate?) are still getting deleted. Will the bot eventually get around to all of them? I'm still deleting them with an AWB replacement whenever I happen to light on them, which is inexpensive, but is that just a waste? It does consolidate two updates into one. David Brooks (talk) 17:55, 29 February 2016 (UTC)

Undent: The bot will get to all of them. I don't see an issue with running AWB to remove them as well where another fix is being made. --Izno (talk) 18:04, 29 February 2016 (UTC)

The bot will delete all on one point. According to the RfA it is only allowed to do 6 edits / minute. Great if somebody else is removing too. --T.seppelt (talk) 20:22, 29 February 2016 (UTC)
I just realized the downside of manual removal: it deprives the bot of the chance to put a "see challenges for this article" note on the changelog. I recently resolved a couple of challenges because I happened to have a candidate article in my watchlist, and hadn't looked for a discrepancy on an earlier edit. David Brooks (talk) 18:37, 1 March 2016 (UTC)
Yes, that's true. This project really needs some advertisement. Thank you for your contributions. --T.seppelt (talk) 19:35, 1 March 2016 (UTC)

No challenges

I noticed this history entry when Persondata was removed from Thomas Doggett. It points me to "see challenges", but there are none. I thought even resolved challenges remain on the page, so were there in fact none? If so, can the message be changed to "no challenges for this article"? David Brooks (talk) 20:40, 4 March 2016 (UTC)

The removal script is not connected to the challenges database. This is not possible without larger adjustments. I would not like to do it because the script would probably crash more often if it depends on the database connection. -- T.seppelt (talk) 14:44, 5 March 2016 (UTC)

Alias with middle initial?

Another Alias consensus question. WIkidata and Wikipedia articles are usually first/last, but some Persondata challenges include the middle initial. For item "John Smith", is it OK, not OK, or don't-care, to include "John A Smith" or "John A. Smith" as aliases. If "OK", are you concerned about inconsistency in the use of the period? David Brooks (talk) 19:39, 13 March 2016 (UTC) ETA: I just saw the opposite, twice. Wikidata with middle initial, Persondata without. David Brooks (talk) 19:41, 13 March 2016 (UTC)

Include the initial. I would attempt to import the associated punctuation if possible. Also valuable of course is the inclusion of the middle name on the Wikidata item.

Sometimes articles at Wikipedia are moved from an initialed version, or vice versa; the title/aliases aren't updated then. I queried the WMDE dev team a while ago about seeing if they could possibly move the title when that happens. The answer was "maybe". I probably should have filed a task on phabricator for it. --Izno (talk) 21:06, 13 March 2016 (UTC)

Aliases with missing diacritics

What should be the rule about aliases that are the same name but with dropped diacritics and other English-friendly substitutions, such as this challenge? Bodtcher for Bødtcher works as a Wikipedia lookup, and is among several similar redirects, but it is a definitive mis-spelling. My feeling would be to not recognize it as an alias unless it is acknowledged by the subject or a published biography, but I'd appreciate another vote. Related, is an orthographically valid re-spelling (e.g. Boehm for Böhm) an alias, or just a convenient alternative rendering? David Brooks (talk) 20:39, 8 March 2016 (UTC)

We had a discussion about this somewhere on Wikidata. I can't find it at the moment. I think Help:Aliases allows versions without diacritics due to some problems with the search function (tracked in ticket T121863). It should be fine to add them. -- T.seppelt (talk) 20:52, 8 March 2016 (UTC)
I went form there to project chat, whose archives are huge, but luckily it was recent: d:Wikidata:Project_chat/Archive/2016/02#Accents (your question from Feb 6 isn't directly answered). The consensus of "accept" is clear, although I'm still personally uncomfortable about giving them legitimacy! David Brooks (talk) 21:18, 8 March 2016 (UTC)
Thank you. I also feel uncomfortable about. Thousands of edits just because the search is not working properly. And after this bug is fixed reverting all of them because content-wise the aliases doesn't really make sense? --T.seppelt (talk) 06:50, 9 March 2016 (UTC)
I am not working on challenges and I don't know whether "Aliases" and the Wikidata field labeled in English "Also known as" are equivalent to each other or to Persondata ALTERNATIVE NAMES.
P.S. checking the report linked above [3], specifically "newest revision with Persondata template", I think I see that this concerns the Persondata NAME field rather than ALTERNATIVE NAMES. For L.B. the Persondata template shows NAMES=Ludvig Bodtcher and ALTERNATIVE NAMES=[blank]. --P64 (talk) 19:53, 9 March 2016 (UTC)
[1] I do know that Wikidata French-language editors, at least, do add many such variant names and I have done so myself for hundreds of names, at least. See for instance François Place D:Q3085583. (Under my settings, the languages English, Spanish, Traditional Chinese, and French are displayed at the top of the page by default.)
[2] Ludvig Bødtcher D:Q1230738 is also known as Ludwig Bödtcher per French-language editors.
[3] Another who exemplifies related issues is Olive Beaupré Miller D:Q7087129, also known as Olive Kennon Beaupre Miller (one issue is how much of that to do [3a]). The only Wikipedia article is ours, which I moved to the name with diacritic a couple days ago. The Spanish and French labels predate that move and, I feel sure, simply replicate our contemporary EN.wiki page name. I didn't edit any but the English-language data (almost never do so [3b], so the WD item may be understood to represent effect of a diacritic-related page move in the home language. --P64 (talk) 19:47, 9 March 2016 (UTC)
I still have a problem with legitimizing one single instance of an "undiacriticized" alias, if it happens to have been added to Persondata by someone who didn't understand the diacritic's importance, or was a little too hasty. I just came across Johann Friedrich Dübner, where "Dubner" appeared in Persondata (and nowhere else). That entry was created in November 2010, as part of a huge batch, by User:RjwilmsiBot. Here are others: Stéphane Denève (diff) and Sébastien Denis (diff), both of which have outstanding challenges based on Persondata's NAME. There were many others in the same timespan. It seems to have been a systematic error; I guess it's possible Persondata was limited to ASCII but I doubt it. I'll put a message on the bot's talk page, although obviously the moment has passed. David Brooks (talk) 21:43, 9 March 2016 (UTC) p.s. I guess we should agree whether or not "Alias" is equivalent to a convenience redirect; I don't think the term implies that it does. David Brooks (talk) 21:55, 9 March 2016 (UTC)

I would suggest that that's a Wikidata-wide discussion and not specific to en (though it is most-evident to en.WP-ers in the context of this import). There are enough lines in d:Help:Aliases both within and without the English-specific section to indicate that an alias without diacritics is an acceptable use there.

That said, I suspect no one will stop you from rejecting such aliases if you think it's a problem. --Izno (talk) 12:42, 10 March 2016 (UTC)

Rjwilmsi explained that during the mass automated Persondata population in 2010, the NAME parameter was intentionally set to be like the DEFAULTSORT: no diacritics, surname first. Bgwhite firmly recommends not using it for Wikidata because it is "extremely unreliable". My feeling, as you know, is that if its only valid use is to assist searches, the best solution is to fix the search function. I would accept a NAME if it had subsequently been changed by an editor, to something other than the article name, and in a way that is a genuine alternative name for the subject, but I haven't come across one yet.

Izno: I will certainly reject them myself, and I'll review the few that I've already handled. But because the whole world is now being invited to Persondata challenges via article histories, I'd like to suggest something stronger. Either drop the Alias challenges from the database in cases where it is the same as the initial default (I realize that's not trivial) or put some guidance into the UI that the Alias challenge should be rejected in cases of simple diacritic drop, unless there is a strong motivation to accept.

Finally, is this the right place for the discussion? Or should it be on Kasparbot's talk? David Brooks (talk) 04:02, 11 March 2016 (UTC)

First, this is the right place for the discussion. It should be as public as possible and not at my talk page. I'd also reject such aliases especially since it's clear now that they were only used for technical reasons on Wikipedia too. Putting some kind of guidance into the UI would be a first step. In the end it'd be better to reject all these aliases which are just diacritic-free versions of an existing alias automatically. I'll experiment with some PHP functions for doint this. What do you think? -- T.seppelt (talk) 06:54, 11 March 2016 (UTC)

Sorry, I didn't realize the bot's talk page redirects to yours. Anyway, I'm in strong agreement with your proposal (both halves) but I guess there are others who are more neutral or slightly opposed; I suggest you tweak the UI right away with a strong recommendation ("If this Alias is simply the article name but with diacritics (accents) removed, we recommend you reject it") but wait a few more days before changing anything else. Here are some observations:

  1. It's still creating challenges that involve accented names; I saw 7 in the most recent 50 edits, which seems about typical. Would it be useful to stop the bot while you are sorting this out, or is it almost finished?
  2. Can you remove the outstanding challenges, or would it just be easier to reject them all en masse?
  3. How about backing out the Alias challenges that have been already decided as "add this"? If there aren't many yet, it might be possible to deal with them by hand. The problem with automating it is that some of them could be examples of:
  4. Some diacritic-free aliases may be valid. I don't know of any languages where that is generally true, but they could be recognized by their subject. For example, I don't know for sure, but I suspect the thoroughly Americanized André Previn is happy with "Andre", and that variant is already installed in his Spanish and Italian Wikidata entries. I think those should be entered separately by anyone interested enough. David Brooks (talk) 21:01, 11 March 2016 (UTC)

The bot and the import are two separate processes. Stopping the bot... only stops the bot from deleting the content from Wikipedia.

I would disagree with changing what users should expect regarding importing the statements, and if they should be not added, they should be not added by removing the challenges from view. Hence "I'll experiment with some PHP functions for doint this." --Izno (talk) 22:52, 11 March 2016 (UTC)

I suggested stopping (I meant pausing) the bot to prevent adding more invitations to articles' histories. Happy with the other response but it depends on T.seppelt's schedule. David Brooks (talk) 23:08, 11 March 2016 (UTC)
@Izno and DavidBrooks: I found a way for excluding (marking them as EXCLUDED, similar to [4]) all challenges which represent diacritic-free versions of aliases. What is your opinion on this now? Should I exclude them or not? -- T.seppelt (talk) 13:20, 12 March 2016 (UTC)
@Izno and DavidBrooks: I just run a short test with 1000 randomly selected alias challenges. 19.5 % of them could be excluded. This would be a huge step ahead. -- T.seppelt (talk) 13:45, 12 March 2016 (UTC)
@T.seppelt: Can you put that list (the 1,000 and the 195) somewhere online so we can double-check? David Brooks (talk) 23:40, 12 March 2016 (UTC)
@DavidBrooks: [5]. This time I only found 168. So the final amount is somewhere around 16-20 %. The file is in UTF-8. Diacritics may not be displayed correctly if the browser is set to ISO ... -- T.seppelt (talk) 00:47, 13 March 2016 (UTC)
@T.seppelt: Looks pretty good. I don't see why Topas, Thomas Weston, Teeyon Winfree or Eilis Flynn are on the list though. Probably harmless; they aren't even challenges. David Brooks (talk) 01:18, 13 March 2016 (UTC)
@DavidBrooks: this is a convenient side effect. Data is added to Wikidata all the time. Sometimes we have challenges for aliases/descriptions/claims which are than "coincidentally" accepted. The tool is checking every challenge before it is shown to the user. Look for example at the challenge for Eilis Flynn. I would just like to include checking those cases in excluding diacritics. -- T.seppelt (talk) 13:14, 13 March 2016 (UTC)
@T.seppelt: Looks good then; go ahead. By the way, it should be "Welcome to Kaspar's tool". David Brooks (talk) 19:30, 13 March 2016 (UTC)

 Done the script is running and the preposition is fixed. Thank you David, -- T.seppelt (talk) 16:31, 14 March 2016 (UTC)