Talk:Algorithmic bias/GA1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


GA Review[edit]

Article (edit | visual edit | history) · Article talk (edit | history) · Watch

Reviewer: Farang Rak Tham (talk · contribs) 11:54, 30 May 2018 (UTC)[reply]


Introduction and limitations[edit]

Before starting this review, I'd like to state that I have little knowledge on the subject, apart from a few news reports. I do think it is a very essential subject, and will be more and more so in the near future. You will have to bear with me, because I am a newbie on this topic, but then again, for GA, you cannot be too technical, so this may turn out just well.

Overview[edit]

I have assessed the article at B now.

1. Prose:
  • No copyright violations.
  • The article reads well. You have made great attempts to get things explained to "dummies". Nevertheless, there are some parts left that are unclear. See detailed review below.
2. MOS:
  • Remove citations in the lead which are already in the body of the article per Lead section policy, unless there are some very controversial statements in there.
  • Though not required by any criteria, you might consider using shortened footnotes using the {{sfn}} template, which looks cleaner than the {{rp}} system you have used now.
3. References layout:
  • There was only one dead link actually, and I've replaced it with a reference to the book where the article appears. --Owlsmcgee (talk) 23:59, 4 July 2018 (UTC)[reply]
  • "us 7113917" should be expanded, in case the url dies, per WP:LINKROT.
4. Reliable sources:
  • shirky.com, Bitch Magazine and Culture Digitally read more like blogs or essays than news coverage, and you should cite them as primary sources, using inline attribution (According to Bitch Magazine ...) and not the voice of Wikipedia. If possible, try to corroborate information from sources with data from independent, secondary, reliable sources, which will also help to show evidence of notability and relevance for the opinions quoted. You have already done this correctly with the Shirky website.
  • The rest of the sources are reliable enough.
  • I respectfully disagree. Bitch is a magazine with a history of editorial oversight and fact-checking, the article itself is a summary of the author's PhD thesis and dissertation, though it is a feminist magazine, that's irrelevant to the fact that it is cited in support of. Additionally, Culture Digitally is a blog run by the National Science Foundation as an explainer for scientific concepts; I don't know if I would call the National Science Foundation unreliable? And as mentioned, Clay Shirky is cited as you described. -- Owlsmcgee (talk) 00:13, 5 July 2018 (UTC)[reply]
5. Original research: None found.
6. Broadness: I believe this topic has been covered in popular culture such as here. If you can find reliable sources on this, you should add it, as it indicates how the topic is relevant for the public.
  • I respect the suggestion but I don't think the popular culture section should be considered a requirement for GA status. --Owlsmcgee (talk) 00:13, 5 July 2018 (UTC)[reply]
7. Focus: Yes.
8. Neutral: Yes.
9. Stable: article is stable.
10-11. Pics: Relevant and tagged. Nicely done.

Detailed review per section[edit]

I will continue with a detailed review per section. Feel free to insert replies or inquiries.

Lead[edit]

  • ... (such as a website or app) ... Isn't application better for written language?
  • ... even within a single use-case ...and ... even between users of the same service ... seem to contradict.

Methods[edit]

  • ... uncertainty bias ... Isn't there somewhere this can wikilink to?
    • I could not find an article about this specific bias in algorithms. There's simply not a lot of articles about computational bias right now. We could wikilink it in red, but right now, links to uncertainty and other similar articles aren't appropriate to this very narrow idea. So, I'm passing on this suggestion for now. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... algorithms may be flawed in ways that reveal personal information ... Is this also part of the definition of algorithmic bias? It doesn't sound like bias to me; more like sloppiness. Am I missing something here?
    • You're correct, I was unclear. The issue is the stereotyping in marketing, shadow profiling, and other problems. Certainly data exposure is a problem, but that isn't algorithmic bias. I've fixed it to reflect the bias-associated problem rather than privacy ones. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]

Early critiques[edit]

This section is very difficult to read, and needs to be rewritten almost completely:

    • I've reorganized it a lot and tried to clarify all your points. Hopefully it's clearer now. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... human-derived operations ...: you mean procedures and working methods as in real life?
    • I mean the models that humans created to solve math problems, basically - literally the order of operations. I've tried to make it a little clearer by explaining programs as a series of steps, etc. Let me know if it reads better now. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... are therefore understood to "embody law". What does this mean?
    • I've tried to make it clear. They embody law in that they take a series of steps and repeat them across all inputs, becoming a "law" for how the program executes and never changing how it executes, regardless of what data is inserted into it. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... computer programs changing perceptions of machines from transferring power to transferring information How do computer programs change the perception of machines? Or do you mean people's perceptions of machines? Transferring power or information to people?
    • Good catch, that was pretty sloppy writing. Hopefully it's clearer now. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... if users interpret data in intuitive ways that cannot be formally communicated to, or from, a machine. For example?
    • I think this is clear without an example, and because Weinstein was theorizing at the time, it would be WP:OR to include a contemporary example if he didn't include them himself. I hope the examples given later in the article illustrate these concepts enough for the reader. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • Weizenbaum stated that all data fed to a machine must reflect "human decisionmaking processes" which have been translated into rules for the computer to follow. So, data must reflect processes, to translate into rules... So the rules are part of the data?
    • I've seperated his critique into two sections: Programs can be biased, and data can be biased. I hope that clears up the confusion. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... and as a result, computer simulations can be built ... As a result of imagining that world incompletely?
  • ... the results of such decisions ... Whose decisions?
    • This got deleted in the rewrite, if the question lingers, let me know! -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... why the decision was made ... You mean, why the tourist made the decision?
  • In what way are the coin tosses "correct"?
    • They aren't, which is Weizenbaum's point - I've tried to clarify it a bit. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]

Impressive rewrite. Excellent.--Farang Rak Tham (Talk) 20:32, 8 June 2018 (UTC)[reply]

Contemporary critiques[edit]

  • ... natural results of the program's output ... How can the code of a program be the result of a program?
    • Good catch, this was a needlessly complex sentence. Rewritten, should be clearer. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • These biases can create new patterns of behavior ..., Biases may also impact how society shapes itself ... For example?
    • I love examples, but again, a lot of the critiques are kind of "theorizing" and I think it might be WP:OR to say "here's an example of this" if nobody has specifically pointed to it as an example. If I find a third-party making that link, I would record it here, but it's not appropriate to draw that connection on my own until a reliable source says I should. :) -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
      • Okay, but please specify ... other elements of society ... or ... how society shapes .... WP:NOTOR may be helpful in this process. If you can't specify it, you need to find more sources. If all your secondary sources are very technical and niche-specific, try using tertiary sources like study books or encyclopedias instead. Or use popular sources written by scholars in the field--the "lay versions" of scholarly articles can often be found in news papers and popular magazines. Some of these are useful, especially when written by experts in the field. If you are unsure about the level of editorial oversight, just mention the source inline and you should be okay. Especially if those sources are written by scholars.--Farang Rak Tham (Talk) 20:32, 8 June 2018 (UTC)[reply]
I've gone ahead and summarized the research from the paper cited, hopefully this serves as an example that clarifies the meaning. -- Owlsmcgee (talk) 21:47, 14 June 2018 (UTC)[reply]
  • ... weighed more heavily ... You mean, people give more authority to decisions by algorithms? Or, perceive such decisions to be more authoritative?
  • Fixed. A little redundant to the next quote, but I think it's useful to reiterate. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • ... language frames ... I've wikilinked this now. Is my interpretation correct?
    • It wasn't, I was being too academic. I just meant the way the media or social media sites come up with language to describe stuff. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
  • Sociologist Scott Lash has critiqued algorithms ... Expand or wikilink important terms.
    • I tried explaining in text. There is no article for generative power, so I hope it makes sense in context. Let me know if it's still unclear. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
      • ... they are a virtual means of generating actual ends ... is still cryptic, but the closing sentence clarifies it.--Farang Rak Tham (Talk) 20:32, 8 June 2018 (UTC)[reply]

Pre-existing[edit]

  • Such ideas may reflect ... You mean the bias may reflect, right? You would not expect an institutional bias to reflect a personal bias. Ideas sounds like you are referring to the ideologies rather than the bias.
    • Yes, thanks. I've adjusted the language. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
      • The first sentence is much clearer. But I am still uncertain what this means: ... who also carry sets social, institutional, and cultural assumptions ...

* In a critical view ... Whose critical view?

    • That was a needlessly confusing addition to the sentence, and I've cut it. -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]

The example helps to explain the main points.

  • By attempting to appropriately articulate this logic into an algorithmic process, the BNAP inscribed the logic of the British Nationality Act into its algorithm. This sentence seems to stay the same thing twice. Am I right?
    • Not quite. The distinction is there: they were trying to reproduce the logic of immigration law into the algorithm, and in doing so, inscribed the BN act into the software. Is that clearer? -- Owlsmcgee (talk) 21:38, 14 June 2018 (UTC)[reply]

Detection[edit]

I have moved this section to the talk page per MOS:EMBED.--Farang Rak Tham (Talk) 21:08, 8 June 2018 (UTC)[reply]

Technical[edit]

  • Flaws in random number generation can also introduce bias into results. How?
  • ... what takes place beyond the camera's field of vision. Raises questions. What could be there then?

Emergent bias[edit]

  • drug breakthrough is unusual English, unless you are referring to Breakthrough therapy. Development of a new drug?
  • ... clear outlines of authorship or personal responsibility ... ... for that exclusion process?
  • ... a "lead member's" ... How is it determined who is the lead?
  • ... it removed distant locations from their partner's preferences ... Consider cutting this out to simplify.
  • {{tq|... high-rated schools ...}} You mean the schools the lead member preferred? High-rated could refer to many kinds of ratings.
    • Tackled all of these, I think it makes more sense now and is also shorter. -- Owlsmcgee (talk) 22:07, 14 June 2018 (UTC)[reply]

Correlations[edit]

  • ... compared to each other in practice. Can in practice be removed?
  • ... By "discrimination" against ... Confusing and redundant. I would just write By responding to or By selecting
  • ... correlations can be inferred for reasons beyond the algorithm's ability to understand them. "The algorithm draws conclusions from correlations, without being able to understand those correlations?"
  • ... hospitals typically give ... "hospitals without such a triage program"?
Adjusted this section too. --Owlsmcgee (talk) 22:12, 14 June 2018 (UTC)[reply]

Unanticipated uses[edit]

  • ... machines may demand ... machines may expect?
  • Also, certain metaphors ... For example, the British National Act Program ... How is this a metaphor?
  • How is this example of British citizenship an unanticipated audience?
  • Does an ATM have algorithms?
I see what you mean. I've seperated the two topics within this section, as they were being read as extensions of each other. They are two types of unanticipated users, and I hope that's clearer. The ATM was an example illustrating the concept, it wasn't meant to be about ATMs, I've removed it because I see how that's confusing. --Owlsmcgee (talk) 22:23, 14 June 2018 (UTC)[reply]

Feedback loops[edit]

  • The simulation showed that public reports of crime could rise based on the sight of increased police activity, and could be interpreted by the software in modeling predictions of crime, and to encourage a further increase in police presence within the same neighborhoods. Do you mean: "The simulation discovered a correlation between increased reports of crime and increased reports of police activity"? If that is what you mean, why did the simulation encourage more police in black neighborhoods?
    • I don't mean that - I mean that reports of crime were often reported because people saw police cars. Weird, but true. I've tried to clarify. --Owlsmcgee (talk) 22:25, 14 June 2018 (UTC)[reply]
      • And successfully so. But you have not explained yet what ... the study ... refers to.--Farang Rak Tham (Talk) 12:56, 15 June 2018 (UTC)[reply]
    • Changed "the study" to "the simulation" so it's clear that the sentence refers to the same thing as the rest of the paragraph. --Owlsmcgee (talk) 00:18, 5 July 2018 (UTC)[reply]

Examples[edit]

I recommend integrating the examples into the sections about the different kinds of bias. It will help to improve understanding those sections, and will make the narrative of the body more smooth and less repetitive.

  • That seems like a stylistic choice rather than one that is required for a GA. It would require an extensive rewrite of how the article is organized, which would jeopardize all progress made so far in the article toward GA status. As it stands, I feel the examples are relevant enough. -- Owlsmcgee (talk) 00:20, 5 July 2018 (UTC)[reply]

Voting behavior[edit]

  • A randomized trial of Facebook users showing an increased effect of 340,000 votes among users ... This number has little meaning without context. You need to provide percentages or some other relative indication. Secondly, it isn't clear whether the friends of the users also saw the pro-voting messages. Thirdly, pro-voting is an unusual term. Maybe write which encouraged voting or something like that.
  • The percentage is listed in the first line of the paragraph (a 20% swing). I let the raw number stand as that's how many additional votes resulted - I don't know how that isn't clear, given that we mentioned the percentage just two sentences earlier. I did change "pro-voting" to "which encouraged voting," thank you for the suggestion. --Owlsmcgee (talk) 00:25, 5 July 2018 (UTC)[reply]

Gender discrimination[edit]

  • In fairness to Target, you might want to write they later adjusted their policies. I have read that in another book by Duhigg published in 2013.
  • Could you add this info? I can't find anything online that supports it. -- Owlsmcgee (talk) 00:52, 5 July 2018 (UTC)[reply]
  • This bias extends to the search ... Sentence too long, better split.
  • ... that a suspect or prisoner will repeat a crime. A suspect has not been proven to have committed a crime. Perpetrator?
    You're right, at the point of sentencing they are still prisoners, I've adjusted the language accordingly. --Owlsmcgee (talk) 00:52, 5 July 2018 (UTC)[reply]

Sexual discrimination[edit]

  • Change app into application, as indicated above.
  • ... sex-offender lookup apps ... You mean, applications with blacklists of sex-offenders?
  • ... saw 57,000 books de-listed ... Delisted or put on a blacklist?
    Clarified all above points. "Delisted" remains because they were delisted, not blacklisted. --Owlsmcgee (talk) 00:57, 5 July 2018 (UTC)[reply]

Lack of transparency[edit]

  • Commercial algorithms are proprietary, and may be treated as trade secrets.[9]:2[16]:7[35]:183 This protects companies ... Trade secrets do not protect companies, laws about trade secrets protect companies.
  • This protects companies, such as a search engine, in cases where a transparent algorithm for ranking results would reveal techniques for manipulating the service. Simplify.
  • It can also be used ... The law can also be abused...
  • The closed nature of the code ... The companies are closed, the code is hidden: "The closed nature of the companies..."
  • ... as a certain degree of obscurity is protected by the complexity of contemporary programs ... Move the part about complexity to the next paragraph.
  • All fixed, some sections moved to "complexity." --Owlsmcgee (talk) 01:29, 5 July 2018 (UTC)[reply]

Complexity[edit]

  • ... large teams of programmers ... You mean, "programmers within large teams ..."?
  • ... sprawling algorithmic processes ... Wikilink or define inline what this means, or simplify it.
  • All fixed in the combining of text from the transparency section. --Owlsmcgee (talk) 01:29, 5 July 2018 (UTC)[reply]

Lack of data about sensitive categories[edit]

  • When sources refer explicitly to US law, please say so explicitly. Same holds fro EU law, for that matter.
    • This is a general section about legal issues, when laws are mentioned I have categorized them as EU. Otherwise the section relates to various legal concerns under various systems and does not refer to any specific country's case law. --Owlsmcgee (talk) 01:36, 5 July 2018 (UTC)[reply]
  • A significant barrier to understanding tackling bias in practice is that categories, such as demographics of individuals protected by anti-discrimination law, are often not explicitly held by those collecting and processing data. What does holding a category mean?
  • Seemed like an editing scar, I clarified the sentence. --Owlsmcgee (talk) 01:36, 5 July 2018 (UTC)[reply]
  • ... insurance rates based on historical data of car accidents which may overlap with residential clusters of ethnic minorities. Please explain further. How does this overlap? Which is false correlation?
  • The point is that they do not relate to each other, that's why it's a false correlation. I've specified that the case study presents a coincidental overlap. --Owlsmcgee (talk) 01:36, 5 July 2018 (UTC)[reply]

Rapid pace of change[edit]

  • ... can confuse attempts to understand them ... To understand what?
  • The algorithm. I've repeated it for clarity. --Owlsmcgee (talk) 01:39, 5 July 2018 (UTC)[reply]
  • ... segmenting the experience of an algorithm between users, or among the same users Simplify.

Rapid pace of dissemination[edit]

Please merge with section Rapid Pace of change, or expand this subsection. Normally, a GA article should not have one-line paragraphs or sections.

Europe[edit]

  • ... non-binding recital ... Wikilink from the first mention.
  • Why do you keep using alleged throughout this section, when the law's content is already known, as it can be quoted from?
  • While these ... These regulations?
  • While these are commonly considered to be new ... Too long, split. And original and originating in the same sentence is awkward.
  • What do you mean by carve-outs in this context?
  • The GDPR does address ... Implies a contradiction or nuance. Or do you mean "The GDPR addresses..."
  • It has been argued that ... Too long, split.
  • I didn't create this section but I have cleaned it up following all of your suggestions. --Owlsmcgee (talk) 01:47, 5 July 2018 (UTC)[reply]

US[edit]

  • ... and uses. Uses of what?

May 2018[edit]

I will continue this detailed review once I get a response from you.--Farang Rak Tham (Talk) 13:13, 30 May 2018 (UTC)[reply]

Thank you Farang Rak Tham! I will try to respond this weekend. Do feel free to continue with your assessment if you have more to say - I tend to prefer investing enormous chunks of time into tackling these problems rather than tackling them piecemeal, but of course how you review is up to you. Thank you for a thoughtful and constructive set of recommendations! -Owlsmcgee (talk) 03:15, 2 June 2018 (UTC)[reply]

June 2018[edit]

The prose in some sections still needs a lot of copy-editing work. Try using more active voice, and less passive voice. This will help.--Farang Rak Tham (Talk) 16:08, 2 June 2018 (UTC)[reply]

Putting review on hold, as seven days have passed. You have indicated you still wish to pursue this, so I will give you another seven days. If you need even more time, you should specify a deadline yourself. The article is a very relevant subject worthy of the reader's attention, so it would certainly not be a waste of time to do some copy-editing on it.--Farang Rak Tham (Talk) 11:15, 6 June 2018 (UTC)[reply]
Hello Farang Rak Tham - I did intend to tackle it on the weekend, however, some things had come up. I will take some time with the review this week, but could you extend the timeline to June 18 so I have time to address them all in a thorough manner? I know I am a bit of a lapsed editor but it's important for me to get this article to GA status as I have created it almost entirely myself from scratch, and am very invested in this outcome! I assure you I will not abandon your work and recommendations. Thanks! -Owlsmcgee (talk) 21:14, 7 June 2018 (UTC)[reply]
Okay. Hard work!--Farang Rak Tham (Talk) 21:29, 7 June 2018 (UTC)[reply]
There is still the copy editing of the text to go through and correct, I'll get to it soon, but wanted to show a good faith effort to get some work done on this article in response to your excellent review. Thank you! -- Owlsmcgee (talk) 01:22, 8 June 2018 (UTC)[reply]
This article is very intensive in terms of copy-editing. But I will continue with it because it is very interesting material.--Farang Rak Tham (Talk) 23:22, 8 June 2018 (UTC)[reply]
Phew, I finally finished the first check. Lots of subsections. I hope you will have the time to continue with it. Some sections have little comments, others have many. Let me know if you have any questions.--Farang Rak Tham (Talk) 14:50, 15 June 2018 (UTC)[reply]
Owlsmcgee, It has been over three weeks now, and beyond the deadline which you set yourself. I am giving you until Sunday to correct the prose in the article, failing which i will have to fail the article for GA.--Farang Rak Tham (Talk) 07:29, 21 June 2018 (UTC)[reply]
Farang Rak Tham Given the enormous list of corrections, there's simply no way I can get them all done by Sunday. Feel free to fail this article as a GA and I'll resubmit after tackling the list on my own time. -- Owlsmcgee (talk) 04:16, 22 June 2018 (UTC)[reply]
Owlsmcgee, okay, but if you still want to work on it this weekend, I am willing to wait a bit. We can then see how far we can get. Whatever we do now, we don't have to do later. Up to you.--Farang Rak Tham (Talk) 16:10, 22 June 2018 (UTC)[reply]

GA progress[edit]

Good Article review progress box
Criteria: 1a. prose () 1b. MoS () 2a. ref layout () 2b. cites WP:RS () 2c. no WP:OR () 2d. no WP:CV ()
3a. broadness () 3b. focus () 4. neutral () 5. stable () 6a. free or tagged images () 6b. pics relevant ()
Note: this represents where the article stands relative to the Good Article criteria. Criteria marked are unassessed
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.