Talk:List of data breaches

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

New column "publication"[edit]

I'm proposing to add a new column "year of publicization" to the table. For instance the Yahoo! data breach entry has 2014 set as year (year the stolen data is of) but was brought to the public in 2016. --Fixuture (talk) 23:32, 22 September 2016 (UTC)[reply]

Personally, I wouldn't bother adding any more complexity to the table, which could make it harder for an editor to add to or modify. Plus clicking on the source gives any earlier dates of when a hack occurred. It could also force a lot more updating work since the date of the original hack isn't always known until much later after its been analyzed. --Light show (talk) 00:31, 23 September 2016 (UTC)[reply]
I agree with adding new column. It's complex already, but the two dates are very significant. --Wazz4444 (talk) 20:37, 9 June 2018 (UTC)[reply]
One more vote for the new column. Both these dates are usually present alongside the same information that populates the other columns and wouldn't represent substantial additional burden for the editor adding new items. --Jsoverson (talk) 17:22, 13 July 2018 (UTC)[reply]

I thing the title should be "List of Known Data Breaches." There are always breaches going on that have not been discovered yet. — Preceding unsigned comment added by 67.180.205.108 (talk) 23:11, 21 December 2018 (UTC)[reply]

Missing entries: leaks[edit]

https://www.animenewsnetwork.com/news/2017-02-22/report-2.5-million-funimation-accounts-compromised-in-data-breach/.112538 — Preceding unsigned comment added by 2601:840:8400:EC10:7854:7C9A:A8CF:A2D8 (talk) 01:37, 28 October 2020 (UTC)[reply]

As far as I understand it all leaks are data breaches except the ones were the leaking was done by an whistleblower from the inside who already got access to the data, right? Because it seems like many such leaks are missing from the list. (Most can be found at Category:News leaks). --Fixuture (talk) 17:58, 23 September 2016 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 5 external links on List of data breaches. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 19:11, 29 December 2017 (UTC)[reply]

Google+ incident a data breach?[edit]

@Zazpot: Regarding the recent Google+ reports and your recent reversion of my edit, I've found multiple sources which cover this:

Google, a unit of Alphabet Inc., exposed the private data of some users of its Google+ social network to outside developers, but the company said it found no evidence that developers misused data. The phrase “data breach” in the headline for Tuesday’s Page One article about the exposure could be interpreted as suggesting that data was misused.
Google said this incident represented an "exposure" rather than a "breach" of data. This means that personal data was exposed for any bad guy to take, but there's no evidence anyone did.
The company said private data in Google+ could have been viewed by third-party app developers, but there's no evidence any of these individuals even knew about the bug that caused the vulnerability, let alone exploited it.
Now with what’s happening right now with Google, “breach” is the wrong word, although it’s certainly getting tossed around. Users of Google+ had some profile data “exposed,” meaning it was potentially accessible by third parties although that may not have actually happened.

Given the distinction these articles discuss about a "data breach" and "data exposure" (including from The Wall Street Journal which first reported on the incident), it appears to me that this is out of scope for this article. FallingGravity 21:01, 10 October 2018 (UTC)[reply]

We probably should have a "data exposure" page too. This is not the first such case, eg [1]. Having that, and making sure the two pages link to each other would help greatly. --Masem (t) 01:44, 11 October 2018 (UTC)[reply]
I think what's happening here is that Google's PR representatives, obviously under instructions to limit the damage to Google's reputation, have contacted journalists to "educate" them by claiming that there is a distinction between a data exposure and a data breach and that Google Plus suffered the former rather than the latter (which, by implication, would absolve Google somewhat of its failure to notify users). Personally, I think that the distinction is bollocksbogus. Quoting the Ars Technica piece, which seems to me to be much more level-headed: [Google] destroys most Google+ logs after two weeks. According to the WSJ, an internal memo acknowledged there was no way to know [therefore, whether the exposed data was accessed by people who should not have had access]. People who have used Google+ during the time the bugs were active should assume any exposed data is publicly available. Zazpot (talk) 11:52, 11 October 2018 (UTC); edited 19:28, 11 October 2018 (UTC)[reply]
The distinction between data exposure and data breach makes sense, because a data breach is an instance of data exposure, but data exposure is not necessarily a data breach (even if it's best practices to assume otherwise, as discussed in the Ars Technica piece). Your claim that the "distinction is bollocks" contradicts reliable sources, so I'm removing the entry for now. Perhaps we could have an article on data exposure to explain this distinction and discuss the Google+ and the voter records incident which Masem mentioned. FallingGravity 17:06, 11 October 2018 (UTC)[reply]
@FallingGravity: the idea that "data breaches" and "data exposures" are disjoint sets, no matter how plausible it sounds, is an artificial one promoted by Google's PR and regurgitated by gullible journalists.
It is not WP:OR to assert that in national and international public policy and in legal guidance from official public organisations, "data breach" is an umbrella term for incidents that include data exposure. I.e. "data exposures" are a proper subset of "data breaches". See:
You just learned that your business experienced a data breach. Whether hackers took personal information from your corporate server, an insider stole customer information, or information was inadvertently exposed on your company’s website, you are probably wondering what to do next.[1]
A personal data breach can be broadly defined as a security incident that has affected the confidentiality, integrity or availability of personal data.[2]
Zazpot (talk) 19:20, 11 October 2018 (UTC)[reply]
@Zazpot: I say we should follow the reliable sources which discuss the Google+ incident, not your WP:SYNTH of an FTC handbook for businesses and ICO guidelines for breaches of personal data. Also, what's your source for saying these journalists are "gullible"? FallingGravity 22:50, 11 October 2018 (UTC)[reply]
FallingGravity: you ask, what's your source for saying these journalists are "gullible"? Cicero.
Also, an FTC handbook for businesses about data breaches and the ICO guidelines for breaches of data are not WP:SYNTH about data breaches. They are authoritative sources about data breaches in general, which necessarily includes the Google Plus data breach. Your suggestion that they are not applicable here is akin to saying that Smoking and Health was irrelevant to Lucky Strikes because it didn't name that brand specifically.
Also, numerous WP:RS have used, in relation to the Google Plus revelations, the exact wording "data breach" (as though that were somehow the most important thing, which it isn't, but I'm mentioning it in order to address your concerns), e.g.: The Guardian, NPR, and CBS. Also slightly less WP:RS (but still WP:RS on this sort of topic, IMO): Politico and TheNextWeb. Zazpot (talk) 01:18, 12 October 2018 (UTC)[reply]
It looks like CBS News, The Guardian, and TheNextWeb have since corrected their stories to say "data exposure" or "data leak". But now I'm guessing those sources aren't reliable anymore because they've been duped by Google's PR team, right? FallingGravity 18:17, 13 October 2018 (UTC)[reply]
Again, I think it's a "layperson" issue. "data breach" vs "data exposure" means to the average people that their data was not kept private, and the same result for them happens. Computer experts know better. There's no problem making sure that difference is well known to exclude data exposures from this page, but I will repeat, if that is done, then we absolutely need a "List of data exposures", make sure both pages are clear what elements are included and link back to the other page. --Masem (t) 19:10, 13 October 2018 (UTC)[reply]
@Masem: I appreciate your goodwill here, but what you are proposing does not make sense to me. As explained above, the set of "data exposures" is a subset of the set of "data breaches". Zazpot (talk) 22:47, 14 October 2018 (UTC)[reply]
@FallingGravity: please can you provide links to those "corrections"? If you are right about those sources, then:
  • It sounds as though they have fallen below their usual standards of journalism. I am disappointed in them. They should know better.
  • How do you feel about noting that WP:RS disagree about whether the Google breach was a "breach"? (What a ridiculous world this is that that sentence should be valid, but it is.)
Zazpot (talk) 22:45, 14 October 2018 (UTC); edited 06:41, 15 October 2018 (UTC)[reply]
@Zazpot: It's funny that you think reliable sources have "fallen below their usual standards of journalism" because they issue corrections, even though corrections are a hallmark of reliable sources. TheNextWeb article says "Removed references calling the issue a “breach,” to more accurately reflect that the Google+ security flaw was a “glitch” or “bug” which could have potentially resulted in a breach." The Wall Street Journal issued a similar correction. Even The Washington Post agrees that "The Google+ bug, it seems, was not a breach but a vulnerability." As for your second proposal, I think this article should list incidents that are definitely data breaches; any "debate" can go in the data breach or Google+ articles. FallingGravity 16:06, 15 October 2018 (UTC)[reply]
A willingness to issue corrections, when appropriate, is indeed a hallmark of a reliable source. This does not, however, imply that all corrections are appropriate. These particular "corrections" are inappropriate, and disappointing.
I agree that the article should list only definite data breaches; but as I have explained, data exposures are necessarily (because of the subset relationship) data breaches. Zazpot (talk) 23:06, 15 October 2018 (UTC)[reply]
Response to third opinion request:
Policy sidenote: More than two participants already present. Nevertheless, I think the subject is interesting and relatively simple to answer, so here.
  1. The term "breach" suggests one of two events, or both: Intrusion through security measures into an inner network or physical facility; or a localized failure of security policy. The latter need not involve the former: A loss of a flash drive with some sensitive information by a corporate employee on a business trip would often be termed a "breach", regardless of whether any information was exposed or even if the drive is actually in anyone's hands rather than just stuck under a mattress somewhere.
  2. As this article points out, the US Dept. of Justice takes a similar broad approach [2].
  3. Other than that, given that there's no standardized taxonomy of infosec failures from which we can draw, and as we're all aware of what the non-technical usage of "breach" encompassed (and in most likelihood the reader is too), I'd argue it simply doesn't matter. As long as we use the lead to define what this list is about (and the DOJ's definition is as good as any), there's no significant risk of misleading, misrepresentation or inaccuracy by simply keeping the current choice of name. Security vulnerabilities come in all shapes and sizes (so to speak), and as long as we're all aware of what we're discussing here and it reflects common as well as academic use, it's just not that important. François Robere (talk) 19:19, 17 October 2018 (UTC)[reply]
  4. Although, if you insist, a more accurate title would be "List of security vulnerabilities that resulted in large scale data exposure". But well.
Thanks for the comments. However, expanding the definition of this article might make it unwieldy, to the point where any vulnerability that could expose data, such Row hammer and Spectre, is listed (because any device that isn't patched could be breached). I've started an RfC on the matter to determine if this particular incident should be included. FallingGravity 17:48, 20 October 2018 (UTC)[reply]

References

  1. ^ "Data Breach Response: A Guide for Business". Federal Trade Commission. Retrieved 2018-10-11.
  2. ^ "Personal data breaches". Information Commissioner's Office. Retrieved 2018-10-11.

RfC on the inclusion of the Google+ incident[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Should Google+'s reported data exposure be included or excluded from this list? RfC relisted by Cunard (talk) at 01:37, 13 January 2019 (UTC). RfC relisted by Cunard (talk) at 05:28, 2 December 2018 (UTC). FallingGravity 17:34, 20 October 2018 (UTC)[reply]

  • Include. Data exposures are a proper subset of data breaches, according to relevant guidance from government bodies, and trade guides, e.g.:
  • You just learned that your business experienced a data breach. Whether hackers took personal information from your corporate server, an insider stole customer information, or information was inadvertently exposed on your company’s website, you are probably wondering what to do next.[1]
  • A personal data breach can be broadly defined as a security incident that has affected the confidentiality, integrity or availability of personal data.[2]
  • The term 'breach' is used to include the loss of control, compromise, unauthorized disclosure, unauthorized acquisition, unauthorized access, or any similar term referring to situations where persons other than authorized users and for an other than authorized purpose have access or potential access to information, whether physical or electronic.[3]
  • A data breach is an incident wherein an unauthorised person(s) or company (companies) receives access to the personal data of data subjects. This may be the result of intentional or unintentional action.[4]
To suggest that it is somehow WP:OR or WP:SYNTH to recognise that these passages are applicable to the Google incident, is akin to suggesting that Smoking and Health was irrelevant to Lucky Strikes because it didn't name that brand specifically.
Zazpot (talk) 05:43, 21 October 2018 (UTC)[reply]
Both your examples (Google+ and Lucky Strikes) are original research unless secondary sources make such connections to these primary sources. I suggest you read WP:PSTS very carefully. FallingGravity 23:48, 24 October 2018 (UTC)[reply]
I have read it several times previously, I read it again recently, I am broadly supportive of it, and yet I still disagree with you. Wikipedia does not source everything: it does not source each English word or term that is used in each article, for example. But having established what tobacco cigarettes are, or what data breaches are, etc, from reliable sources, we as Wikipedians can then categorise entities or events in the world appropriately. If WP:RS disagree with each other, then we may note this, as I suggested above; but we should not pretend that reliably-sourced facts about what is what can be temporarily suspended because they look bad on a company or its products, even if normally reliable sources choose to do so. Zazpot (talk) 02:36, 27 October 2018 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Data breach omission: University of Delaware, 2013[edit]

Hey crew. I am new to this and I hope this is the right venue. I see an omission that has affected over 74,000 people employed, enrolled, or matriculated from the University of Delaware. Here is the source: http://www1.udel.edu/udaily/2014/jul/resources073013.html

I see that I cannot edit the list directly, so please help me understand how we can get this one on the list. Thanks! — Preceding unsigned comment added by Kinobaby (talkcontribs) 16:55, 20 November 2018 (UTC)[reply]

Wordpress[edit]

Hi, can someone confirm Wordpress has been hacked recently? If so, it should be added to this page. https://www.zdnet.com/article/thousands-of-wordpress-sites-backdoored-with-malicious-code/ Kathelijne (talk) 13:44, 5 December 2018 (UTC)[reply]

Collection #1[edit]

This is being called a data breach , despite the fact it appears to be a collection of 773M+ from previous breaches and other data leaks; eg technically nothing new. [3]. I believe we should include it but putting the question on the table. --Masem (t) 04:29, 18 January 2019 (UTC)[reply]

These aggregate dumps are different though and not infrequent. Collection #1 is already shown to be a part of a much larger collection and there have been other dumps that have included parts of the included data. What is definitely worth adding to this list, though, are the dumps that were found in collection #1, were publicly disclosed, but aren't in this list (e.g. elance, cdprojektred, nexus mods). -- Jsoverson (talk) 16:31, 28 January 2019 (UTC)[reply]

New column "country"[edit]

I'm proposing to add a new column "Country" to the table to provide the country of origine of the company that suffered the breach. For instance the Yahoo! data breach would be US, OVH would be FR, ... — Preceding unsigned comment added by 194.3.119.2 (talk) 06:59, 17 July 2019 (UTC)[reply]

Add column to add more insight[edit]

Add some columns to add more insight into Data breaches.

a) What percentage of users/employees/customers were affected b) What was average compensation / account breach was settled in courts.

Sample as below: https://www.linkedin.com/feed/update/urn:li:activity:6559118839525834752 — Preceding unsigned comment added by Tapan.allabadi (talkcontribs) 17:22, 22 July 2019 (UTC)[reply]

MongoDB entries are erroneous[edit]

Joe Drumgoole (talk) 13:51, 23 November 2020 (UTC) The two MongoDB entries imply that MongoDB (the company) was responsible for these breaches. In both instances the owner of the database was an (unknown?)third party. I don't want to make the edit as I am employee of MongoDB. If we were listing vendors who sold the databases that were used to create the breaches every database vendor would be listed here. Can we amend the MongoDB entries to indicate the actual entity involved or mark the entity as unknown?[reply]

Adding Philip Morris[edit]

As I am not registered, someone can add Philip Morris International? The data breach concerns the data of 15 years of tobacco survey belonging to major tobacco companies (value of 70 million USD). Reference and source can be seen in a complaint at the New York State court: https://iapps.courts.state.ny.us/nyscef/DocumentList?docketId=ixdcabdUnWjejcynC/fJsQ==&display=all&courtType=New%20York%20County%20Supreme%20Court&resultsPageNum=1 — Preceding unsigned comment added by 2.53.134.87 (talk) 14:27, 30 November 2020 (UTC)[reply]

We can't use court documents as they are a primary source; it needs to be reported by third-party sources. --Masem (t) 14:44, 30 November 2020 (UTC)[reply]

So you can add it as it was reported by OCCRP https://www.occrp.org/en/daily/13413-complaint-phillip-morris-smuggled-smokes-distorted-data — Preceding unsigned comment added by 2.53.155.153 (talk) 22:07, 3 December 2020 (UTC)[reply]

It is still a claim and not proven, so we can't include it. --Masem (t) 22:09, 3 December 2020 (UTC)[reply]

Historical perspective / earlier breaches[edit]

Currently the earliest breach listed is 2004. Large breaches may be well covered, but I'd like to see more info supporting a historical perspective. E.g. at what point in the History of Technology should the potential for data breaches have changed people's fundamental strategic thinking about what can and cannot be "secret" or safe anymore? At what point were such volumes of critical or consumer data being amassed digitally such that breaches could be significantly damaging? At what point were storage densities high enough and portable enough to be a risk? At what point were networks interconnected enough with common protocols and operating systems to be at risk?

I don't mean that this article should answer those questions directly, but that a list of indicative early breaches (they don't have to be huge, just significant in some interesting way) should provide insight to such questions. It would also be nice to have some estimates (probably an extremely rough range) of what % of breaches are suspected to have gone completely undetected, to give further insight into the incompleteness of any such list. DKEdwards (talk) 19:06, 12 January 2021 (UTC)[reply]

For example, this site: https://searchsecurity.techtarget.com/feature/Data-breach-protection-requires-new-barriers says: "In 1984 the global credit information corporation known as TRW (now called Experian) was hacked and 90 million records were stolen." That sounds like a very significant example. DKEdwards (talk) 20:48, 12 January 2021 (UTC)[reply]

Comcast/xfinity NOT listed... why?[edit]

Comcast has been hacked numerous times (not all listed): in 2015, 2020, 2021, 2022.

In December 2020 alone, 1.51 BILLION records were hacked.

Is Wikipedia or the author of this article afraid of or somehow restrained by Comcast for some reason?

River City media is also not listed - January 2017 1.24 BILLION 2601:601:D27F:3630:509F:86D7:D2F4:4E62 (talk) 16:13, 31 January 2024 (UTC)[reply]

I need to take[edit]

@ 103.18.168.62 (talk) 15:59, 25 February 2024 (UTC)[reply]