Wikipedia talk:WikiProject COVID-19/Case Count Task Force

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconCOVID-19 Project‑class
WikiProject iconThis page is within the scope of WikiProject COVID-19, a project to coordinate efforts to improve all COVID-19-related articles. If you would like to help, you are invited to join and to participate in project discussions.
ProjectThis page does not require a rating on Wikipedia's content assessment scale.

Goal[edit]

This task force has a noble goal but case counts are being constantly updated on hundreds of COVID-19 articles. There is no way I can see to make every editor abide by the guidelines a task force can come up with.

I think you need to focus on a core group of articles and templates you seek to keep accurate because case counting is happening on hundreds of articles now and it would be difficult to impose order on an aspect that experienced, casual and brand new editors are contributing to. Liz Read! Talk! 21:09, 22 March 2020 (UTC)[reply]

Liz: I'm focusing on Template:2019–20 coronavirus pandemic data and occasionally on some country-specific articles. Some editors are doing the same, while others are focusing on a single country-specific article. I think we can be more efficient if some editors are committed to update, at least, some subset of countries. In my case, I can be faster sourcing changes for countries with Spanish (and most Romance languages) sources than, let's say, Slavic languages. --MarioGom (talk) 21:44, 22 March 2020 (UTC)[reply]

I don't think that individuals really need to take responsibility for one country. What happens is that people continuously edit to update. Once a day I check all entries to see if a change has been missed. This can take a while so there may be no point in time where everything is updated. If responsibility is needed it would be for little known places that do not appear in the news. Graeme Bartlett (talk) 05:05, 24 March 2020 (UTC)[reply]

Graeme Bartlett: Indeed, in practice I think a lot of us are just trying to catch latest updates. I'm working on script to automate alerts for updates. As you know, I'm posting a report daily at Template:2019–20 coronavirus pandemic data. It is currently based on Worldometer only, but I'm working on integrating other sources. --MarioGom (talk) 18:34, 29 March 2020 (UTC)[reply]

Translation[edit]

See here re: whether or not translation work should be done at separate task force. ---Another Believer (Talk) 20:01, 23 March 2020 (UTC)[reply]

US testing data by date by state: covidtracking.com[edit]

A great updated archive of day-by-day testing (positive and negative) data for Colorado and other states in the US is https://covidtracking.com/ See also graphs based on the data at [1] and [2] ★NealMcB★ (talk) 23:00, 23 March 2020 (UTC)[reply]

Nealmcb: As another and more experienced editor MarioGom has mentioned in a more recent thread, covidtracking.com is not as up-to-date as other trackers such as 1Point3Acres.
This reply is just to let you know that your suggestion has not been forgotten.
Anyways, hope that addresses your concerns. Cheers. RayDeeUx (talk) 18:05, 31 March 2020 (UTC)[reply]

worldometers[edit]

I've said this elsewhere, but reporting by worldometers of Ohio's stats is definitely suspicious -- they say they're sourcing to Ohio Department of Health, but ODH is reporting only cumulative cases and deaths, and worldometers is doing arithmetic to arrive at a number for "active cases" for Ohio, which is then apparently being used to assume no recoveries in Ohio since worldometers is also reporting a "recovered" number for the country as a whole, so they must be aggregating stuff that can't be aggregated. At best they should be reporting "reported recoveries" and "cases reported as active" if they're getting that level of detail from some depts of health. --valereee (talk) 10:53, 24 March 2020 (UTC)[reply]

Valereee: I have not included Worldometer as a recommended source in this page for a reason, discussed extensively in other talk pages. They do not cite exact sources for every update and they have reported incorrect figures repeatedly without a cited source, citing a dubious source, or without a rollback when a source was proven incorrect. I do use it to check if we are missing updates though. For the US, it seems 1point3acres is gaining some more traction as a source. --MarioGom (talk) 11:01, 24 March 2020 (UTC)[reply]
MarioGom, but it's listed for the US as a source we're relying on? --valereee (talk) 12:03, 24 March 2020 (UTC)[reply]
Valereee: Yes. In practice, both Worldometer and 1point3acres are still used as references for the US in Template:2019–20 coronavirus pandemic data. Although I think we should stop using Worldometer there too. --MarioGom (talk) 13:55, 24 March 2020 (UTC)[reply]
MarioGom, I agree. It's one thing to use them as an easy way to find the other sources already easily grouped, but we shouldn't source to them directly, their arithmetic is suspect. --valereee (talk) 14:00, 24 March 2020 (UTC)[reply]

APIs for COVID-19 Data[edit]

Hi, all. My name's Evan. I'm the Product Manager for APIs at the Wikimedia Foundation.

There is an amazing amount of data being assembled and collated here. I'd love to find a way for us to share it with the world as a machine-readable API. Is the best way to scrape it from the tables in the templates, or could we move it to tabular data and everyone uses it from there? --EProdromou (WMF) (talk) 20:20, 24 March 2020 (UTC)[reply]

I've started copying data from these templates to the Data namespace on Commons. --EProdromou (WMF) (talk) 12:03, 25 March 2020 (UTC)[reply]
Thanks! I'd like to see them all in one place, then we can generate multilingual assets, among other things. All the best: Rich Farmbrough (the apparently calm and reasonable) 10:29, 26 March 2020 (UTC).[reply]
Sounds great! Whatever solution we use, it should be easy to update and it should allow maintaining references properly. --MarioGom (talk) 10:52, 26 March 2020 (UTC)[reply]

Questions on Official Sources for Confirmed Cases[edit]

For those of us that have been updating pages with the confirmed cases (active, recovered, and deaths), should we include all official sources?

For example, I have mainly been updating 2020 coronavirus pandemic in Illinois and sticking to the Illinois Department of Public Health's official numbers that come out daily. However, their information is typically a day behind the local county health departments official numbers. Should we include both the local and the state numbers?

Thoughts? — Mr Xaero ☎️ 09:54, 25 March 2020 (UTC)[reply]

There's no reason not to sum the county health figures, that I can see, as long as we don't mix the two series in the same list. All the best: Rich Farmbrough (the apparently calm and reasonable) 10:28, 26 March 2020 (UTC).[reply]

CovidTracking in the US[edit]

Consider including covidtracking.com as a US source -- it's a curated synth of other available sources, with source and fact-checking from the Atlantic and other outlets. – SJ + 17:59, 26 March 2020 (UTC)[reply]

SJ: Their approach to sources is amazing, with clear source citations, grades, screenshots, etc. However, I see it is currently lagging a lot, even behind the CDC ([3]). --MarioGom (talk) 19:08, 29 March 2020 (UTC)[reply]
Hi @MarioGom: maybe that was for a moment while CA and NY were going through process changes. It is usually ~1 day ahead of CDC, and reflects reports at the state level -- which is always lagging a bit behind county-level reports. But we should work with them to coordinate updates -- they are hand-reviewing sources in a very wiki-compatible way. – SJ + 05:36, 1 April 2020 (UTC)[reply]
SJ: I see. It might be worth to discuss this at Template:2019–20 coronavirus pandemic data. --MarioGom (talk) 09:59, 1 April 2020 (UTC)[reply]

Sources for Venezuelan cases[edit]

Though I'd update here that there's discussions at the Venezuela article about how the country has no reliable reporting on medicine, and how to deal with that, if anyone here was interested. Kingsif (talk) 15:22, 28 March 2020 (UTC)[reply]

Kingsif: We are tracking officially confirmed cases, deaths and recoveries. By definition, official sources are reliable for this. Why do you say Venezuela has no reliable sources in this case? Even the most biased state-run media outlets around the world are generally valid to cite government officials' claims. --MarioGom (talk) 18:03, 29 March 2020 (UTC)[reply]
@MarioGom: If you look at the Venezuela COVID-19 article's talk page, there's discussion there. And the Ministry of Health is known to lie and hide statistics. It's not just that you can't trust the state-run media, you can't trust the government. That's kind of the issue. Kingsif (talk) 18:08, 29 March 2020 (UTC)[reply]
@Kingsif:: Ok. I'll reply there: Talk:2020 coronavirus pandemic in Venezuela § Ministerio de la Salud as source. --MarioGom (talk) 18:12, 29 March 2020 (UTC)[reply]

Google want you ?[edit]

Please see Wikipedia_talk:WikiProject_COVID-19#Google_using_Wikipedia_pages_to_power_sidebar_stats_panel_in_search. cc: Accedie. Yug (talk) 18:14, 31 March 2020 (UTC)[reply]

And in case it's interesting/helpful, Google is sharing some data back. See Wikipedia_talk:WikiProject_COVID-19#Data_on_most-trafficked_COVID_stats_in_Google_+_sneak_peek_at_stats_card_roadmap for more! MPinchuk (WMF) (talk) 17:02, 3 April 2020 (UTC)[reply]
ETA: the roadmap table rendered poorly via the mailing list, so posting here in a wikitable so it's easier to read:
Statistic Segment Description Status Source URL(s) Notes / Feedback
Global (Total) Total Cases, Total Recoveries, Total Deaths Live 2019–20_coronavirus_pandemic_by_country_and_territory#covid19-container
Country (Total) Total Cases, Total Recoveries, Total Deaths Live 2019–20_coronavirus_pandemic_by_country_and_territory#covid19-container
US > State/ Territory (Total) Total Cases, Total Recoveries, Total Deaths Live 2020_coronavirus_pandemic_in_the_United_States
Global (Daily) Daily Cases, Total Recoveries, Total Deaths Live 2019–20_coronavirus_pandemic_cases/WHO_situation_reports, 2019–20_coronavirus_pandemic_deaths/WHO_situation_reports Missing daily recoveries numbers
Country (Daily) Daily Cases, Total Recoveries, Total Deaths In Progress (Apr-05) 2019–20_coronavirus_pandemic_by_country_and_territory#covid19-container, Pages linked from the main page table Coverage is not available in all countries. Google is hoping to get info on at least the top 50 countries.
US > State/ Territory (Daily) Daily Cases, Total Recoveries, Total Deaths In Progress (Apr-05) Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases, Pages linked from the main page table
Country > Testing Total Tests, Positive Tests, Tests/Million People, Positive/Thousand Tests In Progress (Apr-05) COVID-19_testing Information is currently only a snapshot view. Google is still looking for tests over time.
Global > Age, Gender, Severity Distribution of cases by age, gender and severity Planned (Apr-12) Missing data Information is limited to a small number of countries. Google is hoping to get this information for all countries.
Country > Age, Gender, Severity Distribution of cases by age, gender and severity Planned (Apr-12) 2019–20_coronavirus_pandemic_by_country_and_territory#covid19-container, Pages linked from the main page table Information is limited to a small number of countries. Google is hoping to get this information for all countries.
US > State/ Territory > Age, Gender, Severity Distribution of cases by age, gender and severity Planned (Apr-12) Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases, Pages linked from the main page table
To be clear, this is just to give you an idea of the case data Google is already displaying and wants to display based on what readers are searching for. MPinchuk (WMF) (talk) 17:40, 3 April 2020 (UTC)[reply]

@MPinchuk: Just a reminder that the WMF wikis and Google are fundamentally different projects. Wikipedia is fully transparent and community run, while Alphabet Inc is an authoritarian centralised secretive corporation that is trying to retain close to totalitarian control over information. As a member of the WMF board, you could at least tell us what efforts you have made to cooperate with community-based search engines that aim at user privacy right from the beginning, like Duckduckgo and Qwant, in this situation. Thanks in advance for your efforts. Boud (talk) 00:01, 5 April 2020 (UTC)[reply]

Boud, quick clarification: I am not a member of the Board of the Wikimedia Foundation; I work as a staff member for the Wikimedia Foundation on the Partnerships team :)
To answer your question: my team has a) directly reached out to all partners who we think might be interested in using this data, and b) publicly offered our assistance to any potential partner (see Collaboration section) who is interested in using this data. The decision to use data from Wikipedia/Wikimedia projects ultimately rests with them, not us, but we're definitely excited and eager to assist any major search engine or other partner that wants to get these stats out to a broader audience! MPinchuk (WMF) (talk) 15:37, 6 April 2020 (UTC)[reply]
Thanks for the clarification about WMF Board vs WMFPartnerships employeeship. :)
a) is rather vague. Have you or have you not directly reached out to Duckduckgo and Qwant? Did you do so before my message? Did you directly reach out to Bing - the search engine of an organisation focussed on authoritarian control and privacy violation?
Please remember that Wikipedians are volunteers. It's quite likely that most of us would prefer a priority for cooperation with non-authoritarian, privacy-respecting organisations rather than with authoritarian, privacy-violating organisations. Boud (talk) 16:33, 6 April 2020 (UTC)[reply]
Boud: we reached out to Duckduckgo last week to see how/if we could support them. We also just learned that Bing is using Wikipedia stats, among other sources of data, for their COVID tracker. We don't have any contacts at Qwant, but if you know anyone who works on that project, please let them know that they can email us at partnerships@wikimedia.org if they're interested in collaborating! MPinchuk (WMF) (talk) 17:00, 6 April 2020 (UTC)[reply]
MPinchuk Thanks for the details. :) Boud (talk) 19:21, 6 April 2020 (UTC)[reply]
MPinchuk (WMF). I have transfer your contact to Qwant. Pyb (talk) 09:20, 20 April 2020 (UTC)[reply]

CA data used as source for SF Chronicle?[edit]

I heard a rumor that the SF Chronicle was sourcing data to a Feature Layer in ESRI's arcgis system, which was build from the California state data from the WP template. ! Now switching to JHU because it is more regularly updated... But we should confirm this + work with thowever it was making that feature layer to help normalize / speed up updates + help make them more visible. – SJ + 05:38, 1 April 2020 (UTC)[reply]

Complementary epidemiological data[edit]

Despite constant community criticism of the Polish Ministry of Health data, and whatever the true accuracy of the data may be, the Ministry appears to be the only state health agency that publishes daily values of:

  • suspected/hospitalised cases - people who are ill and their illness is suspected to be COVID-19 and so they are hospitalised, but not (yet) SARS-CoV-2 positive in terms of lab tests;
  • quarantined - this means legal quarantining, with police visits to check, and had a big jump when border closures became strict and all people entering Poland started having compulsory 14-day quarantines; these values mix home quarantines (with random police checks) and quarantines in buildings temporarily repurposed;
  • monitored - health agency employees are supposed to telephone regularly, and/or the monitored people install programs on smartphones;
  • lab-tested - this presumably is for tests, not the number of tested people.

These data start on 19 Feb, which by coincidence happens to be exactly two weeks before the first official SARS-CoV-2 detection on 4 March.

All four of these parameters are epidemiologically useful. (For example: what fractions of each of the first three groups are likely to be SARS-CoV-2 positive? what fractions will fall ill with COVID-19? what fractions have seasonal influenza instead? is it a wise strategy to quarantine people and not "waste" money/time testing them if they remain asymptomatic?)

If there are many (are there any?) other territories providing these daily statistics, then we might want to consider something more systematic than just a note for these territories. (Does Sweden provide daily statistics of the number of elderly people self-isolating, for example?) Boud (talk) 00:17, 5 April 2020 (UTC)[reply]

We provide additional epidemiolodical indicators in many country-specific articles. See, for example, 2020 coronavirus pandemic in Spain § Statistics (cases, hospitalizations, ICU, deaths, recoveries) or 2020 coronavirus pandemic in Norway lede (lab tests, confirmed, deaths). Feel free to list them on this project page too. The Ministry of Health or equivalent institution is usually the only source for this information. You will find secondary sources citing them, but reliable sources using different underlying data are rare, except for small daily incremental updates for cases, deaths and recoveries. The focus on confirmed cases, deaths and recovery is just because that is what we are trying to track for every territory at {{2019–20 coronavirus pandemic data}} but that does not mean we cannot add further indicators to other articles. --MarioGom (talk) 12:36, 6 April 2020 (UTC)[reply]

Count by ages[edit]

Data (cases/deaths) per population are useful to assess the spreads of impacts on countries. Since COVID mostly kill old people total deaths divided by population over 65 (for instance, because there are open data: https://data.worldbank.org/indicator/SP.POP.65UP.TO.ZS) would be useful. Eventually the target would be to get the death rate within the ages over 65. It is somthing different an dsimplier as the case fatality rate. PourStephen2020 (talk) 10:08, 15 June 2020 (UTC)[reply]

Unusually un-noisy national daily case counts[edit]

@MarioGom, Yug, Doc James, and Natanieluz: The following is for the moment a preprint submitted for peer review peer-reviewed - arXiv:2007.11779 Zenodo3951152 - based on this Wikipedia WikiProject C19CCTF (COVID-19 Case Count Task Force) dataset snapshot. Quite a few of the national-level case counts show signs of too little noise in at least some periods. Generally, the more infections a country has, the more noisy (super-Poissonian) the daily counts are. With a few exceptions. This theme might become WP-notable if mainstream media take it up. (Disclaimer: COI) Boud (talk) 15:32, 24 July 2020 (UTC)[reply]

 Done peer-reviewed with open peer review. With the extra result that the worse the level of press freedom in a country, the more likely that the noise is low, i.e. the less likely it is that the data are reliable. Boud (talk) 19:29, 28 August 2021 (UTC)[reply]

Templates by country[edit]

Is Template talk:COVID-19 testing by country subdivision still needed? The data is quite out of date and Template talk:COVID-19 testing by country is significantly more current. Awbfiend (talk) 02:23, 2 August 2020 (UTC)[reply]

'Suspected cases' vs 'Infections' — an issue re Template:Infobox outbreak[edit]

The CDC, in response to COVID–19, added 'Infection Fatality Ratio (IFR)' on 7/20/2020 to their Planning Scenarios … as a new parameter value for disease severity, replacing the Symptomatic Case Fatality Ratio and the Symptomatic Case Hospitalization Ratio. IFR takes into account both symptomatic and asymptomatic cases and may therefore be a more directly measurable parameter for disease severity for COVID-19. That document relied on Meyerowitz-Katz & Merone (2020): An important unknown during the COVID-19 pandemic has been the infection-fatality rate (IFR). This differs from the case-fatality rate (CFR) as an estimate of the number of deaths as a proportion of the total number of cases, including those who are mild and asymptomatic. While the CFR is extremely valuable for experts, IFR is increasingly being called for by policy-makers and the lay public as an estimate of the overall mortality from COVID-19.

The WHO, on 10/5, indicated About 10% of the global population may have been infected [4][5], extrapolating from seroprevalence studies.

The Template:Infobox outbreak field structure does not yet account for this focus on 'infections', allowing only 'Suspected cases' — which currently contains the 'Infection' text quoted in the preceding paragraph. In view of 1) the CDC distinction (noted above) between IFR and CFR and 2) the WHO speakihg (also above) in terms of ‘infections’ rather than ‘suspected cases’, I requested that an 'Infections' field be added to allow for clearer presentation of this new metric relevant to COVID–19 and any other future outbreak where a substantive fraction of infections are asymptomatic or mild.

In prior discussion, MartinezMD remarked that the field label should be 'Suspected infections' because the numbers are large-scale estimates. In my view, 'Suspected' is not warranted on that ground as all estimates (large-scale or small) are extrapolations.

ProcrastinatingReader suggested I advertise here for further input. Other participants to-date in this and related discussion include Bakkster Man, Sdkb, Eb.eric, and Ozzie10aaaa. Thoughts? Humanengr (talk) 08:04, 14 November 2020 (UTC)[reply]

To word the request how I see it, it was requested that Template:Infobox outbreak, in addition to the current "Cases" and "Suspected cases" labels, have added to it "Infections" and "Suspected infections". The concern I had was in the meaningful difference between these terms and wanted to ensure the change was supported by consensus and said to be accurate before it is implemented. I presume this WikiProject / WP Medicine will have the best insight on this question. Some more of my thoughts are in the template talk discussion linked. ProcrastinatingReader (talk) 17:57, 14 November 2020 (UTC)[reply]
I think that, in terms of how they're being used, "Infections" and "Suspected Infections" are synonymous. The reason an infection isn't a case is that it wasn't confirmed via a laboratory test, meaning by definition it's an estimate. Of course, this brings up the question of why we need to separate "[Suspected] Infections" from "Suspected Cases". COVID provides a good example of this, with some early cases prior to widespread lab testing being diagnosed through clinical exam and/or epidemiology (close contact with a lab confirmed case), and these cases almost by definition only being symptomatic infections. So the way I see it:
  • "Confirmed Cases" - Lab test confirmed infections.
  • "Suspected Cases" - Lab test confirmed infections, plus diagnoses without lab confirmation.
  • "Suspected Infections" - Estimate of all infections, symptomatic or not, often through serological surveys.
Some examples of how this would be used on pages:
  • COVID-19 pandemic - Early on, "Confirmed Cases" would be lab confirmed (mostly PCR) tests, with a higher number for "Suspected cases" which includes clinical diagnoses during test shortages. Later, moving the "Confirmed Cases" and "Suspected Infections" as we learned how many asymptomatic infections there were, and clinical diagnoses became a vanishingly small proportion of cases.
  • 2009 swine flu pandemic - Move "Suspected Cases" to "Suspected Infections" to match the terminology used by the source. Potential to estimate symptomatic illness in the "Suspected Cases" field.
There seems to be some additional distinction between "cases" and "infections" (reading from the CDC here) that may prove useful as well. Bakkster Man (talk) 15:24, 16 November 2020 (UTC)[reply]
After further review, I support Bakkster Man's proposal above for 'Confirmed cases', 'Suspected cases', 'Suspected infections'; and note that 'Confirmed cases' aligns with WHO's UO7.1 confirmed by laboratory testing irrespective of severity of clinical signs or symptoms and 'Suspected cases' aligns with UO7.1 plus WHO's UO7.2 diagnosed clinically or epidemiologically but laboratory testing is inconclusive or not available. This taxonomy is straightforward and satisfies the need. Humanengr (talk) 06:25, 23 November 2020 (UTC)[reply]

Question: using WikiData or better solution than a WikiText article on a single language?[edit]

Hi wise Wikipedians,

I just try to start a page for tracking COVID-19 vaccination, but not sure if it is the best approach: shall we keep it as a Wikipedia article page that works only on a single language and have challenge to move and share in other WP languages and articles of EN Wikipedia, or shall we use WikiData or other open source relational data and transclude them into EN Wikipedia?

xinbenlv Talk, Remember to "ping" me 21:14, 11 January 2021 (UTC)[reply]

(Un)reliability of open government data[edit]

I've started an essay about the reliability of open government data and "official" versus "reliable" data. It seems to me that we are currently providing disinformation for several countries, even though this is unintentional. The reasons and possible alternatives are not trivial issues, which is why I think an essay is appropriate to see if arguments for and against possible ways to handle this disinformation can emerge. Boud (talk) 23:24, 16 September 2021 (UTC)[reply]