Implementations of differentially private analyses

From Wikipedia, the free encyclopedia

Since the advent of differential privacy, a number of systems supporting differentially private data analyses have been implemented and deployed. This article tracks real-world deployments, production software packages, and research prototypes.

Real-world deployments[edit]

Name Organization Year Introduced Notes Still in use?
OnTheMap: Interactive tool for exploration of US income and commute patterns.[1][2] US Census Bureau 2008 First deployment of differential privacy Yes
RAPPOR in Chrome Browser to collect security metrics[3][4] Google 2014 First widespread use of local differential privacy No
Emoji analytics; analytics. Improve: QuickType, emoji; Spotlight deep link suggestions; Lookup Hints in Notes. Emoji suggestions, health type usage estimates, Safari energy drain statistics, Autoplay intent detection (also in Safari)[5] Apple 2017 Yes
Application telemetry[6] Microsoft 2017 Application usage statistics Microsoft Windows. yes
Flex: A SQL-based system developed for internal Uber analytics[7][8] Uber 2017 Unknown
2020 Census[9] US Census Bureau 2018 Yes
Audience Engagement API[10] LinkedIn 2020 Yes
Labor Market Insights[11] LinkedIn 2020 Yes
COVID-19 Community Mobility Reports[12] Google 2020 Unknown
Advertiser Queries[13] LinkedIn 2020
U.S. Broadband Coverage Data Set[14] Microsoft 2021 Unknown
College Scorecard Website IRS and Dept. of Education 2021 Unknown
Ohm Connect[15] Recurve 2021

Production software packages[edit]

These software packages purport to be usable in production systems. They are split in two categories: those focused on answering statistical queries with differential privacy, and those focused on training machine learning models with differential privacy.

Statistical analyses[edit]

Name Developer Year Introduced Notes Still maintained?
Google's differential privacy libraries[16] Google 2019 Building block libraries in Go, C++, and Java; end-to-end framework in Go,[17] . Yes
OpenDP[18] Harvard, Microsoft 2020 Core library in Rust,[19] SDK in Python with an SQL interface. Yes
Tumult Analytics[20] Tumult Labs[21] 2022 Python library, running on Apache Spark. Yes
PipelineDP[22] Google, OpenMined[23] 2022 Python library, running on Apache Spark, Apache Beam, or locally. Yes
PSI (Ψ): A Private data Sharing Interface Harvard University Privacy Tools Project.[24] 2016 No
TopDown Algorithm[25] United States Census Bureau 2020 Production code used in the 2020 US Census. No

Machine learning[edit]

Name Developer Year Introduced Notes Still maintained?
Diffprivlib[26] IBM[27] 2019 Python library. Yes
TensorFlow Privacy[28][29] Google 2019 Differentially private training in TensorFlow. Yes
Opacus[30] Meta 2020 Differentially private training in PyTorch. Yes

Research projects and prototypes[edit]

Name Citation Year Published Notes
PINQ: An API implemented in C#. [31] 2010
Airavat: A MapReduce-based system implemented in Java hardened with SELinux-like access control. [32] 2010
Fuzz: Time-constant implementation in Caml Light of a domain-specific language. [33] 2011
GUPT: Implementation of the sample-and-aggregate framework. [34] 2012
KTELO: A framework and system for answering linear counting queries. [35] 2018

Attacks on implementations[edit]

In addition to standard defects of software artifacts that can be identified using testing or fuzzing, implementations of differentially private mechanisms may suffer from the following vulnerabilities:

  • Subtle algorithmic or analytical mistakes.[36][37]
  • Timing side-channel attacks.[33] In contrast with timing attacks against implementations of cryptographic algorithms that typically have low leakage rate and must be followed with non-trivial cryptanalysis, a timing channel may lead to a catastrophic compromise of a differentially private system, since a targeted attack can be used to exfiltrate the very bit that the system is designed to hide.
  • Leakage through floating-point arithmetic.[38] Differentially private algorithms are typically presented in the language of probability distributions, which most naturally lead to implementations using floating-point arithmetic. The abstraction of floating-point arithmetic is leaky, and without careful attention to details, a naive implementation may fail to provide differential privacy. (This is particularly the case for ε-differential privacy, which does not allow any probability of failure, even in the worst case.) For example, the support of a textbook sampler of the Laplace distribution (required, for instance, for the Laplace mechanism) is less than 80% of all double-precision floating point numbers; moreover, the support for distributions with different means are not identical. A single sample from a naïve implementation of the Laplace mechanism allows distinguishing between two adjacent datasets with probability more than 35%.
  • Timing channel through floating-point arithmetic.[39] Unlike operations over integers that are typically constant-time on modern CPUs, floating-point arithmetic exhibits significant input-dependent timing variability.[40] Handling of subnormals can be particularly slow, as much as by ×100 compared to the typical case.[41]

See also[edit]

References[edit]

  1. ^ "OnTheMap". onthemap.ces.census.gov. Retrieved 29 March 2023.
  2. ^ Machanavajjhala, Ashwin; Kifer, Daniel; Abowd, John; Gehrke, Johannes; Vilhuber, Lars (April 2008). "Privacy: Theory meets Practice on the Map". 2008 IEEE 24th International Conference on Data Engineering. pp. 277–286. doi:10.1109/ICDE.2008.4497436. ISBN 978-1-4244-1836-7. S2CID 5812674.
  3. ^ Erlingsson, Úlfar. "Learning statistics with privacy, aided by the flip of a coin".
  4. ^ Erlingsson, Úlfar; Pihur, Vasyl; Korolova, Aleksandra (November 2014). "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response". Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. pp. 1054–1067. arXiv:1407.6981. Bibcode:2014arXiv1407.6981E. doi:10.1145/2660267.2660348. ISBN 9781450329576. S2CID 6855746.
  5. ^ Differential Privacy Team (December 2017). "Learning with Privacy at Scale". Apple Machine Learning Journal. 1 (8). {{cite journal}}: |last1= has generic name (help)
  6. ^ Ding, Bolin; Kulkarni, Janardhan; Yekhanin, Sergey (December 2017). "Collecting Telemetry Data Privately". 31st Conference on Neural Information Processing Systems: 3574–3583. arXiv:1712.01524. Bibcode:2017arXiv171201524D.
  7. ^ Tezapsidis, Katie (Jul 13, 2017). "Uber Releases Open Source Project for Differential Privacy".
  8. ^ Johnson, Noah; Near, Joseph P.; Song, Dawn (January 2018). "Towards Practical Differential Privacy for SQL Queries". Proceedings of the VLDB Endowment. 11 (5): 526–539. arXiv:1706.09479. doi:10.1145/3187009.3177733.
  9. ^ Abowd, John M. (August 2018). "The U.S. Census Bureau Adopts Differential Privacy". Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 2867. doi:10.1145/3219819.3226070. hdl:1813/60392. ISBN 9781450355520. S2CID 51711121.
  10. ^ Rogers, Ryan; Subramaniam, Subbu; Peng, Sean; Durfee, David; Lee, Seunghyun; Santosh Kumar Kancha; Sahay, Shraddha; Ahammad, Parvez (2020). "LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale". arXiv:2002.05839 [cs.CR].
  11. ^ Rogers, Ryan; Adrian Rivera Cardoso; Mancuhan, Koray; Kaura, Akash; Gahlawat, Nikhil; Jain, Neha; Ko, Paul; Ahammad, Parvez (2020). "A Members First Approach to Enabling LinkedIn's Labor Market Insights at Scale". arXiv:2010.13981 [cs.CR].
  12. ^ Aktay, Ahmet; Bavadekar, Shailesh; Cossoul, Gwen; Davis, John; Desfontaines, Damien; Fabrikant, Alex; Gabrilovich, Evgeniy; Gadepalli, Krishna; Gipson, Bryant; Guevara, Miguel; Kamath, Chaitanya; Kansal, Mansi; Lange, Ali; Mandayam, Chinmoy; Oplinger, Andrew; Pluntke, Christopher; Roessler, Thomas; Schlosberg, Arran; Shekel, Tomer; Vispute, Swapnil; Vu, Mia; Wellenius, Gregory; Williams, Brian; Royce J Wilson (2020). "Google COVID-19 Community Mobility Reports: Anonymization Process Description (Version 1.1)". arXiv:2004.04145 [cs.CR].
  13. ^ Rogers, Ryan; Subbu, Subramaniam; Peng, Sean; Durfee, David; Lee, Seunghyun; Kancha, Santosh Kumar; Sahay, Shraddha; Ahammad, Parvez (2020). "LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale". arXiv:2002.05839 [cs.CR].
  14. ^ Pereira, Mayana; Kim, Allen; Allen, Joshua; White, Kevin; Juan Lavista Ferres; Dodhia, Rahul (2021). "U.S. Broadband Coverage Data Set: A Differentially Private Data Release". arXiv:2103.14035 [cs.CR].
  15. ^ "EDP". EDP. Retrieved 29 March 2023.
  16. ^ "Google's differential privacy libraries". GitHub. 3 February 2023.
  17. ^ "Differential-privacy/Privacy-on-beam at main · google/Differential-privacy". GitHub.
  18. ^ "OpenDP". opendp.org. Retrieved 29 March 2023.
  19. ^ "OpenDP Library". GitHub.
  20. ^ "Tumult Analytics". www.tmlt.dev. Retrieved 29 March 2023.
  21. ^ "Tumult Labs | Privacy Protection Redefined". www.tmlt.io. Retrieved 29 March 2023.
  22. ^ "PipelineDP". pipelinedp.io. Retrieved 29 March 2023.
  23. ^ "OpenMined". www.openmined.org. Retrieved 29 March 2023.
  24. ^ Gaboardi, Marco; Honaker, James; King, Gary; Nissim, Kobbi; Ullman, Jonathan; Vadhan, Salil; Murtagh, Jack (June 2016). "PSI (Ψ): a Private data Sharing Interface".
  25. ^ "DAS 2020 Redistricting Production Code Release". GitHub. 22 June 2022.
  26. ^ "Diffprivlib v0.5". GitHub. 17 October 2022.
  27. ^ Holohan, Naoise; Braghin, Stefano; Pól Mac Aonghusa; Levacher, Killian (2019). "Diffprivlib: The IBM Differential Privacy Library". arXiv:1907.02444 [cs.CR].
  28. ^ Radebaugh, Carey; Erlingsson, Ulfar (March 6, 2019). "Introducing TensorFlow Privacy: Learning with Differential Privacy for Training Data".
  29. ^ "TensorFlow Privacy". GitHub. 2019-08-09.
  30. ^ "Opacus · Train PyTorch models with Differential Privacy". opacus.ai. Retrieved 29 March 2023.
  31. ^ McSherry, Frank (1 September 2010). "Privacy integrated queries" (PDF). Communications of the ACM. 53 (9): 89–97. doi:10.1145/1810891.1810916. S2CID 52898716.
  32. ^ Roy, Indrajit; Setty, Srinath T.V.; Kilzer, Ann; Shmatikov, Vitaly; Witchel, Emmett (April 2010). "Airavat: Security and Privacy for MapReduce" (PDF). Proceedings of the 7th Usenix Symposium on Networked Systems Design and Implementation (NSDI).
  33. ^ a b Haeberlen, Andreas; Pierce, Benjamin C.; Narayan, Arjun (2011). "Differential Privacy Under Fire". 20th USENIX Security Symposium.
  34. ^ Mohan, Prashanth; Thakurta, Abhradeep; Shi, Elaine; Song, Dawn; Culler, David E. "GUPT: Privacy Preserving Data Analysis Made Easy". Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. pp. 349–360. doi:10.1145/2213836.2213876. S2CID 2135755.
  35. ^ Zhang, Dan; McKenna, Ryan; Kotsogiannis, Ios; Hay, Michael; Machanavajjhala, Ashwin; Miklau, Gerome (June 2018). "EKTELO: A Framework for Defining Differentially-Private Computations". Proceedings of the 2018 International Conference on Management of Data. pp. 115–130. arXiv:1808.03555. doi:10.1145/3183713.3196921. ISBN 9781450347037. S2CID 5033862.
  36. ^ McSherry, Frank (25 February 2018). "Uber's differential privacy .. probably isn't". GitHub.
  37. ^ Lyu, Min; Su, Dong; Li, Ninghui (1 February 2017). "Understanding the sparse vector technique for differential privacy". Proceedings of the VLDB Endowment. 10 (6): 637–648. arXiv:1603.01699. doi:10.14778/3055330.3055331. S2CID 5449336.
  38. ^ Mironov, Ilya (October 2012). "On significance of the least significant bits for differential privacy". Proceedings of the 2012 ACM conference on Computer and communications security (PDF). ACM. pp. 650–661. doi:10.1145/2382196.2382264. ISBN 9781450316514. S2CID 3421585.
  39. ^ Andrysco, Marc; Kohlbrenner, David; Mowery, Keaton; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav (May 2015). "On Subnormal Floating Point and Abnormal Timing". 2015 IEEE Symposium on Security and Privacy. pp. 623–639. doi:10.1109/SP.2015.44. ISBN 978-1-4673-6949-7. S2CID 1903469.
  40. ^ Kohlbrenner, David; Shacham, Hovav (August 2017). "On the Effectiveness of Mitigations Against Floating-point Timing Channels". Proceedings of the 26th USENIX Conference on Security Symposium. USENIX Association: 69–81.
  41. ^ Dooley, Isaac; Kale, Laxmikant (September 2006). "Quantifying the interference caused by subnormal floating-point values" (PDF). Proceedings of the Workshop on Operating System Interference in High Performance Applications.