Formal Privacy Models and Title 13 (Census Cooperative Agreement CB16ADR0160001)

2023
Alabi, Daniel, and Salil Vadhan. “Differentially private hypothesis testing for linear regression.” Journal of Machine Learning Research 24, no. 361 (2023): 1-50. Publisher's VersionAbstract

Version History: Preliminary versions in NeurIPS '22, posted as arXiv:2206.14449 and presented at TPDP ‘21 (poster), IMS ‘22 (oral), and SEA ‘22 (oral). (Previously published as "Hypothesis testing for differentially private linear regression".

 

Abstract:

In this work, we design differentially private hypothesis tests for the following problems in the general linear model: testing a linear relationship and testing for the presence of mixtures. The majority of our hypothesis tests are based on differentially private versions of the $F$-statistic for the general linear model framework, which are uniformly most powerful unbiased in the non-private setting. We also present other tests for these problems, one of which is based on the differentially private nonparametric tests of Couch, Kazan, Shi, Bray, and Groce (CCS 2019), which is especially suited for the small dataset regime. We show that the differentially private $F$-statistic converges to the asymptotic distribution of its non-private counterpart. As a corollary, the statistical power of the differentially private $F$-statistic converges to the statistical power of the non-private $F$-statistic. Through a suite of Monte Carlo based experiments, we show that our tests achieve desired significance levels and have a high power that approaches the power of the non-private tests as we increase sample sizes or the privacy-loss parameter. We also show when our tests outperform existing methods in the literature.

ArXiv 2022.pdf NeurIPS 2022.pdf JMLR 2023.pdf
2020
Altman, Micah, Kobbi Nissim, Salil Vadhan, and Alexandra Wood. “Designing Access with Differential Privacy.” In Using Administrative Data for Research and Evidence-based Policy – A Handbook, 173-242. Cambridge, United States: Abdul Latif Jameel Poverty Action Lab (J-PAL). 2020. Publisher's VersionAbstract

Webinarhttps://www.youtube.com/watch?v=cOu-sTV8J2M

This chapter explains how administrative data containing personal information can be collected, analyzed, and published in a way that ensures the individuals in the data will be afforded the strong protections of differential privacy.

It is intended as a practical resource for government agencies and research organizations interested in exploring the possibility of implementing tools for differentially private data sharing and analysis. Using intuitive examples rather than the mathematical formalism used in other guides, this chapter introduces the differential privacy definition and the risks it was developed to address. The text employs modern privacy frameworks to explain how to determine whether the use of differential privacy is an appropriate solution in a given setting. It also discusses the design considerations one should take into account when implementing differential privacy. This discussion incorporates a review of real-world implementations, including tools designed for tiered access systems combining differential privacy with other disclosure controls presented in this Handbook, such as consent mechanisms, data use agreements, and secure environments.

Differential privacy technology has passed a preliminary transition from being the subject of academic work to initial implementations by large organizations and high-tech companies that have the expertise to develop and implement customized differentially private methods. With a growing collection of software packages for generating differentially private releases from summary statistics to machine learning models, differential privacy is now transitioning to being usable more widely and by smaller organizations.

J-PAL 2020.pdf
2018
Karwa, Vishesh, and Salil Vadhan. “Finite sample differentially private confidence intervals.” In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), 44:1-44:9. Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. ITCS, 2018. Publisher's VersionAbstract

Version History: Also presented at TPDP 2017. Preliminary version posted as arXiv:1711.03908 [cs.CR].

We study the problem of estimating finite sample confidence intervals of the mean of a normal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence intervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors.

ITCS2018.pdf ArXiv2017.pdf