# Publications by Year: 2018

**Version History**: Preliminary versions in EUROCRYPT ‘13 and Cryptology ePrint report 2013/125.

Bellare, Boldyreva, and O’Neill (CRYPTO '07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially-chosen plaintext distributions that are *independent* of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme.

We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to *adaptively *choose plaintext distributions *after* seeing the public key, in an *interactive* manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by \(2^p\), where \(p\) can be any predetermined polynomial in the security parameter. For example, with \(p=0\) we capture plaintext distributions that are independent of the public key, and with \(p=0(s \log s)\) we capture, in particular, all plaintext distributions that are samplable by circuits of size \(s\).

Within our framework we present both constructions in the random-oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS '00) and Dodis (PhD Thesis, MIT, '00).

**Version History**: Special Issue on STOC ‘14. Preliminary versions in STOC ‘14 and arXiv:1311.3158 [cs.CR].

We show new information-theoretic lower bounds on the sample complexity of (ε, δ)- differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database \(D ∈ (\{0, 1\}^d)^n\) has the form “What fraction of the individual records in the database satisfy the property \(q\)?” We show that in order to answer an arbitrary set \(Q\) of \(\gg d/ \alpha^2\) counting queries on \(D\) to within error \(±α\) it is necessary that \(n ≥ \tilde{Ω}(\sqrt{d} \log |Q|/α^2ε)\). This bound is optimal up to polylogarithmic factors, as demonstrated by the private multiplicative weights algorithm (Hardt and Rothblum, FOCS’10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is \(O(\log |Q|/α^2 )\). In addition, we show that our lower bound holds for the specific case of \(k\)-way marginal queries (where \(|Q| = 2^k \binom{d}{k}\) ) when \(\alpha\) is not too small compared to d (e.g., when \(\alpha\) is any fixed constant). Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO’95; Tardos, STOC’03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample-complexity lower bounds into stronger lower bounds.

**Version History: **Full version posted on CoRR, abs/1507.03113, July 2015. Additional version published in Proceedings of the 13th IACR Theory of Cryptography Conference (TCC '16-A).

In the study of differential privacy, composition theorems (starting with the original paper of Dwork, McSherry, Nissim, and Smith (TCC '06)) bound the degradation of privacy when composing several differentially private algorithms. Kairouz, Oh, and Viswanath (ICML '15) showed how to compute the optimal bound for composing \(k\) arbitrary (\(\epsilon\),\(\delta\))- differentially private algorithms. We characterize the optimal composition for the more general case of \(k\) arbitrary (\(\epsilon_1\) , \(\delta_1\) ), . . . , (\(\epsilon_k\) , \(\delta_k\) )-differentially private algorithms where the privacy parameters may differ for each algorithm in the composition. We show that computing the optimal composition in general is \(\#\)P-complete. Since computing optimal composition exactly is infeasible (unless FP\(=\)\(\#\)P), we give an approximation algorithm that computes the composition to arbitrary accuracy in polynomial time. The algorithm is a modification of Dyer’s dynamic programming approach to approximately counting solutions to knapsack problems (STOC '03).

**Version History: **Also presented at TPDP 2017. Preliminary version posted as arXiv:1711.03908 [cs.CR].

We study the problem of estimating finite sample confidence intervals of the mean of a normal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence intervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors.

**Version History: **v1, 11 September 2018 https://arxiv.org/abs/1809.04103

Differential privacy is a promising framework for addressing the privacy concerns in sharing sensitive datasets for others to analyze. However differential privacy is a highly technical area and current deployments often require experts to write code, tune parameters, and optimize the trade-off between the privacy and accuracy of statistical releases. For differential privacy to achieve its potential for wide impact, it is important to design usable systems that enable differential privacy to be used by ordinary data owners and analysts. PSI is a tool that was designed for this purpose, allowing researchers to release useful differentially private statistical information about their datasets without being experts in computer science, statistics, or privacy. We conducted a thorough usability study of PSI to test whether it accomplishes its goal of usability by non-experts. The usability test illuminated which features of PSI are most user-friendly and prompted us to improve aspects of the tool that caused confusion. The test also highlighted some general principles and lessons for designing usable systems for differential privacy, which we discuss in depth.

**Version History:** Preliminary version posted as ECCC TR18-119.

We study entropy flattening: Given a circuit \(C_X\) implicitly describing an n-bit source \(X\) (namely, \(X\) is the output of \(C_X \)_{ } on a uniform random input), construct another circuit \(C_Y\) describing a source \(Y\) such that (1) source \(Y\) is nearly *flat* (uniform on its support), and (2) the Shannon entropy of \(Y\) is monotonically related to that of \(X\). The standard solution is to have \(C_Y\) evaluate \(C_X\) altogether \(\Theta(n^2)\) times on independent inputs and concatenate the results (correctness follows from the asymptotic equipartition property). In this paper, we show that this is optimal among *black-box* constructions: Any circuit \(C_Y\) for entropy flattening that repeatedly queries \(C_X\) as an oracle requires \(\Omega(n^2)\)queries.

Entropy flattening is a component used in the constructions of pseudorandom generators and other cryptographic primitives from one-way functions [12, 22, 13, 6, 11, 10, 7, 24]. It is also used in reductions between problems complete for statistical zero-knowledge [19, 23, 4, 25]. The \(\Theta(n^2)\) query complexity is often the main efficiency bottleneck. Our lower bound can be viewed as a step towards proving that the current best construction of pseudorandom generator from arbitrary one-way functions by Vadhan and Zheng (STOC 2012) has optimal efficiency.

**Version History: **Preliminary version workshopped at PLSC 2017.

Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics. These range from simple statistical estimations, such as averages, to machine learning. Tools for differentially private analysis are now in early stages of implementation and use across a variety of academic, industry, and government settings. Interest in the concept is growing among potential users of the tools, as well as within legal and policy communities, as it holds promise as a potential approach to satisfying legal requirements for privacy protection when handling personal information. In particular, differential privacy may be seen as a technical solution for analyzing and sharing data while protecting the privacy of individuals in accordance with existing legal or policy requirements for de-identification or disclosure limitation.

This primer seeks to introduce the concept of differential privacy and its privacy implications to non-technical audiences. It provides a simplified and informal, but mathematically accurate, description of differential privacy. Using intuitive illustrations and limited mathematical formalism, it discusses the definition of differential privacy, how differential privacy addresses privacy risks, how differentially private analyses are constructed, and how such analyses can be used in practice. A series of illustrations is used to show how practitioners and policymakers can conceptualize the guarantees provided by differential privacy. These illustrations are also used to explain related concepts, such as composition (the accumulation of risk across multiple analyses), privacy loss parameters, and privacy budgets. This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.