# Publications

2018
Bun, Mark, Jonathan Ullman, and Salil Vadhan.Fingerprinting codes and the price of approximate differential privacy.SIAM Journal on Computing 47, no. 5 (2018): 1888-1938. Publisher's VersionAbstract
We show new information-theoretic lower bounds on the sample complexity of $$(\varepsilon, \delta)$$-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database $D \in (\{0,1\}^d)^n$ has the form “What fraction of the individual records in the database satisfy the property $q$?” We show that in order to answer an arbitrary set $\mathcal{Q}$ of $\gg d/\alpha^2$ counting queries on $D$ to within error $\pm \alpha$ it is necessary that $n \geq \tilde{\Omega}({\sqrt{d} \log |\mathcal{Q}|}/{\alpha^2 \varepsilon}).$ This bound is optimal up to polylogarithmic factors, as demonstrated by the private multiplicative weights algorithm (Hardt and Rothblum, FOCS'10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and $(\varepsilon, \delta)$-differential privacy is asymptotically larger than what is required merely for accuracy, which is $O(\log |\mathcal{Q}| / \alpha^2)$. In addition, we show that our lower bound holds for the specific case of $k$-way marginal queries (where $|\mathcal{Q}| = 2^k \binom{d}{k}$) when $\alpha$ is not too small compared to $d$ (e.g., when $\alpha$ is any fixed constant). Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample-complexity lower bounds into stronger lower bounds.

Raghunathan, Ananth, Gil Segev, and Salil P. Vadhan.Deterministic public-key encryption for adaptively-chosen plaintext distributions.” Journal of Cryptology 31, no. 4 (2018): 1012-1063.Abstract

Bellare, Boldyreva, and O’Neill (CRYPTO ’07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially chosen-plaintext distributions that are independent of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme. We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to adaptively choose plaintext distributions after seeing the public key, in an interactive manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by 2 p 2p , where p can be any predetermined polynomial in the security parameter and plaintext length. For example, with p=0 p=0 we capture plaintext distributions that are independent of the public key, and with p=O(slogs) p=O(slogs) we capture, in particular, all plaintext distributions that are samplable by circuits of size s. Within our framework we present both constructions in the random oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS ’00) and Dodis (Ph.D. Thesis, MIT, ’00).

Chen, Yi-Hsiu, Mika Goos, Salil P. Vadhan, and Jiapeng Zhang. “A tight lower bound for entropy flattening.” In 33rd Computational Complexity Conference (CCC 2018), 102:23:21-23:28. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. Publisher's VersionAbstract

We study entropy flattening: Given a circuit CX implicitly describing an n-bit source X (namely, X is the output of C on a uniform random input), construct another circuit CY describing a source Y such that (1) source Y is nearly flat (uniform on its support), and (2) the Shannon entropy of Y is monotonically related to that of X. The standard solution is to have CY evaluate CX altogether $$(n2)$$ times on independent inputs and concatenate the results (correctness follows from the asymptotic equipartition property). In this paper, we show that this is optimal among black-box constructions: Any circuit CY for entropy flattening that repeatedly queries CX as an oracle requires (n2) queries. Entropy flattening is a component used in the constructions of pseudorandom generators and other cryptographic primitives from one-way functions [12, 22, 13, 6, 11, 10, 7, 24]. It is also used in reductions between problems complete for statistical zero-knowledge [19, 23, 4, 25]. The [1](n2) query complexity is often the main efficiency bottleneck. Our lower bound can be viewed as a step towards proving that the current best construction of pseudorandom generator from arbitrary one-way functions by Vadhan and Zheng (STOC 2012) has optimal efficiency.

Wood, Alexandra, Micah Altman, Aaron Bembenek, Mark Bun, Marco Gaboardi, James Honaker, Kobbi Nissim, David R. OBrien, Thomas Steinke, and Salil Vadhan. “Differential privacy: A primer for a non-technical audience.Vanderbilt Journal of Entertainment & Technology Law 21, no. 1 (2018): 209-275. Publisher's VersionAbstract

Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics.

These range from simple statistical estimations, such as averages, to machine learning. Tools for differentially private analysis are now in early stages of implementation and use across a variety of academic, industry, and government settings. Interest in the concept is growing among potential users of the tools, as well as within legal and policy communities, as it holds promise as a potential approach to satisfying legal requirements for privacy protection when handling personal information. In particular, differential privacy may be seen as a technical solution for analyzing and sharing data while protecting the privacy of individuals in accordance with existing legal or policy requirements for de-identification or disclosure limitation.

This primer seeks to introduce the concept of differential privacy and its privacy implications to non-technical audiences. It provides a simplified and informal, but mathematically accurate, description of differential privacy. Using intuitive illustrations and limited mathematical formalism, it discusses the definition of differential privacy, how differential privacy addresses privacy risks, how differentially private analyses are constructed, and how such analyses can be used in practice. A series of illustrations is used to show how practitioners and policymakers can conceptualize the guarantees provided by differential privacy. These illustrations are also used to explain related concepts, such as composition (the accumulation of risk across multiple analyses), privacy loss parameters, and privacy budgets.

This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.

2017
Nissim, Kobbi, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David O'Brien, Thomas Steinke, and Salil Vad. “Bridging the gap between computer science and legal approaches to privacy.” Harvard Journal of Law & Technology (2017). Publisher's VersionAbstract
The analysis and release of statistical data about individuals and groups of individuals carries inherent privacy risks, and these risks have been conceptualized in different ways within the fields of law and computer science. For instance, many information privacy laws adopt notions of privacy risk that are sector- or context-specific, such as in the case of laws that protect from disclosure certain types of information contained within health, educational, or financial records. In addition, many privacy laws refer to specific techniques, such as deidentification, that are designed to address a subset of possible attacks on privacy. In doing so, many legal standards for privacy protection rely on individual organizations to make case-by-case determinations regarding concepts such as the identifiability of the types of information they hold. These regulatory approaches are intended to be flexible, allowing organizations to (1) implement a variety of specific privacy measures that are appropriate given their varying institutional policies and needs, (2) adapt to evolving best practices, and (3) address a range of privacy-related harms. However, in the absence of clear thresholds and detailed guidance on making case-specific determinations, flexibility in the interpretation and application of such standards also creates uncertainty for practitioners and often results in ad hoc, heuristic processes. This uncertainty may pose a barrier to the adoption of new technologies that depend on unambiguous privacy requirements. It can also lead organizations to implement measures that fall short of protecting against the full range of data privacy risks.
Vadhan., Salil P.On learning vs. refutation.30th Conference on Learning Theory (COLT 17),, 2017, 65, 1835-1848. Publisher's VersionAbstract
Building on the work of Daniely et al. (STOC 2014, COLT 2016), we study the connection between computationally efficient PAC learning and refutation of constraint satisfaction problems. Specifically, we prove that for every concept class $$P$$ , PAC-learning $$P$$ is \em polynomially equivalent to “random-right-hand-side-refuting” (“RRHS-refuting”) a dual class $$P ^∗$$ , where RRHS-refutation of a class $$Q$$ refers to refuting systems of equations where the constraints are (worst-case) functions from the class $$Q$$ but the right-hand-sides of the equations are uniform and independent random bits. The reduction from refutation to PAC learning can be viewed as an abstraction of (part of) the work of Daniely, Linial, and Shalev-Schwartz (STOC 2014). The converse, however, is new, and is based on a combination of techniques from pseudorandomness (Yao ‘82) with boosting (Schapire ‘90). In addition, we show that PAC-learning the class of $$DNF$$ formulas is polynomially equivalent to PAC-learning its dual class $$DNF ^∗$$ , and thus PAC-learning $$DNF$$ is equivalent to RRHS-refutation of $$DNF$$ , suggesting an avenue to obtain stronger lower bounds for PAC-learning $$DNF$$ than the quasipolynomial lower bound that was obtained by Daniely and Shalev-Schwartz (COLT 2016) assuming the hardness of refuting $$k$$-SAT
Murtagh, Jack, Omer Reingold, Aaron Sidford, and Salil Vadhan.Derandomization beyond connectivity: Undirected Laplacian systems in nearly logarithmic space.58th Annual IEEE Symposium on Foundations of Computer Science (FOCS 17), 2017. Publisher's VersionAbstract
We give a deterministic $$O ~ (logn)$$  -space algorithm for approximately solving linear systems given by Laplacians of undirected graphs, and consequently also approximating hitting times, commute times, and escape probabilities for undirected graphs. Previously, such systems were known to be solvable by randomized algorithms using $$O(logn)$$ space (Doron, Le Gall, and Ta-Shma, 2017) and hence by deterministic algorithms using$$O(log 3/2 n)$$  space (Saks and Zhou, FOCS 1995 and JCSS 1999).
Our algorithm combines ideas from time-efficient Laplacian solvers (Spielman and Teng, STOC 04; Peng and Spielman, STOC 14) with ideas used to show that Undirected S-T Connectivity is in deterministic logspace (Reingold, STOC 05 and JACM 08; Rozenman and Vadhan, RANDOM 05).
Chen, Yi-Hsiu, Kai-Min Chung, Ching-Yi Lai, Salil P. Vadhan, and Xiaodi Wu.Computational notions of quantum min-entropy.” In Poster presention at QIP 2017 and oral presentation at QCrypt 2017, 2017. Publisher's VersionAbstract
We initiate the study of computational entropy in the quantum setting. We investigate to what extent the classical notions of computational entropy generalize to the quantum setting, and whether quantum analogues of classical theorems hold. Our main results are as follows. (1) The classical Leakage Chain Rule for pseudoentropy can be extended to the case that the leakage information is quantum (while the source remains classical). Specifically, if the source has pseudoentropy at least k  , then it has pseudoentropy at least k−ℓ  conditioned on an ℓ  -qubit leakage. (2) As an application of the Leakage Chain Rule, we construct the first quantum leakage-resilient stream-cipher in the bounded-quantum-storage model, assuming the existence of a quantum-secure pseudorandom generator. (3) We show that the general form of the classical Dense Model Theorem (interpreted as the equivalence between two definitions of pseudo-relative-min-entropy) does not extend to quantum states. Along the way, we develop quantum analogues of some classical techniques (e.g. the Leakage Simulation Lemma, which is proven by a Non-uniform Min-Max Theorem or Boosting). On the other hand, we also identify some classical techniques (e.g. Gap Amplification) that do not work in the quantum setting. Moreover, we introduce a variety of notions that combine quantum information and quantum complexity, and this raises several directions for future work.
2016
Chen, Yiling, Stephen Chong, Ian A. Kash, Tal Moran, and Salil P. Vadhan. “Truthful mechanisms for agents that value privacy.ACM Transactions on Economics and Computation 4, no. 3 (2016). Publisher's VersionAbstract
Recent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from truthfulness; it is not incorporated in players’ utility functions (and doing so has been shown to lead to nontruthfulness in some cases). In this work, we propose a new, general way of modeling privacy in players’ utility functions. Specifically, we only assume that if an outcome o has the property that any report of player i would have led to o with approximately the same probability, then o has a small privacy cost to player i. We give three mechanisms that are truthful with respect to our modeling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number n of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of n). Preliminary version on arXiv (2011).
2003
Sahai, Amit, and Salil Vadhan. “A complete problem for statistical zero knowledge.Journal of the ACM 50, no. 2 (2003): 196-249.Abstract
We present the first complete problem for SZK, the class of (promise) problems possessing statistical zero-knowledge proofs (against an honest verifier). The problem, called STATISTICAL DIFFERENCE, is to decide whether two efficiently samplable distributions are either statistically close or far apart. This gives a new characterization of SZK that makes no reference to interaction or zero knowledge.

We propose the use of complete problems to unify and extend the study of statistical zero knowledge. To this end, we examine several consequences of our Completeness Theorem and its proof, such as:

• A way to make every (honest-verifier) statistical zero-knowledge proof very communication efficient, with the prover sending only one bit to the verifier (to achieve soundness error 1/2).
• Simpler proofs of many of the previously known results about statistical zero knowledge, such as the Fortnow and Aiello--Håstad upper bounds on the complexity of SZK and Okamoto's result that SZK is closed under complement.
• Strong closure properties of SZK which amount to constructing statistical zero-knowledge proofs for complex assertions built out of simpler assertions already shown to be in SZK.
• New results about the various measures of "knowledge complexity," including a collapse in the hierarchy corresponding to knowledge complexity in the "hint" sense.
• Algorithms for manipulating the statistical difference between efficiently samplable distributions, including transformations which "polarize" and "reverse" the statistical relationship between a pair of distributions.
2002
Bender, Michael A., Antonio Fernández, Dana Ron, Amit Sahai, and Salil Vadhan. “The power of a pebble: exploring and mapping directed graphs.Information and Computation 176, no. 1 (2002): 1-21.Abstract
Exploring and mapping an unknown environment is a fundamental problem that is studied in a variety of contexts. Many results have focused on finding efficient solutions to restricted versions of the problem. In this paper, we consider a model that makes very limited assumptions about the environment and solve the mapping problem in this general setting.

We model the environment by an unknown directed graph G, and consider the problem of a robot exploring and mapping G. The edges emanating from each vertex are numbered from 1' to `d', but we do not assume that the vertices of G are labeled. Since the robot has no way of distinguishing between vertices, it has no hope of succeeding unless it is given some means of distinguishing between vertices. For this reason we provide the robot with a "pebble" --- a device that it can place on a vertex and use to identify the vertex later.

In this paper we show:

1. If the robot knows an upper bound on the number of vertices then it can learn the graph efficiently with only one pebble.
2. If the robot does not know an upper bound on the number of vertices n, then Theta(loglog n) pebbles are both necessary and sufficient.
In both cases our algorithms are deterministic.
2001
Vadhan, Salil. “The complexity of counting in sparse, regular, and planar graphs.SIAM Journal on Computing 31, no. 2 (2001): 398-427.Abstract
We show that a number of graph-theoretic counting problems remain NP-hard, indeed #P-complete, in very restricted classes of graphs. In particular, we prove that the problems of counting matchings, vertex covers, independent sets, and extremal variants of these all remain hard when restricted to planar bipartite graphs of bounded degree or regular graphs of constant degree. We obtain corollaries about counting cliques in restricted classes of graphs and counting satisfying assignments to restricted classes of monotone 2-CNF formulae. To achieve these results, a new interpolation-based reduction technique which preserves properties such as constant degree is introduced.
1999
Sahai, Amit, and Salil Vadhan. “ Manipulating statistical difference.Randomization Methods in Algorithm Design (DIMACS Workshop, December 1997), volume 43 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science 43 (1999): 251-270.Abstract

We give several efficient transformations for manipulating the statistical difference (variation distance) between a pair of probability distributions. The effects achieved include increasing the statistical difference, decreasing the statistical difference, "polarizing" the statistical relationship, and "reversing" the statistical relationship. We also show that a boolean formula whose atoms are statements about statistical difference can be transformed into a single statement about statistical difference. All of these transformations can be performed in polynomial time, in the sense that, given circuits which sample from the input distributions, it only takes polynomial time to compute circuits which sample from the output distributions.

By our prior work (see FOCS 97), such transformations for manipulating statistical difference are closely connected to results about SZK, the class of languages possessing statistical zero-knowledge proofs. In particular, some of the transformations given in this paper are equivalent to the closure of SZK under complement and under certain types of Turing reductions. This connection is also discussed briefly in this paper.

Wallner, D., E. Harder, and R. Agee. “Key management for multicast: Issues and architectures.Internet RFC 2627, no. June 1999 (1999).Abstract

This report contains a discussion of the difficult problem of key management for multicast communication sessions.  It focuses on two main areas of concern with respect to key management, which are, initializing the multicast group with a common net key and rekeying the multicast group.  A rekey may be necessary upon the compromise of a user or for other reasons (e.g., periodic rekey).  In particular, this report identifies a technique which allows for secure compromise recovery, while also being robust against collusion of excluded users.  This is one important feature of multicast key management which has not been addressed in detail by most other multicast key management proposals [1,2,4].  The benefits of this proposed technique are that it minimizes the number of transmissions required to rekey the multicast group and it imposes minimal storage requirements on the multicast group.

1998
Goldreich, Oded, Amit Sahai, and Salil Vadhan. “Honest-verifier statistical zero-knowledge equals general statistical zero-knowledge.Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC ‘98) (1998): 399-408.Abstract
We show how to transform any interactive proof system which is statistical zero-knowledge with respect to the honest-verifier, into a proof system which is statistical zero-knowledge with respect to any verifier. This is done by limiting the behavior of potentially cheating verifiers, without using computational assumptions or even referring to the complexity of such verifier strategies. (Previous transformations have either relied on computational assumptions or were applicable only to constant-round public-coin proof systems.)

Our transformation also applies to public-coin (aka Arthur-Merlin) computational zero-knowledge proofs: We transform any Arthur-Merlin proof system which is computational zero-knowledge with respect to the honest-verifier, into an Arthur-Merlin proof system which is computational zero-knowledge with respect to any probabilistic polynomial-time verifier.

A crucial ingredient in our analysis is a new lemma regarding 2-universal hashing functions.