# Publications

*fingerprinting codes*(Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample-complexity lower bounds into stronger lower bounds.

Bellare, Boldyreva, and O’Neill (CRYPTO ’07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially chosen-plaintext distributions that are *independent* of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme. We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to *adaptively* choose plaintext distributions *after* seeing the public key, in an *interactive* manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by 2 *p* 2p , where *p* can be any predetermined polynomial in the security parameter and plaintext length. For example, with *p*=0 p=0 we capture plaintext distributions that are independent of the public key, and with *p*=*O*(*s*log*s*) p=O(slogs) we capture, in particular, all plaintext distributions that are samplable by circuits of size *s*. Within our framework we present both constructions in the random oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS ’00) and Dodis (Ph.D. Thesis, MIT, ’00).

We study entropy flattening: Given a circuit C_{X} implicitly describing an n-bit source X (namely, X is the output of C_{X } on a uniform random input), construct another circuit C_{Y} describing a source Y such that (1) source Y is nearly flat (uniform on its support), and (2) the Shannon entropy of Y is monotonically related to that of X. The standard solution is to have C_{Y} evaluate C_{X} altogether \((n2)\) times on independent inputs and concatenate the results (correctness follows from the asymptotic equipartition property). In this paper, we show that this is optimal among black-box constructions: Any circuit C_{Y} for entropy flattening that repeatedly queries CX as an oracle requires (n2) queries. Entropy flattening is a component used in the constructions of pseudorandom generators and other cryptographic primitives from one-way functions [12, 22, 13, 6, 11, 10, 7, 24]. It is also used in reductions between problems complete for statistical zero-knowledge [19, 23, 4, 25]. The [1](n2) query complexity is often the main efficiency bottleneck. Our lower bound can be viewed as a step towards proving that the current best construction of pseudorandom generator from arbitrary one-way functions by Vadhan and Zheng (STOC 2012) has optimal efficiency.

*Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics.*

*These range from simple statistical estimations, such as averages, to machine learning. Tools for differentially private analysis are now in early stages of implementation and use across a variety of academic, industry, and government settings. Interest in the concept is growing among potential users of the tools, as well as within legal and policy communities, as it holds promise as a potential approach to satisfying legal requirements for privacy protection when handling personal information. In particular, differential privacy may be seen as a technical solution for analyzing and sharing data while protecting the privacy of individuals in accordance with existing legal or policy requirements for de-identification or disclosure limitation.*

*This primer seeks to introduce the concept of differential privacy and its privacy implications to non-technical audiences. It provides a simplified and informal, but mathematically accurate, description of differential privacy. Using intuitive illustrations and limited mathematical formalism, it discusses the definition of differential privacy, how differential privacy addresses privacy risks, how differentially private analyses are constructed, and how such analyses can be used in practice. A series of illustrations is used to show how practitioners and policymakers can conceptualize the guarantees provided by differential privacy. These illustrations are also used to explain related concepts, such as composition (the accumulation of risk across multiple analyses), privacy loss parameters, and privacy budgets.*

*This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.*

Our algorithm combines ideas from time-efficient Laplacian solvers (Spielman and Teng, STOC `04; Peng and Spielman, STOC `14) with ideas used to show that Undirected S-T Connectivity is in deterministic logspace (Reingold, STOC `05 and JACM `08; Rozenman and Vadhan, RANDOM `05).

*o*has the property that any report of player

*i*would have led to

*o*with approximately the same probability, then

*o*has a small privacy cost to player

*i*. We give three mechanisms that are truthful with respect to our modeling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number

*n*of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of

*n*). Preliminary version on arXiv (2011).

*no reference to interaction or zero knowledge.*

We propose the use of complete problems to unify and extend the study of statistical zero knowledge. To this end, we examine several consequences of our Completeness Theorem and its proof, such as:

- A way to make every (honest-verifier) statistical zero-knowledge proof very communication efficient, with the prover sending only one bit to the verifier (to achieve soundness error 1/2).
- Simpler proofs of many of the previously known results about statistical zero knowledge, such as the Fortnow and Aiello--Håstad upper bounds on the complexity of SZK and Okamoto's result that SZK is closed under complement.
- Strong closure properties of SZK which amount to constructing statistical zero-knowledge proofs for complex assertions built out of simpler assertions already shown to be in SZK.
- New results about the various measures of "knowledge complexity," including a collapse in the hierarchy corresponding to knowledge complexity in the "hint" sense.
- Algorithms for manipulating the statistical difference between efficiently samplable distributions, including transformations which "polarize" and "reverse" the statistical relationship between a pair of distributions.

We model the environment by an unknown directed graph G, and consider the problem of a robot exploring and mapping G. The edges emanating from each vertex are numbered from `1' to `d', but we do not assume that the vertices of G are labeled. Since the robot has no way of distinguishing between vertices, it has no hope of succeeding unless it is given some means of distinguishing between vertices. For this reason we provide the robot with a "pebble" --- a device that it can place on a vertex and use to identify the vertex later.

In this paper we show:

- If the robot knows an upper bound on the number of vertices then it can learn the graph efficiently with only
*one*pebble. - If the robot does not know an upper bound on the number of vertices n, then Theta(loglog n) pebbles are both necessary and sufficient.

We give several efficient transformations for manipulating the statistical difference (variation distance) between a pair of probability distributions. The effects achieved include increasing the statistical difference, decreasing the statistical difference, "polarizing" the statistical relationship, and "reversing" the statistical relationship. We also show that a boolean formula whose atoms are statements about statistical difference can be transformed into a single statement about statistical difference. All of these transformations can be performed in polynomial time, in the sense that, given circuits which sample from the input distributions, it only takes polynomial time to compute circuits which sample from the output distributions.

By our prior work (see FOCS 97), such transformations for manipulating statistical difference are closely connected to results about SZK, the class of languages possessing statistical zero-knowledge proofs. In particular, some of the transformations given in this paper are equivalent to the closure of SZK under complement and under certain types of Turing reductions. This connection is also discussed briefly in this paper.

This report contains a discussion of the difficult problem of key management for multicast communication sessions. It focuses on two main areas of concern with respect to key management, which are, initializing the multicast group with a common net key and rekeying the multicast group. A rekey may be necessary upon the compromise of a user or for other reasons (e.g., periodic rekey). In particular, this report identifies a technique which allows for secure compromise recovery, while also being robust against collusion of excluded users. This is one important feature of multicast key management which has not been addressed in detail by most other multicast key management proposals [1,2,4]. The benefits of this proposed technique are that it minimizes the number of transmissions required to rekey the multicast group and it imposes minimal storage requirements on the multicast group.

*without using computational assumptions or even referring to the complexity of such verifier strategies.*(Previous transformations have either relied on computational assumptions or were applicable only to constant-round public-coin proof systems.)

Our transformation also applies to public-coin (aka Arthur-Merlin) computational zero-knowledge proofs: We transform any Arthur-Merlin proof system which is computational zero-knowledge with respect to the honest-verifier, into an Arthur-Merlin proof system which is computational zero-knowledge with respect to any probabilistic polynomial-time verifier.

A crucial ingredient in our analysis is a new lemma regarding 2-universal hashing functions.