# Publications

2016
Chen, Yiling, Stephen Chong, Ian A. Kash, Tal Moran, and Salil P. Vadhan. “Truthful mechanisms for agents that value privacy.ACM Transactions on Economics and Computation 4, no. 3 (2016). Publisher's VersionAbstract
Recent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from truthfulness; it is not incorporated in players’ utility functions (and doing so has been shown to lead to nontruthfulness in some cases). In this work, we propose a new, general way of modeling privacy in players’ utility functions. Specifically, we only assume that if an outcome o has the property that any report of player i would have led to o with approximately the same probability, then o has a small privacy cost to player i. We give three mechanisms that are truthful with respect to our modeling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number n of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of n). Preliminary version on arXiv (2011).
2015
O'Brien, David, Jonathan Ullman, Micah Altman, Urs Gasser, Michael Bar-Sinai, Kobbi Nissim, Salil Vadhan, Michael Wojcik, and Alexandra Wood. “Integrating approaches to privacy across the research lifecycle: When is information purely public?Berkman Center Research Publication No. 2015-7, 2015, March. Publisher's VersionAbstract

Version History: Available at SSRN: http://ssrn.com/abstract=2586158.

On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data.

Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.

Bun, Mark, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. “Differentially private release and learning of threshold functions.” In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘15). IEEE, 2015. Publisher's VersionAbstract

Version HistoryFull version posted as arXiv:1504.07553.

We prove new upper and lower bounds on the sample complexity of $$(\varepsilon, \delta)$$ differentially private algorithms for releasing approximate answers to threshold functions. A threshold function $$c_x$$ over a totally ordered domain $$X$$ evaluates to $$c_x(y)=1$$ if $$y \leq {x}$$, and evaluates to $$0$$ otherwise. We give the first nontrivial lower bound for releasing thresholds with $$(\varepsilon, \delta)$$ differential privacy, showing that the task is impossible over an infinite domain $$X$$, and moreover requires sample complexity $$n \geq \Omega(\log^* |X|)$$, which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with $$n ≤ 2^{(1+o(1)) \log^∗|X|}$$ samples. This improves the previous best upper bound of $$8^{(1+o(1)) \log^∗ |X|}$$(Beimel et al., RANDOM ’13).

Our sample complexity upper and lower bounds also apply to the tasks of learning distri- butions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with $$(\varepsilon, \delta)$$ differential privacy and learning without privacy. For properly learning thresholds in $$\ell$$ dimensions, this lower bound extends to $$n ≥ Ω(\ell·\log^∗ |X|)$$.

To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database $$D$$ of elements from $$X$$, the interior point problem asks for an element between the smallest and largest elements in $$D$$. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

Chen, Sitan, Thomas Steinke, and Salil P. Vadhan. “Pseudorandomness for read-once, constant-depth circuits.” CoRR, 2015, 1504.04675. Publisher's VersionAbstract

For Boolean functions computed by read-once, depth-D circuits with unbounded fan-in over the de Morgan basis, we present an explicit pseudorandom generator with seed length $$\tilde{O}(\log^{D+1} n)$$. The previous best seed length known for this model was $$\tilde{O}(\log^{D+4} n)$$, obtained by Trevisan and Xue (CCC ‘13) for all of AC0 (not just read-once). Our work makes use of Fourier analytic techniques for pseudorandomness introduced by Reingold, Steinke, and Vadhan (RANDOM ‘13) to show that the generator of Gopalan et al. (FOCS ‘12) fools read-once AC0. To this end, we prove a new Fourier growth bound for read-once circuits, namely that for every $$F : \{0,1\}^n\rightarrow \{0,1\}$$ computed by a read-once, depth-$$D$$ circuit,

$$\left|\hat{F}[s]\right| \leq O\left(\log^{D-1} n\right)^k,$$

where $$\hat{F}$$ denotes the Fourier transform of $$F$$ over $$\mathbb{Z}_2^n$$.

Dwork, Cynthia, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. “Robust traceability from trace amounts.” In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘15), 650-669. IEEE, 2015. Publisher's VersionAbstract

The privacy risks inherent in the release of a large number of summary statistics were illustrated by Homer et al. (PLoS Genetics, 2008), who considered the case of 1-way marginals of SNP allele frequencies obtained in a genome-wide association study: Given a large number of minor allele frequencies from a case group of individuals diagnosed with a particular disease, together with the genomic data of a single target individual and statistics from a sizable reference dataset independently drawn from the same population, an attacker can determine with high confidence whether or not the target is in the case group.

In this work we describe and analyze a simple attack that succeeds even if the summary statistics are significantly distorted, whether due to measurement error or noise intentionally introduced to protect privacy. Our attack only requires that the vector of distorted summary statistics is close to the vector of true marginals in $$\ell_1$$norm. Moreover, the reference pool required by previous attacks can be replaced by a single sample drawn from the underlying population.

The new attack, which is not specific to genomics and which handles Gaussian as well as Bernouilli data, significantly generalizes recent lower bounds on the noise needed to ensure differential privacy (Bun, Ullman, and Vadhan, STOC 2014; Steinke and Ullman, 2015), obviating the need for the attacker to control the exact distribution of the data.

2014
Gopalan, Parikshit, Salil Vadhan, and Yuan Zhou. “Locally testable codes and Cayley graphs.” In In Moni Naor, editor, Innovations in Theoretical Computer Science (ITCS ‘14), 81-92. ACM, 2014. Publisher's VersionAbstract

Version History: Full version posted as https://arxiv.org/pdf/1308.5158.pdf.

We give two new characterizations of ($$\mathbb{F}_2$$-linear) locally testable error-correcting codes in terms of Cayley graphs over $$\mathbb{F}^h_2$$ :

1. A locally testable code is equivalent to a Cayley graph over $$\mathbb{F}^h_2$$ whose set of generators is significantly larger than $$h$$ and has no short linear dependencies, but yields a shortest-path metric that embeds into $$\ell_1$$ with constant distortion. This extends and gives a converse to a result of Khot and Naor (2006), which showed that codes with large dual distance imply Cayley graphs that have no low-distortion embeddings into $$\ell_1$$.

2. A locally testable code is equivalent to a Cayley graph over $$\mathbb{F}^h_2$$ that has significantly more than $$h$$ eigenvalues near 1, which have no short linear dependencies among them and which “explain” all of the large eigenvalues. This extends and gives a converse to a recent construction of Barak et al. (2012), which showed that locally testable codes imply Cayley graphs that are small-set expanders but have many large eigenvalues.

Nissim, Kobbi, Salil Vadhan, and David Xiao. “Redrawing the boundaries on purchasing data from privacy-sensitive individuals.” In Moni Naor, editor, Innovations in Theoretical Computer Science (ITCS ‘14), 411-422. ACM, 2014. Publisher's VersionAbstract

Version History: Full version posted as arXiv:1401.4092 [cs.GT].

We prove new positive and negative results concerning the existence of truthful and individually rational mechanisms for purchasing private data from individuals with unbounded and sensitive privacy preferences. We strengthen the impos- sibility results of Ghosh and Roth (EC 2011) by extending it to a much wider class of privacy valuations. In particular, these include privacy valuations that are based on $$(\varepsilon, \delta)$$- differentially private mechanisms for non-zero $$\delta$$, ones where the privacy costs are measured in a per-database manner (rather than taking the worst case), and ones that do not depend on the payments made to players (which might not be observable to an adversary).

To bypass this impossibility result, we study a natural special setting where individuals have monotonic privacy valuations, which captures common contexts where certain values for private data are expected to lead to higher valuations for privacy (e.g. having a particular disease). We give new mechanisms that are individually rational for all players with monotonic privacy valuations, truthful for all players whose privacy valuations are not too large, and accu- rate if there are not too many players with too-large privacy valuations. We also prove matching lower bounds showing that in some respects our mechanism cannot be improved significantly.

Wood, Alexandra, David O'Brien, Micah Altman, Alan Karr, Urs Gasser, Michael Bar-Sinai, Kobbi Nissim, Jonathan Ullman, Salil Vadhan, and Michael Wojcik. “Integrating approaches to privacy across the research lifecycle: Long-term longitudinal studies.” Berkman Center Research Publication No. 2014-12, 2014, July. Publisher's VersionAbstract

Version History: Available at SSRN: http://ssrn.com/abstract=2469848.

On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data.

This workshop report, the first in a series, provides an overview of the long-term longitudinal study use case. Long-term longitudinal studies collect, at multiple points over a long period of time, highly-specific and often sensitive data describing the health, socioeconomic, or behavioral characteristics of human subjects. The value of such studies lies in part in their ability to link a set of behaviors and changes to each individual, but these factors tend to make the combination of observable characteristics associated with each subject unique and potentially identifiable.

Using the research information lifecycle as a framework, this report discusses the defining features of long-term longitudinal studies and the associated challenges for researchers tasked with collecting and analyzing such data while protecting the privacy of human subjects. It also describes the disclosure risks and common legal and technical approaches currently used to manage confidentiality in longitudinal data. Finally, it identifies urgent problems and areas for future research to advance the integration of various methods for preserving confidentiality in research data.

2013
Chung, Kai-Min, Michael Mitzenmacher, and Salil P. Vadhan. “Why simple hash functions work: Exploiting the entropy in a data stream.” Theory of Computing 9 (2013): 897-945. Publisher's VersionAbstract

Version History: Merge of conference papers from SODA ‘08 (with the same title) and RANDOM ‘08 (entitled “Tight Bounds for Hashing Block Sources”).

Hashing is fundamental to many algorithms and data structures widely used in practice. For the theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits (in the length of a data item) to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or $$O$$(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing.

In practice, however, it is commonly observed that simple hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source,” whereby each new data item has some “entropy” given the previous ones. As long as the Rényi entropy per data item is sufficiently large, it turns out that the performance when choosing a hash function from a 2-universal family is essentially the same as for a truly random hash function. We describe results for several sample applications, including linear probing, chained hashing, balanced allocations, and Bloom filters.

Towards developing our results, we prove tight bounds for hashing block sources, determining the entropy required per block for the distribution of hashed values to be close to uniformly distributed.

Haitner, Iftach, Omer Reingold, and Salil Vadhan. “Efficiency improvements in constructing pseudorandom generators from one-way functions.” SIAM Journal on Computing 42, no. 3 (2013): 1405-1430. Publisher's VersionAbstract

Version HistorySpecial Issue on STOC ‘10.

We give a new construction of pseudorandom generators from any one-way function. The construction achieves better parameters and is simpler than that given in the seminal work of Håstad, Impagliazzo, Levin, and Luby [SICOMP ’99]. The key to our construction is a new notion of next-block pseudoentropy, which is inspired by the notion of “in-accessible entropy” recently introduced in [Haitner, Reingold, Vadhan, and Wee, STOC ’09]. An additional advan- tage over previous constructions is that our pseudorandom generators are parallelizable and invoke the one-way function in a non-adaptive manner. Using [Applebaum, Ishai, and Kushilevitz, SICOMP ’06], this implies the existence of pseudorandom generators in NC$$^0$$ based on the existence of one-way functions in NC$$^1$$.

Reshef, Yakir, and Salil Vadhan. “On extractors and exposure-resilient functions for sublogarithmic entropy.” Random Structures & Algorithms 42, no. 3 (2013): 386-401. ArXiv VersionAbstract

Version HistoryPreliminary version posted as arXiv:1003.4029 (Dec. 2010).

We study resilient functions and exposure‐resilient functions in the low‐entropy regime. A resilient function (a.k.a. deterministic extractor for oblivious bit‐fixing sources) maps any distribution on n‐bit strings in which bits are uniformly random and the rest are fixed into an output distribution that is close to uniform. With exposure‐resilient functions, all the input bits are random, but we ask that the output be close to uniform conditioned on any subset of n – k input bits. In this paper, we focus on the case that k is sublogarithmic in n.

We simplify and improve an explicit construction of resilient functions for k sublogarithmic in n due to Kamp and Zuckerman (SICOMP 2006), achieving error exponentially small in krather than polynomially small in k. Our main result is that when k is sublogarithmic in n, the short output length of this construction ($$O(\log k)$$ output bits) is optimal for extractors computable by a large class of space‐bounded streaming algorithms.

Next, we show that a random function is a resilient function with high probability if and only if k is superlogarithmic in n, suggesting that our main result may apply more generally. In contrast, we show that a random function is a static (resp. adaptive) exposure‐resilient function with high probability even if k is as small as a constant (resp. log log n). No explicit exposure‐resilient functions achieving these parameters are known.

Mahmoody, Mohammad, Tal Moran, and Salil Vadhan. “Publicly verifiable proofs of sequential work.” In Innovations in Theoretical Computer Science (ITCS ‘13), 373-388. ACM, 2013. Publisher's VersionAbstract

Version HistoryPreliminary version posted as Cryptology ePrint Archive Report 2011/553, under title “Non-Interactive Time-Stamping and Proofs of Work in the Random Oracle Model”.

We construct a publicly verifiable protocol for proving computational work based on collision- resistant hash functions and a new plausible complexity assumption regarding the existence of “inherently sequential” hash functions. Our protocol is based on a novel construction of time-lock puzzles. Given a sampled “puzzle” $$\mathcal{P} \overset{}\gets \mathbf{D}_n$$, where $$n$$ is the security parameter and $$\mathbf{D}_n$$ is the distribution of the puzzles, a corresponding “solution” can be generated using $$N$$ evaluations of the sequential hash function, where $$N > n$$ is another parameter, while any feasible adversarial strategy for generating valid solutions must take at least as much time as $$\Omega(N)$$ sequential evaluations of the hash function after receiving $$\mathcal{P}$$. Thus, valid solutions constitute a “proof” that $$\Omega(N)$$ parallel time elapsed since $$\mathcal{P}$$ was received. Solutions can be publicly and efficiently verified in time $$\mathrm{poly}(n) \cdot \mathrm{polylog}(N)$$. Applications of these “time-lock puzzles” include noninteractive timestamping of documents (when the distribution over the possible documents corresponds to the puzzle distribution $$\mathbf{D}_n$$) and universally verifiable CPU benchmarks.

Our construction is secure in the standard model under complexity assumptions (collision- resistant hash functions and inherently sequential hash functions), and makes black-box use of the underlying primitives. Consequently, the corresponding construction in the random oracle model is secure unconditionally. Moreover, as it is a public-coin protocol, it can be made non- interactive in the random oracle model using the Fiat-Shamir Heuristic.

Our construction makes a novel use of “depth-robust” directed acyclic graphs—ones whose depth remains large even after removing a constant fraction of vertices—which were previously studied for the purpose of complexity lower bounds. The construction bypasses a recent negative result of Mahmoody, Moran, and Vadhan (CRYPTO ‘11) for time-lock puzzles in the random oracle model, which showed that it is impossible to have time-lock puzzles like ours in the random oracle model if the puzzle generator also computes a solution together with the puzzle.

Rothblum, Guy N., Salil Vadhan, and Avi Wigderson. “Interactive proofs of proximity: delegating computation in sublinear time.” In Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC ‘13), 793-802. New York, NY: ACM, 2013. Publisher's VersionAbstract

We study interactive proofs with sublinear-time verifiers. These proof systems can be used to ensure approximate correctness for the results of computations delegated to an untrusted server. Following the literature on property testing, we seek proof systems where with high probability the verifier accepts every input in the language, and rejects every input that is far from the language. The verifier’s query complexity (and computation complexity), as well as the communication, should all be sublinear. We call such a proof system an Interactive Proof of Proximity (IPP).

• On the positive side, our main result is that all languages in $$\mathcal{NC}$$ have Interactive Proofs of Proximity with roughly $$\sqrt{n}$$ query and communication and complexities, and $$\mathrm{polylog} (n)$$ communication rounds.

This is achieved by identifying a natural language, membership in an affine subspace (for a structured class of subspaces), that is complete for constructing interactive proofs of proximity, and providing efficient protocols for it. In building an IPP for this complete language, we show a tradeoff between the query and communication complexity and the number of rounds. For example, we give a 2-round protocol with roughly $$n^{3/4}$$ queries and communication.

• On the negative side, we show that there exist natural languages in $$\mathcal{NC}^1$$, for which the sum of queries and communication in any constant-round interactive proof of proximity must be polynomially related to n. In particular, for any 2-round protocol, the sum of queries and communication must be at least $$\tilde{\Omega}(\sqrt{n})$$.

• Finally, we construct much better IPPs for specific functions, such as bipartiteness on random or well-mixing graphs, and the majority function. The query complexities of these protocols are provably better (by exponential or polynomial factors) than what is possible in the standard property testing model, i.e. without a prover.

Vadhan, Salil, and Colin Jia Zheng. “A uniform min-max theorem with applications in cryptography.” In Ran Canetti and Juan Garay, editors, Advances in Cryptology—CRYPTO ‘13, Lecture Notes on Computer Science, 8042:93-110. Springer Verlag, Lecture Notes in Computer Science, 2013. Publisher's VersionAbstract
Version History:
Full version published on ECCC2013 and IACR ePrint 2013.

We present a new, more constructive proof of von Neumann’s Min-Max Theorem for two-player zero-sum game — specifically, an algorithm that builds a near-optimal mixed strategy for the second player from several best-responses of the second player to mixed strategies of the first player. The algorithm extends previous work of Freund and Schapire (Games and Economic Behavior ’99) with the advantage that the algorithm runs in poly$$(n)$$ time even when a pure strategy for the first player is a distribution chosen from a set of distributions over $$\{0,1\}^n$$. This extension enables a number of additional applications in cryptography and complexity theory, often yielding uniform security versions of results that were previously only proved for nonuniform security (due to use of the non-constructive Min-Max Theorem).

We describe several applications, including a more modular and improved uniform version of Impagliazzo’s Hardcore Theorem (FOCS ’95), showing impossibility of constructing succinct non-interactive arguments (SNARGs) via black-box reductions under uniform hardness assumptions (using techniques from Gentry and Wichs (STOC ’11) for the nonuniform setting), and efficiently simulating high entropy distributions within any sufficiently nice convex set (extending a result of Trevisan, Tulsiani and Vadhan (CCC ’09)).

Reingold, Omer, Thomas Steinke, and Salil Vadhan. “Pseudorandomness for regular branching programs via Fourier analysis.” In Sofya Raskhodnikova and José Rolim, editors, Proceedings of the 17th International Workshop on Randomization and Computation (RANDOM ‘13), Lecture Notes in Computer Science, 8096:655-670. Springer-Verlag, 2013. Publisher's VersionAbstract

Version HistoryFull version posted as ECCC TR13-086 and arXiv:1306.3004 [cs.CC].

We present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is $$O(\log^2n)$$, where $$n$$ is the length of the branching program. The previous best seed length known for this model was $$n^{1/2+o(1)}$$, which follows as a special case of a generator due to Impagliazzo, Meka, and Zuckerman (FOCS 2012) (which gives a seed length of $$s^{1/2+o(1)}$$ for arbitrary branching programs of size $$s$$). Our techniques also give seed length $$n^{1/2+o(1)}$$ for general oblivious, read-once branching programs of width $$2^{n^{o(1)}}$$) , which is incomparable to the results of Impagliazzo et al.

Our pseudorandom generator is similar to the one used by Gopalan et al. (FOCS 2012) for read-once CNFs, but the analysis is quite different; ours is based on Fourier analysis of branching programs. In particular, we show that an oblivious, read-once, regular branching program of width $$w$$ has Fourier mass at most $$(2w^2)^k$$ at level $$k$$, independent of the length of the program.

2012
Schoenebeck, Grant, and Salil Vadhan. “The computational complexity of Nash equilibria in concisely represented games.” ACM Transactions on Computation Theory 4, no. 2 (2012). Publisher's VersionAbstract

Version History: Preliminary versions as ECCC TOR05-52 (https://eccc.weizmann.ac.il/report/2005/052/; attached as ECCC2005.pdf) and in EC '06 (https://dl.acm.org/doi/10.1145/1134707.1134737; attached as EC2006.pdf).

Games may be represented in many different ways, and different representations of games affect the complexity of problems associated with games, such as finding a Nash equilibrium. The traditional method of representing a game is to explicitly list all the payoffs, but this incurs an exponential blowup as the number of agents grows. We study two models of concisely represented games: circuit games, where the payoffs are computed by a given boolean circuit, and graph games, where each agent’s payoff is a function of only the strategies played by its neighbors in a given graph. For these two models, we study the complexity of four questions: determining if a given strategy is a Nash equilibrium, finding a Nash equilibrium, determining if there exists a pure Nash equilibrium, and determining if there exists a Nash equilibrium in which the payoffs to a player meet some given guarantees. In many cases, we obtain tight results, showing that the problems are complete for various complexity classes.

Dodis, Yevgeniy, Thomas Ristenpart, and Salil Vadhan. “Randomness condensers for efficiently samplable, seed-dependent sources.” In Ronald Cramer, editor, Proceedings of the 9th IACR Theory of Cryptography Conference (TCC ‘12), Lecture Notes on Computer Science, 7194:618-635. Springer-Verlag, 2012. Publisher's VersionAbstract

We initiate a study of randomness condensers for sources that are efficiently samplable but may depend on the seed of the condenser. That is, we seek functions $$\mathsf{Cond} : \{0,1\}^n \times \{0,1\}^d \to \{0,1\}^m$$such that if we choose a random seed $$S \gets \{0,1\}^d$$, and a source $$X = \mathcal{A}(S)$$ is generated by a randomized circuit $$\mathcal{A}$$ of size $$t$$ such that $$X$$ has min- entropy at least $$k$$ given $$S$$, then $$\mathsf{Cond}(X ; S)$$ should have min-entropy at least some $$k'$$ given $$S$$. The distinction from the standard notion of randomness condensers is that the source $$X$$ may be correlated with the seed $$S$$ (but is restricted to be efficiently samplable). Randomness extractors of this type (corresponding to the special case where $$k' = m$$) have been implicitly studied in the past (by Trevisan and Vadhan, FOCS ‘00).

We show that:

• Unlike extractors, we can have randomness condensers for samplable, seed-dependent sources whose computational complexity is smaller than the size $$t$$ of the adversarial sampling algorithm $$\mathcal{A}$$. Indeed, we show that sufficiently strong collision-resistant hash functions are seed-dependent condensers that produce outputs with min-entropy $$k' = m – \mathcal{O}(\log t)$$, i.e. logarithmic entropy deficiency.

• Randomness condensers suffice for key derivation in many cryptographic applications: when an adversary has negligible success probability (or negligible “squared advantage” [3]) for a uniformly random key, we can use instead a key generated by a condenser whose output has logarithmic entropy deficiency.

• Randomness condensers for seed-dependent samplable sources that are robust to side information generated by the sampling algorithm imply soundness of the Fiat-Shamir Heuristic when applied to any constant-round, public-coin interactive proof system.

Vadhan, Salil, and Colin Jia Zheng. “Characterizing pseudoentropy and simplifying pseudorandom generator constructions.” In Proceedings of the 44th Annual ACM Symposium on Theory of Computing (STOC ‘12), 817-836. ACM, 2012. Publisher's VersionAbstract

Version HistoryFull version posted as ECCC TR11-141.

We provide a characterization of pseudoentropy in terms of hardness of sampling: Let ($$X, B$$) be jointly distributed random variables such that $$B$$ takes values in a polynomial-sized set. We show that $$B$$ is computationally indistinguishable from a random variable of higher Shannon entropy given $$X$$ if and only if there is no probabilistic polynomial-time $$S$$ such that $$(X, S(X))$$ has small KL divergence from $$(X, B)$$. This can be viewed as an analogue of the Impagliazzo Hard- core Theorem (FOCS ‘95) for Shannon entropy (rather than min-entropy).

Using this characterization, we show that if $$f$$ is a one-way function, then $$(f(U_n), U_n)$$ has “next-bit pseudoentropy” at least $$n + \log n$$, establishing a conjecture of Haitner, Reingold, and Vadhan (STOC ‘10). Plugging this into the construction of Haitner et al., this yields a simpler construction of pseudorandom generators from one-way functions. In particular, the construction only performs hashing once, and only needs the hash functions that are randomness extractors (e.g. universal hash functions) rather than needing them to support “local list-decoding” (as in the Goldreich–Levin hardcore predicate, STOC ‘89).

With an additional idea, we also show how to improve the seed length of the pseudorandom generator to $$\tilde{O}(n^3)$$, compared to $$\tilde{O}(n^4)$$ in the construction of Haitner et al.

Thaler, Justin, Jonathan Ullman, and Salil Vadhan. “Faster algorithms for privately releasing marginals.” In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Proceedings of the 39th International Colloquium on Automata, Languages, and Programming (ICALP ‘12), Lecture Notes on Computer Science, 7391:810-821. Springer-Verlag, 2012. Publisher's VersionAbstract

Version History: Full version posted as arXiv:1205.1758v2.

We study the problem of releasing k-way marginals of a database $$D ∈ ({0,1}^d)^n$$, while preserving differential privacy. The answer to a $$k$$-way marginal query is the fraction of $$D$$’s records $$x \in \{0,1\}^d$$ with a given value in each of a given set of up to $$k$$ columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS ’07).

We give an algorithm that runs in time $$d^{O(\sqrt{K})}$$ and releases a private summary capable of answering any $$k$$-way marginal query with at most ±.01 error on every query as long as $$n \geq d^{O(\sqrt{K})}$$. To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of k-way marginal queries, which is $$d^{\Theta(k)}$$ (for $$k ≪ d$$).

Dodis, Yevgeniy, Adriana López-Alt, Ilya Mironov, and Salil Vadhan. “Differential privacy with imperfect randomness.” In Ran Canetti and Rei Safavi-Naini, editors, Proceedings of the 32nd International Cryptology Conference (CRYPTO ‘12), Lecture Notes on Computer Science, 7417:497-516. Springer-Verlag, 2012. Publisher's VersionAbstract

In this work we revisit the question of basing cryptography on imperfect randomness. Bosley and Dodis (TCC’07) showed that if a source of randomness $$\mathcal{R}$$ is “good enough” to generate a secret key capable of encrypting $$k$$ bits, then one can deterministically extract nearly $$k$$ almost uniform bits from $$\mathcal{R}$$, suggesting that traditional privacy notions (namely, indistinguishability of encryption) requires an “extractable” source of randomness. Other, even stronger impossibility results are known for achieving privacy under specific “non-extractable” sources of randomness, such as the $$\gamma$$-Santha-Vazirani (SV) source, where each next bit has fresh entropy, but is allowed to have a small bias $$\gamma < 1$$ (possibly depending on prior bits).

We ask whether similar negative results also hold for a more recent notion of privacy called differential privacy (Dwork et al., TCC’06), concentrating, in particular, on achieving differential privacy with the Santha-Vazirani source. We show that the answer is no. Specifically, we give a differentially private mechanism for approximating arbitrary “low sensitivity” functions that works even with randomness coming from a $$\gamma$$-Santha-Vazirani source, for any $$\gamma < 1$$. This provides a somewhat surprising “separation” between traditional privacy and differential privacy with respect to imperfect randomness.

Interestingly, the design of our mechanism is quite different from the traditional “additive-noise” mechanisms (e.g., Laplace mechanism) successfully utilized to achieve differential privacy with perfect randomness. Indeed, we show that any (non-trivial) “SV-robust” mechanism for our problem requires a demanding property called consistent sampling, which is strictly stronger than differential privacy, and cannot be satisfied by any additive-noise mechanism.