Pseudorandomness and Combinatorial Constructions (US-Israel Binational Science Foundation; BSF 2002246)

Chung, Kai-Min, Michael Mitzenmacher, and Salil P. Vadhan. “Why simple hash functions work: Exploiting the entropy in a data stream.” Theory of Computing 9 (2013): 897-945. Publisher's VersionAbstract

Version History: Merge of conference papers from SODA ‘08 (with the same title) and RANDOM ‘08 (entitled “Tight Bounds for Hashing Block Sources”).

Hashing is fundamental to many algorithms and data structures widely used in practice. For the theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits (in the length of a data item) to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or \(O\)(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing.

In practice, however, it is commonly observed that simple hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source,” whereby each new data item has some “entropy” given the previous ones. As long as the Rényi entropy per data item is sufficiently large, it turns out that the performance when choosing a hash function from a 2-universal family is essentially the same as for a truly random hash function. We describe results for several sample applications, including linear probing, chained hashing, balanced allocations, and Bloom filters.

Towards developing our results, we prove tight bounds for hashing block sources, determining the entropy required per block for the distribution of hashed values to be close to uniformly distributed.


THEORYCOMP2013.pdf RANDOM2008.pdf SODA2008.pdf
Haitner, Iftach, Omer Reingold, and Salil Vadhan. “Efficiency improvements in constructing pseudorandom generators from one-way functions.” SIAM Journal on Computing 42, no. 3 (2013): 1405-1430. Publisher's VersionAbstract

Version HistorySpecial Issue on STOC ‘10.

We give a new construction of pseudorandom generators from any one-way function. The construction achieves better parameters and is simpler than that given in the seminal work of Håstad, Impagliazzo, Levin, and Luby [SICOMP ’99]. The key to our construction is a new notion of next-block pseudoentropy, which is inspired by the notion of “in-accessible entropy” recently introduced in [Haitner, Reingold, Vadhan, and Wee, STOC ’09]. An additional advan- tage over previous constructions is that our pseudorandom generators are parallelizable and invoke the one-way function in a non-adaptive manner. Using [Applebaum, Ishai, and Kushilevitz, SICOMP ’06], this implies the existence of pseudorandom generators in NC\(^0\) based on the existence of one-way functions in NC\(^1\).

SIAM2013.pdf STOC2010.pdf
Reshef, Yakir, and Salil Vadhan. “On extractors and exposure-resilient functions for sublogarithmic entropy.” Random Structures & Algorithms 42, no. 3 (2013): 386-401. ArXiv VersionAbstract

Version HistoryPreliminary version posted as arXiv:1003.4029 (Dec. 2010).

We study resilient functions and exposure‐resilient functions in the low‐entropy regime. A resilient function (a.k.a. deterministic extractor for oblivious bit‐fixing sources) maps any distribution on n‐bit strings in which bits are uniformly random and the rest are fixed into an output distribution that is close to uniform. With exposure‐resilient functions, all the input bits are random, but we ask that the output be close to uniform conditioned on any subset of n – k input bits. In this paper, we focus on the case that k is sublogarithmic in n.

We simplify and improve an explicit construction of resilient functions for k sublogarithmic in n due to Kamp and Zuckerman (SICOMP 2006), achieving error exponentially small in krather than polynomially small in k. Our main result is that when k is sublogarithmic in n, the short output length of this construction (\(O(\log k)\) output bits) is optimal for extractors computable by a large class of space‐bounded streaming algorithms.

Next, we show that a random function is a resilient function with high probability if and only if k is superlogarithmic in n, suggesting that our main result may apply more generally. In contrast, we show that a random function is a static (resp. adaptive) exposure‐resilient function with high probability even if k is as small as a constant (resp. log log n). No explicit exposure‐resilient functions achieving these parameters are known. 


ArXiv2010.pdf RANDOM2013.pdf
Kamp, Jesse, Anup Rao, Salil Vadhan, and David Zuckerman. “Deterministic extractors for small-space sources.” Journal of Computer and System Sciences 77, no. 1 (2011): 191-220. Publisher's VersionAbstract

Version History: Special issue to celebrate Richard Karp's Kyoto Prize. Extended abstract in STOC '06.

We give polynomial-time, deterministic randomness extractors for sources generated in small space, where we model space \(s\) sources on\(\{0,1\}^n\) as sources generated by width \(2^s\) branching programs. Specifically, there is a constant \(η > 0\) such that for any \(ζ > n^{−η}\), our algorithm extracts \(m = (δ − ζ)n\) bits that are exponentially close to uniform (in variation distance) from space \(s\) sources with min-entropy \(δn\), where \(s = Ω(ζ^ 3n)\). Previously, nothing was known for \(δ \ll 1/2,\), even for space \(0\). Our results are obtained by a reduction to the class of total-entropy independent sources. This model generalizes both the well-studied models of independent sources and symbol-fixing sources. These sources consist of a set of \(r\) independent smaller sources over \(\{0, 1\}^\ell\), where the total min-entropy over all the smaller sources is \(k\). We give deterministic extractors for such sources when \(k\) is as small as \(\mathrm{polylog}(r)\), for small enough \(\ell\).


JCSS2011.pdf STOC-05-2006.pdf
Chung, Kai-Min, Omer Reingold, and Salil Vadhan. “S-T connectivity on digraphs with a known stationary distribution.” In ACM Transactions on Algorithms. Vol. 7. 3rd ed. ACM, 2011. Publisher's VersionAbstract

Version history: Preliminary versions in CCC '07 and on ECCC (TR07-030).

We present a deterministic logspace algorithm for solving S-T Connectivity on directed graphs if: (i) we are given a stationary distribution of the random walk on the graph in which both of the input vertices \(s\) and \(t\) have nonnegligible probability mass and (ii) the random walk which starts at the source vertex \(s\) has polynomial mixing time. This result generalizes the recent deterministic logspace algorithm for S-T Connectivity on undirected graphs [Reingold, 2008]. It identifies knowledge of the stationary distribution as the gap between the S-T Connectivity problems we know how to solve in logspace (L) and those that capture all of randomized logspace (RL).


ECCC2007.pdf ACM2011.pdf
Haitner, Iftach, Thomas Holenstein, Omer Reingold, Salil Vadhan, and Hoeteck Wee. “Universal one-way hash functions via inaccessible entropy.” In Henri Gilbert, editor, Advances in Cryptology—EUROCRYPT ‘10, Lecture Notes on Computer Science, 6110:616-637. Springer-Verlag, 2010. Publisher's VersionAbstract

This paper revisits the construction of Universal One-Way Hash Functions (UOWHFs) from any one-way function due to Rompel (STOC 1990). We give a simpler construction of UOWHFs, which also obtains better efficiency and security. The construction exploits a strong connection to the recently introduced notion of inaccessible entropy (Haitner et al. STOC 2009). With this perspective, we observe that a small tweak of any one-way function \(f\) is already a weak form of a UOWHF: Consider \(F(x', i)\) that outputs the \(i\)-bit long prefix of \(f(x)\). If \(F\) were a UOWHF then given a random \(x\) and \(i\) it would be hard to come up with \(x' \neq x\) such that \(F(x, i) = F(x', i)\). While this may not be the case, we show (rather easily) that it is hard to sample \(x'\) with almost full entropy among all the possible such values of \(x'\). The rest of our construction simply amplifies and exploits this basic property.

With this and other recent works, we have that the constructions of three fundamental cryptographic primitives (Pseudorandom Generators, Statistically Hiding Commitments and UOWHFs) out of one-way functions are to a large extent unified. In particular, all three constructions rely on and manipulate computational notions of entropy in similar ways. Pseudorandom Generators rely on the well-established notion of pseudoentropy, whereas Statistically Hiding Commitments and UOWHFs rely on the newer notion of inaccessible entropy.

McGregor, Andrew, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. “The limits of two-party differential privacy.” In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘10), 81-90. IEEE, 2010. Publisher's VersionAbstract

Version History and Errata: Subsequent version published in ECCC 2011. Proposition 8 and Part (b) of Theorem 13 in the FOCS version are incorrect, and are removed from the ECCC version.


We study differential privacy in a distributed setting where two parties would like to perform analysis of their joint data while preserving privacy for both datasets. Our results imply almost tight lower bounds on the accuracy of such data analyses, both for specific natural functions (such as Hamming distance) and in general. Our bounds expose a sharp contrast between the two-party setting and the simpler client-server setting (where privacy guarantees are one-sided). In addition, those bounds demonstrate a dramatic gap between the accuracy that can be obtained by differentially private data analysis versus the accuracy obtainable when privacy is relaxed to a computational variant of differential privacy. The first proof technique we develop demonstrates a connection between differential privacy and deterministic extraction from Santha-Vazirani sources. A second connection we expose indicates that the ability to approximate a function by a low-error differentially private protocol is strongly related to the ability to approximate it by a low communication protocol. (The connection goes in both directions).

FOCS2010.pdf ECCC2011.pdf
Guruswami, Venkatesan, Christopher Umans, and Salil Vadhan. “Unbalanced expanders and randomness extractors from Parvaresh–Vardy codes.” Journal of the ACM 56, no. 4 (2009): 1–34. Publisher's VersionAbstract

Version History: Preliminary versions of this article appeared as Technical Report TR06-134 in Electronic Colloquium on Computational Complexity, 2006, and in Proceedings of the 22nd Annual IEEE Conference on Computional Complexity (CCC '07), pp. 96–108. Preliminary version recipient of Best Paper Award at CCC '07.

We give an improved explicit construction of highly unbalanced bipartite expander graphs with expansion arbitrarily close to the degree (which is polylogarithmic in the number of vertices). Both the degree and the number of right-hand vertices are polynomially close to optimal, whereas the previous constructions of Ta-Shma et al. [2007] required at least one of these to be quasipolynomial in the optimal. Our expanders have a short and self-contained description and analysis, based on the ideas underlying the recent list-decodable error-correcting codes of Parvaresh and Vardy [2005].

Our expanders can be interpreted as near-optimal “randomness condensers,” that reduce the task of extracting randomness from sources of arbitrary min-entropy rate to extracting randomness from sources of min-entropy rate arbitrarily close to 1, which is a much easier task. Using this connection, we obtain a new, self-contained construction of randomness extractors that is optimal up to constant factors, while being much simpler than the previous construction of Lu et al. [2003] and improving upon it when the error parameter is small (e.g., 1/poly(n)).

JACM2009.pdf CCC2007.pdf ECCC2006.pdf ECCC - Rev. 2008.pdf
Haitner, Iftach, Minh Nguyen, Shien Jin Ong, Omer Reingold, and Salil Vadhan. “Statistically hiding commitments and statistical zero-knowledge arguments from any one-way function.” SIAM Journal on Computing 39, no. 3 (2009): 1153-1218. Publisher's VersionAbstract

Version HistorySpecial Issue on STOC ‘07. Merge of papers from FOCS ‘06 and STOC ‘07. Received SIAM Outstanding Paper Prize 2011.

We give a construction of statistically hiding commitment schemes (those in which the hiding property holds against even computationally unbounded adversaries) under the minimal complexity assumption that one-way functions exist. Consequently, one-way functions suffice to give statistical zero-knowledge arguments for any NP statement (whereby even a computationally unbounded adversarial verifier learns nothing other than the fact that the assertion being proven is true, and no polynomial-time adversarial prover can convince the verifier of a false statement). These results resolve an open question posed by Naor et al. [J. Cryptology, 11 (1998), pp. 87–108].

Haitner, Iftach, Omer Reingold, Salil Vadhan, and Hoeteck Wee. “Inaccessible entropy.” In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC ‘09), 611-620. ACM, 2009. Publisher's VersionAbstract

We put forth a new computational notion of entropy, which measures the (in)feasibility of sampling high entropy strings that are consistent with a given protocol. Specifically, we say that the \(i\)’th round of a protocol \((\mathsf{A,B})\) has accessible entropy at most \(k\), if no polynomial-time strategy \(\mathsf{A}^*\) can generate messages for \(\mathsf{A}\) such that the entropy of its message in the \(i\)’th round has entropy greater than \(k\) when conditioned both on prior messages of the protocol and on prior coin tosses of \(\mathsf{A}^*\). We say that the protocol has inaccessible entropy if the total accessible entropy (summed over the rounds) is noticeably smaller than the real entropy of \(\mathsf{A}\)’s messages, conditioned only on prior messages (but not the coin tosses of \(\mathsf{A}\)). As applications of this notion, we

  • Give a much simpler and more efficient construction of statistically hiding commitment schemes from arbitrary one- way functions.

  • Prove that constant-round statistically hiding commitments are necessary for constructing constant-round zero-knowledge proof systems for NP that remain secure under parallel composition (assuming the existence of one-way functions).

Mironov, Ilya, Omkant Pandey, Omer Reingold, and Salil Vadhan. “Computational differential privacy.” In S. Halevi, editor, Advances in Cryptology—CRYPTO ‘09, Lecture Notes in Computer Science, 5677:126-142. Springer-Verlag, 2009. Publisher's VersionAbstract

The definition of differential privacy has recently emerged as a leading standard of privacy guarantees for algorithms on statistical databases. We offer several relaxations of the definition which require privacy guarantees to hold only against efficient—i.e., computationally-bounded—adversaries. We establish various relationships among these notions, and in doing so, we observe their close connection with the theory of pseudodense sets by Reingold et al.[1]. We extend the dense model theorem of Reingold et al. to demonstrate equivalence between two definitions (indistinguishability-and simulatability-based) of computational differential privacy.

Our computational analogues of differential privacy seem to allow for more accurate constructions than the standard information-theoretic analogues. In particular, in the context of private approximation of the distance between two vectors, we present a differentially-private protocol for computing the approximation, and contrast it with a substantially more accurate protocol that is only computationally differentially private.

Lovett, Shachar, Omer Reingold, Luca Trevisan, and Salil Vadhan. “Pseudorandom bit generators that fool modular sums.” In Proceedings of the 13th International Workshop on Randomization and Computation (RANDOM ‘09), Lecture Notes in Computer Science, 5687:615-630. Springer-Verlag, 2009. Publisher's VersionAbstract

We consider the following problem: for given \(n, M,\) produce a sequence \(X_1, X_2, . . . , X_n\) of bits that fools every linear test modulo \(M\). We present two constructions of generators for such sequences. For every constant prime power \(M\), the first construction has seed length \(O_M(\log(n/\epsilon))\), which is optimal up to the hidden constant. (A similar construction was independently discovered by Meka and Zuckerman [MZ]). The second construction works for every \(M, n,\) and has seed length \(O(\log n + \log(M/\epsilon) \log( M \log(1/\epsilon)))\).

The problem we study is a generalization of the problem of constructing small bias distributions [NN], which are solutions to the \(M=2\) case. We note that even for the case \(M=3\) the best previously known con- structions were generators fooling general bounded-space computations, and required \(O(\log^2 n)\) seed length.

For our first construction, we show how to employ recently constructed generators for sequences of elements of \(\mathbb{Z}_M\) that fool small-degree polynomials modulo \(M\). The most interesting technical component of our second construction is a variant of the derandomized graph squaring operation of [RV]. Our generalization handles a product of two distinct graphs with distinct bounds on their expansion. This is then used to produce pseudorandom walks where each step is taken on a different regular directed graph (rather than pseudorandom walks on a single regular directed graph as in [RTV, RV]).

Ong, Shien Jin, and Salil Vadhan. “An equivalence between zero knowledge and commitments.” In R. Canetti, editor, Proceedings of the Third Theory of Cryptography Conference (TCC ‘08), 4948:482-500. Springer Verlag, Lecture Notes in Computer Science, 2008. Publisher's VersionAbstract

We show that a language in NP has a zero-knowledge protocol if and only if the language has an “instance-dependent” commitment scheme. An instance-dependent commitment schemes for a given language is a commitment scheme that can depend on an instance of the language, and where the hiding and binding properties are required to hold only on the yes and no instances of the language, respectively.

The novel direction is the only if direction. Thus, we confirm the widely held belief that commitments are not only sufficient for zero knowledge protocols, but necessary as well. Previous results of this type either held only for restricted types of protocols or languages, or used nonstandard relaxations of (instance-dependent) commitment schemes.

Gutfreund, Dan, and Salil Vadhan. “Limitations on hardness vs. randomness under uniform reductions.” In Proceedings of the 12th International Workshop on Randomization and Computation (RANDOM ‘08), Lecture Notes in Computer Science, 5171:469-482. Springer-Verlag, 2008. Publisher's VersionAbstract

We consider (uniform) reductions from computing a function \({f}\) to the task of distinguishing the output of some pseudorandom generator \({G}\) from uniform. Impagliazzo and Wigderson [10] and Trevisan and Vadhan [24] exhibited such reductions for every function \({f}\) in PSPACE. Moreover, their reductions are “black box,” showing how to use any distinguisher \({T}\), given as oracle, in order to compute \({f}\) (regardless of the complexity of \({T}\) ). The reductions are also adaptive, but with the restriction that queries of the same length do not occur in different levels of adaptivity. Impagliazzo and Wigderson [10] also exhibited such reductions for every function \({f}\) in EXP, but those reductions are not black-box, because they only work when the oracle \({T}\) is computable by small circuits.

Our main results are that:

– Nonadaptive black-box reductions as above can only exist for functions \({f}\) in BPPNP (and thus are unlikely to exist for all of PSPACE).

– Adaptive black-box reductions, with the same restriction on the adaptivity as above, can only exist for functions \({f}\) in PSPACE (and thus are unlikely to exist for all of EXP).

Beyond shedding light on proof techniques in the area of hardness vs. randomness, our results (together with [10,24]) can be viewed in a more general context as identifying techniques that overcome limitations of black-box reductions, which may be useful elsewhere in complexity theory (and the foundations of cryptography).

Reingold, Omer, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “Dense subsets of pseudorandom sets.” In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘08), 76-85. IEEE, 2008. Publisher's VersionAbstract
A theorem of Green, Tao, and Ziegler can be stated (roughly) as follows: if R is a pseudorandom set, and D is a dense subset of R, then D may be modeled by a set M that is dense in the entire domain such that D and M are indistinguishable. (The precise statement refers to "measures" or distributions rather than sets.) The proof of this theorem is very general, and it applies to notions of pseudo-randomness and indistinguishability defined in terms of any family of distinguishers with some mild closure properties. The proof proceeds via iterative partitioning and an energy increment argument, in the spirit of the proof of the weak Szemeredi regularity lemma. The "reduction" involved in the proof has exponential complexity in the distinguishing probability. We present a new proof inspired by Nisan's proof of Impagliazzo's hardcore set theorem. The reduction in our proof has polynomial complexity in the distinguishing probability and provides a new characterization of the notion of "pseudoentropy" of a distribution. A proof similar to ours has also been independently discovered by Gowers [2]. We also follow the connection between the two theorems and obtain a new proof of Impagliazzo's hardcore set theorem via iterative partitioning and energy increment. While our reduction has exponential complexity in some parameters, it has the advantage that the hardcore set is efficiently recognizable.
Healy, Alex, Salil Vadhan, and Emanuele Viola. “Using nondeterminism to amplify hardness.” SIAM Journal on Computing: Special Issue on STOC '04 35, no. 4 (2006): 903-931. Publisher's VersionAbstract

We revisit the problem of hardness amplification in \(\mathcal{NP}\) as recently studied by O’Donnell [J. Comput. System Sci., 69 (2004), pp. 68–94]. We prove that if \(\mathcal{NP}\) has a balanced function \(f\) such that any circuit of size \(s(n)\) fails to compute \(f\) on a \(1/\mathrm{poly}(n)\) fraction of inputs, then \(\mathcal{NP}\) has a function \(f'\) such that any circuit of size \(s'(n) = s(\sqrt{n})^{\Omega(1)}\) fails to compute \(f\) on a \(1/2 − 1/s' (n)\) fraction of inputs. In particular,

  1. if \(s(n) = n^{\omega(1)}\), we amplify to hardness \(1/2 - 1/n^{\omega(1)}\);
  2. if \(s(n) = 2^{n^{\Omega(1)}}\), we amplify to hardness \(1/2 - 1/2^{n^{\Omega(1)}}\);
  3. if \(s(n) = 2^{\Omega(n)}\), we amplify to hardness \(1/2 - 1/2^{\Omega(\sqrt{n})}\).

Our results improve those of of O’Donnell, which amplify to\(1/2 - 1/ \sqrt{n}\). O’Donnell also proved that no construction of a certain general form could amplify beyond \(1/2 - 1/n\). We bypass this barrier by using both derandomization and nondeterminism in the construction of \(f'\). We also prove impossibility results demonstrating that both our use of nondeterminism and the hypothesis that \(f\) is balanced are necessary for “black-box” hardness amplification procedures (such as ours).

Gradwohl, Ronen, Salil Vadhan, and David Zuckerman. “Random selection with an adversarial majority.” In Advances in Cryptology—CRYPTO ‘06, C. Dwork, ed. 4117:409–426. Springer Verlag, Lecture Notes in Computer Science, 2006. Publisher's VersionAbstract

Version History: Full version published in ECCC TR 06-026, February 2006. Updated full version published June 2006.

We consider the problem of random selection, where \(p\) players follow a protocol to jointly select a random element of a universe of size \(n\). However, some of the players may be adversarial and collude to force the output to lie in a small subset of the universe. We describe essentially the first protocols that solve this problem in the presence of a dishonest majority in the full-information model (where the adversary is computationally unbounded and all communication is via non-simultaneous broadcast). Our protocols are nearly optimal in several parameters, including the round complexity (as a function of \(n\)), the randomness complexity, the communication complexity, and the tradeoffs between the fraction of honest players, the probability that the output lies in a small subset of the universe, and the density of this subset.

CRYPTO2006.pdf ECCC-02.2006.pdf FULL-06.2006.pdf
Trevisan, Luca, Salil Vadhan, and David Zuckerman. “Compression of samplable sources.” Computational Complexity: Special Issue on CCC'04 14, no. 3 (2005): 186-227. Publisher's VersionAbstract

We study the compression of polynomially samplable sources. In particular, we give efficient prefix-free compression and decompression algorithms for three classes of such sources (whose support is a subset of \(\{0, 1\}^n\)).

  1. We show how to compress sources \(X\) samplable by logspace machines to expected length \(H(X) + O(1)\). Our next results concern flat sources whose support is in \(\mathbf{P}\).
  2. If \(H(X) ≤ k = n−O(\log n)\), we show how to compress to expected length \(k + \mathrm{polylog}(n − k)\).
  3. If the support of \(X\) is the witness set for a self-reducible \(\mathbf{NP}\) relation, then we show how to compress to expected length \(H(X)+ 5\).