Pseudorandomness and Applications (ONR Young Investigator Award; ONR Grant N00014-04-1-0478)

Chung, Kai-Min, Michael Mitzenmacher, and Salil P. Vadhan. “Why simple hash functions work: Exploiting the entropy in a data stream.” Theory of Computing 9 (2013): 897-945. Publisher's VersionAbstract

Version History: Merge of conference papers from SODA ‘08 (with the same title) and RANDOM ‘08 (entitled “Tight Bounds for Hashing Block Sources”).

Hashing is fundamental to many algorithms and data structures widely used in practice. For the theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits (in the length of a data item) to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or \(O\)(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing.

In practice, however, it is commonly observed that simple hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source,” whereby each new data item has some “entropy” given the previous ones. As long as the Rényi entropy per data item is sufficiently large, it turns out that the performance when choosing a hash function from a 2-universal family is essentially the same as for a truly random hash function. We describe results for several sample applications, including linear probing, chained hashing, balanced allocations, and Bloom filters.

Towards developing our results, we prove tight bounds for hashing block sources, determining the entropy required per block for the distribution of hashed values to be close to uniformly distributed.


THEORYCOMP2013.pdf RANDOM2008.pdf SODA2008.pdf
Kamp, Jesse, Anup Rao, Salil Vadhan, and David Zuckerman. “Deterministic extractors for small-space sources.” Journal of Computer and System Sciences 77, no. 1 (2011): 191-220. Publisher's VersionAbstract

Version History: Special issue to celebrate Richard Karp's Kyoto Prize. Extended abstract in STOC '06.

We give polynomial-time, deterministic randomness extractors for sources generated in small space, where we model space \(s\) sources on\(\{0,1\}^n\) as sources generated by width \(2^s\) branching programs. Specifically, there is a constant \(η > 0\) such that for any \(ζ > n^{−η}\), our algorithm extracts \(m = (δ − ζ)n\) bits that are exponentially close to uniform (in variation distance) from space \(s\) sources with min-entropy \(δn\), where \(s = Ω(ζ^ 3n)\). Previously, nothing was known for \(δ \ll 1/2,\), even for space \(0\). Our results are obtained by a reduction to the class of total-entropy independent sources. This model generalizes both the well-studied models of independent sources and symbol-fixing sources. These sources consist of a set of \(r\) independent smaller sources over \(\{0, 1\}^\ell\), where the total min-entropy over all the smaller sources is \(k\). We give deterministic extractors for such sources when \(k\) is as small as \(\mathrm{polylog}(r)\), for small enough \(\ell\).


JCSS2011.pdf STOC-05-2006.pdf
Chung, Kai-Min, Omer Reingold, and Salil Vadhan. “S-T connectivity on digraphs with a known stationary distribution.” In ACM Transactions on Algorithms. Vol. 7. 3rd ed. ACM, 2011. Publisher's VersionAbstract

Version history: Preliminary versions in CCC '07 and on ECCC (TR07-030).

We present a deterministic logspace algorithm for solving S-T Connectivity on directed graphs if: (i) we are given a stationary distribution of the random walk on the graph in which both of the input vertices \(s\) and \(t\) have nonnegligible probability mass and (ii) the random walk which starts at the source vertex \(s\) has polynomial mixing time. This result generalizes the recent deterministic logspace algorithm for S-T Connectivity on undirected graphs [Reingold, 2008]. It identifies knowledge of the stationary distribution as the gap between the S-T Connectivity problems we know how to solve in logspace (L) and those that capture all of randomized logspace (RL).


ECCC2007.pdf ACM2011.pdf
Guruswami, Venkatesan, and Salil Vadhan. “A lower bound on list size for list decoding.” IEEE Transactions on Information Theory 56, no. 11 (2010): 5681-5688. Publisher's VersionAbstract

Version History: Preliminary version published in RANDOM '05 ( and attached as RANDOM2005.pdf.

q-ary error-correcting code \(C ⊆ \{1,2,...,q\}^n\) is said to be list decodable to radius \(\rho\) with list size L if every Hamming ball of radius ρ contains at most L codewords of C. We prove that in order for a q-ary code to be list-decodable up to radius \((1–1/q)(1–ε)n\), we must have \(L = Ω(1/ε^2)\). Specifically, we prove that there exists a constant \(c_q >0\) and a function \(f_q\) such that for small enough \(ε > 0\), if C is list-decodable to radius\( (1–1/q)(1–ε)n \)with list size \(c_q /ε^2\), then C has at most \(f q (ε) \)codewords, independent of n. This result is asymptotically tight (treating q as a constant), since such codes with an exponential (in n) number of codewords are known for list size \(L = O(1/ε^2)\).

A result similar to ours is implicit in Blinovsky [Bli] for the binary \((q=2)\) case. Our proof works for all alphabet sizes, and is technically and conceptually simpler.


Guruswami, Venkatesan, Christopher Umans, and Salil Vadhan. “Unbalanced expanders and randomness extractors from Parvaresh–Vardy codes.” Journal of the ACM 56, no. 4 (2009): 1–34. Publisher's VersionAbstract

Version History: Preliminary versions of this article appeared as Technical Report TR06-134 in Electronic Colloquium on Computational Complexity, 2006, and in Proceedings of the 22nd Annual IEEE Conference on Computional Complexity (CCC '07), pp. 96–108. Preliminary version recipient of Best Paper Award at CCC '07.

We give an improved explicit construction of highly unbalanced bipartite expander graphs with expansion arbitrarily close to the degree (which is polylogarithmic in the number of vertices). Both the degree and the number of right-hand vertices are polynomially close to optimal, whereas the previous constructions of Ta-Shma et al. [2007] required at least one of these to be quasipolynomial in the optimal. Our expanders have a short and self-contained description and analysis, based on the ideas underlying the recent list-decodable error-correcting codes of Parvaresh and Vardy [2005].

Our expanders can be interpreted as near-optimal “randomness condensers,” that reduce the task of extracting randomness from sources of arbitrary min-entropy rate to extracting randomness from sources of min-entropy rate arbitrarily close to 1, which is a much easier task. Using this connection, we obtain a new, self-contained construction of randomness extractors that is optimal up to constant factors, while being much simpler than the previous construction of Lu et al. [2003] and improving upon it when the error parameter is small (e.g., 1/poly(n)).

JACM2009.pdf CCC2007.pdf ECCC2006.pdf ECCC - Rev. 2008.pdf
Sanghvi, Saurabh, and Salil Vadhan. “The round complexity of two-party random selection.” SIAM Journal on Computing: Special Issue on STOC '05 38, no. 2 (2008): 523-550. Publisher's VersionAbstract

Version History. Preliminary versions of this work appeared in the first author's undergraduate thesis and in the conference paper (STOC '05).

We study the round complexity of two-party protocols for generating a random \(n\)-bit string such that the output is guaranteed to have bounded “bias,” even if one of the two parties deviates from the protocol (possibly using unlimited computational resources). Specifically, we require that the output’s statistical difference from the uniform distribution on \(\{0, 1\}^n\) is bounded by a constant less than 1. We present a protocol for the above problem that has \(2\log^* n + O(1)\) rounds, improving a previous 2\(n\)-round protocol of Goldreich, Goldwasser, and Linial (FOCS ’91). Like the GGL Protocol, our protocol actually provides a stronger guarantee, ensuring that the output lands in any set \(T ⊆ \{0, 1\}^n\) of density \(μ\) with probability at most \(O( \sqrt{μ + δ})\), where \(δ\) may be an arbitrarily small constant. We then prove a nearly matching lower bound, showing that any protocol guaranteeing bounded statistical difference requires at least \(\log^* n−\log^* \log^* n−O(1)\) rounds. We also prove several results for the case when the output’s bias is measured by the maximum multiplicative factor by which a party can increase the probability of a set \(T ⊆ \{0, 1\}^n\)

Ong, Shien Jin, and Salil Vadhan. “Zero knowledge and soundness are symmetric.” In Advances in Cryptology–EUROCRYPT '07, 4515:187-209. Barcelona, Spain: Springer Verlag, Lecture Notes in Computer Science, M. Naor, ed. 2007. Publisher's VersionAbstract

Version History: Recipient of Best Paper Award. Preliminary version posted on ECCC as TR06-139, November 2006.

We give a complexity-theoretic characterization of the class of problems in NP having zero-knowledge argument systems. This characterization is symmetric in its treatment of the zero knowledge and the soundness conditions, and thus we deduce that the class of problems in NP \(\bigcap\) coNP having zero-knowledge arguments is closed under complement. Furthermore, we show that a problem in NP has a statistical zero-knowledge argument \(\)system if and only if its complement has a computational zero-knowledge proof system. What is novel about these results is that they are unconditional, i.e., do not rely on unproven complexity assumptions such as the existence of one-way functions.

Our characterization of zero-knowledge arguments also enables us to prove a variety of other unconditional results about the class of problems in NP having zero-knowledge arguments, such as equivalences between honest-verifier and malicious-verifier zero knowledge, private coins and public coins, inefficient provers and efficient provers, and non-black-box simulation and black-box simulation. Previously, such results were only known unconditionally for zero-knowledge proof systems, or under the assumption that one-way functions exist for zero-knowledge argument systems.

EUROCRYPT 2007.pdf ECCC 2006.pdf
Canetti, Ran, Ron Rivest, Madhu Sudan, Luca Trevisan, Salil Vadhan, and Hoeteck Wee. “Amplifying collision-resistance: A complexity-theoretic treatment.” In A. Menezes, editor, Advances in Cryptology (CRYPTO '07), 4622:264-283. Lecture Notes in Computer Science, Springer-Verlag, 2007. Publisher's VersionAbstract

We initiate a complexity-theoretic treatment of hardness amplification for collision-resistant hash functions, namely the transformation of weakly collision-resistant hash functions into strongly collision-resistant ones in the standard model of computation. We measure the level of collision resistance by the maximum probability, over the choice of the key, for which an efficient adversary can find a collision. The goal is to obtain constructions with short output, short keys, small loss in adversarial complexity tolerated, and a good trade-off between compression ratio and computational complexity. We provide an analysis of several simple constructions, and show that many of the parameters achieved by our constructions are almost optimal in some sense.

Ben-Sasson, Eli, Oded Goldreich, Prahladh Harsha, Madhu Sudan, and Salil Vadhan. “Robust PCPs of proximity, shorter PCPs, and applications to coding.” SIAM Journal on Computing: Special Issue on Randomness and Complexity 36, no. 4 (2006): 889-974. Publisher's VersionAbstract

Version History. Extended abstract in STOC '04.

We continue the study of the trade-off between the length of probabilistically checkable proofs (PCPs) and their query complexity, establishing the following main results (which refer to proofs of satisfiability of circuits of size \(n\)):

  1. We present PCPs of length \(\exp(o(\log \log n)^2) ·n\) that can be verified by making \(o(\log \log n)\) Boolean queries.
  2. For every \( ε > 0\), we present PCPs of length \(\exp(\log^ε n) · n \) that can be verified by making a constant number of Boolean queries.

In both cases, false assertions are rejected with constant probability (which may be set to be arbitrarily close to 1). The multiplicative overhead on the length of the proof, introduced by transforming a proof into a probabilistically checkable one, is just quasi polylogarithmic in the first case (of query complexity \(o(\log \log n)\)), and is \(2^{(\log n)^ε} \), for any \(ε > 0\), in the second case (of constant query complexity). Our techniques include the introduction of a new variant of PCPs that we call “robust PCPs of proximity.” These new PCPs facilitate proof composition, which is a central ingredient in the construction of PCP systems. (A related notion and its composition properties were discovered independently by Dinur and Reingold.) Our main technical contribution is a construction of a “length-efficient” robust PCP of proximity. While the new construction uses many of the standard techniques used in PCP constructions, it does differ from previous constructions in fundamental ways, and in particular does not use the “parallelization” step of Arora et al. [J. ACM, 45 (1998), pp. 501–555]. The alternative approach may be of independent interest. We also obtain analogous quantitative results for locally testable codes. In addition, we introduce a relaxed notion of locally decodable codes and present such codes mapping \(k\) information bits to codewords of length \(k^{1+ε}\) for any \(ε > 0\).

Healy, Alex, Salil Vadhan, and Emanuele Viola. “Using nondeterminism to amplify hardness.” SIAM Journal on Computing: Special Issue on STOC '04 35, no. 4 (2006): 903-931. Publisher's VersionAbstract

We revisit the problem of hardness amplification in \(\mathcal{NP}\) as recently studied by O’Donnell [J. Comput. System Sci., 69 (2004), pp. 68–94]. We prove that if \(\mathcal{NP}\) has a balanced function \(f\) such that any circuit of size \(s(n)\) fails to compute \(f\) on a \(1/\mathrm{poly}(n)\) fraction of inputs, then \(\mathcal{NP}\) has a function \(f'\) such that any circuit of size \(s'(n) = s(\sqrt{n})^{\Omega(1)}\) fails to compute \(f\) on a \(1/2 − 1/s' (n)\) fraction of inputs. In particular,

  1. if \(s(n) = n^{\omega(1)}\), we amplify to hardness \(1/2 - 1/n^{\omega(1)}\);
  2. if \(s(n) = 2^{n^{\Omega(1)}}\), we amplify to hardness \(1/2 - 1/2^{n^{\Omega(1)}}\);
  3. if \(s(n) = 2^{\Omega(n)}\), we amplify to hardness \(1/2 - 1/2^{\Omega(\sqrt{n})}\).

Our results improve those of of O’Donnell, which amplify to\(1/2 - 1/ \sqrt{n}\). O’Donnell also proved that no construction of a certain general form could amplify beyond \(1/2 - 1/n\). We bypass this barrier by using both derandomization and nondeterminism in the construction of \(f'\). We also prove impossibility results demonstrating that both our use of nondeterminism and the hypothesis that \(f\) is balanced are necessary for “black-box” hardness amplification procedures (such as ours).

Vadhan, Salil. “An unconditional study of computational zero knowledge.” SIAM Journal on Computing: Special Issue on Randomness and Complexity 36, no. 4 (2006): 1160-1214. Publisher's VersionAbstract

Version History: Extended abstract in FOCS '04.

We prove a number of general theorems about \(\mathbf{ZK}\), the class of problems possessing (computational) zero-knowledge proofs. Our results are unconditional, in contrast to most previous works on \(\mathbf{ZK}\), which rely on the assumption that one-way functions exist. We establish several new characterizations of \(\mathbf{ZK}\) and use these characterizations to prove results such as the following:

  1. Honest-verifier \(\mathbf{ZK}\) equals general \(\mathbf{ZK}\).
  2. Public-coin \(\mathbf{ZK}\) equals private-coin \(\mathbf{ZK}\).
  3. \(\mathbf{ZK}\) is closed under union.
  4. \(\mathbf{ZK}\) with imperfect completeness equals \(\mathbf{ZK}\) with perfect completeness.
  5. Any problem in \(\mathbf{ZK}\) \(\cap\) \(\mathbf{NP} \) can be proven in computational zero knowledge by a \(\mathbf{BPP^{NP}}\)prover.
  6. \(\mathbf{ZK}\) with black-box simulators equals \(\mathbf{ZK}\) with general, non–black-box simulators.

The above equalities refer to the resulting class of problems (and do not necessarily preserve other efficiency measures such as round complexity). Our approach is to combine the conditional techniques previously used in the study of \(\mathbf{ZK}\) with the unconditional techniques developed in the study of \(\mathbf{SZK}\), the class of problems possessing statistical zero-knowledge proofs. To enable this combination, we prove that every problem in \(\mathbf{ZK}\) can be decomposed into a problem in \(\mathbf{SZK}\) together with a set of instances from which a one-way function can be constructed.

Reingold, Omer, Luca Trevisan, and Salil Vadhan. “Pseudorandom walks in regular digraphs and the RL vs. L problem.” In Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC ‘06), 457-466, 2006, 457-466. Publisher's VersionAbstract

Version History: Preliminary version as ECCC TR05-22, February 2005 (; attached as ECCC2005.pdf).

We revisit the general \(\mathbf{RL}\) vs. \(\mathbf{L}\) question, obtaining the following results.

  1. Generalizing Reingold’s techniques to directed graphs, we present a deterministic, log-space algorithm that given a regular directed graph G (or, more generally, a digraph with Eulerian connected components) and two vertices s and t, finds a path between s and t if one exists.
  2. If we restrict ourselves to directed graphs that are regular and consistently labelled, then we are able to produce pseudorandom walks for such graphs in logarithmic space (this result already found an independent application).
  3. We prove that if (2) could be generalized to all regular directed graphs (including ones that are not consistently labelled) then \(\mathbf{L=RL}\). We do so by exhibiting a new complete promise problem for \(\mathbf{RL}\), and showing that such a problem can be solved in deterministic logarithmic space given a log-space pseudorandom walk generator for regular directed graphs.
ACM2006.pdf ECCC2005.pdf
Nguyen, Minh, and Salil Vadhan. “Zero knowledge with efficient provers.” In Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC ‘06), 287-295. ACM, 2006. Publisher's VersionAbstract
We prove that every problem in NP that has a zero-knowledge proof also has a zero-knowledge proof where the prover can be implemented in probabilistic polynomial time given an NP witness. Moreover, if the original proof system is statistical zero knowledge, so is the resulting efficient-prover proof system. An equivalence of zero knowledge and efficient-prover zero knowledge was previously known only under the assumption that one-way functions exist (whereas our result is unconditional), and no such equivalence was known for statistical zero knowledge. Our results allow us to translate the many general results and characterizations known for zero knowledge with inefficient provers to zero knowledge with efficient provers.
Gradwohl, Ronen, Salil Vadhan, and David Zuckerman. “Random selection with an adversarial majority.” In Advances in Cryptology—CRYPTO ‘06, C. Dwork, ed. 4117:409–426. Springer Verlag, Lecture Notes in Computer Science, 2006. Publisher's VersionAbstract

Version History: Full version published in ECCC TR 06-026, February 2006. Updated full version published June 2006.

We consider the problem of random selection, where \(p\) players follow a protocol to jointly select a random element of a universe of size \(n\). However, some of the players may be adversarial and collude to force the output to lie in a small subset of the universe. We describe essentially the first protocols that solve this problem in the presence of a dishonest majority in the full-information model (where the adversary is computationally unbounded and all communication is via non-simultaneous broadcast). Our protocols are nearly optimal in several parameters, including the round complexity (as a function of \(n\)), the randomness complexity, the communication complexity, and the tradeoffs between the fraction of honest players, the probability that the output lies in a small subset of the universe, and the density of this subset.

CRYPTO2006.pdf ECCC-02.2006.pdf FULL-06.2006.pdf
Nguyen, Minh-Huyen, Shien Jin Ong, and Salil Vadhan. “Statistical zero-knowledge arguments for NP from any one-way function.” In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘06), 3-13. IEEE, 2006. Publisher's VersionAbstract

Version History: Merged with STOC '07 paper of Haitner and Reingold. Also available as a journal version. Full version invited to SIAM J. Computing Special Issue on FOCS ‘06

We show that every language in NP has a statistical zero-knowledge argument system under the (minimal) complexity assumption that one-way functions exist. In such protocols, even a computationally unbounded verifier cannot learn anything other than the fact that the assertion being proven is true, whereas a polynomial-time prover cannot convince the verifier to accept a false assertion except with negligible probability. This resolves an open question posed by Naor et al. (1998). Departing from previous works on this problem, we do not construct standard statistically hiding commitments from any one-way function. Instead, we construct a relaxed variant of commitment schemes called "1-out-of-2-binding commitments," recently introduced by Nguyen et al. (2006)

Trevisan, Luca, Salil Vadhan, and David Zuckerman. “Compression of samplable sources.” Computational Complexity: Special Issue on CCC'04 14, no. 3 (2005): 186-227. Publisher's VersionAbstract

We study the compression of polynomially samplable sources. In particular, we give efficient prefix-free compression and decompression algorithms for three classes of such sources (whose support is a subset of \(\{0, 1\}^n\)).

  1. We show how to compress sources \(X\) samplable by logspace machines to expected length \(H(X) + O(1)\). Our next results concern flat sources whose support is in \(\mathbf{P}\).
  2. If \(H(X) ≤ k = n−O(\log n)\), we show how to compress to expected length \(k + \mathrm{polylog}(n − k)\).
  3. If the support of \(X\) is the witness set for a self-reducible \(\mathbf{NP}\) relation, then we show how to compress to expected length \(H(X)+ 5\).
Ben-Sasson, Eli, Oded Goldreich, Prahladh Harsha, Madhu Sudan, and Salil Vadhan. “Short PCPs verifiable in polylogarithmic time.” In Proceedings of the 20th Annual IEEE Conference on Computational Complexity (CCC '05), 120-134, 2005, 120-134. Publisher's VersionAbstract
We show that every language in NP has a probabilistically checkable proof of proximity (i.e., proofs asserting that an instance is "close" to a member of the language), where the verifier's running time is polylogarithmic in the input size and the length of the probabilistically checkable proof is only polylogarithmically larger that the length of the classical proof. (Such a verifier can only query polylogarithmically many bits of the input instance and the proof. Thus it needs oracle access to the input as well as the proof, and cannot guarantee that the input is in the language - only that it is close to some string in the language.) If the verifier is restricted further in its query complexity and only allowed q queries, then the proof size blows up by a factor of 2/sup (log n)c/q/ where the constant c depends only on the language (and is independent of q). Our results thus give efficient (in the sense of running time) versions of the shortest known PCPs, due to Ben-Sasson et al. (STOC '04) and Ben-Sasson and Sudan (STOC '05), respectively. The time complexity of the verifier and the size of the proof were the original emphases in the definition of holographic proofs, due to Babai et al. (STOC '91), and our work is the first to return to these emphases since their work. Of technical interest in our proof is a new complete problem for NEXP based on constraint satisfaction problems with very low complexity constraints, and techniques to arithmetize such constraints over fields of small characteristic.
Rozenman, Eyal, and Salil Vadhan. “Derandomized squaring of graphs.” In Proceedings of the 8th International Workshop on Randomization and Computation (RANDOM '05), 3624:436-447. Berkeley, CA: Springer Verlag, Lecture Notes in Computer Science, 2005. Publisher's VersionAbstract

We introduce a “derandomized” analogue of graph squaring. This operation increases the connectivity of the graph (as measured by the second eigenvalue) almost as well as squaring the graph does, yet only increases the degree of the graph by a constant factor, instead of squaring the degree.

One application of this product is an alternative proof of Reingold’s recent breakthrough result that S-T Connectivity in Undirected Graphs can be solved in deterministic logspace.