Guggenheim Fellowship

2013
Chung, Kai-Min, Michael Mitzenmacher, and Salil P. Vadhan. “Why simple hash functions work: Exploiting the entropy in a data stream.” Theory of Computing 9 (2013): 897-945. Publisher's VersionAbstract

Version History: Merge of conference papers from SODA ‘08 (with the same title) and RANDOM ‘08 (entitled “Tight Bounds for Hashing Block Sources”).


Hashing is fundamental to many algorithms and data structures widely used in practice. For the theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits (in the length of a data item) to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or \(O\)(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing.

In practice, however, it is commonly observed that simple hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source,” whereby each new data item has some “entropy” given the previous ones. As long as the Rényi entropy per data item is sufficiently large, it turns out that the performance when choosing a hash function from a 2-universal family is essentially the same as for a truly random hash function. We describe results for several sample applications, including linear probing, chained hashing, balanced allocations, and Bloom filters.

Towards developing our results, we prove tight bounds for hashing block sources, determining the entropy required per block for the distribution of hashed values to be close to uniformly distributed.

 

THEORYCOMP2013.pdf RANDOM2008.pdf SODA2008.pdf
2009
Guruswami, Venkatesan, Christopher Umans, and Salil Vadhan. “Unbalanced expanders and randomness extractors from Parvaresh–Vardy codes.” Journal of the ACM 56, no. 4 (2009): 1–34. Publisher's VersionAbstract

Version History: Preliminary versions of this article appeared as Technical Report TR06-134 in Electronic Colloquium on Computational Complexity, 2006, and in Proceedings of the 22nd Annual IEEE Conference on Computional Complexity (CCC '07), pp. 96–108. Preliminary version recipient of Best Paper Award at CCC '07.

We give an improved explicit construction of highly unbalanced bipartite expander graphs with expansion arbitrarily close to the degree (which is polylogarithmic in the number of vertices). Both the degree and the number of right-hand vertices are polynomially close to optimal, whereas the previous constructions of Ta-Shma et al. [2007] required at least one of these to be quasipolynomial in the optimal. Our expanders have a short and self-contained description and analysis, based on the ideas underlying the recent list-decodable error-correcting codes of Parvaresh and Vardy [2005].

Our expanders can be interpreted as near-optimal “randomness condensers,” that reduce the task of extracting randomness from sources of arbitrary min-entropy rate to extracting randomness from sources of min-entropy rate arbitrarily close to 1, which is a much easier task. Using this connection, we obtain a new, self-contained construction of randomness extractors that is optimal up to constant factors, while being much simpler than the previous construction of Lu et al. [2003] and improving upon it when the error parameter is small (e.g., 1/poly(n)).

JACM2009.pdf CCC2007.pdf ECCC2006.pdf ECCC - Rev. 2008.pdf
Haitner, Iftach, Omer Reingold, Salil Vadhan, and Hoeteck Wee. “Inaccessible entropy.” In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC ‘09), 611-620. ACM, 2009. Publisher's VersionAbstract

We put forth a new computational notion of entropy, which measures the (in)feasibility of sampling high entropy strings that are consistent with a given protocol. Specifically, we say that the \(i\)’th round of a protocol \((\mathsf{A,B})\) has accessible entropy at most \(k\), if no polynomial-time strategy \(\mathsf{A}^*\) can generate messages for \(\mathsf{A}\) such that the entropy of its message in the \(i\)’th round has entropy greater than \(k\) when conditioned both on prior messages of the protocol and on prior coin tosses of \(\mathsf{A}^*\). We say that the protocol has inaccessible entropy if the total accessible entropy (summed over the rounds) is noticeably smaller than the real entropy of \(\mathsf{A}\)’s messages, conditioned only on prior messages (but not the coin tosses of \(\mathsf{A}\)). As applications of this notion, we

  • Give a much simpler and more efficient construction of statistically hiding commitment schemes from arbitrary one- way functions.

  • Prove that constant-round statistically hiding commitments are necessary for constructing constant-round zero-knowledge proof systems for NP that remain secure under parallel composition (assuming the existence of one-way functions).

STOC2009.pdf
2008
Chailloux, André, Dragos Florin Ciocan, Iordanis Kerenidis, and Salil Vadhan. “Interactive and noninteractive zero knowledge are equivalent in the help model.” In Proceedings of the Third Theory of Cryptography Conference (TCC '08), 4948:501-534. Springer-Verlag, Lecture Notes in Computer Science, 2008. Publisher's VersionAbstract

Version History: 

  • Preliminary versions of this work previously appeared on the Cryptology ePrint Archive and in the second author’s undergraduate thesis.
  • Chailloux, A., Kerenidis, I.: The role of help in classical and quantum zero-knowledge. Cryptology ePrint Archive, Report 2007/421 (2007), http://eprint.iacr.org/
  • Ciocan, D.F., Vadhan, S.: Interactive and noninteractive zero knowledge coincide in the help model. Cryptology ePrint Archive, Report 2007/389 (2007), http://eprint.iacr.org/
  • Ciocan, D.: Constructions and characterizations of non-interactive zero-knowledge. Undergradute thesis, Harvard University (2007) 

We show that interactive and noninteractive zero-knowledge are equivalent in the ‘help model’ of Ben-Or and Gutfreund (J. Cryptology, 2003). In this model, the shared reference string is generated by a probabilistic polynomial-time dealer who is given access to the statement to be proven. Our results do not rely on any unproven complexity assumptions and hold for statistical zero knowledge, for computational zero knowledge restricted to AM, and for quantum zero knowledge when the help is a pure quantum state.

TCC2008.pdf
Ong, Shien Jin, and Salil Vadhan. “An equivalence between zero knowledge and commitments.” In R. Canetti, editor, Proceedings of the Third Theory of Cryptography Conference (TCC ‘08), 4948:482-500. Springer Verlag, Lecture Notes in Computer Science, 2008. Publisher's VersionAbstract

We show that a language in NP has a zero-knowledge protocol if and only if the language has an “instance-dependent” commitment scheme. An instance-dependent commitment schemes for a given language is a commitment scheme that can depend on an instance of the language, and where the hiding and binding properties are required to hold only on the yes and no instances of the language, respectively.

The novel direction is the only if direction. Thus, we confirm the widely held belief that commitments are not only sufficient for zero knowledge protocols, but necessary as well. Previous results of this type either held only for restricted types of protocols or languages, or used nonstandard relaxations of (instance-dependent) commitment schemes.

TCC2008.pdf
Gutfreund, Dan, and Salil Vadhan. “Limitations on hardness vs. randomness under uniform reductions.” In Proceedings of the 12th International Workshop on Randomization and Computation (RANDOM ‘08), Lecture Notes in Computer Science, 5171:469-482. Springer-Verlag, 2008. Publisher's VersionAbstract

We consider (uniform) reductions from computing a function \({f}\) to the task of distinguishing the output of some pseudorandom generator \({G}\) from uniform. Impagliazzo and Wigderson [10] and Trevisan and Vadhan [24] exhibited such reductions for every function \({f}\) in PSPACE. Moreover, their reductions are “black box,” showing how to use any distinguisher \({T}\), given as oracle, in order to compute \({f}\) (regardless of the complexity of \({T}\) ). The reductions are also adaptive, but with the restriction that queries of the same length do not occur in different levels of adaptivity. Impagliazzo and Wigderson [10] also exhibited such reductions for every function \({f}\) in EXP, but those reductions are not black-box, because they only work when the oracle \({T}\) is computable by small circuits.

Our main results are that:

– Nonadaptive black-box reductions as above can only exist for functions \({f}\) in BPPNP (and thus are unlikely to exist for all of PSPACE).

– Adaptive black-box reductions, with the same restriction on the adaptivity as above, can only exist for functions \({f}\) in PSPACE (and thus are unlikely to exist for all of EXP).

Beyond shedding light on proof techniques in the area of hardness vs. randomness, our results (together with [10,24]) can be viewed in a more general context as identifying techniques that overcome limitations of black-box reductions, which may be useful elsewhere in complexity theory (and the foundations of cryptography).

RANDOM2008.pdf
Reingold, Omer, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “Dense subsets of pseudorandom sets.” In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘08), 76-85. IEEE, 2008. Publisher's VersionAbstract
A theorem of Green, Tao, and Ziegler can be stated (roughly) as follows: if R is a pseudorandom set, and D is a dense subset of R, then D may be modeled by a set M that is dense in the entire domain such that D and M are indistinguishable. (The precise statement refers to "measures" or distributions rather than sets.) The proof of this theorem is very general, and it applies to notions of pseudo-randomness and indistinguishability defined in terms of any family of distinguishers with some mild closure properties. The proof proceeds via iterative partitioning and an energy increment argument, in the spirit of the proof of the weak Szemeredi regularity lemma. The "reduction" involved in the proof has exponential complexity in the distinguishing probability. We present a new proof inspired by Nisan's proof of Impagliazzo's hardcore set theorem. The reduction in our proof has polynomial complexity in the distinguishing probability and provides a new characterization of the notion of "pseudoentropy" of a distribution. A proof similar to ours has also been independently discovered by Gowers [2]. We also follow the connection between the two theorems and obtain a new proof of Impagliazzo's hardcore set theorem via iterative partitioning and energy increment. While our reduction has exponential complexity in some parameters, it has the advantage that the hardcore set is efficiently recognizable.
IEEE2008.pdf