# Publications

Forthcoming
Murtagh, Jack, Omer Reingold, Aaron Sidford, and Salil Vadhan. “Deterministic approximation of random walks in small space.” Theory of Computing Special Issue on RANDOM '19 (Forthcoming). Publisher's VersionAbstract
Version History: v1, 15 Mar. 2019: https://arxiv.org/abs/1903.06361v1
v2 in ArXiv, 25 Nov. 2019: https://arxiv.org/abs/1903.06361v2

Prior Published Version (APPROX-RANDOM 2019), 20 Sep 2019:
In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019), Dimitris Achlioptas and László A. Végh (Eds.). Vol. 145. Cambridge, Massachusetts (MIT): Leibniz International Proceedings in Informatics (LIPIcs), 2019.

We give a deterministic, nearly logarithmic-space algorithm that given an undirected graph $$G$$, a positive integer $$r$$, and a set $$S$$ of vertices, approximates the conductance of $$S$$ in the $$r$$-step random walk on $$G$$ to within a factor of $$1+ϵ$$, where $$ϵ > 0$$ is an arbitrarily small constant. More generally, our algorithm computes an $$ϵ$$-spectral approximation to the normalized Laplacian of the $$r$$-step walk. Our algorithm combines the derandomized square graph operation (Rozenman and Vadhan, 2005), which we recently used for solving Laplacian systems in nearly logarithmic space (Murtagh, Reingold, Sidford, and Vadhan, 2017), with ideas from (Cheng, Cheng, Liu, Peng, and Teng, 2015), which gave an algorithm that is time-efficient (while ours is space-efficient) and randomized (while ours is deterministic) for the case of even $$r$$ (while ours works for all $$r$$). Along the way, we provide some new results that generalize technical machinery and yield improvements over previous work. First, we obtain a nearly linear-time randomized algorithm for computing a spectral approximation to the normalized Laplacian for odd $$r$$. Second, we define and analyze a generalization of the derandomized square for irregular graphs and for sparsifying the product of two distinct graphs. As part of this generalization, we also give a strongly explicit construction of expander graphs of every size.

2021
Vadhan, Salil, and Tianhao Wang. “Concurrent composition of differential privacy” (2021). ArXiv VersionAbstract
We initiate a study of the composition properties of interactive differentially private mechanisms. An interactive differentially private mechanism is an algorithm that allows an analyst to adaptively ask queries about a sensitive dataset, with the property that an adversarial analyst's view of the interaction is approximately the same regardless of whether or not any individual's data is in the dataset. Previous studies of composition of differential privacy have focused on non-interactive algorithms, but interactive mechanisms are needed to capture many of the intended applications of differential privacy and a number of the important differentially private primitives.

We focus on concurrent composition, where an adversary can arbitrarily interleave its queries to several differentially private mechanisms, which may be feasible when differentially private query systems are deployed in practice. We prove that when the interactive mechanisms being composed are pure differentially private, their concurrent composition achieves privacy parameters (with respect to pure or approximate differential privacy) that match the (optimal) composition theorem for noninteractive differential privacy. We also prove a composition theorem for interactive mechanisms that satisfy approximate differential privacy. That bound is weaker than even the basic (suboptimal) composition theorem for noninteractive differential privacy, and we leave closing the gap as a direction for future research, along with understanding concurrent composition for other variants of differential privacy.
Doron, Dean, Raghu Meka, Omer Reingold, Avishay Tal, and Salil Vadhan. “Pseudorandom generators for read-once monotone branching programs.” Electronic Colloquium on Computational Complexity (ECCC) 2021, no. 18 (2021). Publisher's VersionAbstract

Version History: Originally published as "Monotone branching programs: pseudorandomness and circuit complexity".

Motivated by the derandomization of space-bounded computation, there has been a long line of work on constructing pseudorandom generators (PRGs) against various forms of read-once branching programs (ROBPs), with a goal of improving the $$O(\log^2n)$$ seed length of Nisan’s classic construction to the optimal $$O(\log n)$$.

In this work, we construct an explicit PRG with seed length $$\tilde{O}(\log n)$$ for constant-width ROBPs that are monotone, meaning that the states at each time step can be ordered so that edges with the same labels never cross each other. Equivalently, for each fixed input, the transition functions are a monotone function of the state. This result is complementary to a line of work that gave PRGs with seed length $$O(\log n)$$ for (ordered) permutation ROBPs of constant width, since the monotonicity constraint can be seen as the “opposite” of the permutation constraint.

Our PRG also works for monotone ROBPs that can read the input bits in any order, which are strictly more powerful than read-once $$\mathsf{AC^0}$$. Our PRG achieves better parameters (in terms of the dependence on the depth of the circuit) than the best previous pseudorandom generator for read-once $$\mathsf{AC^0}$$, due to Doron, Hatami, and Hoza.

Our pseudorandom generator construction follows Ajtai and Wigderson’s approach of iterated pseudorandom restrictions. We give a randomness-efficient width-reduction process which proves that the branching program simplifies to an $$O(\log n)$$-junta after only $$O(\log \log n)$$ independent applications of the Forbes-Kelley pseudorandom restrictions.

Pyne, Edward, and Salil Vadhan. “Pseudodistributions that beat all pseudorandom generators.” Electronic Colloquium on Computational Complexity (ECCC) 2021, no. 19 (2021). Publisher's VersionAbstract
A recent paper of Braverman, Cohen, and Garg (STOC 2018) introduced the concept of a pseudorandom pseudodistribution generator (PRPG), which amounts to a pseudorandom generator (PRG) whose outputs are accompanied with real coefficients that scale the acceptance probabilities of any potential distinguisher. They gave an explicit construction of PRPGs for ordered branching programs whose seed length has a better dependence on the error parameter  than the classic PRG construction of Nisan (STOC 1990 and Combinatorica 1992).

In this work, we give an explicit construction of PRPGs that achieve parameters that are impossible to achieve by a PRG. In particular, we construct a PRPG for ordered permutation branching programs of unbounded width with a single accept state that has seed length $$\tilde{O}(\log^{3/2}n)$$  for error parameter $$\epsilon = 1/ \mathrm{poly}(n)$$, where $$n$$ is the input length. In contrast, recent work of Hoza et al. (ITCS 2021) shows that any PRG for this model requires seed length $$\Omega(\log^2n)$$ to achieve error $$\epsilon = 1/ \mathrm{poly}(n)$$.

As a corollary, we obtain explicit PRPGs with seed length $$\tilde{O}(\log^{3/2}n)$$  and error $$\epsilon = 1/ \mathrm{poly}(n)$$ for ordered permutation branching programs of width $$w = \mathrm{poly}(n)$$with an arbitrary number of accept states. Previously, seed length $$o(\log^2n)$$ was only known when both the width and the reciprocal of the error are subpolynomial, i.e. $$w= n^{o(1)}$$ and $$\epsilon = 1/n^{o(1)}$$(Braverman, Rao, Raz, Yehudayoff, FOCS 2010 and SICOMP 2014).

The starting point for our results are the recent space-efficient algorithms for estimating random-walk probabilities in directed graphs by Ahmadenijad, Kelner, Murtagh, Peebles, Sidford, and Vadhan (FOCS 2020), which are based on spectral graph theory and space-efficient Laplacian solvers. We interpret these algorithms as giving PRPGs with large seed length, which we then derandomize to obtain our results. We also note that this approach gives a simpler proof of the original result of Braverman, Cohen, and Garg, as independently discovered by Cohen, Doron, Renard, Sberlo, and Ta-Shma (personal communication, January 2021).
Hoza, William M., Edward Pyne, and Salil Vadhan. “Pseudorandom generators for unbounded-width permutation branching programs.” 12th Innovations in Theoretical Computer Science (ITCS '21) . Leibniz International Proceedings in Informatics (LIPIcs), 2021. Publisher's VersionAbstract

Version History:

Preliminary version posted on ECCC TR20-138 (PDF version attached as ECCC 2020).

Talks: The ITCS talk for this paper, presented by Edward Pyne, is currently available on YouTube; click the embedded link to view.

We prove that the Impagliazzo-Nisan-Wigderson [Impagliazzo et al., 1994] pseudorandom generator (PRG) fools ordered (read-once) permutation branching programs of unbounded width with a seed length of $$\tilde{O} (\log d + \log n ⋅ \log(1/\epsilon))$$, assuming the program has only one accepting vertex in the final layer. Here, $$n$$ is the length of the program, $$d$$ is the degree (equivalently, the alphabet size), and $$\epsilon$$ is the error of the PRG. In contrast, we show that a randomly chosen generator requires seed length $$\Omega (n \log d)$$ to fool such unbounded-width programs. Thus, this is an unusual case where an explicit construction is "better than random."

Except when the program’s width $$w$$ is very small, this is an improvement over prior work. For example, when $$w = \mathrm{poly} (n)$$ and $$d = 2$$, the best prior PRG for permutation branching programs was simply Nisan’s PRG [Nisan, 1992], which fools general ordered branching programs with seed length $$O (\log (wn/\epsilon) \log n)$$. We prove a seed length lower bound of $$\tilde{\Omega} (\log d + \log n ⋅ \log(1/\epsilon))$$for fooling these unbounded-width programs, showing that our seed length is near-optimal. In fact, when$$\epsilon ≤ 1/\log n$$, our seed length is within a constant factor of optimal. Our analysis of the INW generator uses the connection between the PRG and the derandomized square of Rozenman and Vadhan [Rozenman and Vadhan, 2005] and the recent analysis of the latter in terms of unit-circle approximation by Ahmadinejad et al. [Ahmadinejad et al., 2020].

Chung, Kai-Min, Michael Mitzenmacher, and Salil Vadhan. “When simple hash functions suffice.” In Beyond the Worst-Case Analysis of Algorithms (Tim Roughgarden, ed.). Cambridge, UK: Cambridge University Press, 2021. Publisher's Version
2020
Hay, Michael, Marco Gaboardi, and Salil Vadhan. “A programming framework for OpenDP.” 6th Workshop on the Theory and Practice of Differential Privacy (TPDP 2020), 2020. Initial PDF VersionAbstract

Version History: Original version released as a Working Paper for the May 2020 OpenDP Community Meeting (version attached as MAY 2020.pdf, and accessible online at https://projects.iq.harvard.edu/files/opendp/files/opendp_programming_fr...).

Talks: View a talk on this paper presented by Marco Gaboardi and Michael Hay at the 2020 OpenDP Community Meeting.

Subsequently presented as a poster at TPDP 2020 (attached as TPDP2020.pdf).

In this working paper, we propose a programming framework for the library of differentially private algorithms that will be at the core of the OpenDP open-source software project, and recommend programming languages in which to implement the framework.

OpenDP, Team. “The OpenDP White Paper,” 2020. Publisher's VersionAbstract
Talks:

OpenDP is a community effort to build a trustworthy suite of open-source tools for enabling privacy-protective analysis of sensitive personal data, focused on a library of algorithms for generating differentially private statistical releases. The target use cases for OpenDP are to enable government, industry, and academic institutions to safely and confidently share sensitive data to support scientifically oriented research and exploration in the public interest. We aim for OpenDP to flexibly grow with the rapidly advancing science of differential privacy, and be a pathway to bring the newest algorithmic developments to a wide array of practitioners.

OpenDP is led by Faculty Directors Gary King and Salil Vadhan and an Executive Committee at Harvard University, funded in part by a grant from the Sloan Foundation. Its efforts so far have included implementing a differentially private curator application in collaboration with Microsoft, and developing a framework for a community-driven OpenDP Commons through the work of an Ad Hoc Design Committee including external experts. Going forward, the project plans to engage with a wide community of stakeholders, establish partnerships with a wide variety of groups from industry, academia, and government, and adopt increasing levels of community governance.

Haitner, Iftach, Omer Reingold, Salil Vadhan, and Hoeteck Wee. “Inaccessible entropy I: Inaccessible entropy generators and statistically hiding commitments from one-way functions.” arXiv: 2010.05586 [cs.CR] (2020). ArXiv VersionAbstract

Version History: Full version of part of an STOC 2009 paper.

We put forth a new computational notion of entropy, measuring the (in)feasibility of sampling high-entropy strings that are consistent with a given generator. Specifically, the $$i$$'th output block of a generator $$\mathsf{G}$$ has accessible entropy at most $$k$$ if the following holds: when conditioning on its prior coin tosses, no polynomial-time strategy $$\mathsf{\widetilde{G}}$$ can generate valid output for $$\mathsf{G}$$'s $$i$$'th output block with entropy greater than $$k$$. A generator has inaccessible entropy if the total accessible entropy (summed over the blocks) is noticeably smaller than the real entropy of $$\mathsf{G}$$'s output.

As an application of the above notion, we improve upon the result of Haitner, Nguyen, Ong, Reingold, and Vadhan [Sicomp '09], presenting a much simpler and more efficient construction of statistically hiding commitment schemes from arbitrary one-way functions.

Alabi, Daniel, Audra McMillan, Jayshree Sarathy, Adam Smith, and Salil Vadhan. “Differentially private simple linear regression.” arXiv: 2007.05157 [cs.LG] (2020). Publisher's VersionAbstract
Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm's output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases.
Doron, Dean, Jack Murtagh, Salil Vadhan, and David Zuckerman. “Spectral sparsification via bounded-independence sampling.” In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), 168:39:1-39:21. Leibniz International Proceedings in Informatics (LIPIcs), Schloss-Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Publisher's VersionAbstract
Version History:

We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph $$G$$ on $$n$$ vertices described by a binary string of length $$N$$, an integer $$k \leq \log n$$ and an error parameter $$\varepsilon > 0$$, our algorithm runs in space $$\tilde{O}(k \log(N ^. w_{max}/w_{min}))$$ where $$w_{max}$$ and $$w_{min}$$ are the maximum and minimum edge weights in $$G$$, and produces a weighted graph $$H$$ with $$\tilde{O}(n^{1+2/k} / \varepsilon^2)$$expected edges that spectrally approximates $$G$$, in the sense of Spielmen and Teng [ST04], up to an error of $$\varepsilon$$.

Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastava's effective resistance based edge sampling algorithm [SS08] and uses results from recent work on space-bounded Laplacian solvers [MRSV17]. In particular, we demonstrate an inherent tradeoff (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by $$k$$ above, and the resulting sparsity that can be achieved.

Ahmadinejad, AmirMahdi, Jonathan Kelner, Jack Murtagh, John Peebles, Aaron Sidford, and Salil Vadhan. “High-precision estimation of random walks in small space.” 61st Annual IEEE Symposium on the Foundations of Computer Science (FOCS 2020). IEEE, 2020. Publisher's VersionAbstract
Version History:
arXiv version (2019): http://arxiv.org/abs/1912.04524

Talks: View a talk on this paper presented by by John Peebles at FOCS 2020.

In this paper, we provide a deterministic $$\tilde{O}(\log N)$$-space algorithm for estimating the random walk probabilities on Eulerian directed graphs (and thus also undirected graphs) to within inverse polynomial additive error $$(ϵ = 1/\mathrm{poly}(N))$$ where $$N$$ is the length of the input. Previously, this problem was known to be solvable by a randomized algorithm using space $$O (\log N)$$ (Aleliunas et al., FOCS '79) and by a deterministic algorithm using space $$O (\log^{3/2} N)$$ (Saks and Zhou, FOCS '95 and JCSS '99), both of which held for arbitrary directed graphs but had not been improved even for undirected graphs. We also give improvements on the space complexity of both of these previous algorithms for non-Eulerian directed graphs when the error is negligible $$(ϵ=1/N^{ω(1)})$$, generalizing what Hoza and Zuckerman (FOCS '18) recently showed for the special case of distinguishing whether a random walk probability is 0 or greater than ϵ.

We achieve these results by giving new reductions between powering Eulerian random-walk matrices and inverting Eulerian Laplacian matrices, providing a new notion of spectral approximation for Eulerian graphs that is preserved under powering, and giving the first deterministic $$\tilde{O}(\log N)$$-space algorithm for inverting Eulerian Laplacian matrices. The latter algorithm builds on the work of Murtagh et al. (FOCS '17) that gave a deterministic $$\tilde{O}(\log N)$$-space algorithm for inverting undirected Laplacian matrices, and the work of Cohen et al. (FOCS '19) that gave a randomized $$\tilde{O} (N)$$-time algorithm for inverting Eulerian Laplacian matrices. A running theme throughout these contributions is an analysis of "cycle-lifted graphs," where we take a graph and "lift" it to a new graph whose adjacency matrix is the tensor product of the original adjacency matrix and a directed cycle (or variants of one).
Haitner, Iftach, Thomas Holenstein, Omer Reingold, Salil Vadhan, and Hoeteck Wee. “Inaccessible entropy II: IE functions and universal one-way hashing.” Theory of Computing 16, no. 8 (2020): 1-55. Publisher's VersionAbstract

Version History: published earlier in Henri Gilbert, ed., Advances in Cryptology—EUROCRYPT ‘10, Lecture Notes on Computer Science, as "Universal one-way hash functions via inaccessible entropy":

This paper revisits the construction of Universal One-Way Hash Functions (UOWHFs) from any one-way function due to Rompel (STOC 1990). We give a simpler construction of UOWHFs, which also obtains better efficiency and security. The construction exploits a strong connection to the recently introduced notion of inaccessible entropy (Haitner et al. STOC 2009). With this perspective, we observe that a small tweak of any one-way function $$f$$ is already a weak form of a UOWHF: Consider $$F(x', i)$$ that outputs the $$i$$-bit long prefix of $$f(x)$$. If $$F$$ were a UOWHF then given a random $$x$$ and $$i$$ it would be hard to come up with $$x' \neq x$$ such that $$F(x, i) = F(x', i)$$. While this may not be the case, we show (rather easily) that it is hard to sample $$x'$$ with almost full entropy among all the possible such values of $$x'$$. The rest of our construction simply amplifies and exploits this basic property.

With this and other recent works, we have that the constructions of three fundamental cryptographic primitives (Pseudorandom Generators, Statistically Hiding Commitments and UOWHFs) out of one-way functions are to a large extent unified. In particular, all three constructions rely on and manipulate computational notions of entropy in similar ways. Pseudorandom Generators rely on the well-established notion of pseudoentropy, whereas Statistically Hiding Commitments and UOWHFs rely on the newer notion of inaccessible entropy.

Ullman, Jon, and Salil Vadhan. “PCPs and the hardness of generating synthetic data.” Journal of Cryptology 33 (2020): 2078-2112. Publisher's VersionAbstract

Version HistoryFull version posted as ECCC TR10-017.

Published earlier in Yuval Ishai, ed., Proceedings of the 8th IACR Theory of Cryptography Conference (TCC ‘11), Lecture Notes on Computer Science. Springer-Verlag, Publishers: Vol. 5978, pp. 572-587. https://link.springer.com/chapter/10.1007/978-3-642-19571-6_24

Invited to J. Cryptology selected papers from TCC 2011.

Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm $$\mathcal{A}$$ that takes a database $$D ∈ ({0, 1}^d)^n$$ and outputs a “synthetic database” $$\hat{D}$$ all of whose two-way marginals are approximately equal to those of $$D$$. (A two-way marginal is the fraction of database rows $$x ∈ {0, 1}^d$$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time $$\mathrm{poly} (n, 2^d)$$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on probabilistically checkable proofs.

We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in $$D$$ satisfying a predicate $$c$$ are estimated by applying $$c$$ to each row of $$\hat{D}$$ and aggregating the results in some way.

Chen, Yiling, Or Sheffet, and Salil Vadhan. “Privacy games.” ACM Transactions on Economics and Computation 8, no. 2 (2020): Article 9. Publisher's VersionAbstract

Version History:

Previously published as: Yiling Chen, Or Sheffet, and Salil Vadhan. Privacy games. In Proceedings of the 10th International Conference on Web and Internet Economics (WINE ‘14), volume 8877 of Lecture Notes in Computer Science, pages 371–385. Springer-Verlag, 14–17 December 2014. (WINE Publisher's Version linked here: https://link.springer.com/chapter/10.1007/978-3-319-13129-0_30); PDF attached as WINE2014.

The problem of analyzing the effect of privacy concerns on the behavior of selfish utility-maximizing agents has received much attention lately. Privacy concerns are often modeled by altering the utility functions of agents to consider also their privacy loss. Such privacy aware agents prefer to take a randomized strategy even in very simple games in which non-privacy aware agents play pure strategies. In some cases, the behavior of privacy aware agents follows the framework of Randomized Response, a well-known mechanism that preserves differential privacy.

Our work is aimed at better understanding the behavior of agents in settings where their privacy concerns are explicitly given. We consider a toy setting where agent A, in an attempt to discover the secret type of agent B, offers B a gift that one type of B agent likes and the other type dislikes. As opposed to previous works, B's incentive to keep her type a secret isn't the result of "hardwiring" B's utility function to consider privacy, but rather takes the form of a payment between B and A. We investigate three different types of payment functions and analyze B's behavior in each of the resulting games. As we show, under some payments, B's behavior is very different than the behavior of agents with hardwired privacy concerns and might even be deterministic. Under a different payment we show that B's BNE strategy does fall into the framework of Randomized Response.

2019
Vadhan, Salil. “Computational entropy.” In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali (Oded Goldreich, Ed.), 693-726. ACM, 2019. Publisher's VersionAbstract

In this survey, we present several computational analogues of entropy and illustrate how they are useful for constructing cryptographic primitives. Specifically, we focus on constructing pseudorandom generators and statistically hiding commitments from arbitrary one-way functions, and demonstrate that:

1. The security properties of these (and other) cryptographic primitives can be understood in terms of various computational analogues of entropy, and in particular how these computational measures of entropy can be very different from real, information-theoretic entropy.
2. It can be shown that every one-way function directly exhibits some gaps between real entropy and the various computational entropies.
3. Thus we can construct the desired cryptographic primitives by amplifying and manipulating the entropy gaps in a one-way function, through forms of repetition and hashing.

The constructions we present (which are from the past decade) are much simpler and more efficient than the original ones, and are based entirely on natural manipulations of new notions of computational entropy. The two constructions are "dual" to each other, whereby the construction of pseudorandom generators relies on a form of computational entropy ("pseudoentropy") being larger than the real entropy, while the construction of statistically hiding commitments relies on a form of computational entropy ("accessible entropy") being smaller than the real entropy. Beyond that difference, the two constructions share a common structure, using a very similar sequence of manipulations of real and computational entropy. As a warmup, we also "deconstruct" the classic construction of pseudorandom generators from one-way permutations using the modern language of computational entropy.

This survey is written in honor of Shafi Goldwasser and Silvio Micali.

Balcer, Victor, and Salil Vadhan. “Differential privacy on finite computers.” Journal of Privacy and Confidentiality 9, no. 2 (2019). Publisher's VersionAbstract

Version History:

Also presented at TPDP 2017; preliminary version posted as arXiv:1709.05396 [cs.DS].

2018: Published in Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pp 43:1-43:21. http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=8353

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe $$X$$. The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in $$|X|$$, which can be exponential in the bit length of the input.

In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its “per-bin accuracy” (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of $$\log |X|$$, but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram; it is based on an $$(n + 1)$$-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

Agrawal, Rohit, Yi-Hsiu Chen, Thibaut Horel, and Salil Vadhan. “Unifying computational entropies via Kullback-Leibler divergence.” In Advances in Cryptology: CRYPTO 2019, A. Boldyreva and D. Micciancio, (Eds), 11693:831-858. Springer Verlag, Lecture Notes in Computer Science, 2019. Publisher's VersionAbstract
Version History:
arXiv, first posted Feb 2019, most recently updated Aug 2019: https://arxiv.org/abs/1902.11202

We introduce hardness in relative entropy, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible entropy, two forms of computational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, hardness in relative entropy unifies the latter two notions of computational entropy and sheds light on the apparent “duality” between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC ‘12).
2018
Raghunathan, Ananth, Gil Segev, and Salil P. Vadhan. “Deterministic public-key encryption for adaptively-chosen plaintext distributions.” Journal of Cryptology 31, no. 4 (2018): 1012-1063. Publisher's VersionAbstract

Version History: Preliminary versions in EUROCRYPT ‘13 and Cryptology ePrint report 2013/125.

Bellare, Boldyreva, and O’Neill (CRYPTO '07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially-chosen plaintext distributions that are independent of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme.

We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to adaptively choose plaintext distributions after seeing the public key, in an interactive manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by $$2^p$$, where $$p$$ can be any predetermined polynomial in the security parameter. For example, with $$p=0$$ we capture plaintext distributions that are independent of the public key, and with $$p=0(s \log s)$$ we capture, in particular, all plaintext distributions that are samplable by circuits of size $$s$$.

Within our framework we present both constructions in the random-oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS '00) and Dodis (PhD Thesis, MIT, '00).

Bun, Mark, Jonathan Ullman, and Salil Vadhan. “Fingerprinting codes and the price of approximate differential privacy.” SIAM Journal on Computing, Special Issue on STOC '14 47, no. 5 (2018): 1888-1938. Publisher's VersionAbstract

Version HistorySpecial Issue on STOC ‘14. Preliminary versions in STOC ‘14 and arXiv:1311.3158 [cs.CR].

We show new information-theoretic lower bounds on the sample complexity of (ε, δ)- differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database $$D ∈ (\{0, 1\}^d)^n$$ has the form “What fraction of the individual records in the database satisfy the property $$q$$?” We show that in order to answer an arbitrary set $$Q$$ of $$\gg d/ \alpha^2$$ counting queries on $$D$$ to within error $$±α$$ it is necessary that $$n ≥ \tilde{Ω}(\sqrt{d} \log |Q|/α^2ε)$$. This bound is optimal up to polylogarithmic factors, as demonstrated by the private multiplicative weights algorithm (Hardt and Rothblum, FOCS’10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is $$O(\log |Q|/α^2 )$$. In addition, we show that our lower bound holds for the specific case of $$k$$-way marginal queries (where $$|Q| = 2^k \binom{d}{k}$$ ) when $$\alpha$$ is not too small compared to d (e.g., when $$\alpha$$ is any fixed constant). Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO’95; Tardos, STOC’03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample-complexity lower bounds into stronger lower bounds.