Publications

2020
Doron, Dean, Jack Murtagh, Salil Vadhan, and David Zuckerman. Spectral sparsification via bounded-independence sampling. Electronic Colloquium on Computational Complexity (ECCC), TR20-026, 2020. Publisher's VersionAbstract

Version History:

v1, 26 Feb 2020: https://arxiv.org/abs/2002.11237

We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph \(G\) on \(n\) vertices described by a binary string of length \(N\), an integer \(k \leq \log n \) and an error parameter \(\varepsilon > 0\), our algorithm runs in space \(\tilde{O}(k \log(N ^. w_{max}/w_{min}))\) where \(w_{max}\) and \(w_{min}\) are the maximum and minimum edge weights in \(G\), and produces a weighted graph \(H\) with \(\tilde{O}(n^{1+2/k} / \varepsilon^2)\)expected edges that spectrally approximates \(G\), in the sense of Spielmen and Teng [ST04], up to an error of \(\varepsilon\).

Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastava's effective resistance based edge sampling algorithm [SS08] and uses results from recent work on space-bounded Laplacian solvers [MRSV17]. In particular, we demonstrate an inherent tradeoff (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by \(k\) above, and the resulting sparsity that can be achieved.

ECCC 2020.pdf
Chen, Yiling, Or Sheffet, and Salil Vadhan. “Privacy games.” ACM Transactions on Economics and Computation 8, no. 2 (2020): Article 9. Publisher's VersionAbstract

Version History: 

Previously published as: Yiling Chen, Or Sheffet, and Salil Vadhan. Privacy games. In Proceedings of the 10th International Conference on Web and Internet Economics (WINE ‘14), volume 8877 of Lecture Notes in Computer Science, pages 371–385. Springer-Verlag, 14–17 December 2014. (WINE Publisher's Version linked here: https://link.springer.com/chapter/10.1007/978-3-319-13129-0_30); PDF attached as WINE2014.

The problem of analyzing the effect of privacy concerns on the behavior of selfish utility-maximizing agents has received much attention lately. Privacy concerns are often modeled by altering the utility functions of agents to consider also their privacy loss. Such privacy aware agents prefer to take a randomized strategy even in very simple games in which non-privacy aware agents play pure strategies. In some cases, the behavior of privacy aware agents follows the framework of Randomized Response, a well-known mechanism that preserves differential privacy. 


Our work is aimed at better understanding the behavior of agents in settings where their privacy concerns are explicitly given. We consider a toy setting where agent A, in an attempt to discover the secret type of agent B, offers B a gift that one type of B agent likes and the other type dislikes. As opposed to previous works, B's incentive to keep her type a secret isn't the result of "hardwiring" B's utility function to consider privacy, but rather takes the form of a payment between B and A. We investigate three different types of payment functions and analyze B's behavior in each of the resulting games. As we show, under some payments, B's behavior is very different than the behavior of agents with hardwired privacy concerns and might even be deterministic. Under a different payment we show that B's BNE strategy does fall into the framework of Randomized Response.

ArXiv 2014.pdf WINE 2014.pdf TEAC 2020.pdf
2019
Vadhan, Salil. “Computational entropy.” In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali (Oded Goldreich, Ed.), 693-726. ACM, 2019. Publisher's VersionAbstract

In this survey, we present several computational analogues of entropy and illustrate how they are useful for constructing cryptographic primitives. Specifically, we focus on constructing pseudorandom generators and statistically hiding commitments from arbitrary one-way functions, and demonstrate that:

  1. The security properties of these (and other) cryptographic primitives can be understood in terms of various computational analogues of entropy, and in particular how these computational measures of entropy can be very different from real, information-theoretic entropy.
  2. It can be shown that every one-way function directly exhibits some gaps between real entropy and the various computational entropies.
  3. Thus we can construct the desired cryptographic primitives by amplifying and manipulating the entropy gaps in a one-way function, through forms of repetition and hashing.

The constructions we present (which are from the past decade) are much simpler and more efficient than the original ones, and are based entirely on natural manipulations of new notions of computational entropy. The two constructions are "dual" to each other, whereby the construction of pseudorandom generators relies on a form of computational entropy ("pseudoentropy") being larger than the real entropy, while the construction of statistically hiding commitments relies on a form of computational entropy ("accessible entropy") being smaller than the real entropy. Beyond that difference, the two constructions share a common structure, using a very similar sequence of manipulations of real and computational entropy. As a warmup, we also "deconstruct" the classic construction of pseudorandom generators from one-way permutations using the modern language of computational entropy.

This survey is written in honor of Shafi Goldwasser and Silvio Micali.

 

ACM-2019-Computational Entropy.pdf
Ahmadinejad, AmirMahdi, Jonathan Kelner, Jack Murtagh, John Peebles, Aaron Sidford, and Salil Vadhan. “High-precision estimation of random walks in small space.” arXiv: 1912.04525 [cs.CC], 2019 (2019). ArXiv VersionAbstract
In this paper, we provide a deterministic \(\tilde{O}(\log N)\)-space algorithm for estimating the random walk probabilities on Eulerian directed graphs (and thus also undirected graphs) to within inverse polynomial additive error \((ϵ = 1/\mathrm{poly}(N)) \) where \(N\) is the length of the input. Previously, this problem was known to be solvable by a randomized algorithm using space \(O (\log N)\) (Aleliunas et al., FOCS '79) and by a deterministic algorithm using space \(O (\log^{3/2} N)\) (Saks and Zhou, FOCS '95 and JCSS '99), both of which held for arbitrary directed graphs but had not been improved even for undirected graphs. We also give improvements on the space complexity of both of these previous algorithms for non-Eulerian directed graphs when the error is negligible \((ϵ=1/N^{ω(1)})\), generalizing what Hoza and Zuckerman (FOCS '18) recently showed for the special case of distinguishing whether a random walk probability is 0 or greater than ϵ.
We achieve these results by giving new reductions between powering Eulerian random-walk matrices and inverting Eulerian Laplacian matrices, providing a new notion of spectral approximation for Eulerian graphs that is preserved under powering, and giving the first deterministic \(\tilde{O}(\log N)\)-space algorithm for inverting Eulerian Laplacian matrices. The latter algorithm builds on the work of Murtagh et al. (FOCS '17) that gave a deterministic \(\tilde{O}(\log N)\)-space algorithm for inverting undirected Laplacian matrices, and the work of Cohen et al. (FOCS '19) that gave a randomized \(\tilde{O} (N)\)-time algorithm for inverting Eulerian Laplacian matrices. A running theme throughout these contributions is an analysis of "cycle-lifted graphs," where we take a graph and "lift" it to a new graph whose adjacency matrix is the tensor product of the original adjacency matrix and a directed cycle (or variants of one).
ARXIV 2019.pdf
Balcer, Victor, and Salil Vadhan. “Differential privacy on finite computers.” Journal of Privacy and Confidentiality 9, no. 2 (2019). Publisher's VersionAbstract

Version History: 

Also presented at TPDP 2017; preliminary version posted as arXiv:1709.05396 [cs.DS].

2018: Published in Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pp 43:1-43:21. http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=8353

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe \(X\). The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in \(|X|\), which can be exponential in the bit length of the input.

In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its “per-bin accuracy” (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of \(\log |X|\), but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram; it is based on an \((n + 1)\)-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

JPC2019.pdf ITCS2018.pdf ArXiv2018.pdf
Murtagh, Jack, Omer Reingold, Aaron Sidford, and Salil Vadhan. “Deterministic approximation of random walks in small space.” In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019), Dimitris Achlioptas and László A. Végh (Eds.). Vol. 145. Cambridge, Massachusetts (MIT) : Leibniz International Proceedings in Informatics (LIPIcs), 2019. Publisher's VersionAbstract
Version History: v1, 15 Mar. 2019: https://arxiv.org/abs/1903.06361v1
v2 in ArXiv, 25 Nov. 2019: https://arxiv.org/abs/1903.06361v2
 
Publisher's Version (APPROX-RANDOM 2019), 20 Sep 2019: 
 

We give a deterministic, nearly logarithmic-space algorithm that given an undirected graph \(G\), a positive integer \(r\), and a set \(S\) of vertices, approximates the conductance of \(S\) in the \(r\)-step random walk on \(G\) to within a factor of \(1+ϵ\), where \(ϵ > 0\) is an arbitrarily small constant. More generally, our algorithm computes an \(ϵ\)-spectral approximation to the normalized Laplacian of the \(r\)-step walk. Our algorithm combines the derandomized square graph operation (Rozenman and Vadhan, 2005), which we recently used for solving Laplacian systems in nearly logarithmic space (Murtagh, Reingold, Sidford, and Vadhan, 2017), with ideas from (Cheng, Cheng, Liu, Peng, and Teng, 2015), which gave an algorithm that is time-efficient (while ours is space-efficient) and randomized (while ours is deterministic) for the case of even \(r\) (while ours works for all \(r\)). Along the way, we provide some new results that generalize technical machinery and yield improvements over previous work. First, we obtain a nearly linear-time randomized algorithm for computing a spectral approximation to the normalized Laplacian for odd \(r\). Second, we define and analyze a generalization of the derandomized square for irregular graphs and for sparsifying the product of two distinct graphs. As part of this generalization, we also give a strongly explicit construction of expander graphs of every size.

APPROX-RANDOM 2019 ArXiv2019.pdf ArXiv 2019 v2.pdf
Agrawal, Rohit, Yi-Hsiu Chen, Thibaut Horel, and Salil Vadhan. “Unifying computational entropies via Kullback-Leibler divergence.” In Advances in Cryptology: CRYPTO 2019, A. Boldyreva and D. Micciancio, (Eds), 11693:831-858. Springer Verlag, Lecture Notes in Computer Science, 2019. Publisher's VersionAbstract
Version History: 
arXiv, first posted Feb 2019, most recently updated Aug 2019: https://arxiv.org/abs/1902.11202
 
We introduce hardness in relative entropy, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible entropy, two forms of computational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, hardness in relative entropy unifies the latter two notions of computational entropy and sheds light on the apparent “duality” between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC ‘12).
CRYPTO 2019.pdf ArXiv2019.pdf
2018
Raghunathan, Ananth, Gil Segev, and Salil P. Vadhan. “Deterministic public-key encryption for adaptively-chosen plaintext distributions.” Journal of Cryptology 31, no. 4 (2018): 1012-1063. Publisher's VersionAbstract

Version History: Preliminary versions in EUROCRYPT ‘13 and Cryptology ePrint report 2013/125.

Bellare, Boldyreva, and O’Neill (CRYPTO '07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially-chosen plaintext distributions that are independent of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme.

We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to adaptively choose plaintext distributions after seeing the public key, in an interactive manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by \(2^p\), where \(p\) can be any predetermined polynomial in the security parameter. For example, with \(p=0\) we capture plaintext distributions that are independent of the public key, and with \(p=0(s \log s)\) we capture, in particular, all plaintext distributions that are samplable by circuits of size \(s\).

Within our framework we present both constructions in the random-oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS '00) and Dodis (PhD Thesis, MIT, '00).

JCRYPTOL2018.pdf EUROCRYPT2013.pdf IACR2013.pdf
Bun, Mark, Jonathan Ullman, and Salil Vadhan. “Fingerprinting codes and the price of approximate differential privacy.” SIAM Journal on Computing, Special Issue on STOC '14 47, no. 5 (2018): 1888-1938. Publisher's VersionAbstract

Version HistorySpecial Issue on STOC ‘14. Preliminary versions in STOC ‘14 and arXiv:1311.3158 [cs.CR].

We show new information-theoretic lower bounds on the sample complexity of (ε, δ)- differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database \(D ∈ (\{0, 1\}^d)^n\) has the form “What fraction of the individual records in the database satisfy the property \(q\)?” We show that in order to answer an arbitrary set \(Q\) of \(\gg d/ \alpha^2\) counting queries on \(D\) to within error \(±α\) it is necessary that \(n ≥ \tilde{Ω}(\sqrt{d} \log |Q|/α^2ε)\). This bound is optimal up to polylogarithmic factors, as demonstrated by the private multiplicative weights algorithm (Hardt and Rothblum, FOCS’10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is \(O(\log |Q|/α^2 )\). In addition, we show that our lower bound holds for the specific case of \(k\)-way marginal queries (where \(|Q| = 2^k \binom{d}{k}\) ) when \(\alpha\) is not too small compared to d (e.g., when \(\alpha\) is any fixed constant). Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO’95; Tardos, STOC’03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample-complexity lower bounds into stronger lower bounds.

ArXiv2018.pdf STOC2014.pdf SIAM2018.pdf
Murtagh, Jack, and Salil Vadhan. “The complexity of computing the optimal composition of differential privacy.” Theory of Computing 14 (2018): 1-35. Publisher's VersionAbstract

Version History: Full version posted on CoRR, abs/1507.03113, July 2015Additional version published in Proceedings of the 13th IACR Theory of Cryptography Conference (TCC '16-A)

In the study of differential privacy, composition theorems (starting with the original paper of Dwork, McSherry, Nissim, and Smith (TCC '06)) bound the degradation of privacy when composing several differentially private algorithms. Kairouz, Oh, and Viswanath (ICML '15) showed how to compute the optimal bound for composing \(k\) arbitrary (\(\epsilon\),\(\delta\))- differentially private algorithms. We characterize the optimal composition for the more general case of \(k\) arbitrary (\(\epsilon_1\) , \(\delta_1\) ), . . . , (\(\epsilon_k\) , \(\delta_k\) )-differentially private algorithms where the privacy parameters may differ for each algorithm in the composition. We show that computing the optimal composition in general is \(\#\)P-complete. Since computing optimal composition exactly is infeasible (unless FP\(=\)\(\#\)P), we give an approximation algorithm that computes the composition to arbitrary accuracy in polynomial time. The algorithm is a modification of Dyer’s dynamic programming approach to approximately counting solutions to knapsack problems (STOC '03).

ArXiv2016.pdf TCC2016-A.pdf TOC2018.pdf
Karwa, Vishesh, and Salil Vadhan. “Finite sample differentially private confidence intervals.” In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), 44:1-44:9. Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. ITCS, 2018. Publisher's VersionAbstract

Version History: Also presented at TPDP 2017. Preliminary version posted as arXiv:1711.03908 [cs.CR].

We study the problem of estimating finite sample confidence intervals of the mean of a normal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence intervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors.

ITCS2018.pdf ArXiv2017.pdf
Murtagh, Jack, Kathryn Taylor, George Kellaris, and Salil P. Vadhan. “Usable differential privacy: A case study with PSI.” arXiv, 2018, 1809.04103 [cs.CR]. ArXiv VersionAbstract

Version History: v1, 11 September 2018 https://arxiv.org/abs/1809.04103

Differential privacy is a promising framework for addressing the privacy concerns in sharing sensitive datasets for others to analyze. However differential privacy is a highly technical area and current deployments often require experts to write code, tune parameters, and optimize the trade-off between the privacy and accuracy of statistical releases. For differential privacy to achieve its potential for wide impact, it is important to design usable systems that enable differential privacy to be used by ordinary data owners and analysts. PSI is a tool that was designed for this purpose, allowing researchers to release useful differentially private statistical information about their datasets without being experts in computer science, statistics, or privacy. We conducted a thorough usability study of PSI to test whether it accomplishes its goal of usability by non-experts. The usability test illuminated which features of PSI are most user-friendly and prompted us to improve aspects of the tool that caused confusion. The test also highlighted some general principles and lessons for designing usable systems for differential privacy, which we discuss in depth.

ArXiv2018.pdf
Chen, Yi-Hsiu, Mika Goos, Salil P. Vadhan, and Jiapeng Zhang. “A tight lower bound for entropy flattening.” In 33rd Computational Complexity Conference (CCC 2018), 102:23:21-23:28. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik: Leibniz International Proceedings in Informatics (LIPIcs), 2018. Publisher's VersionAbstract

Version History: Preliminary version posted as ECCC TR18-119.

We study entropy flattening: Given a circuit \(C_X\) implicitly describing an n-bit source \(X\) (namely, \(X\) is the output of \(C_X \)  on a uniform random input), construct another circuit \(C_Y\) describing a source \(Y\) such that (1) source \(Y\) is nearly flat (uniform on its support), and (2) the Shannon entropy of \(Y\) is monotonically related to that of \(X\). The standard solution is to have \(C_Y\) evaluate \(C_X\) altogether \(\Theta(n^2)\) times on independent inputs and concatenate the results (correctness follows from the asymptotic equipartition property). In this paper, we show that this is optimal among black-box constructions: Any circuit \(C_Y\) for entropy flattening that repeatedly queries \(C_X\) as an oracle requires \(\Omega(n^2)\)queries.

Entropy flattening is a component used in the constructions of pseudorandom generators and other cryptographic primitives from one-way functions [12, 22, 13, 6, 11, 10, 7, 24]. It is also used in reductions between problems complete for statistical zero-knowledge [19, 23, 4, 25]. The \(\Theta(n^2)\) query complexity is often the main efficiency bottleneck. Our lower bound can be viewed as a step towards proving that the current best construction of pseudorandom generator from arbitrary one-way functions by Vadhan and Zheng (STOC 2012) has optimal efficiency.

CCC2018.pdf
Wood, Alexandra, Micah Altman, Aaron Bembenek, Mark Bun, Marco Gaboardi, James Honaker, Kobbi Nissim, David R. OBrien, Thomas Steinke, and Salil Vadhan. “Differential privacy: A primer for a non-technical audience.” Vanderbilt Journal of Entertainment & Technology Law 21, no. 1 (2018): 209-275. Publisher's VersionAbstract

Version History: Preliminary version workshopped at PLSC 2017.

Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics. These range from simple statistical estimations, such as averages, to machine learning. Tools for differentially private analysis are now in early stages of implementation and use across a variety of academic, industry, and government settings. Interest in the concept is growing among potential users of the tools, as well as within legal and policy communities, as it holds promise as a potential approach to satisfying legal requirements for privacy protection when handling personal information. In particular, differential privacy may be seen as a technical solution for analyzing and sharing data while protecting the privacy of individuals in accordance with existing legal or policy requirements for de-identification or disclosure limitation.

This primer seeks to introduce the concept of differential privacy and its privacy implications to non-technical audiences. It provides a simplified and informal, but mathematically accurate, description of differential privacy. Using intuitive illustrations and limited mathematical formalism, it discusses the definition of differential privacy, how differential privacy addresses privacy risks, how differentially private analyses are constructed, and how such analyses can be used in practice. A series of illustrations is used to show how practitioners and policymakers can conceptualize the guarantees provided by differential privacy. These illustrations are also used to explain related concepts, such as composition (the accumulation of risk across multiple analyses), privacy loss parameters, and privacy budgets. This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.

JETLAW 2018.pdf
2017
Wood, Alexandra, Micah Altman, Suso Baleato, and Salil Vadhan. Comments on the City of Seattle Open Data Risk Assessment, 2017. Publisher's VersionAbstract
The transparency goals of the open data movement serve important social, economic, and democratic functions in cities like Seattle. At the same time, some municipal datasets about the city and its citizens’ activities carry inherent risks to individual privacy when shared publicly. In 2016, the City of Seattle declared in its Open Data Policy that the city’s data would be “open by preference,” except when doing so may affect individual privacy. To ensure its Open Data program effectively protects individuals, Seattle committed to performing an annual risk assessment and tasked the Future of Privacy Forum (FPF) with creating and deploying an initial privacy risk assessment methodology for open data.This Draft Report provides tools and guidance to the City of Seattle and other municipalities navigating the complex policy, operational, technical, organizational, and ethical standards that support privacyprotective open data programs. Although there is a growing body of research on open data privacy, open data managers and departmental data owners need to be able to employ a standardized methodology for assessing the privacy risks and benefits of particular datasets internally, without a bevy of expert statisticians, privacy lawyers, or philosophers. By following a flexible, risk-based assessment process, the City of Seattle – and other municipal open data programs – can maximize the utility and openness of civic data while minimizing privacy risks to individuals and community concerns about ethical challenges, fairness, and equity.This Draft Report first describes inherent privacy risks in an open data landscape, with an emphasis on potential harms related to re-identification, data quality, and fairness. Accompanying this, the Draft Report includes a Model Open Data Benefit Risk Analysis (MODBRA). The model template evaluates the types of data contained in a proposed open dataset, the potential benefits – and concomitant risks – of releasing the dataset publicly, and strategies for effective de-identification and risk mitigation. This holistic assessment guides city officials to determine whether to release the dataset openly, in a limited access environment, or to withhold it from publication (absent countervailing public policy considerations). The Draft Report methodology builds on extensive work done in this field by experts at the National Institute of Standards and Technology, the University of Washington, the Berkman Klein Center for Internet & Society at Harvard University, and others, and adapts existing frameworks to the unique challenges faced by cities as local governments, technological system integrators, and consumer facing service providers.
FPF 2017.pdf
Haitner, Iftach, and Salil Vadhan. “The Many Entropies in One-way Functions.” In Tutorials on the Foundations of Cryptography, 159-217. Springer, Yehuda Lindell, ed. 2017. Publisher's VersionAbstract

Version History: 

Earlier versions: May 2017: ECCC TR 17-084

Dec. 2017: ECCC TR 17-084 (revised)

Computational analogues of information-theoretic notions have given rise to some of the most interesting phenomena in the theory of computation. For example, computational indistinguishability, Goldwasser and Micali [9], which is the computational analogue of statistical distance, enabled the bypassing of Shannon’s impossibility results on perfectly secure encryption, and provided the basis for the computational theory of pseudorandomness. Pseudoentropy, Håstad, Impagliazzo, Levin, and Luby [17], a computational analogue of entropy, was the key to the fundamental result establishing the equivalence of pseudorandom generators and one-way functions, and has become a basic concept in complexity theory and cryptography.

This tutorial discusses two rather recent computational notions of entropy, both of which can be easily found in any one-way function, the most basic cryptographic primitive. The first notion is next-block pseudoentropy, Haitner, Reingold, and Vadhan [14], a refinement of pseudoentropy that enables simpler and more ecient construction of pseudorandom generators. The second is inaccessible entropy, Haitner, Reingold, Vadhan, andWee [11], which relates to unforgeability and is used to construct simpler and more efficient universal one-way hash functions and statistically hiding commitments.

SPRINGER 2017.pdf ECCC 5-2017.pdf ECCC 12-2017.pdf
Vadhan, Salil. “The Complexity of Differential Privacy.” In Tutorials on the Foundations of Cryptography, 347-450. Springer, Yehuda Lindell, ed. 2017. Publisher's VersionAbstract

Version History: 

August 2016: Manuscript v1 (see files attached)

March 2017: Manuscript v2 (see files attached); Errata

April 2017: Published Version (in Tutorials on the Foundations of Cryptography; see Publisher's Version link and also SPRINGER 2017.PDF, below) 

 

Differential privacy is a theoretical framework for ensuring the privacy of individual-level data when performing statistical analysis of privacy-sensitive datasets. This tutorial provides an introduction to and overview of differential privacy, with the goal of conveying its deep connections to a variety of other topics in computational complexity, cryptography, and theoretical computer science at large. This tutorial is written in celebration of Oded Goldreich’s 60th birthday, starting from notes taken during a minicourse given by the author and Kunal Talwar at the 26th McGill Invitational Workshop on Computational Complexity [1].

 

SPRINGER 2017.pdf ERRATA 2017.pdf MANUSCRIPT 2017.pdf MANUSCRIPT 2016.pdf
Steinke, Thomas, Salil Vadhan, and Andrew Wan. “Pseudorandomness and Fourier growth bounds for width 3 branching programs.” Theory of Computing – Special Issue on APPROX-RANDOM 2014 13, no. 12 (2017): 1-50. Publisher's VersionAbstract

Version History: a conference version of this paper appeared in the Proceedings of the 18th International Workshop on Randomization and Computation (RANDOM'14). Full version posted as ECCC TR14-076 and arXiv:1405.7028 [cs.CC].

We present an explicit pseudorandom generator for oblivious, read-once, width-3 branching programs, which can read their input bits in any order. The generator has seed length \(Õ(\log^3 n)\).The previously best known seed length for this model is \(n^{1/2+o(1)}\) due to Impagliazzo, Meka, and Zuckerman (FOCS ’12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM ’13) for permutation branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any \(f : \{0, 1\}^n → \{0, 1\}\) computed by such a branching program, and \(k ∈ [n]\),

 \(\displaystyle\sum_{s⊆[n]:|s|=k} \big| \hat{f}[s] \big | ≤n^2 ·(O(\log n))^k\),

where \(\hat{f}[s] = \mathbb{E}_U [f[U] \cdot (-1)^{s \cdot U}]\) is the standard Fourier transform over \(\mathbb{Z}^n_2\). The base \(O(\log n)\) of the Fourier growth is tight up to a factor of \(\log \log n\).

TOC 2017.pdf APPROX-RANDOM 2014.pdf ArXiv 2014.pdf
Nissim, Kobbi, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David O'Brien, Thomas Steinke, and Salil Vadhan. “Bridging the gap between computer science and legal approaches to privacy.” Harvard Journal of Law & Technology 31, no. 2 (2017). Publisher's VersionAbstract

Version History: Workshopped at PLSC (Privacy Law Scholars Conference) ‘16.

 

The analysis and release of statistical data about individuals and groups of individuals carries inherent privacy risks, and these risks have been conceptualized in different ways within the fields of law and computer science. For instance, many information privacy laws adopt notions of privacy risk that are sector- or context-specific, such as in the case of laws that protect from disclosure certain types of information contained within health, educational, or financial records. In addition, many privacy laws refer to specific techniques, such as deidentification, that are designed to address a subset of possible attacks on privacy. In doing so, many legal standards for privacy protection rely on individual organizations to make case-by-case determinations regarding concepts such as the identifiability of the types of information they hold. These regulatory approaches are intended to be flexible, allowing organizations to (1) implement a variety of specific privacy measures that are appropriate given their varying institutional policies and needs, (2) adapt to evolving best practices, and (3) address a range of privacy-related harms. However, in the absence of clear thresholds and detailed guidance on making case-specific determinations, flexibility in the interpretation and application of such standards also creates uncertainty for practitioners and often results in ad hoc, heuristic processes. This uncertainty may pose a barrier to the adoption of new technologies that depend on unambiguous privacy requirements. It can also lead organizations to implement measures that fall short of protecting against the full range of data privacy risks.

HARVARD JLT 18.pdf
Vadhan., Salil P.On learning vs. refutation.” 30th Conference on Learning Theory (COLT `17), 2017, 65, 1835-1848. Publisher's VersionAbstract
Building on the work of Daniely et al. (STOC 2014, COLT 2016), we study the connection between computationally efficient PAC learning and refutation of constraint satisfaction problems. Specifically, we prove that for every concept class \(\mathcal{P }\) , PAC-learning \(\mathcal{P}\) is polynomially equivalent to “random-right-hand-side-refuting” (“RRHS-refuting”) a dual class \(\mathcal{P}^∗ \), where RRHS-refutation of a class \(Q\) refers to refuting systems of equations where the constraints are (worst-case) functions from the class \( Q\) but the right-hand-sides of the equations are uniform and independent random bits. The reduction from refutation to PAC learning can be viewed as an abstraction of (part of) the work of Daniely, Linial, and Shalev-Schwartz (STOC 2014). The converse, however, is new, and is based on a combination of techniques from pseudorandomness (Yao ‘82) with boosting (Schapire ‘90). In addition, we show that PAC-learning the class of \(DNF\) formulas is polynomially equivalent to PAC-learning its dual class \(DNF ^∗\) , and thus PAC-learning \(DNF\) is equivalent to RRHS-refutation of \(DNF\) , suggesting an avenue to obtain stronger lower bounds for PAC-learning \(DNF\) than the quasipolynomial lower bound that was obtained by Daniely and Shalev-Schwartz (COLT 2016) assuming the hardness of refuting \(k\)-SAT.
COLT 17.pdf

Pages