Publications by Year: 2017

2017
Wood, Alexandra, Micah Altman, Suso Baleato, and Salil Vadhan. Comments on the City of Seattle Open Data Risk Assessment, 2017. Publisher's VersionAbstract
The transparency goals of the open data movement serve important social, economic, and democratic functions in cities like Seattle. At the same time, some municipal datasets about the city and its citizens’ activities carry inherent risks to individual privacy when shared publicly. In 2016, the City of Seattle declared in its Open Data Policy that the city’s data would be “open by preference,” except when doing so may affect individual privacy. To ensure its Open Data program effectively protects individuals, Seattle committed to performing an annual risk assessment and tasked the Future of Privacy Forum (FPF) with creating and deploying an initial privacy risk assessment methodology for open data.This Draft Report provides tools and guidance to the City of Seattle and other municipalities navigating the complex policy, operational, technical, organizational, and ethical standards that support privacyprotective open data programs. Although there is a growing body of research on open data privacy, open data managers and departmental data owners need to be able to employ a standardized methodology for assessing the privacy risks and benefits of particular datasets internally, without a bevy of expert statisticians, privacy lawyers, or philosophers. By following a flexible, risk-based assessment process, the City of Seattle – and other municipal open data programs – can maximize the utility and openness of civic data while minimizing privacy risks to individuals and community concerns about ethical challenges, fairness, and equity.This Draft Report first describes inherent privacy risks in an open data landscape, with an emphasis on potential harms related to re-identification, data quality, and fairness. Accompanying this, the Draft Report includes a Model Open Data Benefit Risk Analysis (MODBRA). The model template evaluates the types of data contained in a proposed open dataset, the potential benefits – and concomitant risks – of releasing the dataset publicly, and strategies for effective de-identification and risk mitigation. This holistic assessment guides city officials to determine whether to release the dataset openly, in a limited access environment, or to withhold it from publication (absent countervailing public policy considerations). The Draft Report methodology builds on extensive work done in this field by experts at the National Institute of Standards and Technology, the University of Washington, the Berkman Klein Center for Internet & Society at Harvard University, and others, and adapts existing frameworks to the unique challenges faced by cities as local governments, technological system integrators, and consumer facing service providers.
Haitner, Iftach, and Salil Vadhan. “The Many Entropies in One-way Functions.” In Tutorials on the Foundations of Cryptography, 159-217. Springer, Yehuda Lindell, ed. 2017. Publisher's VersionAbstract

Version History:

Earlier versions: May 2017: ECCC TR 17-084

Dec. 2017: ECCC TR 17-084 (revised)

Computational analogues of information-theoretic notions have given rise to some of the most interesting phenomena in the theory of computation. For example, computational indistinguishability, Goldwasser and Micali [9], which is the computational analogue of statistical distance, enabled the bypassing of Shannon’s impossibility results on perfectly secure encryption, and provided the basis for the computational theory of pseudorandomness. Pseudoentropy, Håstad, Impagliazzo, Levin, and Luby [17], a computational analogue of entropy, was the key to the fundamental result establishing the equivalence of pseudorandom generators and one-way functions, and has become a basic concept in complexity theory and cryptography.

This tutorial discusses two rather recent computational notions of entropy, both of which can be easily found in any one-way function, the most basic cryptographic primitive. The first notion is next-block pseudoentropy, Haitner, Reingold, and Vadhan [14], a refinement of pseudoentropy that enables simpler and more ecient construction of pseudorandom generators. The second is inaccessible entropy, Haitner, Reingold, Vadhan, andWee [11], which relates to unforgeability and is used to construct simpler and more efficient universal one-way hash functions and statistically hiding commitments.

Vadhan, Salil. “The Complexity of Differential Privacy.” In Tutorials on the Foundations of Cryptography, 347-450. Springer, Yehuda Lindell, ed. 2017. Publisher's VersionAbstract

Version History:

August 2016: Manuscript v1 (see files attached)

March 2017: Manuscript v2 (see files attached); Errata

April 2017: Published Version (in Tutorials on the Foundations of Cryptography; see Publisher's Version link and also SPRINGER 2017.PDF, below)

Differential privacy is a theoretical framework for ensuring the privacy of individual-level data when performing statistical analysis of privacy-sensitive datasets. This tutorial provides an introduction to and overview of differential privacy, with the goal of conveying its deep connections to a variety of other topics in computational complexity, cryptography, and theoretical computer science at large. This tutorial is written in celebration of Oded Goldreich’s 60th birthday, starting from notes taken during a minicourse given by the author and Kunal Talwar at the 26th McGill Invitational Workshop on Computational Complexity [1].

Steinke, Thomas, Salil Vadhan, and Andrew Wan. “Pseudorandomness and Fourier growth bounds for width 3 branching programs.” Theory of Computing – Special Issue on APPROX-RANDOM 2014 13, no. 12 (2017): 1-50. Publisher's VersionAbstract

Version History: a conference version of this paper appeared in the Proceedings of the 18th International Workshop on Randomization and Computation (RANDOM'14). Full version posted as ECCC TR14-076 and arXiv:1405.7028 [cs.CC].

We present an explicit pseudorandom generator for oblivious, read-once, width-3 branching programs, which can read their input bits in any order. The generator has seed length $$Õ(\log^3 n)$$.The previously best known seed length for this model is $$n^{1/2+o(1)}$$ due to Impagliazzo, Meka, and Zuckerman (FOCS ’12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM ’13) for permutation branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any $$f : \{0, 1\}^n → \{0, 1\}$$ computed by such a branching program, and $$k ∈ [n]$$,

$$\displaystyle\sum_{s⊆[n]:|s|=k} \big| \hat{f}[s] \big | ≤n^2 ·(O(\log n))^k$$,

where $$\hat{f}[s] = \mathbb{E}_U [f[U] \cdot (-1)^{s \cdot U}]$$ is the standard Fourier transform over $$\mathbb{Z}^n_2$$. The base $$O(\log n)$$ of the Fourier growth is tight up to a factor of $$\log \log n$$.

Nissim, Kobbi, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David O'Brien, Thomas Steinke, and Salil Vadhan. “Bridging the gap between computer science and legal approaches to privacy.” Harvard Journal of Law & Technology 31, no. 2 (2017). Publisher's VersionAbstract

Version History: Workshopped at PLSC (Privacy Law Scholars Conference) ‘16.

The analysis and release of statistical data about individuals and groups of individuals carries inherent privacy risks, and these risks have been conceptualized in different ways within the fields of law and computer science. For instance, many information privacy laws adopt notions of privacy risk that are sector- or context-specific, such as in the case of laws that protect from disclosure certain types of information contained within health, educational, or financial records. In addition, many privacy laws refer to specific techniques, such as deidentification, that are designed to address a subset of possible attacks on privacy. In doing so, many legal standards for privacy protection rely on individual organizations to make case-by-case determinations regarding concepts such as the identifiability of the types of information they hold. These regulatory approaches are intended to be flexible, allowing organizations to (1) implement a variety of specific privacy measures that are appropriate given their varying institutional policies and needs, (2) adapt to evolving best practices, and (3) address a range of privacy-related harms. However, in the absence of clear thresholds and detailed guidance on making case-specific determinations, flexibility in the interpretation and application of such standards also creates uncertainty for practitioners and often results in ad hoc, heuristic processes. This uncertainty may pose a barrier to the adoption of new technologies that depend on unambiguous privacy requirements. It can also lead organizations to implement measures that fall short of protecting against the full range of data privacy risks.

Vadhan., Salil P.On learning vs. refutation.” 30th Conference on Learning Theory (COLT 17), 2017, 65, 1835-1848. Publisher's VersionAbstract
Building on the work of Daniely et al. (STOC 2014, COLT 2016), we study the connection between computationally efficient PAC learning and refutation of constraint satisfaction problems. Specifically, we prove that for every concept class $$\mathcal{P }$$ , PAC-learning $$\mathcal{P}$$ is polynomially equivalent to “random-right-hand-side-refuting” (“RRHS-refuting”) a dual class $$\mathcal{P}^∗$$, where RRHS-refutation of a class $$Q$$ refers to refuting systems of equations where the constraints are (worst-case) functions from the class $$Q$$ but the right-hand-sides of the equations are uniform and independent random bits. The reduction from refutation to PAC learning can be viewed as an abstraction of (part of) the work of Daniely, Linial, and Shalev-Schwartz (STOC 2014). The converse, however, is new, and is based on a combination of techniques from pseudorandomness (Yao ‘82) with boosting (Schapire ‘90). In addition, we show that PAC-learning the class of $$DNF$$ formulas is polynomially equivalent to PAC-learning its dual class $$DNF ^∗$$ , and thus PAC-learning $$DNF$$ is equivalent to RRHS-refutation of $$DNF$$ , suggesting an avenue to obtain stronger lower bounds for PAC-learning $$DNF$$ than the quasipolynomial lower bound that was obtained by Daniely and Shalev-Schwartz (COLT 2016) assuming the hardness of refuting $$k$$-SAT.
Murtagh, Jack, Omer Reingold, Aaron Sidford, and Salil Vadhan. “Derandomization beyond connectivity: Undirected Laplacian systems in nearly logarithmic space.58th Annual IEEE Symposium on Foundations of Computer Science (FOCS 17), 2017. Publisher's VersionAbstract
Version History
ArXiv, 15 August 2017 https://arxiv.org/abs/1708.04634

We give a deterministic $$\overline{O} (\log n)$$-space algorithm for approximately solving linear systems given by Laplacians of undirected graphs, and consequently also approximating hitting times, commute times, and escape probabilities for undirected graphs. Previously, such systems were known to be solvable by randomized algorithms using $$O(\log n)$$ space (Doron, Le Gall, and Ta-Shma, 2017) and hence by deterministic algorithms using $$O(\log^{3/2} n)$$  space (Saks and Zhou, FOCS 1995 and JCSS 1999).

Our algorithm combines ideas from time-efficient Laplacian solvers (Spielman and Teng, STOC '04; Peng and Spielman, STOC '14) with ideas used to show that Undirected S-T Connectivity is in deterministic logspace (Reingold, STOC '05 and JACM '08; Rozenman and Vadhan, RANDOM '05).
Chen, Yi-Hsiu, Kai-Min Chung, Ching-Yi Lai, Salil P. Vadhan, and Xiaodi Wu.Computational notions of quantum min-entropy.” In Poster presention at QIP 2017 and oral presentation at QCrypt 2017, 2017. Publisher's VersionAbstract

Version History

ArXiv v1, 24 April 2017 https://arxiv.org/abs/1704.07309v1
ArXiv v2, 25 April 2017 https://arxiv.org/abs/1704.07309v2
ArXiv v3, 9 September 2017 https://arxiv.org/abs/1704.07309v3
ArXiv v4, 5 October 2017 https://arxiv.org/abs/1704.07309v4

We initiate the study of computational entropy in the quantum setting. We investigate to what extent the classical notions of computational entropy generalize to the quantum setting, and whether quantum analogues of classical theorems hold. Our main results are as follows. (1) The classical Leakage Chain Rule for pseudoentropy can be extended to the case that the leakage information is quantum (while the source remains classical). Specifically, if the source has pseudoentropy at least $$k$$, then it has pseudoentropy at least $$k−ℓ$$ conditioned on an $$ℓ$$-qubit leakage. (2) As an application of the Leakage Chain Rule, we construct the first quantum leakage-resilient stream-cipher in the bounded-quantum-storage model, assuming the existence of a quantum-secure pseudorandom generator. (3) We show that the general form of the classical Dense Model Theorem (interpreted as the equivalence between two definitions of pseudo-relative-min-entropy) does not extend to quantum states. Along the way, we develop quantum analogues of some classical techniques (e.g. the Leakage Simulation Lemma, which is proven by a Non-uniform Min-Max Theorem or Boosting). On the other hand, we also identify some classical techniques (e.g. Gap Amplification) that do not work in the quantum setting. Moreover, we introduce a variety of notions that combine quantum information and quantum complexity, and this raises several directions for future work.