Publications

2019
Murtagh, Jack, Omer Reingold, Aaron Sidford, and Salil Vadhan. “Deterministic Approximation of Random Walks in Small Space.” arXiv, 2019, 1903.06361 [cs.CR]. ArXiv VersionAbstract

We give a deterministic, nearly logarithmic-space algorithm that given an undirected graph \(G\), a positive integer \(r\), and a set \(S\) of vertices, approximates the conductance of \(S\) in the \(r\)-step random walk on \(G\) to within a factor of \(1+ϵ\), where \(ϵ > 0\) is an arbitrarily small constant. More generally, our algorithm computes an \(ϵ\)-spectral approximation to the normalized Laplacian of the \(r\)-step walk. Our algorithm combines the derandomized square graph operation (Rozenman and Vadhan, 2005), which we recently used for solving Laplacian systems in nearly logarithmic space (Murtagh, Reingold, Sidford, and Vadhan, 2017), with ideas from (Cheng, Cheng, Liu, Peng, and Teng, 2015), which gave an algorithm that is time-efficient (while ours is space-efficient) and randomized (while ours is deterministic) for the case of even \(r\) (while ours works for all \(r\)). Along the way, we provide some new results that generalize technical machinery and yield improvements over previous work. First, we obtain a nearly linear-time randomized algorithm for computing a spectral approximation to the normalized Laplacian for odd \(r\). Second, we define and analyze a generalization of the derandomized square for irregular graphs and for sparsifying the product of two distinct graphs. As part of this generalization, we also give a strongly explicit construction of expander graphs of every size.

ArXiv 2019.pdf
Agrawal, Rohit, Yi-Hsiu Chen, Thibaut Horel, and Salil Vadhan. “Unifying computational entropies via Kullback-Leibler divergence.” arXiv, 2019, 1902.11202 [cs.CR]. ArXiv VersionAbstract
We introduce KL-hardness, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible-entropy, two forms of computational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, KL-hardness unifies the latter two notions of computational entropy and sheds light on the apparent "duality" between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC '12).
ArXiv 2019.pdf
2018
Raghunathan, Ananth, Gil Segev, and Salil P. Vadhan. “Deterministic public-key encryption for adaptively-chosen plaintext distributions.” Journal of Cryptology 31, no. 4 (2018): 1012-1063. EUROCRYPT VersionAbstract

Version History: Preliminary versions in EUROCRYPT ‘13 and Cryptology ePrint report 2013/125.

Bellare, Boldyreva, and O’Neill (CRYPTO ’07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially-chosen plaintext distributions that are independent of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme.

We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to adaptively choose plaintext distributions after seeing the public key, in an interactive manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by \(2^p\), where \(p\) can be any predetermined polynomial in the security parameter. For example, with \(p=0\) we capture plaintext distributions that are independent of the public key, and with \(p=0(s \log s)\) we capture, in particular, all plaintext distributions that are samplable by circuits of size \(s\).

Within our framework we present both constructions in the random-oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS ’00) and Dodis (PhD Thesis, MIT, ’00).

JCRYPTOL2018.pdf EUROCRYPT2013.pdf IACR2013.pdf
Bun, Mark, Jonathan Ullman, and Salil Vadhan. “Fingerprinting codes and the price of approximate differential privacy.” SIAM Journal on Computing 47, no. 5 (2018): 1888-1938. Publisher's VersionAbstract

Version HistorySpecial Issue on STOC ‘14. Preliminary versions in STOC ‘14 and arXiv:1311.3158 [cs.CR].

We show new lower bounds on the sample complexity of (ε,δ)-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D∈({0,1}d)n has the form "What fraction of the individual records in the database satisfy the property q?" We show that in order to answer an arbitrary set  of ≫nd counting queries on D to within error ±α it is necessary that 

n≥Ω̃(d‾‾√log||α2ε).

This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and (ε,δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is O(log||/α2). In addition, we show that our lower bound holds for the specific case of k-way marginal queries (where ||=2k(dk)) when α is not too small compared to d (e.g. when α is any fixed constant). 
Our results rely on the existence of short \emph{fingerprinting codes} (Boneh and Shaw, CRYPTO'95, Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.

ArXiV2018.pdf STOC2014.pdf SIAM2018.pdf
Karwa, Vishesh, and Salil Vadhan. “Finite sample differentially private confidence intervals.” In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), 44:1-44:9. Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. ITCS, 2018. Publisher's VersionAbstract

Version History: Also presented at TPDP 2017. Preliminary version posted as arXiv:1711.03908 [cs.CR].

We study the problem of estimating finite sample confidence intervals of the mean of a nor- mal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence in- tervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors.

ITCS2018.pdf ArXiv2017.pdf
Balcer, Victor, and Salil Vadhan. “Differential privacy on finite computers.” In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), 43:1-43:21. Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ITCS, 2018. Publisher's VersionAbstract

Version History: Also presented at TPDP 2017. Invited to J. Privacy & Condentiality Special Issue on TPDP 2017. Preliminary version posted as arXiv:1709.05396 [cs.DS].

We consider the problem of designing and analyzing differentially private algorithms that can be implemented on discrete models of computation in strict polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial- time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe \(\chi\). The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in \(|\chi|\), which can be exponential in the bit length of the input.

In this paper, we provide strict polynomial-time discrete algorithms for approximate his- tograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its “per-bin accuracy” (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of \(\log |\chi|\), but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram; it is based on an \((n + 1)\)-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.

 

ITCS2018.pdf ArXiv2018.pdf
Murtagh, Jack, Kathryn Taylor, George Kellaris, and Salil P. Vadhan. “Usable differential privacy: A case study with PSI.” arXiv, 2018, 1809.04103 [cs.CR]. ArXiv VersionAbstract

Differential privacy is a promising framework for addressing the privacy concerns in sharing sensitive datasets for others to analyze. However differential privacy is a highly technical area and current deployments often require experts to write code, tune parameters, and optimize the trade-off between the privacy and accuracy of statistical releases. For differential privacy to achieve its potential for wide impact, it is important to design usable systems that enable differential privacy to be used by ordinary data owners and analysts. PSI is a tool that was designed for this purpose, allowing researchers to release useful differentially private statistical information about their datasets without being experts in computer science, statistics, or privacy. We conducted a thorough usability study of PSI to test whether it accomplishes its goal of usability by non-experts. The usability test illuminated which features of PSI are most user-friendly and prompted us to improve aspects of the tool that caused confusion. The test also highlighted some general principles and lessons for designing usable systems for differential privacy, which we discuss in depth.

ArXiv2018.pdf
Bun, Mark, Jonathan Ullman, and Salil Vadhan. “Fingerprinting codes and the price of approximate differential privacy.SIAM Journal on Computing 47, no. 5 (2018): 1888-1938. Special Issue on STOC '14. Publisher's VersionAbstract

Version History: Preliminary versions in STOC '14 and arXiv:1311.3158 [cs.CR].

We show new information-theoretic lower bounds on the sample complexity of \((ε, δ)\)-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database has the form “What fraction of the individual records in the database satisfy the property \(q\)?” We show that in order to answer an arbitrary set \(Q\) of \(>> d/ \alpha^2\) counting queries on \(D\) to within error \(\pm\alpha\) it is necessary that \(n ≥ \overline{Ω} (\sqrt{d} \log |Q| /α ^2ε).\)This bound is optimal up to polylogarithmic factors, as demonstrated by the private multiplicative weights algorithm (Hardt and Rothblum, FOCS’10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is \(O (\log |Q|/α^2)\). In addition, we show that our lower bound holds for the specific case of k-way marginal queries (where \(|Q| = 2^k \binom dk\)) when \(α\) is not too small compared to d (e.g., when \(α\) is any fixed constant). Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO’95; Tardos, STOC’03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample-complexity lower bounds into stronger lower bounds.

SICOMP18.pdf STOC14.pdf
Raghunathan, Ananth, Gil Segev, and Salil P. Vadhan. “Deterministic public-key encryption for adaptively-chosen plaintext distributions.” Journal of Cryptology 31, no. 4 (2018): 1012-1063. Publisher's VersionAbstract

Version History: Preliminary versions in EUROCRYPT ‘13 and Cryptology ePrint report 2013/125.

 

Bellare, Boldyreva, and O’Neill (CRYPTO ’07) initiated the study of deterministic public-key encryption as an alternative in scenarios where randomized encryption has inherent drawbacks. The resulting line of research has so far guaranteed security only for adversarially chosen-plaintext distributions that are independent of the public key used by the scheme. In most scenarios, however, it is typically not realistic to assume that adversaries do not take the public key into account when attacking a scheme. We show that it is possible to guarantee meaningful security even for plaintext distributions that depend on the public key. We extend the previously proposed notions of security, allowing adversaries to adaptively choose plaintext distributions after seeing the public key, in an interactive manner. The only restrictions we make are that: (1) plaintext distributions are unpredictable (as is essential in deterministic public-key encryption), and (2) the number of plaintext distributions from which each adversary is allowed to adaptively choose is upper bounded by 2\(^p\), where \(p\) can be any predetermined polynomial in the security parameter and plaintext length. For example, with \(p = 0\)we capture plaintext distributions that are independent of the public key, and with \(p = 0 (s \log s)\)we capture, in particular, all plaintext distributions that are samplable by circuits of size s. Within our framework we present both constructions in the random oracle model based on any public-key encryption scheme, and constructions in the standard model based on lossy trapdoor functions (thus, based on a variety of number-theoretic assumptions). Previously known constructions heavily relied on the independence between the plaintext distributions and the public key for the purposes of randomness extraction. In our setting, however, randomness extraction becomes significantly more challenging once the plaintext distributions and the public key are no longer independent. Our approach is inspired by research on randomness extraction from seed-dependent distributions. Underlying our approach is a new generalization of a method for such randomness extraction, originally introduced by Trevisan and Vadhan (FOCS ’00) and Dodis (Ph.D. Thesis, MIT, ’00).

JCrypt18.pdf EUROCRYPT13.pdf
Chen, Yi-Hsiu, Mika Goos, Salil P. Vadhan, and Jiapeng Zhang. “A tight lower bound for entropy flattening.” In 33rd Computational Complexity Conference (CCC 2018), 102:23:21-23:28. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik: Leibniz International Proceedings in Informatics (LIPIcs), 2018. Publisher's VersionAbstract

Version History: Preliminary version posted as ECCC TR18-119.

We study entropy flattening: Given a circuit \(C_X\) implicitly describing an n-bit source \(X\) (namely, \(X\) is the output of \(C_X \)  on a uniform random input), construct another circuit \(C_Y\) describing a source \(Y\) such that (1) source \(Y\) is nearly flat (uniform on its support), and (2) the Shannon entropy of \(Y\) is monotonically related to that of \(X\). The standard solution is to have \(C_Y\) evaluate \(C_X\) altogether \(\Theta(n^2)\) times on independent inputs and concatenate the results (correctness follows from the asymptotic equipartition property). In this paper, we show that this is optimal among black-box constructions: Any circuit \(C_Y\) for entropy flattening that repeatedly queries \(C_X\) as an oracle requires \(\Omega(n^2)\)queries.

Entropy flattening is a component used in the constructions of pseudorandom generators and other cryptographic primitives from one-way functions [12, 22, 13, 6, 11, 10, 7, 24]. It is also used in reductions between problems complete for statistical zero-knowledge [19, 23, 4, 25]. The \(\Theta(n^2)\) query complexity is often the main efficiency bottleneck. Our lower bound can be viewed as a step towards proving that the current best construction of pseudorandom generator from arbitrary one-way functions by Vadhan and Zheng (STOC 2012) has optimal efficiency.

CCC 2018.pdf
Wood, Alexandra, Micah Altman, Aaron Bembenek, Mark Bun, Marco Gaboardi, James Honaker, Kobbi Nissim, David R. OBrien, Thomas Steinke, and Salil Vadhan. “Differential privacy: A primer for a non-technical audience.Vanderbilt Journal of Entertainment & Technology Law 21, no. 1 (2018): 209-275. Publisher's VersionAbstract

Version History: Preliminary version workshopped at PLSC 2017.

Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics.

These range from simple statistical estimations, such as averages, to machine learning. Tools for differentially private analysis are now in early stages of implementation and use across a variety of academic, industry, and government settings. Interest in the concept is growing among potential users of the tools, as well as within legal and policy communities, as it holds promise as a potential approach to satisfying legal requirements for privacy protection when handling personal information. In particular, differential privacy may be seen as a technical solution for analyzing and sharing data while protecting the privacy of individuals in accordance with existing legal or policy requirements for de-identification or disclosure limitation.

This primer seeks to introduce the concept of differential privacy and its privacy implications to non-technical audiences. It provides a simplified and informal, but mathematically accurate, description of differential privacy. Using intuitive illustrations and limited mathematical formalism, it discusses the definition of differential privacy, how differential privacy addresses privacy risks, how differentially private analyses are constructed, and how such analyses can be used in practice. A series of illustrations is used to show how practitioners and policymakers can conceptualize the guarantees provided by differential privacy. These illustrations are also used to explain related concepts, such as composition (the accumulation of risk across multiple analyses), privacy loss parameters, and privacy budgets.

This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.

JETlaw.pdf
2017
Nissim, Kobbi, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David O'Brien, Thomas Steinke, and Salil Vadhan. “Bridging the gap between computer science and legal approaches to privacy.” Harvard Journal of Law & Technology (2017). Publisher's VersionAbstract

Version History: Workshopped at PLSC (Privacy Law Scholars Conference) ‘16.

 

The analysis and release of statistical data about individuals and groups of individuals carries inherent privacy risks, and these risks have been conceptualized in different ways within the fields of law and computer science. For instance, many information privacy laws adopt notions of privacy risk that are sector- or context-specific, such as in the case of laws that protect from disclosure certain types of information contained within health, educational, or financial records. In addition, many privacy laws refer to specific techniques, such as deidentification, that are designed to address a subset of possible attacks on privacy. In doing so, many legal standards for privacy protection rely on individual organizations to make case-by-case determinations regarding concepts such as the identifiability of the types of information they hold. These regulatory approaches are intended to be flexible, allowing organizations to (1) implement a variety of specific privacy measures that are appropriate given their varying institutional policies and needs, (2) adapt to evolving best practices, and (3) address a range of privacy-related harms. However, in the absence of clear thresholds and detailed guidance on making case-specific determinations, flexibility in the interpretation and application of such standards also creates uncertainty for practitioners and often results in ad hoc, heuristic processes. This uncertainty may pose a barrier to the adoption of new technologies that depend on unambiguous privacy requirements. It can also lead organizations to implement measures that fall short of protecting against the full range of data privacy risks.

Harvard JLT 18.pdf
Vadhan., Salil P.On learning vs. refutation.30th Conference on Learning Theory (COLT `17),, 2017, 65, 1835-1848. Publisher's VersionAbstract
Building on the work of Daniely et al. (STOC 2014, COLT 2016), we study the connection between computationally efficient PAC learning and refutation of constraint satisfaction problems. Specifically, we prove that for every concept class \(P \) , PAC-learning \(P\) is \em polynomially equivalent to “random-right-hand-side-refuting” (“RRHS-refuting”) a dual class \(P ^∗ \) , where RRHS-refutation of a class \(Q\) refers to refuting systems of equations where the constraints are (worst-case) functions from the class \( Q\) but the right-hand-sides of the equations are uniform and independent random bits. The reduction from refutation to PAC learning can be viewed as an abstraction of (part of) the work of Daniely, Linial, and Shalev-Schwartz (STOC 2014). The converse, however, is new, and is based on a combination of techniques from pseudorandomness (Yao ‘82) with boosting (Schapire ‘90). In addition, we show that PAC-learning the class of \(DNF\) formulas is polynomially equivalent to PAC-learning its dual class \(DNF ^∗\) , and thus PAC-learning \(DNF\) is equivalent to RRHS-refutation of \(DNF\) , suggesting an avenue to obtain stronger lower bounds for PAC-learning \(DNF\) than the quasipolynomial lower bound that was obtained by Daniely and Shalev-Schwartz (COLT 2016) assuming the hardness of refuting \(k\)-SAT
on_learning_vs_refutation.pdf
Murtagh, Jack, Omer Reingold, Aaron Sidford, and Salil Vadhan.Derandomization beyond connectivity: Undirected Laplacian systems in nearly logarithmic space.58th Annual IEEE Symposium on Foundations of Computer Science (FOCS `17), 2017. Publisher's VersionAbstract
We give a deterministic \(O ~ (logn)\)  -space algorithm for approximately solving linear systems given by Laplacians of undirected graphs, and consequently also approximating hitting times, commute times, and escape probabilities for undirected graphs. Previously, such systems were known to be solvable by randomized algorithms using \(O(logn)\) space (Doron, Le Gall, and Ta-Shma, 2017) and hence by deterministic algorithms using\( O(log 3/2 n)\)  space (Saks and Zhou, FOCS 1995 and JCSS 1999).
Our algorithm combines ideas from time-efficient Laplacian solvers (Spielman and Teng, STOC `04; Peng and Spielman, STOC `14) with ideas used to show that Undirected S-T Connectivity is in deterministic logspace (Reingold, STOC `05 and JACM `08; Rozenman and Vadhan, RANDOM `05). 
1708.04634.pdf
Chen, Yi-Hsiu, Kai-Min Chung, Ching-Yi Lai, Salil P. Vadhan, and Xiaodi Wu.Computational notions of quantum min-entropy.” In Poster presention at QIP 2017 and oral presentation at QCrypt 2017, 2017. Publisher's VersionAbstract
We initiate the study of computational entropy in the quantum setting. We investigate to what extent the classical notions of computational entropy generalize to the quantum setting, and whether quantum analogues of classical theorems hold. Our main results are as follows. (1) The classical Leakage Chain Rule for pseudoentropy can be extended to the case that the leakage information is quantum (while the source remains classical). Specifically, if the source has pseudoentropy at least k  , then it has pseudoentropy at least k−ℓ  conditioned on an ℓ  -qubit leakage. (2) As an application of the Leakage Chain Rule, we construct the first quantum leakage-resilient stream-cipher in the bounded-quantum-storage model, assuming the existence of a quantum-secure pseudorandom generator. (3) We show that the general form of the classical Dense Model Theorem (interpreted as the equivalence between two definitions of pseudo-relative-min-entropy) does not extend to quantum states. Along the way, we develop quantum analogues of some classical techniques (e.g. the Leakage Simulation Lemma, which is proven by a Non-uniform Min-Max Theorem or Boosting). On the other hand, we also identify some classical techniques (e.g. Gap Amplification) that do not work in the quantum setting. Moreover, we introduce a variety of notions that combine quantum information and quantum complexity, and this raises several directions for future work. 
1704.07309.pdf
2016
Chen, Yiling, Stephen Chong, Ian A. Kash, Tal Moran, and Salil P. Vadhan. “Truthful mechanisms for agents that value privacy.” ACM Transactions on Economics and Computation 4, no. 3 (2016). ArXiv VersionAbstract

Version History: Special issue on EC ‘13. Preliminary version at arXiv:1111.5472 [cs.GT] (Nov. 2011).

Recent work has constructed economic mechanisms that are both truthful and differentially private. In these mechanisms, privacy is treated separately from truthfulness; it is not incorporated in players’ utility functions (and doing so has been shown to lead to nontruthfulness in some cases). In this work, we propose a new, general way of modeling privacy in players’ utility functions. Specifically, we only assume that if an outcome \({o}\) has the property that any report of player \({i}\) would have led to \({o}\) with approximately the same probability, then \({o}\) has a small privacy cost to player \({i}\). We give three mechanisms that are truthful with respect to our modeling of privacy: for an election between two candidates, for a discrete version of the facility location problem, and for a general social choice problem with discrete utilities (via a VCG-like mechanism). As the number \({n}\) of players increases, the social welfare achieved by our mechanisms approaches optimal (as a fraction of \({n}\)).

ACM2016.pdf ArXiv2012.pdf
Murtagh, Jack, and Salil Vadhan. “The complexity of computing the optimal composition of differential privacy.” In Eyal Kushilevitz and Tal Malkin, editors, Proceedings of the 13th IACR Theory of Cryptography Conference (TCC ‘16-A), 9562:157-175. Lecture Notes in Computer Science, Springer-Verlag, 2016. Publisher's VersionAbstract

Version History: Full version posted on CoRR, abs/1507.03113, July 2015.

In the study of differential privacy, composition theorems (starting with the original paper of Dwork, McSherry, Nissim, and Smith (TCC’06)) bound the degradation of privacy when composing several differentially private algorithms. Kairouz, Oh, and Viswanath (ICML’15) showed how to compute the optimal bound for composing \(k\) arbitrary (\(\epsilon\),\(\delta\))- differentially private algorithms. We characterize the optimal composition for the more general case of \(k\) arbitrary (\(\epsilon_1\) , \(\delta_1\) ), . . . , (\(\epsilon_k\) , \(\delta_k\) )-differentially private algorithms where the privacy parameters may differ for each algorithm in the composition. We show that computing the optimal composition in general is \(\#\)P-complete. Since computing optimal composition exactly is infeasible (unless FP\(=\)\(\#\)P), we give an approximation algorithm that computes the composition to arbitrary accuracy in polynomial time. The algorithm is a modification of Dyer’s dynamic programming approach to approximately counting solutions to knapsack problems (STOC’03).

ArXiv2016.pdf THEORY_COMP_2018.pdf
Nissim, Kobbi, Uri Stemmer, and Salil Vadhan. “Locating a small cluster privately.” In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ‘16), 413-427. ACM, 2016. Publisher's VersionAbstract

Version HistoryFull version posted as arXiv:1604.05590 [cs.DS].

We present a new algorithm for locating a small cluster of points with differential privacy [Dwork, McSherry, Nissim, and Smith, 2006]. Our algorithm has implications to private data exploration, clustering, and removal of outliers. Furthermore, we use it to significantly relax the requirements of the sample and aggregate technique [Nissim, Raskhod- nikova, and Smith, 2007], which allows compiling of “off the shelf ” (non-private) analyses into analyses that preserve differential privacy.
 

ArXiv2018.pdf
Gaboardi, Marco, Hyun Woo Lim, Ryan Rogers, and Salil Vadhan. “Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing.” In M. Balcan and K. Weinberger, editors, Proceedings of the 33rd International Conference on Machine Learning (ICML ‘16). 2111-2120, 2016. ArXiv VersionAbstract

Version History: Preliminary version posted as arXiv:1602.03090.

Hypothesis testing is a useful statistical tool in determining whether a given model should be rejected based on a sample from the population. Sample data may contain sensitive information about individuals, such as medical information. Thus it is important to design statistical tests that guarantee the privacy of subjects in the data. In this work, we study hypothesis testing subject to differential privacy, specifically chi-squared tests for goodness of fit for multinomial data and independence between two categorical variables.

We propose new tests for goodness of fit and independence testing that like the classical versions can be used to determine whether a given model should be rejected or not, and that additionally can ensure differential privacy. We give both Monte Carlo based hypothesis tests as well as hypothesis tests that more closely follow the classical chi-squared goodness of fit test and the Pearson chi-squared test for independence. Crucially, our tests account for the distribution of the noise that is injected to ensure privacy in determining significance.

We show that these tests can be used to achieve desired significance levels, in sharp contrast to direct applications of classical tests to differentially private contingency tables which can result in wildly varying significance levels. Moreover, we study the statistical power of these tests. We empirically show that to achieve the same level of power as the classical non-private tests our new tests need only a relatively modest increase in sample size.

Gaboardi, Marco, James Honaker, Gary King, Jack Murtagh, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. “PSI (Ψ): a private data-sharing interface.” In Poster presentation at the 2nd Workshop on the Theory and Practice of Differential Privacy (TPDP ‘16), 2016. ArXiv VersionAbstract

Version History: Paper posted as arXiv:1609.04340 [cs.CR].

We provide an overview of the design of PSI (“a Private data Sharing Interface”), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.

TPDP_POSTER.pdf ArXiv2018.pdf

Pages