Publications

2024
Barak, Boaz, Yael Kalai, Ran Raz, Salil Vadhan, and Nisheeth Vishnoi. “On the Works of Avi Widgerson.” In The Abel Prize 2018-2022 (eds. Helge Holden and Ragni Piene). 2023rd ed. Springer, Cham, 2024. Publisher's VersionAbstract ArXiv 2023.pdf
Doron, Dean, Jack Murtagh, Salil Vadhan, and David Zuckerman. “Small-space spectral sparsification via bounded-independence sampling.” ACM Transactions on Computation Theory (2024). Publisher's VersionAbstract
Version History:
NB: Some earlier versions published as "Spectral sparsification via bounded-independence sampling".

We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph \(G\) on \(n\) vertices described by a binary string of length \(N\), an integer \(k \leq \log n \) and an error parameter \(\varepsilon > 0\), our algorithm runs in space \(\tilde{O}(k \log(N ^. w_{max}/w_{min}))\) where \(w_{max}\) and \(w_{min}\) are the maximum and minimum edge weights in \(G\), and produces a weighted graph \(H\) with \(\tilde{O}(n^{1+2/k} / \varepsilon^2)\)expected edges that spectrally approximates \(G\), in the sense of Spielmen and Teng [ST04], up to an error of \(\varepsilon\).

Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastava's effective resistance based edge sampling algorithm [SS08] and uses results from recent work on space-bounded Laplacian solvers [MRSV17]. In particular, we demonstrate an inherent tradeoff (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by \(k\) above, and the resulting sparsity that can be achieved.

ECCC 2020.pdf ICALP 2020.pdf ACM 2023.pdf
2023
Casacuberta, Sílvia, Cynthia Dwork, and Salil Vadhan. “Complexity-theoretic implications of multicalibration” (2023). ArXiv VersionAbstract

We present connections between the recent literature on multigroup fairness for prediction algorithms and classical results in computational complexity. Multiaccurate predictors are correct in expectation on each member of an arbitrary collection of pre-specified sets. Multicalibrated predictors satisfy a stronger condition: they are calibrated on each set in the collection.

Multiaccuracy is equivalent to a regularity notion for functions defined by Trevisan, Tulsiani, and Vadhan (2009). They showed that, given a class F of (possibly simple) functions, an arbitrarily complex function g can be approximated by a low-complexity function h that makes a small number of oracle calls to members of F, where the notion of approximation requires that h cannot be distinguished from g by members of F. This complexity-theoretic Regularity Lemma is known to have implications in different areas, including in complexity theory, additive number theory, information theory, graph theory, and cryptography. Starting from the stronger notion of multicalibration, we obtain stronger and more general versions of a number of applications of the Regularity Lemma, including the Hardcore Lemma, the Dense Model Theorem, and the equivalence of conditional pseudo-min-entropy and unpredictability. For example, we show that every boolean function (regardless of its hardness) has a small collection of disjoint hardcore sets, where the sizes of those hardcore sets are related to how balanced the function is on corresponding pieces of an efficient partition of the domain.

ComplexityTheoretic ArXiv 2023.pdf
Olagoke, Lukman, Salil Vadhan, and Seth Neel. “Black-box training data identification in GANs via detector networks.” October (2023). ArXiv VersionAbstract
Since their inception Generative Adversarial Networks (GANs) have been popular generative models across images, audio, video, and tabular data. In this paper we study whether given access to a trained GAN, as well as fresh samples from the underlying distribution, if it is possible for an attacker to efficiently identify if a given point is a member of the GAN's training data. This is of interest for both reasons related to copyright, where a user may want to determine if their copyrighted data has been used to train a GAN, and in the study of data privacy, where the ability to detect training set membership is known as a membership inference attack. Unlike the majority of prior work this paper investigates the privacy implications of using GANs in black-box settings, where the attack only has access to samples from the generator, rather than access to the discriminator as well. We introduce a suite of membership inference attacks against GANs in the black-box setting and evaluate our attacks on image GANs trained on the CIFAR10 dataset and tabular GANs trained on genomic data. Our most successful attack, called The Detector, involve training a second network to score samples based on their likelihood of being generated by the GAN, as opposed to a fresh sample from the distribution. We prove under a simple model of the generator that the detector is an approximately optimal membership inference attack. Across a wide range of tabular and image datasets, attacks, and GAN architectures, we find that adversaries can orchestrate non-trivial privacy attacks when provided with access to samples from the generator. At the same time, the attack success achievable against GANs still appears to be lower compared to other generative and discriminative models; this leaves the intriguing open question of whether GANs are in fact more private, or if it is a matter of developing stronger attacks.
BlackBox - ArXiv 2023.pdf
Haney, Samuel, Michael Shoemate, Grace Tian, Salil P. Vadhan, Andrew Vyrros, Vicki Xu, and Wanrong Zhang. “Concurrent composition for interactive differential privacy with adaptive privacy-loss parameters.” Weizhi Meng, Christian Damsgaard Jensen, Cas Cremers, and Engin Kirda, eds., Proceedings of the 30th ACM SIGAC Conference on Computer and Communications Security (CCS '23). ACM, 2023. Publisher's VersionAbstract

In this paper, we study the concurrent composition of interactive mechanisms with adaptively chosen privacy-loss parameters. In this setting, the adversary can interleave queries to existing interac tive mechanisms, as well as create new ones. We prove that every valid privacy filter and odometer for noninteractive mechanisms extends to the concurrent composition of interactive mechanisms if privacy loss is measured using (ε, δ)-DP, f -DP, or Rényi DP of fixed order. Our results offer strong theoretical foundations for enabling full adaptivity in composing differentially private interactive mechanisms, showing that concurrency does not affect the privacy guarantees. We also provide an implementation for users to deploy
in practice.

 

 

History: Received a CCS '23 Distinguished Paper Award. Preliminary version presented as a poster at TPDP '23 and posted as: https://arxiv.org/abs/2309.05901.

CCS 2023.pdf
Lee, Chin Ho, Edward Pyne, and Salil Vadhan. “On the power of regular and permutation branching programs.Nicole Megow and Adam D. Smith, eds., RANDOM '23: Proceedings of the International Conference on Randomization and Computation. Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl Leibniz-Zentrum für Informatik, 2023. Posted As:Abstract

We give new upper and lower bounds on the power of several restricted classes of arbitrary-order read-once branching programs (ROBPs) and standard-order ROBPs (SOBPs) that have received significant attention in the literature on pseudorandomness for space-bounded computation.

Regular SOBPs of length n and width w(n+1)2  can exactly simulate general SOBPs of length n and width w, and moreover an n2−o(n)  blow-up in width is necessary for such a simulation.

Our result extends and simplifies prior average-case simulations (Reingold, Trevisan, and Vadhan (STOC 2006), Bogdanov, Hoza, Prakriya, and Pyne (CCC 2022)), in particular implying that weighted pseudorandom generators (Braverman, Cohen, and Garg (SICOMP 2020)) for regular SOBPs of width poly(n) or larger automatically extend to general SOBPs. Furthermore, our simulation also extends to general (even read-many) oblivious branching programs.

There exist natural functions computable by regular SOBPs of constant width that are average-case hard for permutation SOBPs of exponential width. Indeed, we show that Inner-Product mod 2 is average-case hard for arbitrary-order permutation ROBPs of exponential width.

There exist functions computable by constant-width arbitrary-order permutation ROBPs that are worst-case hard for exponential-width SOBPs.

Read-twice permutation branching programs of subexponential width can simulate polynomial-width arbitrary-order ROBPs.

ECCC 23.pdf
Savi, Merveille Koissi, Akash Yadav, Wanrong Zhang, Navin Vembar, Andrew Schroeder, Satchit Balsari, Caroline O. Buckee, Salil Vadhan, and Nishant Kishore. “A standardised differential privacy framework for epidemiological modelling with mobile phone data.” PLOS Digital Health (2023). Publisher's VersionAbstract

During the COVID-19 pandemic, the use of mobile phone data for monitoring human mobility patterns has become increasingly common, both to study the impact of travel restrictions on population movement and epidemiological modelling. Despite the importance of these data, the use of location information to guide public policy can raise issues of privacy and ethical use. Studies have shown that simple aggregation does not protect the privacy of an individual, and there are no universal standards for aggregation that guarantee anonymity. Newer methods, such as differential privacy, can provide statistically verifiable protection against identifiability but have been largely untested as inputs for compartment models used in infectious disease epidemiology. Our study examines the application of differential privacy as an anonymisation tool in epidemiological models, studying the impact of adding quantifiable statistical noise to mobile phone-based location data on the bias of ten common epidemiological metrics. We find that many epidemiological metrics are preserved and remain close to their non-private values when the true noise state is less than 20, in a count transition matrix, which corresponds to a privacy-less parameter ∈ = 0.05 per release. We show that differential privacy offers a robust approach to preserving individual privacy in mobility data while providing useful population-level insights for public health. Importantly, we have built a modular software pipeline to facilitate the replication and expansion of our framework.

MedArXiv version: https://www.medrxiv.org/content/10.1101/2023.03.16.23287382v1

MedArXiv 2023.pdf PLOS 2023.pdf
Ahmadinejad, AmirMahdi, John Peebles, Edward Pyne, and Aaron Sidford. “Singular value approximation and sparsifying random walks on directed graphs.” 64th IEEE Annual Symposium on Foundations of Computer Science (FOCS '23). IEEE, 2023. Publisher's VersionAbstract

In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SV-approximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs (Spielman Teng STOC 2004), standard approximation for directed graphs (Cohen et. al. STOC 2017), and unit-circle approximation for directed graphs (Ahmadinejad et. al. FOCS 2020). Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e.g., it is preserved under products of random-walk matrices and bounded matrices.


We provide a nearly linear-time algorithm for SV-sparsifying (and hence UC-sparsifying) Eulerian directed graphs, as well as ℓ-step random walks on such graphs, for any ℓ≤poly(n). Combined with the Eulerian scaling algorithms of (Cohen et. al. FOCS 2018), given an arbitrary (not necessarily Eulerian) directed graph and a set S of vertices, we can approximate the stationary probability mass of the (S,Sc) cut in an ℓ-step random walk to within a multiplicative error of 1/polylog(n) and an additive error of 1/poly(n) in nearly linear time. As a starting point for these results, we provide a simple black-box reduction from SV-sparsifying Eulerian directed graphs to SV-sparsifying undirected graphs; such a directed-to-undirected reduction was not known for previous notions of spectral approximation.

Preliminary version posted as: "Singular value approximation and reducing directed to undirected graph sparsification": https://arxiv.org/abs/2301.13541

arxiv_2023.pdf FOCS 2023.pdf
Sarathy, Jayshree, Sophia Song, Audrey Haque, Tania Schlatter, and Salil Vadhan. “Don't look at the data! How differential privacy reconfigures practices of data science.CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, 1-19. Publisher's VersionAbstract
Version History: 
Also presented as a poster at the 8th Workshop on the Theory and Practice of Differential Privacy (TPDP '22) – July 2022.
 
ArXiv 2023.pdf CHI 2023.pdf
Vadhan, Salil, and Wanrong Zhang. “Concurrent composition theorems for differential privacy.” STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing. STOC, 2023. Publisher's VersionAbstract

We study the concurrent composition properties of interactive differentially private mechanisms, whereby an adversary can arbitrarily interleave its queries to the different mechanisms. We prove that all composition theorems for non-interactive differentially private mechanisms extend to the concurrent composition of interactive differentially private mechanisms, whenever differential privacy is measured using the hypothesis testing framework of f-DP, which captures standard (,δ)-DP as a special case. We prove the concurrent composition theorem by showing that every interactive f-DP mechanism can be simulated by interactive post-processing of a non-interactive f-DP mechanism.

In concurrent and independent work, Lyu (NeurIPS ‘22) proves a similar result to ours for (,δ)-DP, as well as a concurrent composition theorem for Rényi DP. We also provide a simple proof of Lyu’s concurrent composition theorem for Rényi DP. Lyu leaves the general case of f-DP as an open problem, which we solve in this paper.

Version History: Preliminary versions published as: https://arxiv.org/abs/2207.08335 and presented as a poster at TPDP '22.

ArXiv 2022.pdf STOC 2023.pdf
Awan, Jordan, and Salil Vadhan. “Canonical noise and private hypothesis tests.Annals of Statistics 51, no. 2 (2023): 547-572. Publisher's VersionAbstract
Version/ Presentation History: Presented at CCS '21 workshop on Privacy-Preserving Machine Learning (PPML '21) and NeurIPS '21 workshop on Privacy in Machine Learning (PriML '21), November 2021. Paper posted as https://arxiv.org/abs/2108.04303
ArXiv 2021.pdf AOS 2023.pdf
Alabi, Daniel, and Salil Vadhan. “Differentially private hypothesis testing for linear regression.” Journal of Machine Learning Research 24, no. 361 (2023): 1-50. Publisher's VersionAbstract

Version History: Preliminary versions in NeurIPS '22, posted as arXiv:2206.14449 and presented at TPDP ‘21 (poster), IMS ‘22 (oral), and SEA ‘22 (oral). (Previously published as "Hypothesis testing for differentially private linear regression".

 

Abstract:

In this work, we design differentially private hypothesis tests for the following problems in the general linear model: testing a linear relationship and testing for the presence of mixtures. The majority of our hypothesis tests are based on differentially private versions of the $F$-statistic for the general linear model framework, which are uniformly most powerful unbiased in the non-private setting. We also present other tests for these problems, one of which is based on the differentially private nonparametric tests of Couch, Kazan, Shi, Bray, and Groce (CCS 2019), which is especially suited for the small dataset regime. We show that the differentially private $F$-statistic converges to the asymptotic distribution of its non-private counterpart. As a corollary, the statistical power of the differentially private $F$-statistic converges to the statistical power of the non-private $F$-statistic. Through a suite of Monte Carlo based experiments, we show that our tests achieve desired significance levels and have a high power that approaches the power of the non-private tests as we increase sample sizes or the privacy-loss parameter. We also show when our tests outperform existing methods in the literature.

ArXiv 2022.pdf NeurIPS 2022.pdf JMLR 2023.pdf
2022
Alabi, Daniel, Audra McMillan, Jayshree Sarathy, Adam Smith, and Salil Vadhan. “Differentially private simple linear regression.” Proceedings on 23rd Privacy Enhancing Technologies Symposium (PoPETS ‘22), 2022, 2022 (2), 184–204. Publisher's VersionAbstract

Version History: Preliminary version posted as arXiv:2007.05157 [cs.LG] and presented as a poster at TPDP ‘20.

Abstract: Forthcoming.

PoPETS 2022.pdf
Casacuberta, Sílvia, Michael Shoemate, Salil Vadhan, and Connor Wagaman. “Widespread underestimation of sensitivity in differentially private libraries and how to fix it.” Proceedings of the 29th ACM SIGSAC Conference on Computer and Communications Security (CCS '22), 2022. Publisher's VersionAbstract
Version History: Preprint version posted as arXiv:2207.10635 [cs.CR].
ArXiv 2022.pdf CCS 2022.pdf
Lee, Chin Ho, Edward Pyne, and Salil Vadhan. “Fourier growth of regular branching programs.” Proceedings of the International Conference on Randomization and Computation (RANDOM '22). Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl Leibniz-Zentrum für Informatik, 2022. Publisher's VersionAbstract

Version History: Preliminary version posted as https://eccc.weizmann.ac.il/report/2022/034/.

Abstract: Forthcoming.

RANDOM 22.pdf
Pyne, Edward, and Salil Vadhan. “Deterministic approximation of random walks via queries in graphs of unbounded size.Proceedings of the SIAM Symposium on Simplicity in Algorithms (SOSA '22), 2022, 57-67. Publisher's VersionAbstract

Version History: Preliminary version posted as arXiv:2111.01997 [cs.CC].

Abstract: Forthcoming.

SOSA 22.pdf
Gong, Roubin, Erica L. Groshen, and Salil Vadhan, ed.Special Issue on Differential Privacy for the 2020 U.S. Census: Can we Make Data Both Private and Useful?Harvard Data Science Review , 2022, Special Issue, 2. Publisher's VersionAbstract
Note: Prof. Vadhan is also the author of the editor's foreword to this issue.
Golowich, Louis, and Salil Vadhan. “Pseudorandomness of expander random walks for symmetric functions and permutation branching programs.” Proceedings of the 37th Computational Complexity Conference (CCC '22) . Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl Leibniz-Zentrum für Informatik, 2022. Publisher's VersionAbstract

Version History: Full version (25 June 2022): https://eccc.weizmann.ac.il/report/2022/024/

Abstract: We study the pseudorandomness of random walks on expander graphs against tests computed by symmetric functions and permutation branching programs. These questions are motivated by applications of expander walks in the coding theory and derandomization literatures. A line of prior work has shown that random walks on expanders with second largest eigenvalue λ fool symmetric functions up to a O(λ) error in total variation distance, but only for the case where the vertices are labeled with symbols from a binary alphabet, and with a suboptimal dependence on the bias of the labeling. We generalize these results to labelings with an arbitrary alphabet, and for the case of binary labelings we achieve an optimal dependence on the labeling bias. We extend our analysis to unify it with and strengthen the expander-walk Chernoff bound. We then show that expander walks fool permutation branching programs up to a O(λ) error in ℓ2-distance, and we prove that much stronger bounds hold for programs with a certain structure. We also prove lower bounds to show that our results are tight. To prove our results for symmetric functions, we analyze the Fourier coefficients of the relevant distributions using linear-algebraic techniques. Our analysis for permutation branching programs is likewise linear-algebraic in nature, but also makes use of the recently introduced singular-value approximation notion for matrices (Ahmadinejad et al. 2021).

CCC 2022.pdf
2021
Pyne, Edward, and Salil Vadhan. “Limitations of the Impagliazzo–Nisan–Wigderson pseudorandom generator against permutation branching programs.Proceedings of the 27th International Computing and Combinatorics Conference (COCOON '21). Lecture Notes in Computer Science (Springer), 2021. Publisher's VersionAbstract

Version History: Preliminary version posted as ECCC TR21-108. Invited to Algorithmica Special Issue on COCOON '21.

Abstract: Forthcoming

Sarathy, Jayshree, and Salil Vadhan. “Analyzing the differentially private Theil-Sen estimator for simple linear regression.” Poster at the 7th Workshop on the Theory and Practice of Differential Privacy (TPDP '21) – July 2021, 2021. ArXiv Version ArXiv 2022.pdf

Pages