**Version History: **Original version released as a Working Paper for the May 2020 OpenDP Community Meeting (version attached as MAY 2020.pdf, and accessible online at https://projects.iq.harvard.edu/files/opendp/files/opendp_programming_framework_11may2020_1_01.pdf).

**Talks: **View a talk on this paper presented by Marco Gaboardi and Michael Hay at the 2020 OpenDP Community Meeting.

Subsequently presented as a poster at TPDP 2020 (attached as TPDP2020.pdf).

In this working paper, we propose a programming framework for the library of differentially private algorithms that will be at the core of the OpenDP open-source software project, and recommend programming languages in which to implement the framework.

%B 6th Workshop on the Theory and Practice of Differential Privacy (TPDP 2020) %G eng %U https://projects.iq.harvard.edu/files/opendp/files/opendp_programming_framework_11may2020_1_01.pdf %0 Manuscript %D 2020 %T The OpenDP White Paper %A Team OpenDP %X- View a talk on this paper presented at the 2020 OpenDP Community Meeting
- View a talk on this paper presented at TPDP 2020

OpenDP is a community effort to build a trustworthy suite of open-source tools for enabling privacy-protective analysis of sensitive personal data, focused on a library of algorithms for generating differentially private statistical releases. The target use cases for OpenDP are to enable government, industry, and academic institutions to safely and confidently share sensitive data to support scientifically oriented research and exploration in the public interest. We aim for OpenDP to flexibly grow with the rapidly advancing science of differential privacy, and be a pathway to bring the newest algorithmic developments to a wide array of practitioners.

OpenDP is led by Faculty Directors Gary King and Salil Vadhan and an Executive Committee at Harvard University, funded in part by a grant from the Sloan Foundation. Its efforts so far have included implementing a differentially private curator application in collaboration with Microsoft, and developing a framework for a community-driven OpenDP Commons through the work of an Ad Hoc Design Committee including external experts. Going forward, the project plans to engage with a wide community of stakeholders, establish partnerships with a wide variety of groups from industry, academia, and government, and adopt increasing levels of community governance.

%G eng %U https://projects.iq.harvard.edu/files/opendp/files/opendp_white_paper_11may2020.pdf %0 Journal Article %J arXiv: 2010.05586 [cs.CR] %D 2020 %T Inaccessible entropy I: Inaccessible entropy generators and statistically hiding commitments from one-way functions %A Iftach Haitner %A Omer Reingold %A Salil Vadhan %A Hoeteck Wee %X **Version History: **Full version of part of an STOC 2009 paper.

We put forth a new computational notion of entropy, measuring the (in)feasibility of sampling high-entropy strings that are consistent with a given generator. Specifically, the \(i\)'th output block of a generator \(\mathsf{G}\) has* accessible entropy* at most \(k\) if the following holds: when conditioning on its prior coin tosses, no polynomial-time strategy \(\mathsf{\widetilde{G}}\) can generate valid output for \(\mathsf{G}\)'s \(i \)'th output block with entropy greater than \(k\). A generator has *inaccessible entropy* if the total accessible entropy (summed over the blocks) is noticeably smaller than the real entropy of \(\mathsf{G}\)'s output.

As an application of the above notion, we improve upon the result of Haitner, Nguyen, Ong, Reingold, and Vadhan [Sicomp '09], presenting a much simpler and more efficient construction of statistically hiding commitment schemes from arbitrary one-way functions.

%B arXiv: 2010.05586 [cs.CR] %G eng %U https://arxiv.org/abs/2010.05586 %0 Journal Article %J arXiv: 2007.05157 [cs.LG] %D 2020 %T Differentially private simple linear regression %A Daniel Alabi %A Audra McMillan %A Jayshree Sarathy %A Adam Smith %A Salil Vadhan %X Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm's output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases. %B arXiv: 2007.05157 [cs.LG] %G eng %U https://arxiv.org/abs/2007.05157 %0 Conference Paper %B 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020) %D 2020 %T Spectral sparsification via bounded-independence sampling %A Dean Doron %A Jack Murtagh %A Salil Vadhan %A David Zuckerman %X- v1, 26 Feb 2020: https://arxiv.org/abs/2002.11237
- Full version published as ECCC TR20-026.
- View a YouTube recording of John Peebles' talk on this paper, recorded at FOCS 2020.

We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph \(G\) on \(n\) vertices described by a binary string of length \(N\), an integer \(k \leq \log n \) and an error parameter \(\varepsilon > 0\), our algorithm runs in space \(\tilde{O}(k \log(N ^. w_{max}/w_{min}))\) where \(w_{max}\) and \(w_{min}\) are the maximum and minimum edge weights in \(G\), and produces a weighted graph \(H\) with \(\tilde{O}(n^{1+2/k} / \varepsilon^2)\)expected edges that spectrally approximates \(G\), in the sense of Spielmen and Teng [ST04], up to an error of \(\varepsilon\).

Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastava's effective resistance based edge sampling algorithm [SS08] and uses results from recent work on space-bounded Laplacian solvers [MRSV17]. In particular, we demonstrate an inherent tradeoff (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by \(k\) above, and the resulting sparsity that can be achieved.

%B 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020) %I Leibniz International Proceedings in Informatics (LIPIcs), Schloss-Dagstuhl-Leibniz-Zentrum für Informatik %V 168 %P 39:1-39:21 %G eng %U https://drops.dagstuhl.de/opus/volltexte/2020/12446/ %0 Conference Proceedings %B 61st Annual IEEE Symposium on the Foundations of Computer Science (FOCS 2020) %D 2020 %T High-precision estimation of random walks in small space %A AmirMahdi Ahmadinejad %A Kelner, Jonathan %A Jack Murtagh %A John Peebles %A Aaron Sidford %A Salil Vadhan %X arXiv version (2019): http://arxiv.org/abs/1912.04524

Published version (2020): https://ieeexplore.ieee.org/document/9317933

In this paper, we provide a deterministic \(\tilde{O}(\log N)\)-space algorithm for estimating the random walk probabilities on Eulerian directed graphs (and thus also undirected graphs) to within inverse polynomial additive error \((ϵ = 1/\mathrm{poly}(N)) \) where \(N\) is the length of the input. Previously, this problem was known to be solvable by a randomized algorithm using space \(O (\log N)\) (Aleliunas et al., FOCS '79) and by a deterministic algorithm using space \(O (\log^{3/2} N)\) (Saks and Zhou, FOCS '95 and JCSS '99), both of which held for arbitrary directed graphs but had not been improved even for undirected graphs. We also give improvements on the space complexity of both of these previous algorithms for non-Eulerian directed graphs when the error is negligible \((ϵ=1/N^{ω(1)})\), generalizing what Hoza and Zuckerman (FOCS '18) recently showed for the special case of distinguishing whether a random walk probability is 0 or greater than ϵ.

We achieve these results by giving new reductions between powering Eulerian random-walk matrices and inverting Eulerian Laplacian matrices, providing a new notion of spectral approximation for Eulerian graphs that is preserved under powering, and giving the first deterministic \(\tilde{O}(\log N)\)-space algorithm for inverting Eulerian Laplacian matrices. The latter algorithm builds on the work of Murtagh et al. (FOCS '17) that gave a deterministic \(\tilde{O}(\log N)\)-space algorithm for inverting undirected Laplacian matrices, and the work of Cohen et al. (FOCS '19) that gave a randomized \(\tilde{O} (N)\)-time algorithm for inverting Eulerian Laplacian matrices. A running theme throughout these contributions is an analysis of "cycle-lifted graphs," where we take a graph and "lift" it to a new graph whose adjacency matrix is the tensor product of the original adjacency matrix and a directed cycle (or variants of one).

**Version History: **published earlier in Henri Gilbert, ed., Advances in Cryptology—EUROCRYPT ‘10, Lecture Notes on Computer Science, as "Universal one-way hash functions via inaccessible entropy":

https://link.springer.com/chapter/10.1007/978-3-642-13190-5_31

This paper revisits the construction of Universal One-Way Hash Functions (UOWHFs) from any one-way function due to Rompel (STOC 1990). We give a simpler construction of UOWHFs, which also obtains better efficiency and security. The construction exploits a strong connection to the recently introduced notion of inaccessible entropy (Haitner et al. STOC 2009). With this perspective, we observe that a small tweak of any one-way function \(f\) is already a weak form of a UOWHF: Consider \(F(x', i)\) that outputs the \(i\)-bit long prefix of \(f(x)\). If \(F\) were a UOWHF then given a random \(x\) and \(i\) it would be hard to come up with \(x' \neq x\) such that \(F(x, i) = F(x', i)\). While this may not be the case, we show (rather easily) that it is hard to sample \(x'\) with almost full entropy among all the possible such values of \(x'\). The rest of our construction simply amplifies and exploits this basic property.

With this and other recent works, we have that the constructions of three fundamental cryptographic primitives (Pseudorandom Generators, Statistically Hiding Commitments and UOWHFs) out of one-way functions are to a large extent unified. In particular, all three constructions rely on and manipulate computational notions of entropy in similar ways. Pseudorandom Generators rely on the well-established notion of pseudoentropy, whereas Statistically Hiding Commitments and UOWHFs rely on the newer notion of inaccessible entropy.

%B Theory of Computing %I Springer-Verlag %V 16 %P 1-55 %G eng %U http://theoryofcomputing.org/articles/v016a008/ %N 8 %0 Journal Article %J Journal of Cryptology %D 2020 %T PCPs and the hardness of generating synthetic data %A Jon Ullman %A Salil Vadhan %X **Version History**: Full version posted as ECCC TR10-017.

Published earlier in Yuval Ishai, ed., Proceedings of the 8th IACR Theory of Cryptography Conference (TCC ‘11), Lecture Notes on Computer Science. Springer-Verlag, Publishers: Vol. 5978, pp. 572-587. https://link.springer.com/chapter/10.1007/978-3-642-19571-6_24

Invited to *J. Cryptology* selected papers from TCC 2011.

Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm \(\mathcal{A}\) that takes a database \(D ∈ ({0, 1}^d)^n\) and outputs a “synthetic database” \(\hat{D}\) all of whose two-way marginals are approximately equal to those of \(D\). (A two-way marginal is the fraction of database rows \(x ∈ {0, 1}^d\) with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time \(\mathrm{poly} (n, 2^d)\).

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on probabilistically checkable proofs.

We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in \(D\) satisfying a predicate \(c\) are estimated by applying \(c\) to each row of \(\hat{D}\) and aggregating the results in some way.

%B Journal of Cryptology %I Springer-Verlag %V 33 %P 2078-2112 %G eng %U https://link.springer.com/article/10.1007/s00145-020-09363-y %0 Journal Article %J ACM Transactions on Economics and Computation %D 2020 %T Privacy games %A Yiling Chen %A Or Sheffet %A Salil Vadhan %X **Version History: **

Previously published as: Yiling Chen, Or Sheffet, and Salil Vadhan. Privacy games. In Proceedings of the 10th International Conference on Web and Internet Economics (WINE ‘14), volume 8877 of *Lecture Notes in Computer Science*, pages 371–385. Springer-Verlag, 14–17 December 2014. (WINE Publisher's Version linked here: https://link.springer.com/chapter/10.1007/978-3-319-13129-0_30); PDF attached as WINE2014.

The problem of analyzing the effect of privacy concerns on the behavior of selfish utility-maximizing agents has received much attention lately. Privacy concerns are often modeled by altering the utility functions of agents to consider also their privacy loss. Such privacy aware agents prefer to take a randomized strategy even in very simple games in which non-privacy aware agents play pure strategies. In some cases, the behavior of privacy aware agents follows the framework of Randomized Response, a well-known mechanism that preserves differential privacy.

Our work is aimed at better understanding the behavior of agents in settings where their privacy concerns are explicitly given. We consider a toy setting where agent *A*, in an attempt to discover the secret type of agent *B*, offers *B* a gift that one type of *B* agent likes and the other type dislikes. As opposed to previous works, *B*'s incentive to keep her type a secret isn't the result of "hardwiring" *B*'s utility function to consider privacy, but rather takes the form of a payment between *B* and *A*. We investigate three different types of payment functions and analyze *B*'s behavior in each of the resulting games. As we show, under some payments, *B*'s behavior is very different than the behavior of agents with hardwired privacy concerns and might even be deterministic. Under a different payment we show that *B*'s BNE strategy does fall into the framework of Randomized Response.