Publications by Year: 2014

Gopalan, Parikshit, Salil Vadhan, and Yuan Zhou. “Locally testable codes and Cayley graphs.” In In Moni Naor, editor, Innovations in Theoretical Computer Science (ITCS ‘14), 81-92. ACM, 2014. Publisher's VersionAbstract

Version History: Full version posted as

We give two new characterizations of (\(\mathbb{F}_2\)-linear) locally testable error-correcting codes in terms of Cayley graphs over \(\mathbb{F}^h_2\) :

  1. A locally testable code is equivalent to a Cayley graph over \(\mathbb{F}^h_2\) whose set of generators is significantly larger than \(h\) and has no short linear dependencies, but yields a shortest-path metric that embeds into \(\ell_1\) with constant distortion. This extends and gives a converse to a result of Khot and Naor (2006), which showed that codes with large dual distance imply Cayley graphs that have no low-distortion embeddings into \(\ell_1\).

  2. A locally testable code is equivalent to a Cayley graph over \(\mathbb{F}^h_2\) that has significantly more than \(h\) eigenvalues near 1, which have no short linear dependencies among them and which “explain” all of the large eigenvalues. This extends and gives a converse to a recent construction of Barak et al. (2012), which showed that locally testable codes imply Cayley graphs that are small-set expanders but have many large eigenvalues.

ITCS2014.pdf ArXiv2018.pdf
Nissim, Kobbi, Salil Vadhan, and David Xiao. “Redrawing the boundaries on purchasing data from privacy-sensitive individuals.” In Moni Naor, editor, Innovations in Theoretical Computer Science (ITCS ‘14), 411-422. ACM, 2014. Publisher's VersionAbstract

Version History: Full version posted as arXiv:1401.4092 [cs.GT].


We prove new positive and negative results concerning the existence of truthful and individually rational mechanisms for purchasing private data from individuals with unbounded and sensitive privacy preferences. We strengthen the impos- sibility results of Ghosh and Roth (EC 2011) by extending it to a much wider class of privacy valuations. In particular, these include privacy valuations that are based on \((\varepsilon, \delta)\)- differentially private mechanisms for non-zero \(\delta\), ones where the privacy costs are measured in a per-database manner (rather than taking the worst case), and ones that do not depend on the payments made to players (which might not be observable to an adversary).

To bypass this impossibility result, we study a natural special setting where individuals have monotonic privacy valuations, which captures common contexts where certain values for private data are expected to lead to higher valuations for privacy (e.g. having a particular disease). We give new mechanisms that are individually rational for all players with monotonic privacy valuations, truthful for all players whose privacy valuations are not too large, and accu- rate if there are not too many players with too-large privacy valuations. We also prove matching lower bounds showing that in some respects our mechanism cannot be improved significantly.

ITCS2014.pdf ArXiv2018.pdf
Wood, Alexandra, David O'Brien, Micah Altman, Alan Karr, Urs Gasser, Michael Bar-Sinai, Kobbi Nissim, Jonathan Ullman, Salil Vadhan, and Michael Wojcik. “Integrating approaches to privacy across the research lifecycle: Long-term longitudinal studies.” Berkman Center Research Publication No. 2014-12, 2014, July. Publisher's VersionAbstract

Version History: Available at SSRN:


On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data. 

This workshop report, the first in a series, provides an overview of the long-term longitudinal study use case. Long-term longitudinal studies collect, at multiple points over a long period of time, highly-specific and often sensitive data describing the health, socioeconomic, or behavioral characteristics of human subjects. The value of such studies lies in part in their ability to link a set of behaviors and changes to each individual, but these factors tend to make the combination of observable characteristics associated with each subject unique and potentially identifiable. 

Using the research information lifecycle as a framework, this report discusses the defining features of long-term longitudinal studies and the associated challenges for researchers tasked with collecting and analyzing such data while protecting the privacy of human subjects. It also describes the disclosure risks and common legal and technical approaches currently used to manage confidentiality in longitudinal data. Finally, it identifies urgent problems and areas for future research to advance the integration of various methods for preserving confidentiality in research data.