Characterizing the Distinguishability of Product Distributions through Multicalibration

Publication information:

Marcussen, Cassandra, Aaron Putterman, and Salil P. Vadhan. “Characterizing the Distinguishability of Product Distributions through Multicalibration”. In 40th Computational Complexity Conference (CCC ’25). Toronto, Canada: Srikanth Srinivasan, ed., Proceedings of the 40th Computational Complexity Conference (CCC ’25), vol. 330 of LIPIcs, pages 19:1-19:19. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2025. https://doi.org/10.4230/LIPIcs.CCC.2025.19.

Abstract

Version History:

Full version posted as arXiv:2412.03562 [cs.CR].
Earlier ArXiv versions include February 2024 (v1); February 2025 (v2); March 2025 (v3), and July 2025 (v4).

Abstract: Given a sequence of samples x₁,...,x_k promised to be drawn from one of two distributions X₀, X₁, a well-studied problem in statistics is to decide which distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given k samples is captured by the total variation distance between X₀^⊗k and X₁^⊗k. However, when we restrict our attention to efficient distinguishers (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish X₀^⊗k and X₁^⊗k is more involved and less understood.

In this work, we give a general way to reduce bounds on the computational indistinguishability of X0 and X1 to bounds on the information-theoretic indistinguishability of some specific, related variables e X0 and e X1. As a consequence, we prove a new, tight characterization of the number of samples k needed to efficiently distinguish X⊗k 0 and X⊗k 1 with constant advantage as k =Θ d−2 H e X0, e X1 , which is the inverse of the squared Hellinger distance dH between two distributions e X0 and e X1 that are computationally indistinguishable from X0 and X1. Likewise, our framework can be used to re-derive a result of Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions. At the heart of our work is the use of the Multicalibration Theorem (H´ebert-Johnson, Kim, Reingold, Rothblum 2018) in a way inspired by recent work of Casacuberta, Dwork, and Vadhan (STOC 2024). Multicalibration allows us to relate the computational indistinguishability of X0,X1 to the statistical indistinguishability of e X0, e X1 (for lower bounds on k) and construct explicit circuits to distinguish between e X0, e X1 and consequently X0,X1 (for upper bounds on k).