JournalsmslVol. 3, No. 3/4pp. 259–313

The all-or-nothing phenomenon in sparse linear regression

  • Galen Reeves

    Duke University, Durham, USA
  • Jiaming Xu

    Duke University, Durham, USA
  • Ilias Zadik

    Massachusetts Institute of Technology, Cambridge, USA
The all-or-nothing phenomenon in sparse linear regression cover
Download PDF

A subscription is required to access this article.


We study the problem of recovering a hidden binary kk-sparse pp-dimensional vector β\beta from nn noisy linear observations Y=Xβ+WY=X\beta+W, where XijX_{ij} are i.i.d. N(0,1)\mathcal{N}(0,1) and WiW_i are i.i.d. N(0,σ2)\mathcal{N}(0,\sigma^2). A closely related hypothesis testing problem is to distinguish the pair (X,Y)(X,Y) generated from this structured model from a corresponding null model where (X,Y)(X,Y) consist of purely independent Gaussian entries. In the low sparsity k=o(p)k=o(\sqrt{p}) and high signal-to-noise ratio k/σ2k/\sigma^2 \to \infty regime, we establish an “all-or-nothing” information-theoretic phase transition at a critical sample size n=2klog(p/k)/log(1+k/σ2)n^{\ast}=2 k\log (p/k) /\log (1+k/\sigma^2), resolving a conjecture of Gamarnik and Zadik (2017). Specifically, we show that if lim infpn/n>1\liminf_{p\to \infty} n/n^{\ast}>1, then the maximum likelihood estimator almost perfectly recovers the hidden vector with high probability and moreover the true hypothesis can be detected with a vanishing error probability. Conversely, if lim suppn/n<1\limsup_{p\to \infty} n/n^{\ast}<1, then it becomes information-theoretically impossible even to recover an arbitrarily small but fixed fraction of the hidden vector support, or to test hypotheses strictly better than random guess.

Our proof of the impossibility result builds upon two key techniques, which could be of independent interest. First, we use a conditional second moment method to upper bound the Kullback–Leibler (KL) divergence between the structured and the null model. Second, inspired by the celebrated area theorem, we establish a lower bound to the minimum mean squared estimation error of the hidden vector in terms of the KL divergence between the two models.

An extended abstract version of this work appeared at the Proceedings of the Conference on Learning Theory (COLT), 2019.

Cite this article

Galen Reeves, Jiaming Xu, Ilias Zadik, The all-or-nothing phenomenon in sparse linear regression. Math. Stat. Learn. 3 (2020), no. 3, pp. 259–313

DOI 10.4171/MSL/22