Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

Kuchibhotla, Arun Kumar; Chakrabortty, Abhishek

doi:10.1093/imaiai/iaac012

Mathematics > Statistics Theory

arXiv:1804.02605 (math)

[Submitted on 8 Apr 2018 (v1), last revised 10 May 2022 (this version, v4)]

Title:Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

Authors:Arun Kumar Kuchibhotla, Abhishek Chakrabortty

View PDF

Abstract:Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much more general exponential type (namely sub-Weibull) tail assumptions. These results extract a part sub-Gaussian tail behavior in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that typifies such tail behaviors.
We illustrate the usefulness of these inequalities through the analysis of four fundamental problems in HD statistics. In the first two problems, we study the rate of convergence of the sample covariance matrix in terms of the maximum elementwise norm and the maximum k-sub-matrix operator norm which are key quantities of interest in bootstrap, HD covariance matrix estimation and HD inference. The third example concerns the restricted eigenvalue condition, required in HD linear regression, which we verify for all sub-Weibull random vectors through a unified analysis, and also prove a more general result related to restricted strong convexity in the process. In the final example, we consider the Lasso estimator for linear regression and establish its rate of convergence under much weaker than usual tail assumptions (on the errors as well as the covariates), while also allowing for misspecified models and both fixed and random design. To our knowledge, these are the first such results for Lasso obtained in this generality. The common feature in all our results over all the examples is that the convergence rates under most exponential tails match the usual ones under sub-Gaussian assumptions.

Comments:	68 pages; Revised version; To appear in Information and Inference: A Journal of the IMA
Subjects:	Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
MSC classes:	60G50, 62J05, 60B20, 62J07, 62E17, 60F05, 60E15
Cite as:	arXiv:1804.02605 [math.ST]
	(or arXiv:1804.02605v4 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.1804.02605
Journal reference:	Information and Inference: A Journal of the IMA (2022), Vol. 11, No. 4, 1389-1456
Related DOI:	https://doi.org/10.1093/imaiai/iaac012

Submission history

From: Abhishek Chakrabortty [view email]
[v1] Sun, 8 Apr 2018 00:27:45 UTC (73 KB)
[v2] Fri, 29 Jun 2018 01:40:10 UTC (73 KB)
[v3] Wed, 5 Aug 2020 20:56:42 UTC (82 KB)
[v4] Tue, 10 May 2022 02:27:31 UTC (89 KB)

Mathematics > Statistics Theory

Title:Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators