Compressed Empirical Measures (in finite dimensions)

Grünewälder, Steffen

Statistics > Machine Learning

arXiv:2204.08847 (stat)

[Submitted on 19 Apr 2022 (v1), last revised 27 Aug 2024 (this version, v3)]

Title:Compressed Empirical Measures (in finite dimensions)

Authors:Steffen Grünewälder

View PDF

Abstract:We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs). In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions and in various settings: we show how conditions on the density of the data and the kernel function can be used to infer such lower bounds; we further develop an approach that uses a lower bound on the smallest eigenvalue of a covariance operator to provide lower bounds on the size of such a ball; we extend the approach to approximate covariance operators and we show how it can be used in the context of kernel ridge regression. We also derive compression guarantees when standard algorithms like the conditional gradient method are used and we discuss variations of such algorithms to improve the runtime of these standard algorithms. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:2204.08847 [stat.ML]
	(or arXiv:2204.08847v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2204.08847

Submission history

From: Steffen Grünewälder [view email]
[v1] Tue, 19 Apr 2022 12:25:41 UTC (111 KB)
[v2] Mon, 29 May 2023 15:50:44 UTC (168 KB)
[v3] Tue, 27 Aug 2024 18:32:12 UTC (191 KB)

Statistics > Machine Learning

Title:Compressed Empirical Measures (in finite dimensions)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Compressed Empirical Measures (in finite dimensions)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators