Data Summarization via Bilevel Optimization

Borsos, Zalán; Mutný, Mojmír; Tagliasacchi, Marco; Krause, Andreas

Computer Science > Machine Learning

arXiv:2109.12534 (cs)

[Submitted on 26 Sep 2021]

Title:Data Summarization via Bilevel Optimization

Authors:Zalán Borsos, Mojmír Mutný, Marco Tagliasacchi, Andreas Krause

View PDF

Abstract:The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a simple yet powerful approach is to operate on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model-specific and are limited to simple models such as linear regression, logistic regression, and $k$-means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model-specific adaptations and applies to any twice differentiable model, including neural networks. We show the effectiveness of our framework for a wide range of models in various settings, including training non-convex models online and batch active learning.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2109.12534 [cs.LG]
	(or arXiv:2109.12534v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.12534

Submission history

From: Zalán Borsos [view email]
[v1] Sun, 26 Sep 2021 09:08:38 UTC (4,350 KB)

Computer Science > Machine Learning

Title:Data Summarization via Bilevel Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Data Summarization via Bilevel Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators