Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions

Bennouna, Amine; Van Parys, Bart P. G.

Statistics > Machine Learning

arXiv:2109.06911 (stat)

[Submitted on 14 Sep 2021 (v1), last revised 11 Mar 2024 (this version, v3)]

Title:Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions

Authors:Amine Bennouna, Bart P.G. Van Parys

View PDF

Abstract:We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
Cite as:	arXiv:2109.06911 [stat.ML]
	(or arXiv:2109.06911v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2109.06911

Submission history

From: Mohammed Amine Bennouna [view email]
[v1] Tue, 14 Sep 2021 18:20:15 UTC (134 KB)
[v2] Tue, 27 Sep 2022 17:40:19 UTC (138 KB)
[v3] Mon, 11 Mar 2024 21:28:38 UTC (150 KB)

Statistics > Machine Learning

Title:Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators