Concave Utility Reinforcement Learning with Zero-Constraint Violations

Agarwal, Mridul; Bai, Qinbo; Aggarwal, Vaneet

Computer Science > Machine Learning

arXiv:2109.05439 (cs)

[Submitted on 12 Sep 2021 (v1), last revised 17 Nov 2023 (this version, v3)]

Title:Concave Utility Reinforcement Learning with Zero-Constraint Violations

Authors:Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

View PDF

Abstract:We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. For this, we propose a model-based learning algorithm that also achieves zero constraint violations. Assuming that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures, we solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We use Bellman error-based analysis for tabular infinite-horizon setups which allows analyzing stochastic policies. Combining the Bellman error-based analysis and tighter optimization equation, for $T$ interactions with the environment, we obtain a high-probability regret guarantee for objective which grows as $\Tilde{O}(1/\sqrt{T})$, excluding other factors. The proposed method can be applied for optimistic algorithms to obtain high-probability regret bounds and also be used for posterior sampling algorithms to obtain a loose Bayesian regret bounds but with significant improvement in computational complexity.

Comments:	Transactions on Machine Learning Research, Dec 2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2109.05439 [cs.LG]
	(or arXiv:2109.05439v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.05439
Journal reference:	Transactions on Machine Learning Research, Dec 2022

Submission history

From: Vaneet Aggarwal [view email]
[v1] Sun, 12 Sep 2021 06:13:33 UTC (7,420 KB)
[v2] Mon, 9 May 2022 18:36:14 UTC (7,456 KB)
[v3] Fri, 17 Nov 2023 02:20:40 UTC (6,694 KB)

Computer Science > Machine Learning

Title:Concave Utility Reinforcement Learning with Zero-Constraint Violations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Concave Utility Reinforcement Learning with Zero-Constraint Violations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators