Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Lu, Miao; Min, Yifei; Wang, Zhaoran; Yang, Zhuoran

Computer Science > Machine Learning

arXiv:2205.13589 (cs)

[Submitted on 26 May 2022 (v1), last revised 1 Apr 2024 (this version, v3)]

Title:Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Authors:Miao Lu, Yifei Min, Zhaoran Wang, Zhuoran Yang

View PDF HTML (experimental)

Abstract:We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the \underline{P}roxy variable \underline{P}essimistic \underline{P}olicy \underline{O}ptimization (\texttt{P3O}) algorithm, which addresses the confounding bias and the distributional shift between the optimal and behavior policies in the context of general function approximation. At the core of \texttt{P3O} is a coupled sequence of pessimistic confidence regions constructed via proximal causal inference, which is formulated as minimax estimation. Under a partial coverage assumption on the confounded dataset, we prove that \texttt{P3O} achieves a $n^{-1/2}$-suboptimality, where $n$ is the number of trajectories in the dataset. To our best knowledge, \texttt{P3O} is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.

Comments:	Updates. 52 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:2205.13589 [cs.LG]
	(or arXiv:2205.13589v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.13589

Submission history

From: Miao Lu [view email]
[v1] Thu, 26 May 2022 19:13:55 UTC (95 KB)
[v2] Mon, 17 Apr 2023 01:04:56 UTC (108 KB)
[v3] Mon, 1 Apr 2024 04:49:15 UTC (76 KB)

Computer Science > Machine Learning

Title:Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators