Does DQN Learn?

Gopalan, Aditya; Thoppe, Gugan

Computer Science > Machine Learning

arXiv:2205.13617 (cs)

[Submitted on 26 May 2022 (v1), last revised 21 Sep 2024 (this version, v4)]

Title:Does DQN Learn?

Authors:Aditya Gopalan, Gugan Thoppe

View PDF HTML (experimental)

Abstract:For a reinforcement learning method to be useful, the policy it estimates in the limit must be superior to the initial guess, at least on average. In this work, we show that the widely used Deep Q-Network (DQN) fails to meet even this basic criterion, even when it gets to see all possible states and actions infinitely often (a condition that ensures tabular Q-learning's convergence to the optimal Q-value). Our work's key highlights are as follows. First, we numerically show that DQN generally has a non-trivial probability of producing a policy worse than the initial one. Second, we give a theoretical explanation for this behavior in the context of linear DQN, wherein we replace the neural network with a linear function approximation but retain DQN's other key ideas, such as experience replay, target network, and $\epsilon$-greedy exploration. Our main result is that the tail behaviors of linear DQN are governed by invariant sets of a deterministic differential inclusion, a set-valued generalization of a differential equation. Notably, we show that these invariant sets need not align with locally optimal policies, thus explaining DQN's pathological behaviors, such as convergence to sub-optimal policies and policy oscillation. We also provide a scenario where the limiting policy is always the worst. Our work addresses a longstanding gap in understanding the behaviors of Q-learning with function approximation and $\epsilon$-greedy exploration.

Comments:	20 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
MSC classes:	93E35, 68Q32
ACM classes:	I.2.0
Cite as:	arXiv:2205.13617 [cs.LG]
	(or arXiv:2205.13617v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.13617

Submission history

From: Gugan Thoppe [view email]
[v1] Thu, 26 May 2022 20:46:01 UTC (703 KB)
[v2] Sun, 4 Dec 2022 15:43:14 UTC (1,914 KB)
[v3] Fri, 10 Feb 2023 23:03:02 UTC (4,286 KB)
[v4] Sat, 21 Sep 2024 04:58:24 UTC (1,424 KB)

Computer Science > Machine Learning

Title:Does DQN Learn?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Does DQN Learn?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators