On the Estimation Bias in Double Q-Learning

Ren, Zhizhou; Zhu, Guangxiang; Hu, Hao; Han, Beining; Chen, Jianglun; Zhang, Chongjie

Computer Science > Machine Learning

arXiv:2109.14419 (cs)

[Submitted on 29 Sep 2021 (v1), last revised 14 Jan 2022 (this version, v3)]

Title:On the Estimation Bias in Double Q-Learning

Authors:Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang

View PDF

Abstract:Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance. However, as shown by prior work, double Q-learning is not fully unbiased and suffers from underestimation bias. In this paper, we show that such underestimation bias may lead to multiple non-optimal fixed points under an approximate Bellman operator. To address the concerns of converging to non-optimal stationary solutions, we propose a simple but effective approach as a partial fix for the underestimation bias in double Q-learning. This approach leverages an approximate dynamic programming to bound the target value. We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms.

Comments:	Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2109.14419 [cs.LG]
	(or arXiv:2109.14419v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.14419

Submission history

From: Zhizhou Ren [view email]
[v1] Wed, 29 Sep 2021 13:41:24 UTC (1,042 KB)
[v2] Thu, 16 Dec 2021 03:51:56 UTC (1,110 KB)
[v3] Fri, 14 Jan 2022 05:42:42 UTC (1,110 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zhizhou Ren
Guangxiang Zhu
Hao Hu
Chongjie Zhang

export BibTeX citation

Computer Science > Machine Learning

Title:On the Estimation Bias in Double Q-Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Estimation Bias in Double Q-Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators