OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Hoshino, Hana; Ota, Kei; Kanezaki, Asako; Yokota, Rio

Computer Science > Machine Learning

arXiv:2109.04307 (cs)

[Submitted on 9 Sep 2021 (v1), last revised 22 May 2022 (this version, v2)]

Title:OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Authors:Hana Hoshino, Kei Ota, Asako Kanezaki, Rio Yokota

View PDF

Abstract:Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

Comments:	ICRA2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2109.04307 [cs.LG]
	(or arXiv:2109.04307v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.04307

Submission history

From: Hana Hoshino [view email]
[v1] Thu, 9 Sep 2021 14:32:26 UTC (6,509 KB)
[v2] Sun, 22 May 2022 14:28:47 UTC (7,221 KB)

Computer Science > Machine Learning

Title:OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators