Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning

Wei, Ning; Liang, Jiahua; Xie, Di; Pu, Shiliang

Computer Science > Machine Learning

arXiv:2109.02332 (cs)

[Submitted on 6 Sep 2021]

Title:Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning

Authors:Ning Wei, Jiahua Liang, Di Xie, Shiliang Pu

View PDF

Abstract:Designing optimal reward functions has been desired but extremely difficult in reinforcement learning (RL). When it comes to modern complex tasks, sophisticated reward functions are widely used to simplify policy learning yet even a tiny adjustment on them is expensive to evaluate due to the drastically increasing cost of training. To this end, we propose a hindsight reward tweaking approach by designing a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space. We simply extend the input observation with a condition vector linearly correlated with the effective environment reward parameters and train the model in a conventional manner except for randomizing reward configurations, obtaining a hyper-policy whose characteristics are sensitively regulated over the condition space. We demonstrate the feasibility of this approach and study one of its potential application in policy performance boosting with multiple MuJoCo tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2109.02332 [cs.LG]
	(or arXiv:2109.02332v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.02332

Submission history

From: Ning Wei [view email]
[v1] Mon, 6 Sep 2021 10:06:48 UTC (1,358 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ning Wei
Di Xie
Shiliang Pu

export BibTeX citation

Computer Science > Machine Learning

Title:Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators