Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments

Kargar, Eshagh; Kyrki, Ville

Computer Science > Machine Learning

arXiv:2109.06514 (cs)

[Submitted on 14 Sep 2021]

Title:Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments

Authors:Eshagh Kargar, Ville Kyrki

View PDF

Abstract:Driving in a complex urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Cite as:	arXiv:2109.06514 [cs.LG]
	(or arXiv:2109.06514v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.06514

Submission history

From: Eshagh Kargar [view email]
[v1] Tue, 14 Sep 2021 08:18:47 UTC (1,910 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.AI
cs.MA
cs.RO

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ville Kyrki

export BibTeX citation

Computer Science > Machine Learning

Title:Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators