Vision-Language Navigation: A Survey and Taxonomy

Wu, Wansen; Chang, Tao; Li, Xinmeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.11544 (cs)

[Submitted on 26 Aug 2021 (v1), last revised 2 Apr 2022 (this version, v3)]

Title:Vision-Language Navigation: A Survey and Taxonomy

Authors:Wansen Wu, Tao Chang, Xinmeng Li

View PDF

Abstract:Vision-Language Navigation (VLN) tasks require an agent to follow human language instructions to navigate in previously unseen environments. This challenging field involving problems in natural language processing, computer vision, robotics, etc., has spawn many excellent works focusing on various VLN tasks. This paper provides a comprehensive survey and an insightful taxonomy of these tasks based on the different characteristics of language instructions in these tasks. Depending on whether the navigation instructions are given for once or multiple times, this paper divides the tasks into two categories, i.e., single-turn and multi-turn tasks. For single-turn tasks, we further subdivide them into goal-oriented and route-oriented based on whether the instructions designate a single goal location or specify a sequence of multiple locations. For multi-turn tasks, we subdivide them into passive and interactive tasks based on whether the agent is allowed to question the instruction or not. These tasks require different capabilities of the agent and entail various model designs. We identify progress made on the tasks and look into the limitations of existing VLN models and task settings. Finally, we discuss several open issues of VLN and point out some opportunities in the future, i.e., incorporating knowledge with VLN models and implementing them in the real physical world.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2108.11544 [cs.CV]
	(or arXiv:2108.11544v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.11544

Submission history

From: Wansen Wu [view email]
[v1] Thu, 26 Aug 2021 01:51:18 UTC (36,864 KB)
[v2] Wed, 1 Sep 2021 01:05:29 UTC (8,927 KB)
[v3] Sat, 2 Apr 2022 02:12:14 UTC (4,619 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Navigation: A Survey and Taxonomy

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Navigation: A Survey and Taxonomy

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators