Who's Waldo? Linking People Across Text and Images

Cui, Claire Yuqing; Khandelwal, Apoorv; Artzi, Yoav; Snavely, Noah; Averbuch-Elor, Hadar

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.07253 (cs)

[Submitted on 16 Aug 2021 (v1), last revised 17 Aug 2021 (this version, v2)]

Title:Who's Waldo? Linking People Across Text and Images

Authors:Claire Yuqing Cui, Apoorv Khandelwal, Yoav Artzi, Noah Snavely, Hadar Averbuch-Elor

View PDF

Abstract:We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image. In contrast to prior work in visual grounding, which is predominantly object-based, our new task masks out the names of people in captions in order to encourage methods trained on such image-caption pairs to focus on contextual cues (such as rich interactions between multiple people), rather than learning associations between names and appearances. To facilitate this task, we introduce a new dataset, Who's Waldo, mined automatically from image-caption data on Wikimedia Commons. We propose a Transformer-based method that outperforms several strong baselines on this task, and are releasing our data to the research community to spur work on contextual models that consider both vision and language.

Comments:	Published in ICCV 2021 (Oral). Project webpage: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2108.07253 [cs.CV]
	(or arXiv:2108.07253v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.07253

Submission history

From: Apoorv Khandelwal [view email]
[v1] Mon, 16 Aug 2021 17:36:49 UTC (12,919 KB)
[v2] Tue, 17 Aug 2021 15:55:12 UTC (4,828 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-08

Change to browse by:

cs
cs.CL
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Who's Waldo? Linking People Across Text and Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Who's Waldo? Linking People Across Text and Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators