Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

Dang, Chen; Randrianarivo, Hicham; Fournier-S'Niehotta, Raphaël; Audebert, Nicolas

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.11629 (cs)

[Submitted on 26 Aug 2021]

Title:Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

Authors:Chen Dang (QR), Hicham Randrianarivo (QR), Raphaël Fournier-S'Niehotta (CNAM, CEDRIC - VERTIGO), Nicolas Audebert (CNAM, CEDRIC - VERTIGO)

View PDF

Abstract:Web Image Context Extraction (WICE) consists in obtaining the textual information describing an image using the content of the surrounding webpage. A common preprocessing step before performing WICE is to render the content of the webpage. When done at a large scale (e.g., for search engine indexation), it may become very computationally costly (up to several seconds per page). To avoid this cost, we introduce a novel WICE approach that combines Graph Neural Networks (GNNs) and Natural Language Processing models. Our method relies on a graph model containing both node types and text as features. The model is fed through several blocks of GNNs to extract the textual context. Since no labeled WICE dataset with ground truth exists, we train and evaluate the GNNs on a proxy task that consists in finding the semantically closest text to the image caption. We then interpret importance weights to find the most relevant text nodes and define them as the image context. Thanks to GNNs, our model is able to encode both structural and semantic information from the webpage. We show that our approach gives promising results to help address the large-scale WICE problem using only HTML data.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Image and Video Processing (eess.IV)
Cite as:	arXiv:2108.11629 [cs.CV]
	(or arXiv:2108.11629v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.11629
Journal reference:	GEM: Graph Embedding and Mining - ECML/PKDD Workshops, Sep 2021, Bilbao, Spain

Submission history

From: Nicolas Audebert [view email] [via CCSD proxy]
[v1] Thu, 26 Aug 2021 07:49:28 UTC (2,843 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators