Is it Time to Replace CNNs with Transformers for Medical Images?

Matsoukas, Christos; Haslum, Johan Fredin; Söderberg, Magnus; Smith, Kevin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.09038 (cs)

[Submitted on 20 Aug 2021]

Title:Is it Time to Replace CNNs with Transformers for Medical Images?

Authors:Christos Matsoukas, Johan Fredin Haslum, Magnus Söderberg, Kevin Smith

View PDF

Abstract:Convolutional Neural Networks (CNNs) have reigned for a decade as the de facto approach to automated medical image diagnosis. Recently, vision transformers (ViTs) have appeared as a competitive alternative to CNNs, yielding similar levels of performance while possessing several interesting properties that could prove beneficial for medical imaging tasks. In this work, we explore whether it is time to move to transformer-based models or if we should keep working with CNNs - can we trivially switch to transformers? If so, what are the advantages and drawbacks of switching to ViTs for medical image diagnosis? We consider these questions in a series of experiments on three mainstream medical image datasets. Our findings show that, while CNNs perform better when trained from scratch, off-the-shelf vision transformers using default hyperparameters are on par with CNNs when pretrained on ImageNet, and outperform their CNN counterparts when pretrained using self-supervision.

Comments:	Originally published at the ICCV 2021 Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2108.09038 [cs.CV]
	(or arXiv:2108.09038v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.09038

Submission history

From: Christos Matsoukas [view email]
[v1] Fri, 20 Aug 2021 08:01:19 UTC (1,378 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Is it Time to Replace CNNs with Transformers for Medical Images?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Is it Time to Replace CNNs with Transformers for Medical Images?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators