Cross-token Modeling with Conditional Computation

Lou, Yuxuan; Xue, Fuzhao; Zheng, Zangwei; You, Yang

Computer Science > Machine Learning

arXiv:2109.02008 (cs)

[Submitted on 5 Sep 2021 (v1), last revised 14 Jan 2022 (this version, v3)]

Title:Cross-token Modeling with Conditional Computation

Authors:Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

View PDF

Abstract:Mixture-of-Experts (MoE), a conditional computation architecture, achieved promising performance by scaling local module (i.e. feed-forward network) of transformer. However, scaling the cross-token module (i.e. self-attention) is challenging due to the unstable training. This work proposes Sparse-MLP, an all-MLP model which applies sparsely-activated MLPs to cross-token modeling. Specifically, in each Sparse block of our all-MLP model, we apply two stages of MoE layers: one with MLP experts mixing information within channels along image patch dimension, the other with MLP experts mixing information within patches along the channel dimension. In addition, by proposing importance-score routing strategy for MoE and redesigning the image representation shape, we further improve our model's computational efficiency. Experimentally, we are more computation-efficient than Vision Transformers with comparable accuracy. Also, our models can outperform MLP-Mixer by 2.5\% on ImageNet Top-1 accuracy with fewer parameters and computational cost. On downstream tasks, i.e. Cifar10 and Cifar100, our models can still achieve better performance than baselines.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2109.02008 [cs.LG]
	(or arXiv:2109.02008v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.02008

Submission history

From: Yuxuan Lou [view email]
[v1] Sun, 5 Sep 2021 06:43:08 UTC (4,504 KB)
[v2] Wed, 8 Sep 2021 20:10:22 UTC (4,540 KB)
[v3] Fri, 14 Jan 2022 08:06:11 UTC (1,003 KB)

Computer Science > Machine Learning

Title:Cross-token Modeling with Conditional Computation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Cross-token Modeling with Conditional Computation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators