Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Dutta, Subhabrata; Gautam, Tanya; Chakrabarti, Soumen; Chakraborty, Tanmoy

Computer Science > Machine Learning

arXiv:2109.15142 (cs)

[Submitted on 30 Sep 2021 (v1), last revised 27 Oct 2021 (this version, v3)]

Title:Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Authors:Subhabrata Dutta, Tanya Gautam, Soumen Chakrabarti, Tanmoy Chakraborty

View PDF

Abstract:The Transformer and its variants have been proven to be efficient sequence learners in many different domains. Despite their staggering success, a critical issue has been the enormous number of parameters that must be trained (ranging from $10^7$ to $10^{11}$) along with the quadratic complexity of dot-product attention. In this work, we investigate the problem of approximating the two central components of the Transformer -- multi-head self-attention and point-wise feed-forward transformation, with reduced parameter space and computational complexity. We build upon recent developments in analyzing deep neural networks as numerical solvers of ordinary differential equations. Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme, TransEvolve, to bypass costly dot-product attention over multiple stacked layers. We perform exhaustive experiments with TransEvolve on well-known encoder-decoder as well as encoder-only tasks. We observe that the degree of approximation (or inversely, the degree of parameter reduction) has different effects on the performance, depending on the task. While in the encoder-decoder regime, TransEvolve delivers performances comparable to the original Transformer, in encoder-only tasks it consistently outperforms Transformer along with several subsequent variants.

Comments:	NeurIPS 2021 (spotlight)
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2109.15142 [cs.LG]
	(or arXiv:2109.15142v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.15142

Submission history

From: Subhabrata Dutta [view email]
[v1] Thu, 30 Sep 2021 14:01:06 UTC (139 KB)
[v2] Sun, 3 Oct 2021 07:21:07 UTC (139 KB)
[v3] Wed, 27 Oct 2021 07:33:47 UTC (140 KB)

Computer Science > Machine Learning

Title:Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators