Audiomer: A Convolutional Transformer For Keyword Spotting

Sahu, Surya Kant; Mitheran, Sai; Kamdar, Juhi; Gandhi, Meet

Computer Science > Machine Learning

arXiv:2109.10252 (cs)

This paper has been withdrawn by Surya Kant Sahu

[Submitted on 21 Sep 2021 (v1), last revised 1 Feb 2022 (this version, v4)]

Title:Audiomer: A Convolutional Transformer For Keyword Spotting

Authors:Surya Kant Sahu, Sai Mitheran, Juhi Kamdar, Meet Gandhi

No PDF available, click to view other formats

Abstract:Transformers have seen an unprecedented rise in Natural Language Processing and Computer Vision tasks. However, in audio tasks, they are either infeasible to train due to extremely large sequence length of audio waveforms or incur a performance penalty when trained on Fourier-based features. In this work, we introduce an architecture, Audiomer, where we combine 1D Residual Networks with Performer Attention to achieve state-of-the-art performance in keyword spotting with raw audio waveforms, outperforming all previous methods while being computationally cheaper and parameter-efficient. Additionally, our model has practical advantages for speech processing, such as inference on arbitrarily long audio clips owing to the absence of positional encoding. The code is available at this https URL.

Comments:	The results and claims made are incorrect due to data leakage and an erroneous split of datasets
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2109.10252 [cs.LG]
	(or arXiv:2109.10252v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.10252

Submission history

From: Surya Kant Sahu [view email]
[v1] Tue, 21 Sep 2021 15:28:41 UTC (198 KB)
[v2] Tue, 7 Dec 2021 00:17:07 UTC (250 KB)
[v3] Fri, 7 Jan 2022 06:11:48 UTC (253 KB)
[v4] Tue, 1 Feb 2022 09:32:15 UTC (1 KB) (withdrawn)

Computer Science > Machine Learning

Title:Audiomer: A Convolutional Transformer For Keyword Spotting

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Audiomer: A Convolutional Transformer For Keyword Spotting

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators