The JHU submission to VoxSRC-21: Track 3

Cho, Jejin; Villalba, Jesus; Dehak, Najim

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2109.13425 (eess)

[Submitted on 28 Sep 2021]

Title:The JHU submission to VoxSRC-21: Track 3

Authors:Jejin Cho, Jesus Villalba, Najim Dehak

View PDF

Abstract:This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed). Our overall training process is similar to the proposed one from the first place team in the last year's VoxSRC2020 challenge. The main difference is a recently proposed non-contrastive self-supervised method in computer vision (CV), distillation with no labels (DINO), is used to train our initial model, which outperformed the last year's contrastive learning based on momentum contrast (MoCo). Also, this requires only a few iterations in the iterative clustering stage, where pseudo labels for supervised embedding learning are updated based on the clusters of the embeddings generated from a model that is continually fine-tuned over iterations. In the final stage, Res2Net50 is trained on the final pseudo labels from the iterative clustering stage. This is our best submitted model to the challenge, showing 1.89, 6.50, and 6.89 in EER(%) in voxceleb1 test o, VoxSRC-21 validation, and test trials, respectively.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2109.13425 [eess.AS]
	(or arXiv:2109.13425v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2109.13425

Submission history

From: Jaejin Cho [view email]
[v1] Tue, 28 Sep 2021 01:30:10 UTC (26 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:The JHU submission to VoxSRC-21: Track 3

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:The JHU submission to VoxSRC-21: Track 3

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators