A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

Lin, Liwei; Kong, Qiuqiang; Jiang, Junyan; Xia, Gus

Computer Science > Sound

arXiv:2108.03456 (cs)

[Submitted on 7 Aug 2021]

Title:A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

Authors:Liwei Lin, Qiuqiang Kong, Junyan Jiang, Gus Xia

View PDF

Abstract:We propose a unified model for three inter-related tasks: 1) to \textit{separate} individual sound sources from a mixed music audio, 2) to \textit{transcribe} each sound source to MIDI notes, and 3) to\textit{ synthesize} new pieces based on the timbre of separated sources. The model is inspired by the fact that when humans listen to music, our minds can not only separate the sounds of different instruments, but also at the same time perceive high-level representations such as score and timbre. To mirror such capability computationally, we designed a pitch-timbre disentanglement module based on a popular encoder-decoder neural architecture for source separation. The key inductive biases are vector-quantization for pitch representation and pitch-transformation invariant for timbre representation. In addition, we adopted a query-by-example method to achieve \textit{zero-shot} learning, i.e., the model is capable of doing source separation, transcription, and synthesis for \textit{unseen} instruments. The current design focuses on audio mixtures of two monophonic instruments. Experimental results show that our model outperforms existing multi-task baselines, and the transcribed score serves as a powerful auxiliary for separation tasks.

Comments:	Accepted by ISMIR2021
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2108.03456 [cs.SD]
	(or arXiv:2108.03456v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2108.03456

Submission history

From: Liwei Lin [view email]
[v1] Sat, 7 Aug 2021 14:28:21 UTC (4,146 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2021-08

Change to browse by:

cs
cs.AI
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Liwei Lin
Qiuqiang Kong
Junyan Jiang
Gus Xia

export BibTeX citation

Computer Science > Sound

Title:A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators