High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR

Banerjee, Sourav; Agarwal, Ayushi; Ghosh, Promila

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2412.00055 (eess)

[Submitted on 24 Nov 2024]

Title:High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR

Authors:Sourav Banerjee, Ayushi Agarwal, Promila Ghosh

View PDF HTML (experimental)

Abstract:Automatic Speech Recognition (ASR) systems in the clinical domain face significant challenges, notably the need to recognise specialised medical vocabulary accurately and meet stringent precision requirements. We introduce United-MedASR, a novel architecture that addresses these challenges by integrating synthetic data generation, precision ASR fine-tuning, and advanced semantic enhancement techniques. United-MedASR constructs a specialised medical vocabulary by synthesising data from authoritative sources such as ICD-10 (International Classification of Diseases, 10th Revision), MIMS (Monthly Index of Medical Specialties), and FDA databases. This enriched vocabulary helps finetune the Whisper ASR model to better cater to clinical needs. To enhance processing speed, we incorporate Faster Whisper, ensuring streamlined and high-speed ASR performance. Additionally, we employ a customised BART-based semantic enhancer to handle intricate medical terminology, thereby increasing accuracy efficiently. Our layered approach establishes new benchmarks in ASR performance, achieving a Word Error Rate (WER) of 0.985% on LibriSpeech test-clean, 0.26% on Europarl-ASR EN Guest-test, and demonstrating robust performance on Tedlium (0.29% WER) and FLEURS (0.336% WER). Furthermore, we present an adaptable architecture that can be replicated across different domains, making it a versatile solution for domain-specific ASR systems.

Comments:	15 pages
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2412.00055 [eess.AS]
	(or arXiv:2412.00055v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2412.00055

Submission history

From: Sourav Banerjee [view email]
[v1] Sun, 24 Nov 2024 17:02:48 UTC (1,326 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators