Abstract
Building a high quality automatic speech recognition (ASR) system with limited training data has been a challenging task particularly for a narrow target population. Open-sourced ASR systems, trained on sufficient data from adults, are susceptible on seniors’ speech due to acoustic mismatch between adults and seniors. With 12 hours of training data, we attempt to develop an ASR system for socially isolated seniors (80+ years old) with possible cognitive impairments. We experimentally identify that ASR for the adult population performs poorly on our target population and transfer learning (TL) can boost the system’s performance. Standing on the fundamental idea of TL, tuning model parameters, we further improve the system by leveraging an attention mechanism to utilize the model’s intermediate information. Our approach achieves 1.58% absolute improvements over the TL model.
Original language | English (US) |
---|---|
Pages (from-to) | 7003-7007 |
Number of pages | 5 |
Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Volume | 2021-June |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada Duration: Jun 6 2021 → Jun 11 2021 |
Keywords
- Attention mechanism
- Automatic speech recognition
- Senior population
- Small training data
- Transfer learning
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering