OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN

Liu Chen, Meysam Asgari, Hiroko H. Dodge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model's block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction. To ensure the reproducibility of our experiments, we leverage Librispeech-100-clean as training data to simulate the limited data condition. We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. Our optimized architecture outperforms the vanilla architecture about 1.8% absolute word error rate (WER) on dev-clean and 1.4% on test-clean.

Original languageEnglish (US)
Title of host publication2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7112-7116
Number of pages5
ISBN (Electronic)9781665405409
DOIs
StatePublished - 2022
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: May 23 2022May 27 2022

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2022-May
ISSN (Print)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period5/23/225/27/22

Keywords

  • Transformer
  • architecture optimization
  • attention pattern
  • automatic speech recognition
  • self-supervise learning

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN'. Together they form a unique fingerprint.

Cite this