OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN

Liu Chen; Meysam Asgari; Hiroko H. Dodge

doi:10.1109/ICASSP43922.2022.9747831

OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN

Liu Chen, Meysam Asgari, Hiroko H. Dodge

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model's block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction. To ensure the reproducibility of our experiments, we leverage Librispeech-100-clean as training data to simulate the limited data condition. We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. Our optimized architecture outperforms the vanilla architecture about 1.8% absolute word error rate (WER) on dev-clean and 1.4% on test-clean.

Original language	English (US)
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	7112-7116
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9747831
State	Published - 2022
Event	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: May 23 2022 → May 27 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Virtual, Online
Period	5/23/22 → 5/27/22

Keywords

Transformer
architecture optimization
attention pattern
automatic speech recognition
self-supervise learning

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP43922.2022.9747831

Cite this

Chen, L., Asgari, M., & Dodge, H. H. (2022). OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 7112-7116). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9747831

OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN. / Chen, Liu; Asgari, Meysam; Dodge, Hiroko H.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 7112-7116 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Chen, L, Asgari, M & Dodge, HH 2022, OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 7112-7116, 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, Virtual, Online, Singapore, 5/23/22. https://doi.org/10.1109/ICASSP43922.2022.9747831

Chen L, Asgari M, Dodge HH. OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 7112-7116. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9747831

Chen, Liu ; Asgari, Meysam ; Dodge, Hiroko H. / OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 7112-7116 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{680d3c550bd6443eab6b4e89fab54491,

title = "OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN",

abstract = "Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model's block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction. To ensure the reproducibility of our experiments, we leverage Librispeech-100-clean as training data to simulate the limited data condition. We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. Our optimized architecture outperforms the vanilla architecture about 1.8% absolute word error rate (WER) on dev-clean and 1.4% on test-clean.",

keywords = "Transformer, architecture optimization, attention pattern, automatic speech recognition, self-supervise learning",

author = "Liu Chen and Meysam Asgari and Dodge, {Hiroko H.}",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; Conference date: 23-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9747831",

language = "English (US)",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7112--7116",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

TY - GEN

T1 - OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN

AU - Chen, Liu

AU - Asgari, Meysam

AU - Dodge, Hiroko H.

PY - 2022

Y1 - 2022

N2 - Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model's block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction. To ensure the reproducibility of our experiments, we leverage Librispeech-100-clean as training data to simulate the limited data condition. We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. Our optimized architecture outperforms the vanilla architecture about 1.8% absolute word error rate (WER) on dev-clean and 1.4% on test-clean.

AB - Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model's block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction. To ensure the reproducibility of our experiments, we leverage Librispeech-100-clean as training data to simulate the limited data condition. We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. Our optimized architecture outperforms the vanilla architecture about 1.8% absolute word error rate (WER) on dev-clean and 1.4% on test-clean.

KW - Transformer

KW - architecture optimization

KW - attention pattern

KW - automatic speech recognition

KW - self-supervise learning

UR - http://www.scopus.com/inward/record.url?scp=85134082793&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85134082793&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9747831

DO - 10.1109/ICASSP43922.2022.9747831

M3 - Conference contribution

AN - SCOPUS:85134082793

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 7112

EP - 7116

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022

Y2 - 23 May 2022 through 27 May 2022

ER -

OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this