Improving ASR systems for children with autism and language impairment using domain-focused DNN transfer techniques

Robert Gale; Liu Chen; Jill Dolata; Jan Van Santen; Meysam Asgari

doi:10.21437/Interspeech.2019-3161

Improving ASR systems for children with autism and language impairment using domain-focused DNN transfer techniques

Robert Gale, Liu Chen, Jill Dolata, Jan Van Santen, Meysam Asgari

Institute on Development and Disability

Research output: Contribution to journal › Conference article › peer-review

22 Scopus citations

Abstract

This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3^rd grade. We find that 2^nd grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1^st, 2^nd, and 3^rd grade, we again compare each grade as well as pairs of grades. We find the combination of 1^st and 2^nd grade performs best at a 26.21% WER.

Original language	English (US)
Pages (from-to)	11-15
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-3161
State	Published - 2019
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: Sep 15 2019 → Sep 19 2019

Keywords

Autism spectrum disorder
Children speech recognition
Deep neural network
Language impairment
Speech recognition
Transfer learning

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modeling and Simulation

Access to Document

10.21437/Interspeech.2019-3161

Cite this

Improving ASR systems for children with autism and language impairment using domain-focused DNN transfer techniques. / Gale, Robert; Chen, Liu; Dolata, Jill et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2019-September, 2019, p. 11-15.

Research output: Contribution to journal › Conference article › peer-review

@article{2cde72a7b1ec4d1a9cc425a3f9cddb18,

title = "Improving ASR systems for children with autism and language impairment using domain-focused DNN transfer techniques",

abstract = "This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3rd grade. We find that 2nd grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1st, 2nd, and 3rd grade, we again compare each grade as well as pairs of grades. We find the combination of 1st and 2nd grade performs best at a 26.21% WER.",

keywords = "Autism spectrum disorder, Children speech recognition, Deep neural network, Language impairment, Speech recognition, Transfer learning",

author = "Robert Gale and Liu Chen and Jill Dolata and {Van Santen}, Jan and Meysam Asgari",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-3161",

language = "English (US)",

volume = "2019-September",

pages = "11--15",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Improving ASR systems for children with autism and language impairment using domain-focused DNN transfer techniques

AU - Gale, Robert

AU - Chen, Liu

AU - Dolata, Jill

AU - Van Santen, Jan

AU - Asgari, Meysam

PY - 2019

Y1 - 2019

N2 - This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3rd grade. We find that 2nd grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1st, 2nd, and 3rd grade, we again compare each grade as well as pairs of grades. We find the combination of 1st and 2nd grade performs best at a 26.21% WER.

AB - This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3rd grade. We find that 2nd grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1st, 2nd, and 3rd grade, we again compare each grade as well as pairs of grades. We find the combination of 1st and 2nd grade performs best at a 26.21% WER.

KW - Autism spectrum disorder

KW - Children speech recognition

KW - Deep neural network

KW - Language impairment

KW - Speech recognition

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85074712046&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074712046&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-3161

DO - 10.21437/Interspeech.2019-3161

M3 - Conference article

AN - SCOPUS:85074712046

SN - 2308-457X

VL - 2019-September

SP - 11

EP - 15

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Improving ASR systems for children with autism and language impairment using domain-focused DNN transfer techniques

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this