Multirate ASR models for phone-class dependent N-best list rescoring

Venkata R. Gadde; Kemal Sönmez; Horacio Franco

doi:10.1109/ASRU.2005.1566513

Multirate ASR models for phone-class dependent N-best list rescoring

Venkata R. Gadde, Kemal Sönmez, Horacio Franco

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Speech comprises a variety of acoustical phenomena occurring at differing rates. Fixed-rate ASR systems assume in effect a constant temporal rate of information flow via incorporating uniform statistics in proportion to a sound's duration. The usual tradeoff window length of 25-30 milliseconds represents a time-frequency resolution compromise, which aims to allow reasonable speed for following changes in the spectral trajectories and sufficient number of samples to estimate the harmonic structure. In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution in order to improve performance. The main idea is to use the hypotheses generated by a fixed-rate recognizer to determine the appropriate model rate for a segment of the speech waveform. This is realized through a technique based on rescoring of N-best lists with acoustical models using different temporal windows by a phone-dependent posterior-like score. We report results on the NIST Evaluation 2002 dataset, and demonstrate that the rescoring method produces word error rate (WER) improvements in a baseline system.

Original language	English (US)
Title of host publication	Proceedings of ASRU 2005
Subtitle of host publication	2005 IEEE Automatic Speech Recognition and Understanding Workshop
Publisher	IEEE Computer Society
Pages	157-161
Number of pages	5
ISBN (Print)	0780394798, 9780780394797
DOIs	https://doi.org/10.1109/ASRU.2005.1566513
State	Published - 2005
Externally published	Yes
Event	ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop - Cancun, Mexico Duration: Nov 27 2005 → Dec 1 2005

Publication series

Name	Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Volume	2005

Other

Other	ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Country/Territory	Mexico
City	Cancun
Period	11/27/05 → 12/1/05

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/ASRU.2005.1566513

Cite this

Gadde, V. R., Sönmez, K., & Franco, H. (2005). Multirate ASR models for phone-class dependent N-best list rescoring. In Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop (pp. 157-161). Article 1566513 (Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop; Vol. 2005). IEEE Computer Society. https://doi.org/10.1109/ASRU.2005.1566513

Multirate ASR models for phone-class dependent N-best list rescoring. / Gadde, Venkata R.; Sönmez, Kemal; Franco, Horacio.
Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop. IEEE Computer Society, 2005. p. 157-161 1566513 (Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop; Vol. 2005).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gadde, VR, Sönmez, K & Franco, H 2005, Multirate ASR models for phone-class dependent N-best list rescoring. in Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop., 1566513, Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop, vol. 2005, IEEE Computer Society, pp. 157-161, ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop, Cancun, Mexico, 11/27/05. https://doi.org/10.1109/ASRU.2005.1566513

Gadde VR, Sönmez K, Franco H. Multirate ASR models for phone-class dependent N-best list rescoring. In Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop. IEEE Computer Society. 2005. p. 157-161. 1566513. (Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop). doi: 10.1109/ASRU.2005.1566513

@inproceedings{9a9bec0dc5ff4d63bb053d1aa46b72ff,

title = "Multirate ASR models for phone-class dependent N-best list rescoring",

abstract = "Speech comprises a variety of acoustical phenomena occurring at differing rates. Fixed-rate ASR systems assume in effect a constant temporal rate of information flow via incorporating uniform statistics in proportion to a sound's duration. The usual tradeoff window length of 25-30 milliseconds represents a time-frequency resolution compromise, which aims to allow reasonable speed for following changes in the spectral trajectories and sufficient number of samples to estimate the harmonic structure. In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution in order to improve performance. The main idea is to use the hypotheses generated by a fixed-rate recognizer to determine the appropriate model rate for a segment of the speech waveform. This is realized through a technique based on rescoring of N-best lists with acoustical models using different temporal windows by a phone-dependent posterior-like score. We report results on the NIST Evaluation 2002 dataset, and demonstrate that the rescoring method produces word error rate (WER) improvements in a baseline system.",

author = "Gadde, {Venkata R.} and Kemal S{\"o}nmez and Horacio Franco",

year = "2005",

doi = "10.1109/ASRU.2005.1566513",

language = "English (US)",

isbn = "0780394798",

series = "Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop",

publisher = "IEEE Computer Society",

pages = "157--161",

booktitle = "Proceedings of ASRU 2005",

note = "ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop ; Conference date: 27-11-2005 Through 01-12-2005",

}

TY - GEN

T1 - Multirate ASR models for phone-class dependent N-best list rescoring

AU - Gadde, Venkata R.

AU - Sönmez, Kemal

AU - Franco, Horacio

PY - 2005

Y1 - 2005

N2 - Speech comprises a variety of acoustical phenomena occurring at differing rates. Fixed-rate ASR systems assume in effect a constant temporal rate of information flow via incorporating uniform statistics in proportion to a sound's duration. The usual tradeoff window length of 25-30 milliseconds represents a time-frequency resolution compromise, which aims to allow reasonable speed for following changes in the spectral trajectories and sufficient number of samples to estimate the harmonic structure. In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution in order to improve performance. The main idea is to use the hypotheses generated by a fixed-rate recognizer to determine the appropriate model rate for a segment of the speech waveform. This is realized through a technique based on rescoring of N-best lists with acoustical models using different temporal windows by a phone-dependent posterior-like score. We report results on the NIST Evaluation 2002 dataset, and demonstrate that the rescoring method produces word error rate (WER) improvements in a baseline system.

AB - Speech comprises a variety of acoustical phenomena occurring at differing rates. Fixed-rate ASR systems assume in effect a constant temporal rate of information flow via incorporating uniform statistics in proportion to a sound's duration. The usual tradeoff window length of 25-30 milliseconds represents a time-frequency resolution compromise, which aims to allow reasonable speed for following changes in the spectral trajectories and sufficient number of samples to estimate the harmonic structure. In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution in order to improve performance. The main idea is to use the hypotheses generated by a fixed-rate recognizer to determine the appropriate model rate for a segment of the speech waveform. This is realized through a technique based on rescoring of N-best lists with acoustical models using different temporal windows by a phone-dependent posterior-like score. We report results on the NIST Evaluation 2002 dataset, and demonstrate that the rescoring method produces word error rate (WER) improvements in a baseline system.

UR - http://www.scopus.com/inward/record.url?scp=33846247945&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846247945&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2005.1566513

DO - 10.1109/ASRU.2005.1566513

M3 - Conference contribution

AN - SCOPUS:33846247945

SN - 0780394798

SN - 9780780394797

T3 - Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop

SP - 157

EP - 161

BT - Proceedings of ASRU 2005

PB - IEEE Computer Society

T2 - ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop

Y2 - 27 November 2005 through 1 December 2005

ER -

Multirate ASR models for phone-class dependent N-best list rescoring

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this