Stability and accuracy in Incremental Speech Recognition

Ethan O. Selfridge; Iker Arizmendi; Peter A. Heeman; Jason D. Williams

Stability and accuracy in Incremental Speech Recognition

Ethan O. Selfridge, Iker Arizmendi, Peter A. Heeman, Jason D. Williams

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

35 Scopus citations

Abstract

Conventional speech recognition approaches usually wait until the user has finished talking before returning a recognition hypothesis. This results in spoken dialogue systems that are unable to react while the user is still speaking. Incremental Speech Recognition (ISR), where partial phrase results are returned during user speech, has been used to create more reactive systems. However, ISR output is unstable and so prone to revision as more speech is decoded. This paper tackles the problem of stability in ISR. We first present a method that increases the stability and accuracy of ISR output, without adding delay. Given that some revisions are unavoidable, we next present a pair of methods for predicting the stability and accuracy of ISR results. Taken together, we believe these approaches give ISR more utility for real spoken dialogue systems.

Original language	English (US)
Title of host publication	Proceedings of the SIGDIAL 2011 Conference
Subtitle of host publication	12th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Pages	110-119
Number of pages	10
State	Published - 2011
Event	12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2011 - Portland, OR, United States Duration: Jun 17 2011 → Jun 18 2011

Publication series

Name	Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Other

Other	12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2011
Country/Territory	United States
City	Portland, OR
Period	6/17/11 → 6/18/11

ASJC Scopus subject areas

Computer Graphics and Computer-Aided Design
Computer Vision and Pattern Recognition
Human-Computer Interaction
Modeling and Simulation

Cite this

Selfridge, E. O., Arizmendi, I., Heeman, P. A., & Williams, J. D. (2011). Stability and accuracy in Incremental Speech Recognition. In Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 110-119). (Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue).

Stability and accuracy in Incremental Speech Recognition. / Selfridge, Ethan O.; Arizmendi, Iker; Heeman, Peter A. et al.
Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2011. p. 110-119 (Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Selfridge, EO, Arizmendi, I, Heeman, PA & Williams, JD 2011, Stability and accuracy in Incremental Speech Recognition. in Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 110-119, 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2011, Portland, OR, United States, 6/17/11.

Selfridge EO, Arizmendi I, Heeman PA, Williams JD. Stability and accuracy in Incremental Speech Recognition. In Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2011. p. 110-119. (Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue).

Selfridge, Ethan O. ; Arizmendi, Iker ; Heeman, Peter A. et al. / Stability and accuracy in Incremental Speech Recognition. Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2011. pp. 110-119 (Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue).

@inproceedings{e843980d716e49a2bbf59560656f5611,

title = "Stability and accuracy in Incremental Speech Recognition",

abstract = "Conventional speech recognition approaches usually wait until the user has finished talking before returning a recognition hypothesis. This results in spoken dialogue systems that are unable to react while the user is still speaking. Incremental Speech Recognition (ISR), where partial phrase results are returned during user speech, has been used to create more reactive systems. However, ISR output is unstable and so prone to revision as more speech is decoded. This paper tackles the problem of stability in ISR. We first present a method that increases the stability and accuracy of ISR output, without adding delay. Given that some revisions are unavoidable, we next present a pair of methods for predicting the stability and accuracy of ISR results. Taken together, we believe these approaches give ISR more utility for real spoken dialogue systems.",

author = "Selfridge, {Ethan O.} and Iker Arizmendi and Heeman, {Peter A.} and Williams, {Jason D.}",

year = "2011",

language = "English (US)",

isbn = "9781937284107",

series = "Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue",

pages = "110--119",

booktitle = "Proceedings of the SIGDIAL 2011 Conference",

note = "12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2011 ; Conference date: 17-06-2011 Through 18-06-2011",

}

TY - GEN

T1 - Stability and accuracy in Incremental Speech Recognition

AU - Selfridge, Ethan O.

AU - Arizmendi, Iker

AU - Heeman, Peter A.

AU - Williams, Jason D.

PY - 2011

Y1 - 2011

N2 - Conventional speech recognition approaches usually wait until the user has finished talking before returning a recognition hypothesis. This results in spoken dialogue systems that are unable to react while the user is still speaking. Incremental Speech Recognition (ISR), where partial phrase results are returned during user speech, has been used to create more reactive systems. However, ISR output is unstable and so prone to revision as more speech is decoded. This paper tackles the problem of stability in ISR. We first present a method that increases the stability and accuracy of ISR output, without adding delay. Given that some revisions are unavoidable, we next present a pair of methods for predicting the stability and accuracy of ISR results. Taken together, we believe these approaches give ISR more utility for real spoken dialogue systems.

AB - Conventional speech recognition approaches usually wait until the user has finished talking before returning a recognition hypothesis. This results in spoken dialogue systems that are unable to react while the user is still speaking. Incremental Speech Recognition (ISR), where partial phrase results are returned during user speech, has been used to create more reactive systems. However, ISR output is unstable and so prone to revision as more speech is decoded. This paper tackles the problem of stability in ISR. We first present a method that increases the stability and accuracy of ISR output, without adding delay. Given that some revisions are unavoidable, we next present a pair of methods for predicting the stability and accuracy of ISR results. Taken together, we believe these approaches give ISR more utility for real spoken dialogue systems.

UR - http://www.scopus.com/inward/record.url?scp=84863229187&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863229187&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84863229187

SN - 9781937284107

T3 - Proceedings of the SIGDIAL 2011 Conference: 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue

SP - 110

EP - 119

BT - Proceedings of the SIGDIAL 2011 Conference

T2 - 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2011

Y2 - 17 June 2011 through 18 June 2011

ER -

Stability and accuracy in Incremental Speech Recognition

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this