Evaluating models of vowel perception

Michelle R. Molis

doi:10.1121/1.1943907

Evaluating models of vowel perception

Michelle R. Molis

Research output: Contribution to journal › Article › peer-review

22 Scopus citations

Abstract

There is a long-standing debate concerning the efficacy of formant-based versus whole spectrum models of vowel perception. Categorization data for a set of synthetic steady-state vowels were used to evaluate both types of models. The models tested included various combinations of formant frequencies and amplitudes, principal components derived from excitation patterns, and perceptually scaled LPC cepstral coefficients. The stimuli were 54 five-formant synthesized vowels that had a common F1 frequency and varied orthogonally in F2 and F3 frequency. Twelve speakers of American English categorized the stimuli as the vowels /I/, /υ/, or /latin small letter reversed open e with hook sign/. Results indicate that formant frequencies provided the best account of the data only if nonlinear terms, in the form of squares and cross products of the formant values, were also included in the analysis. The excitation pattern principal components also produced reasonably accurate fits to the data. Although a wish to use the lowest-dimensional representation would dictate that formant frequencies are the most appropriate vowel description, the relative success of richer, more flexible, and more neurophysiologically plausible whole spectrum representations suggests that they may be preferred for understanding human vowel perception.

Original language	English (US)
Pages (from-to)	1062-1071
Number of pages	10
Journal	Journal of the Acoustical Society of America
Volume	118
Issue number	2
DOIs	https://doi.org/10.1121/1.1943907
State	Published - Aug 2005
Externally published	Yes

ASJC Scopus subject areas

Arts and Humanities (miscellaneous)
Acoustics and Ultrasonics

Access to Document

10.1121/1.1943907

Cite this

@article{a5044cba28184ea99d7261f5c3c93670,

title = "Evaluating models of vowel perception",

abstract = "There is a long-standing debate concerning the efficacy of formant-based versus whole spectrum models of vowel perception. Categorization data for a set of synthetic steady-state vowels were used to evaluate both types of models. The models tested included various combinations of formant frequencies and amplitudes, principal components derived from excitation patterns, and perceptually scaled LPC cepstral coefficients. The stimuli were 54 five-formant synthesized vowels that had a common F1 frequency and varied orthogonally in F2 and F3 frequency. Twelve speakers of American English categorized the stimuli as the vowels /I/, /υ/, or /latin small letter reversed open e with hook sign/. Results indicate that formant frequencies provided the best account of the data only if nonlinear terms, in the form of squares and cross products of the formant values, were also included in the analysis. The excitation pattern principal components also produced reasonably accurate fits to the data. Although a wish to use the lowest-dimensional representation would dictate that formant frequencies are the most appropriate vowel description, the relative success of richer, more flexible, and more neurophysiologically plausible whole spectrum representations suggests that they may be preferred for understanding human vowel perception.",

author = "Molis, {Michelle R.}",

note = "Funding Information: This research was conducted as part of a doctoral dissertation at the University of Texas at Austin. The work was supported by NIH (R01 DC00427-13,-14). The author wishes to thank Randy L. Diehl, Marjorie R. Leek, James M. Hillenbrand, and one unnamed reviewer for comments on an earlier draft. The opinions or assertions contained herein are the private views of the author and are not to be construed as official or as reflecting the views of the Department of the Army or the Department of Defense. 1 ",

year = "2005",

month = aug,

doi = "10.1121/1.1943907",

language = "English (US)",

volume = "118",

pages = "1062--1071",

journal = "Journal of the Acoustical Society of America",

issn = "0001-4966",

publisher = "Acoustical Society of America",

number = "2",

}

TY - JOUR

T1 - Evaluating models of vowel perception

AU - Molis, Michelle R.

N1 - Funding Information: This research was conducted as part of a doctoral dissertation at the University of Texas at Austin. The work was supported by NIH (R01 DC00427-13,-14). The author wishes to thank Randy L. Diehl, Marjorie R. Leek, James M. Hillenbrand, and one unnamed reviewer for comments on an earlier draft. The opinions or assertions contained herein are the private views of the author and are not to be construed as official or as reflecting the views of the Department of the Army or the Department of Defense. 1

PY - 2005/8

Y1 - 2005/8

N2 - There is a long-standing debate concerning the efficacy of formant-based versus whole spectrum models of vowel perception. Categorization data for a set of synthetic steady-state vowels were used to evaluate both types of models. The models tested included various combinations of formant frequencies and amplitudes, principal components derived from excitation patterns, and perceptually scaled LPC cepstral coefficients. The stimuli were 54 five-formant synthesized vowels that had a common F1 frequency and varied orthogonally in F2 and F3 frequency. Twelve speakers of American English categorized the stimuli as the vowels /I/, /υ/, or /latin small letter reversed open e with hook sign/. Results indicate that formant frequencies provided the best account of the data only if nonlinear terms, in the form of squares and cross products of the formant values, were also included in the analysis. The excitation pattern principal components also produced reasonably accurate fits to the data. Although a wish to use the lowest-dimensional representation would dictate that formant frequencies are the most appropriate vowel description, the relative success of richer, more flexible, and more neurophysiologically plausible whole spectrum representations suggests that they may be preferred for understanding human vowel perception.

AB - There is a long-standing debate concerning the efficacy of formant-based versus whole spectrum models of vowel perception. Categorization data for a set of synthetic steady-state vowels were used to evaluate both types of models. The models tested included various combinations of formant frequencies and amplitudes, principal components derived from excitation patterns, and perceptually scaled LPC cepstral coefficients. The stimuli were 54 five-formant synthesized vowels that had a common F1 frequency and varied orthogonally in F2 and F3 frequency. Twelve speakers of American English categorized the stimuli as the vowels /I/, /υ/, or /latin small letter reversed open e with hook sign/. Results indicate that formant frequencies provided the best account of the data only if nonlinear terms, in the form of squares and cross products of the formant values, were also included in the analysis. The excitation pattern principal components also produced reasonably accurate fits to the data. Although a wish to use the lowest-dimensional representation would dictate that formant frequencies are the most appropriate vowel description, the relative success of richer, more flexible, and more neurophysiologically plausible whole spectrum representations suggests that they may be preferred for understanding human vowel perception.

UR - http://www.scopus.com/inward/record.url?scp=23744457105&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=23744457105&partnerID=8YFLogxK

U2 - 10.1121/1.1943907

DO - 10.1121/1.1943907

M3 - Article

C2 - 16158661

AN - SCOPUS:23744457105

SN - 0001-4966

VL - 118

SP - 1062

EP - 1071

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

IS - 2

ER -

Evaluating models of vowel perception

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this