Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Larry P. Heck; Yochai Konig; M. Kemal Sönmez; Mitch Weintraub

doi:10.1016/S0167-6393(99)00077-1

Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Larry P. Heck, Yochai Konig, M. Kemal Sönmez, Mitch Weintraub

Research output: Contribution to journal › Article › peer-review

60 Scopus citations

Abstract

A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear artificial neural network. The neural network is discriminatively trained to maximize speaker recognition performance specifically in the setting of telephone handset mismatch between training and testing. The algorithm requires neither stereo recordings of speech during training nor manual labeling of handset types either in training or testing. Results on the 1998 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation corpus show relative improvements as high as 28% for the new multilayered perceptron (MLP)-based features as compared to a standard mel-cepstral feature set with cepstral mean subtraction (CMS) and handset-dependent normalizing impostor models.

Original language	English (US)
Pages (from-to)	181-192
Number of pages	12
Journal	Speech Communication
Volume	31
Issue number	2
DOIs	https://doi.org/10.1016/S0167-6393(99)00077-1
State	Published - Jun 2000
Externally published	Yes

ASJC Scopus subject areas

Software
Modeling and Simulation
Communication
Language and Linguistics
Linguistics and Language
Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1016/S0167-6393(99)00077-1

Cite this

@article{17dfcdd1c16b4675a06ea10b553f1228,

title = "Robustness to telephone handset distortion in speaker recognition by discriminative feature design",

abstract = "A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear artificial neural network. The neural network is discriminatively trained to maximize speaker recognition performance specifically in the setting of telephone handset mismatch between training and testing. The algorithm requires neither stereo recordings of speech during training nor manual labeling of handset types either in training or testing. Results on the 1998 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation corpus show relative improvements as high as 28% for the new multilayered perceptron (MLP)-based features as compared to a standard mel-cepstral feature set with cepstral mean subtraction (CMS) and handset-dependent normalizing impostor models.",

author = "Heck, {Larry P.} and Yochai Konig and S{\"o}nmez, {M. Kemal} and Mitch Weintraub",

year = "2000",

month = jun,

doi = "10.1016/S0167-6393(99)00077-1",

language = "English (US)",

volume = "31",

pages = "181--192",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier",

number = "2",

}

TY - JOUR

T1 - Robustness to telephone handset distortion in speaker recognition by discriminative feature design

AU - Heck, Larry P.

AU - Konig, Yochai

AU - Sönmez, M. Kemal

AU - Weintraub, Mitch

PY - 2000/6

Y1 - 2000/6

N2 - A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear artificial neural network. The neural network is discriminatively trained to maximize speaker recognition performance specifically in the setting of telephone handset mismatch between training and testing. The algorithm requires neither stereo recordings of speech during training nor manual labeling of handset types either in training or testing. Results on the 1998 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation corpus show relative improvements as high as 28% for the new multilayered perceptron (MLP)-based features as compared to a standard mel-cepstral feature set with cepstral mean subtraction (CMS) and handset-dependent normalizing impostor models.

AB - A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear artificial neural network. The neural network is discriminatively trained to maximize speaker recognition performance specifically in the setting of telephone handset mismatch between training and testing. The algorithm requires neither stereo recordings of speech during training nor manual labeling of handset types either in training or testing. Results on the 1998 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation corpus show relative improvements as high as 28% for the new multilayered perceptron (MLP)-based features as compared to a standard mel-cepstral feature set with cepstral mean subtraction (CMS) and handset-dependent normalizing impostor models.

UR - http://www.scopus.com/inward/record.url?scp=0033746018&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033746018&partnerID=8YFLogxK

U2 - 10.1016/S0167-6393(99)00077-1

DO - 10.1016/S0167-6393(99)00077-1

M3 - Article

AN - SCOPUS:0033746018

SN - 0167-6393

VL - 31

SP - 181

EP - 192

JO - Speech Communication

JF - Speech Communication

IS - 2

ER -

Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this