Including dynamic and phonetic information in voice conversion systems

Helenca Duxans; Antonio Bonafonte; Alexander Kain; Jan Van Santen

Including dynamic and phonetic information in voice conversion systems

Helenca Duxans, Antonio Bonafonte, Alexander Kain, Jan Van Santen

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

42 Scopus citations

Abstract

Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models [1] performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.

Original language	English (US)
Title of host publication	8th International Conference on Spoken Language Processing, ICSLP 2004
Publisher	International Speech Communication Association
Pages	1193-1196
Number of pages	4
State	Published - 2004
Event	8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of Duration: Oct 4 2004 → Oct 8 2004

Other

Other	8th International Conference on Spoken Language Processing, ICSLP 2004
Country/Territory	Korea, Republic of
City	Jeju, Jeju Island
Period	10/4/04 → 10/8/04

ASJC Scopus subject areas

Language and Linguistics
Linguistics and Language

Cite this

Duxans, H, Bonafonte, A, Kain, A & Van Santen, J 2004, Including dynamic and phonetic information in voice conversion systems. in 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, pp. 1193-1196, 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of, 10/4/04.

@inproceedings{af8a9f07b71c46bbb09eaca4c4e0641b,

title = "Including dynamic and phonetic information in voice conversion systems",

abstract = "Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models [1] performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.",

author = "Helenca Duxans and Antonio Bonafonte and Alexander Kain and {Van Santen}, Jan",

year = "2004",

language = "English (US)",

pages = "1193--1196",

booktitle = "8th International Conference on Spoken Language Processing, ICSLP 2004",

publisher = "International Speech Communication Association",

note = "8th International Conference on Spoken Language Processing, ICSLP 2004 ; Conference date: 04-10-2004 Through 08-10-2004",

}

TY - GEN

T1 - Including dynamic and phonetic information in voice conversion systems

AU - Duxans, Helenca

AU - Bonafonte, Antonio

AU - Kain, Alexander

AU - Van Santen, Jan

PY - 2004

Y1 - 2004

N2 - Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models [1] performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.

AB - Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models [1] performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.

UR - http://www.scopus.com/inward/record.url?scp=84994241109&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994241109&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84994241109

SP - 1193

EP - 1196

BT - 8th International Conference on Spoken Language Processing, ICSLP 2004

PB - International Speech Communication Association

T2 - 8th International Conference on Spoken Language Processing, ICSLP 2004

Y2 - 4 October 2004 through 8 October 2004

ER -

Including dynamic and phonetic information in voice conversion systems

Abstract

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this