Abstract
We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.
Original language | English (US) |
---|---|
Title of host publication | EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology |
Publisher | International Speech Communication Association |
Pages | 329-332 |
Number of pages | 4 |
State | Published - 2003 |
Event | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Duration: Sep 1 2003 → Sep 4 2003 |
Other
Other | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 |
---|---|
Country/Territory | Switzerland |
City | Geneva |
Period | 9/1/03 → 9/4/03 |
ASJC Scopus subject areas
- Computer Science Applications
- Software
- Linguistics and Language
- Communication