Prosodic Processing

Jan van Santen; Taniya Mishra; Esther Klabbers

doi:10.1007/978-3-540-49127-9_23

Prosodic Processing

Jan van Santen, Taniya Mishra, Esther Klabbers

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Chapter

14 Scopus citations

Abstract

Speech synthesis speechsynthesis systems have to generate natural-sounding speech output from text. One of the key aspects of speech is prosodic processing prosody, which must be both natural (i.e., sounding like a human) and meaningful (i.e., sounding like a human who understands the contents of the text). The computation of prosody from text can be divided into the computation of prosodic tags from text and the computation of acoustic speech features from these tags. This chapter focuses on the latter. It provides an overview of prosody in human-human communication, including the communicative functions of prosody and the acoustic correlates. Discussed next is a historical overview of the various methods that have been used for prosody generation in speech synthesis, as well as of current methods. Special attention is paid to prosody generation in unit selection synthesis methods, in which large corpora are searched for fragments of speech that match the phonemes and prosodic tags computed from text and that optimize various cost functions, and in which prosody is not modeled and speech not modified. We conclude the chapter by advocating hybrid approaches in which search capabilities of unit selection methods are combined with the speech modification methods from more-traditional approaches.

Original language	English (US)
Title of host publication	Springer Handbooks
Publisher	Springer
Pages	471-488
Number of pages	18
DOIs	https://doi.org/10.1007/978-3-540-49127-9_23
State	Published - 2008

Publication series

Name	Springer Handbooks
ISSN (Print)	2522-8692
ISSN (Electronic)	2522-8706

Keywords

Intonational Phrase
Pitch Accent
Pitch Contour
Speech Synthesis System
Unit Selection

ASJC Scopus subject areas

General

Access to Document

10.1007/978-3-540-49127-9_23

Cite this

@inbook{3b9304e5e3b948cdb5f002112e68e833,

title = "Prosodic Processing",

abstract = "Speech synthesis speechsynthesis systems have to generate natural-sounding speech output from text. One of the key aspects of speech is prosodic processing prosody, which must be both natural (i.e., sounding like a human) and meaningful (i.e., sounding like a human who understands the contents of the text). The computation of prosody from text can be divided into the computation of prosodic tags from text and the computation of acoustic speech features from these tags. This chapter focuses on the latter. It provides an overview of prosody in human-human communication, including the communicative functions of prosody and the acoustic correlates. Discussed next is a historical overview of the various methods that have been used for prosody generation in speech synthesis, as well as of current methods. Special attention is paid to prosody generation in unit selection synthesis methods, in which large corpora are searched for fragments of speech that match the phonemes and prosodic tags computed from text and that optimize various cost functions, and in which prosody is not modeled and speech not modified. We conclude the chapter by advocating hybrid approaches in which search capabilities of unit selection methods are combined with the speech modification methods from more-traditional approaches.",

keywords = "Intonational Phrase, Pitch Accent, Pitch Contour, Speech Synthesis System, Unit Selection",

author = "{van Santen}, Jan and Taniya Mishra and Esther Klabbers",

note = "Publisher Copyright: {\textcopyright} 2008, Springer-Verlag Berlin Heidelberg.",

year = "2008",

doi = "10.1007/978-3-540-49127-9_23",

language = "English (US)",

series = "Springer Handbooks",

publisher = "Springer",

pages = "471--488",

booktitle = "Springer Handbooks",

}

TY - CHAP

T1 - Prosodic Processing

AU - van Santen, Jan

AU - Mishra, Taniya

AU - Klabbers, Esther

PY - 2008

Y1 - 2008

N2 - Speech synthesis speechsynthesis systems have to generate natural-sounding speech output from text. One of the key aspects of speech is prosodic processing prosody, which must be both natural (i.e., sounding like a human) and meaningful (i.e., sounding like a human who understands the contents of the text). The computation of prosody from text can be divided into the computation of prosodic tags from text and the computation of acoustic speech features from these tags. This chapter focuses on the latter. It provides an overview of prosody in human-human communication, including the communicative functions of prosody and the acoustic correlates. Discussed next is a historical overview of the various methods that have been used for prosody generation in speech synthesis, as well as of current methods. Special attention is paid to prosody generation in unit selection synthesis methods, in which large corpora are searched for fragments of speech that match the phonemes and prosodic tags computed from text and that optimize various cost functions, and in which prosody is not modeled and speech not modified. We conclude the chapter by advocating hybrid approaches in which search capabilities of unit selection methods are combined with the speech modification methods from more-traditional approaches.

AB - Speech synthesis speechsynthesis systems have to generate natural-sounding speech output from text. One of the key aspects of speech is prosodic processing prosody, which must be both natural (i.e., sounding like a human) and meaningful (i.e., sounding like a human who understands the contents of the text). The computation of prosody from text can be divided into the computation of prosodic tags from text and the computation of acoustic speech features from these tags. This chapter focuses on the latter. It provides an overview of prosody in human-human communication, including the communicative functions of prosody and the acoustic correlates. Discussed next is a historical overview of the various methods that have been used for prosody generation in speech synthesis, as well as of current methods. Special attention is paid to prosody generation in unit selection synthesis methods, in which large corpora are searched for fragments of speech that match the phonemes and prosodic tags computed from text and that optimize various cost functions, and in which prosody is not modeled and speech not modified. We conclude the chapter by advocating hybrid approaches in which search capabilities of unit selection methods are combined with the speech modification methods from more-traditional approaches.

KW - Intonational Phrase

KW - Pitch Accent

KW - Pitch Contour

KW - Speech Synthesis System

KW - Unit Selection

UR - http://www.scopus.com/inward/record.url?scp=84868924957&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84868924957&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-49127-9_23

DO - 10.1007/978-3-540-49127-9_23

M3 - Chapter

AN - SCOPUS:84868924957

T3 - Springer Handbooks

SP - 471

EP - 488

BT - Springer Handbooks

PB - Springer

ER -

Prosodic Processing

Abstract

Publication series

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this