TY - GEN
T1 - A novel pitch decomposition method for the generalized linear alignment model
AU - Langarani, Mahsa Sadat Elyasi
AU - Klabbers, Esther
AU - Van Santen, Jan
PY - 2014
Y1 - 2014
N2 - Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm ('PRISM'), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.
AB - Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm ('PRISM'), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.
KW - prosody modeling
KW - superpositional model
KW - text-to-speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84905229466&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905229466&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2014.6854067
DO - 10.1109/ICASSP.2014.6854067
M3 - Conference contribution
AN - SCOPUS:84905229466
SN - 9781479928927
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2584
EP - 2588
BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Y2 - 4 May 2014 through 9 May 2014
ER -