Ircam-Centre Pompidou

Recherche

Recherche simple

Recherche avancée

Panier électronique

Votre panier ne contient aucune notice

Connexion à la base

(Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)

Entrepôt OAI-PMH

Soumettre une requête

	Consulter la notice détaillée
	Version complète en ligne
	Version complète en ligne accessible uniquement depuis l'Ircam
	Ajouter la notice au panier
	Retirer la notice du panier

English version

(full translation not yet available)

Liste complète des articles

Consultation des notices

Vue détaillée

Catégorie de document	Article paru dans une revue
Titre	Symbolic Modeling of Prosody: From Linguistics to Statistics
Auteur principal	Nicolas Obin
Co-auteur	Pierre Lanchantin
Paru dans	IEEE/ACM Transactions on Audio, Speech and Language Processing 2015, Vol. 3, n° 23
Comité de lecture	Oui
Collation	p.588-599
Année	2015
Statut éditorial	Non publié
Résumé	The assignment of prosodic events (accent and phrasing) from the text is crucial in text-to-speech synthesis systems. This paper addresses the combination of linguistic and metric constraints for the assignment of prosodic events in textto- speech synthesis. First, a linguistic processing chain is used to provide a rich linguistic description of a text. Then, a novel statistical representation based on a hierarchical HMM (HHMM) is used to model the prosodic structure of a text: the root layer represents the text, each intermediate layer a sequence of intermediate phrases, the pre-terminal layer the sequence of accents, and the terminal layer the sequence of linguistic contexts. For each intermediate layer, a segmental HMM and information fusion are used to fuse the linguistic and metric constraints for the segmentation of a text into phrases. A set of experiments conducted on multi-speaker databases with various speaking styles reports that: the rich linguistic representation improves drastically the assignment of prosodic events, and the fusion of linguistic and metric constraints significantly improves over standard methods for the segmentation of a text into phrases. These constitute substantial advances that can be further used to model the speech prosody of a speaker, a speaking style, and emotions for text-to-speech synthesis.
Mots-clés	text-to-speech synthesis / speech prosody / speaking style / prosodic events / surface / deep syntactic parsing / hierarchical HMMs / segmental HMMs / Dempster-Shafer fusion
Equipe	Analyse et synthèse sonores
Cote	Obin15a
Adresse de la version en ligne	http://architexte.ircam.fr/textes/Obin15a/index.pdf

© Ircam - Centre Pompidou 2005.