Ircam-Centre Pompidou

Recherche

Recherche simple

Recherche avancée

Panier électronique

Votre panier ne contient aucune notice

Connexion à la base

(Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)

Entrepôt OAI-PMH

Soumettre une requête

	Consulter la notice détaillée
	Version complète en ligne
	Version complète en ligne accessible uniquement depuis l'Ircam
	Ajouter la notice au panier
	Retirer la notice du panier

English version

(full translation not yet available)

Liste complète des articles

Consultation des notices

Vue détaillée

Catégorie de document	Contribution à un colloque ou à un congrès
Titre	On the Generalization of Shannon Entropy for Speech Recognition
Auteur principal	Nicolas Obin
Co-auteur	Marco Liuni
Colloque / congrès	IEEE workshop on Spoken Language Technology. Miami : 2012
Comité de lecture	Oui
Année	2012
Statut éditorial	Non publié
Résumé	This paper introduces an entropy-based spectral representation as a measure of the degree of noisiness in audio signals, complementary to the standard MFCCs for audio and speech recognition. The proposed representation is based on the Rényi entropy, which is a generalization of the Shannon entropy. In audio signal representation, Rényi entropy presents the advantage of focusing either on the harmonic content (prominent amplitude within a distribution) or on the noise content (equal distribution of amplitudes). The proposed representation outperforms all other noisiness measures - including Shannon and Wiener entropies - in a large-scale classification of vocal effort (whispered-soft/normal/loud-shouted) in the real scenario of multi-language massive role-playing video games. The improvement is around 10% in relative error reduction, and is particularly significant for the recognition of noisy speech - i.e., whispery/breathy speech. This confirms the role of noisiness for speech recognition, and will further be extended to the classification of voice quality for the design of an automatic voice casting system in video games.
Mots-clés	information theory / spectral entropy / speech recognition / expressive speech / voice quality / video games
Equipe	Analyse et synthèse sonores
Cote	Obin12e
Adresse de la version en ligne	http://architexte.ircam.fr/textes/Obin12e/index.pdf

© Ircam - Centre Pompidou 2005.