Ircam-Centre Pompidou

Recherche

Recherche simple

Recherche avancée

Panier électronique

Votre panier ne contient aucune notice

Connexion à la base

(Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)

Entrepôt OAI-PMH

Soumettre une requête

	Consulter la notice détaillée
	Version complète en ligne
	Version complète en ligne accessible uniquement depuis l'Ircam
	Ajouter la notice au panier
	Retirer la notice du panier

English version

(full translation not yet available)

Liste complète des articles

Consultation des notices

Vue détaillée

Catégorie de document	Contribution à un colloque ou à un congrès
Titre	Timbre Characterisation and Recognition with Combined Stationary and Temporal Features
Auteur principal	Shlomo Dubnov
Co-auteur	Xavier Rodet
Colloque / congrès	ICMC: International Computer Music Conference. Ann Arbor : Octobre 1998
Comité de lecture	Indéterminé
Copyright	Ircam - Centre Georges-Pompidou
Année	1998
Statut éditorial	Publié
Résumé	Classification and generation of sound require a modeling approach that takes into account, additionally to the common sound features, also the statistical behaviour of the sound components. Such statistics include the stationary random fluctuations in amplitude and frequency that occur during sustained portions of the sound and the stochastic behaviour of sound during its lifetime. In our work we have considered so far statistical models of the variations that occur during a sustained portion of the sound. Various aspects, such as phase coupling and its relation to Higher Order Statistical (HOS) analysis were investigated and shown to be important for sound characterization. The purpose of the current work is to extend this research towards modeling the temporal behaviour of sound. We are considering a unified model that combines spectral and HOS features and apply a new method for comparison between the temporal evolutions of these features. Typical applications envisioned are very broad and include characterisation for analysis/synthesis, coding and sound database retrieval. In order to understand the problems in comparing sounds, one must note that there are different temporal scales for sound behaviour. This includes short term correlations related to the timbral properties (such as formants), correlations due to pitch period, slower modulations such as vibrato, expressivity inflections, and transitions between different notes. Thus a sequence that might seem stationary on one time scale, departs from stationarity and ergodicity on another time scale. This situation poses a problem for assessing the right probability function for the sequence of samples. Moreover, for purposes of classification, introducing similarity measures between sounds is usually based upon specific models (like Markov models of a certain order) or apriori knowledge of the parametric shape of the probability distribution, a situation which we would like to avoid. A possible solution for this problem is to consider the Markovian property at different time scales by using multiple features and capturing their temporal behaviours. Thus, we consider a model composed of features that represent stationary segments (states) and transition between these states. For short time description of the sound we use a of spectral envelopes (Mel Frequency Cepstral Coefficients (MFCC), like in speech), which allow for up to 90% of data reduction in sound representation. Moreover, a vector quantisation (VQ) procedure further reduces the set of envelopes by optimally representing the complete dataset with just a few typical envelopes. In order to capture the information present in higher cepstral coefficients as well, additional parameters were used. These higher cepstral coefficients correspond to the excitation signal (also called the residual). Variations in the fundamental frequency and HOS parameters that describe the residual properties (such as kurtosis which is related to phase coupling) were used. The investigation into temporal structure of the signal was done along two lines: 1). the short time temporal evolution is described by specific features such as cepstral "difference" and "acceleration". The evolution is considered in terms of transition between "typical" envelopes found by VQ. This method gives excellent performance for limited data sets such as isolated notes by matching both the instantaneous spectral shapes and their evolution. 2). for the long term behaviour of the signal we applied information-theoretic tools for classification of the feature sequences. Using Ziv-Merhav ``universal'' sequence classification method, the cross-entropy comparison is done without estimation of a specific Markov model. The model requires long feature sequences to reveal its structure and is applicable for complex sounds such as note sequences and some non-musical sounds. The model, classification scheme and refinements for specific types of sounds will be presented in the paper.
Mots-clés	Timbre / Characterisation / Recognition / Vector Quantization / Universal Classification /
Equipe	Analyse et synthèse sonores
Cote	Dubnov98b

© Ircam - Centre Pompidou 2005.