Ircam-Centre Pompidou


  • Recherche simple
  • Recherche avancée

    Panier électronique

    Votre panier ne contient aucune notice

    Connexion à la base

  • Identification
    (Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)

    Entrepôt OAI-PMH

  • Soumettre une requête

    Consulter la notice détailléeConsulter la notice détaillée
    Version complète en ligneVersion complète en ligne
    Version complète en ligne accessible uniquement depuis l'IrcamVersion complète en ligne accessible uniquement depuis l'Ircam
    Ajouter la notice au panierAjouter la notice au panier
    Retirer la notice du panierRetirer la notice du panier

  • English version
    (full translation not yet available)
  • Liste complète des articles

  • Consultation des notices

    Vue détaillée Vue Refer Vue Labintel Vue BibTeX  

    Catégorie de document Contribution à un colloque ou à un congrès
    Titre Timbre Characterisation and Recognition with Combined Stationary and Temporal Features
    Auteur principal Shlomo Dubnov
    Co-auteur Xavier Rodet
    Colloque / congrès ICMC: International Computer Music Conference. Ann Arbor : Octobre 1998
    Comité de lecture Indéterminé
    Copyright Ircam - Centre Georges-Pompidou
    Année 1998
    Statut éditorial Publié

    Classification and generation of sound require a modeling approach that takes into account, additionally to the common sound features, also the statistical behaviour of the sound components. Such statistics include the stationary random fluctuations in amplitude and frequency that occur during sustained portions of the sound and the stochastic behaviour of sound during its lifetime. In our work we have considered so far statistical models of the variations that occur during a sustained portion of the sound. Various aspects, such as phase coupling and its relation to Higher Order Statistical (HOS) analysis were investigated and shown to be important for sound characterization. The purpose of the current work is to extend this research towards modeling the temporal behaviour of sound. We are considering a unified model that combines spectral and HOS features and apply a new method for comparison between the temporal evolutions of these features. Typical applications envisioned are very broad and include characterisation for analysis/synthesis, coding and sound database retrieval. In order to understand the problems in comparing sounds, one must note that there are different temporal scales for sound behaviour. This includes short term correlations related to the timbral properties (such as formants), correlations due to pitch period, slower modulations such as vibrato, expressivity inflections, and transitions between different notes. Thus a sequence that might seem stationary on one time scale, departs from stationarity and ergodicity on another time scale. This situation poses a problem for assessing the right probability function for the sequence of samples. Moreover, for purposes of classification, introducing similarity measures between sounds is usually based upon specific models (like Markov models of a certain order) or apriori knowledge of the parametric shape of the probability distribution, a situation which we would like to avoid. A possible solution for this problem is to consider the Markovian property at different time scales by using multiple features and capturing their temporal behaviours. Thus, we consider a model composed of features that represent stationary segments (states) and transition between these states. For short time description of the sound we use a of spectral envelopes (Mel Frequency Cepstral Coefficients (MFCC), like in speech), which allow for up to 90% of data reduction in sound representation. Moreover, a vector quantisation (VQ) procedure further reduces the set of envelopes by optimally representing the complete dataset with just a few typical envelopes. In order to capture the information present in higher cepstral coefficients as well, additional parameters were used. These higher cepstral coefficients correspond to the excitation signal (also called the residual). Variations in the fundamental frequency and HOS parameters that describe the residual properties (such as kurtosis which is related to phase coupling) were used. The investigation into temporal structure of the signal was done along two lines: 1). the short time temporal evolution is described by specific features such as cepstral "difference" and "acceleration". The evolution is considered in terms of transition between "typical" envelopes found by VQ. This method gives excellent performance for limited data sets such as isolated notes by matching both the instantaneous spectral shapes and their evolution. 2). for the long term behaviour of the signal we applied information-theoretic tools for classification of the feature sequences. Using Ziv-Merhav ``universal'' sequence classification method, the cross-entropy comparison is done without estimation of a specific Markov model. The model requires long feature sequences to reveal its structure and is applicable for complex sounds such as note sequences and some non-musical sounds. The model, classification scheme and refinements for specific types of sounds will be presented in the paper.

    Mots-clés Timbre / Characterisation / Recognition / Vector Quantization / Universal Classification /
    Equipe Analyse et synthèse sonores
    Cote Dubnov98b

    © Ircam - Centre Pompidou 2005.