Ircam-Centre Pompidou

Recherche

  • Recherche simple
  • Recherche avancée

    Panier électronique

    Votre panier ne contient aucune notice

    Connexion à la base

  • Identification
    (Identifiez-vous pour accéder aux fonctions de mise à jour. Utilisez votre login-password de courrier électronique)

    Entrepôt OAI-PMH

  • Soumettre une requête

    Consulter la notice détailléeConsulter la notice détaillée
    Version complète en ligneVersion complète en ligne
    Version complète en ligne accessible uniquement depuis l'IrcamVersion complète en ligne accessible uniquement depuis l'Ircam
    Ajouter la notice au panierAjouter la notice au panier
    Retirer la notice du panierRetirer la notice du panier

  • English version
    (full translation not yet available)
  • Liste complète des articles

  • Consultation des notices


    Vue détaillée Vue Refer Vue Labintel Vue BibTeX  

    %0 Thesis
    %A Villavicencio, Fernando
    %T Conversion de la voix de haute qualité
    %D 2010
    %C Paris
    %I Université Paris 6 (UPMC)
    %F Villavicencio10a
    %K synthese de la parole
    %K analyse de la parole
    %K analyse cepstral
    %K prediction linéaire
    %K speech synthesis
    %K speech analysis
    %K cepstral analysis
    %K linear prediction.
    %X This dissertation address a work on the field known as Voice Conversion. This technology refers to the ability to modify the perceived voice identity of a speaker to render it similar to that of a specific target one. A Voice Conver- sion system consists basically in the analysis and modification of the source speech after conversion of the timbre information (spectral envelope), com- monly achieved by statistical modeling. However, natural speech quality has been rarely observed following the current approaches. Some degradations can result from the conversion process and, in general, a reduction on the overall quality of the converted speech is commonly perceived. In addition, the con- version effect is not considered fully satisfactory since the converted speech is not always perceived as being similar to that of the target speaker. Finally, note that the speech signals used until now has been restricted to low-medium quality sample-rates ([8 − 16]). The problems just described can be principally attributed to an insufficient performance of the source-target mapping of the timbre features as well as an inefficient modeling and modification of the timbre information. In particular, the spectral envelope models used to represent the timbre features, typically based on Linear Prediction or cepstral analysis (MFCC), observe systematic errors and can not been considered in general as performing efficient esti- mation of the underlying transfer-function of the signal (source-filter model). Accordingly, we consider that, following these techniques, proper extraction and modeling of the timbre information cannot be achieved. The goal of our research work was the application of Voice Conversion on high-quality speech. Our main interests were established in the improvement of current systems quality and the use of high-quality speech. To achieve this, we focused our motivation into the study of improved spectral envelope modeling and timbre modification. The benefits provided by a cepstrum-based technique known as True Enve- lope to achieve efficient envelope estimation were studied and experimentally verified. A model including perceptual criteria and accurate target informa- tion was defined to evaluate the conversion performance instead of the classical error measure based on poorly estimated envelope parameters. The improved envelope models were applied to a Voice Conversion framework based on Gaus- sian Mixture Modeling, resulting in increased timbre conversion performance. A strategy to automatically select the order of the envelope models was also derived, allowing increased extraction of the source timbre features. Finally, a technique to achieve improved modified-timbre speech synthesis based on the LP-PSOLA technique and Line Spectral Frequencies parameterization was proposed. The resulting Voice Conversion methodology showed improved ob- jective and subjective performance compared to the classical one based on Linear Prediction.
    %1 8
    %2 1

    © Ircam - Centre Pompidou 2005.