| Serveur © IRCAM - CENTRE POMPIDOU 1996-2005. Tous droits réservés pour tous pays. All rights reserved. |
New Possibilities in Sound Analysis and Synthesis
Xavier Rodet, Philippe Depalle, Guillermo Garcia
ISMA 95, Dourdan (France), 1995
Copyright © ISMA 1995
Abstract
In this presentation we exemplify the emergence of new possibilities in sound analysis
and synthesis with three novel developments that have been done in the Analysis/Synthesis
team at IRCAM. These examples address three main activities in our domain, and have reached a large public making or simply listening to music.
The first example concerns synthesis
using physical models. We have determined the behavior of a class of models, in terms
of stability, oscillation, periodicity, and finally chaos, leading to a better control
of these models in truly innovative musical applications.
The second example concerns additive synthesis essentially based on the analysis
of natural sounds. We have developed a new additive method based on spectral envelopes
and inverse Fast Fourier Transform, which we name FFT-1 and which provides a solution to the different difficulties of the classical method.
Commercial applications are announced for this year, for professional users and soon
for a larger public.
The last example is an original work on the recreation of a castrato voice by the
means of sound analysis, processing
andsynthesis. It has been done for the soundtrack for a film and a CD about Farinelli,
the famous castrato of the eighteenth century. The CD and the film have reached millions
of people all around the world.
1. Introduction
Since the birth of computer music technology in the 1950's, new possibilities in
analysis and synthesis of sound have emerged from research institutions and have
come gradually in public use. In this presentation we will exemplify this emergence
by focusing on three novel developments that have been done by the Analysis/Synthesis team
at IRCAM. These examples are typical since they address three main activities in
our domain, synthesis, analysis and processing. They are typical also because they
are issued from laboratories and have reached not only professionals but also a large public
making or simply listening to music.
The first example concerns synthesis
using the class of models known as physical models [1], [2]. Compared to signal models,
such as additive synthesis, physical models have only recently received as much attention,
because, partly due to their nonlinear nature, they are very complex to construct and handle. But for musical applications, one should not merely build
models and deliver them to musicians. It is indispensable to go into the understanding
of the models, to conceive abstractions of them and to propose explanations useful
to the users. In particular, this comprehension is indispensable for elaborating
control of synthesis models, which are at the same time efficient and musically pertinent
[3]. Consequently, we have studied a family of differential and integral delay equations
which retain the essence of the behavior of certain classes of instruments with sustained sounds, such as wind, brass and string instruments [4]. We have determined
the behavior of our models, in terms of stability, oscillation, periodicity, and
finally chaos, leading to a better control of these models in truly innovative musical
applications.
The second example concerns one of the oldest methods of computer music called additive
synthesis, that is the summation of time-varying sinusoidal components, essentially
based on the analysis
of natural sounds [5], [6]. But despite rapid gains in computational accuracy and
performance, the state of the art in affordable single chip real-time solutions to
the problem of additive synthesis offers only 32 oscillators. Since hundreds of sinusoids
are required for a single low pitched note of the piano, for example, current single
chip solutions fall short by a factor of at least 20. And the development and use
of additive synthesis have also been discouraged by other drawbacks. Firstly, amplitude
and frequency variations of the sinusoidal components are commonly described in terms
of breakpoint functions. When the number of partials is large, control by the user
of each individual breakpoint function becomes impractical. Other arguments against
such breakpoint functions is that for voice and certain musical instruments, a spectral
envelope [7], [8] captures most of the important behavior of the partials. Finally,
the usual oscillator method for additive synthesis does not provide an efficient
way for adding colored noise to sinusoidal partials, which is needed to successfully synthesize
speech and the Japanese Shakuhachi flute, for example. This is why we have developed
a new additive synthesis method based on spectral envelopes and inverse Fast Fourier Transform, which we name FFT-1 and which provides a solution to the different difficulties that we have mentioned
[9], [10]. Commercial applications should appear this year for professional users
and soon for a larger public [11]
.
The last example is an original work on the recreation of a castrato voice by the
means of sound analysis, processing
and
synthesis [12]. It has been done to produce the soundtrack for a film and a CD about
Farinelli, the famous castrato of the eighteenth century. The film, realized by Gérard
Corbiau, and the CD produced by AUVIDIS, brings back to life a repertoire which could not be sung anymore. The musical consultant of the film Marc David has recovered
unedited scores from the French National Library. This example is particularly interesting
for several reasons. First it is extremely difficulty to synthesize high quality
(concert or CD) singing voice [13], secondly 40 minutes of a new castrato-like singing
voice have been produced in a limited schedule, thirdly it is the first application
of a technique which is analogous to morphing
and finally the CD and the film have reached millions of people all around the world.
2. Analysis, processing and synthesis
The work described here has been done at the Institut de Recherche et de Coordination
Acoustique/Musique (IRCAM), Paris, by the Analysis/Synthesis team. Since IRCAM is
devoted to musical research and production, our main goal, on a short term and a
long term basis, is artistic as well as scientific in nature. Computer generated music
occupies a central role in artistic production at IRCAM, in the form of aid and tools
for musicians and composers in the process of research and of the production of musical
compositions. Among activities of computer music that our team is devoted to, we can
mention sound analysis and processing, natural sound simulation and creation of new
types of sounds [14]. In order that these different musical objects be useful for
musical composition, it should be possible to modify them at will to introduce such effects
as expressivity or different playing techniques. Therefore, sounds should be defined
by models
which will be considered as instruments
. In the synthesis of musical sound as in many other fields of simulation, the concept
of model is essential for a better comprehension and use of the properties of sound
analysis and synthesis methods. In particular, it is necessary to understand the
structure of the space of instrumental
sounds.
Analysis of musical sounds is aimed at getting some precise information concerning
the signal itself, the way it has been produced or the way it is perceived [15],
[8]. Such an information can be very specific, this is the case of the fundamental
frequency, or it can be fairly general such as all the information which is needed to rebuild
a quasi identical sound with some synthesis technique [8].
Synthesis of musical sounds can be viewed as a two stage process. In the second stage,
the synthesizer itself computes a sound signal according to the value of parameters.
In the first stage, the determination of adequate values of these parameters is essential for getting the desired sound output. For example, parameters may be obtained
by some analysis technique [6] or generated by rules [16], [17], [18], [19], [20], [21].
Processing of musical sounds can be divided in two classes. On one hand, a processing
system may be independent of the signal: this is the case of modulation, filtering
or reverberation. On the other hand, a processing system may consist of an analysis
stage producing time dependent parameters, a modification of these parameters and a
synthesis stage from the modified parameters. A well known example of such processing
uses the so called phase vocoder
[22], [23].
An analysis or a synthesis method is always referring to some model of sound representation
and of sound production. This model can be the so called physical model
[24], [1] which is explicitly based on the physical laws which govern the evolution
of the physical system producing the sound. Or it can be a model of the sound signal
which consists in one or several parametered structures which are adapted to represent
and reproduce time domain and/or frequency domain characteristics of studied sounds
[25]. As these signal models
include few constraints, they are simple, general and low cost. This is the case
of an oscillator reproducing a periodic waveform extracted from a natural sound.
In between signal models and physical models, one can find models that share properties
with both classes : this is the case of a lattice filter as model of the vocal tract
or of some simplified physical models as proposed by J. Smith. But a more general
way to encompass both classes of models in the same formalism, is the so called State
Space representation studied in our team [26], [27]. One can always gradually transform a signal model into a physical one (or inversely),
by including (or removing) some constraints on the structure of the model [28]. According
to the previous requirements, our models rely on description of perceptually relevant features of the sound or of its Short Time Fourier Transform (STFT) or of the
system that has produced the sound. This description goes from the more general features
- e.g. spectral envelope - to the more subtle details, e.g. those of the STFT such as harmonicity, partials, and noise [7]. Parameters of a model are usually to be
updated at a rather low rate called frame rate or parameter rate (typically lower
than 200 Hz). Considering that the purpose is to allow musicians to process existing
sounds or to create new sounds, the control parameters have to be intuitive, direct, and
easy for musicians to use.
3. Understanding and Control of Physical Models
3.1 Introduction
This section describes an approach to the functioning of musical instruments from
the point of view of the theory of nonlinear dynamical systems. Our work provides
theoretical results on instruments, their models and on a class of equations with
delay, as well as on sound synthesis itself. Experimental and practical results open new
sonic possibilities in terms of sound material and in terms of the control of sound
synthesis which is particularly important for performers and composers of contemporary
music.
The complexity of physical models comes partly from their nonlinear nature. We try
to define resemblance and differences between several classes of instruments. This
attitude is necessary for artistic production which cannot be confined to traditional
instruments and for which we have to understand the structure of the space of instrumental
sounds. To fulfil these requirements, we have highlighted a characteristic common
to instruments with sustained sounds, i.e. the existence of delay terms in the equations
of models [29]. As a consequence, we have studied a family of differential and integral delay equations which are particularly difficult and are not well understood [30].
We have determined the behavior of our models, in terms of stability, oscillation,
periodicity, and finally chaos. Moreover we have found analytically some conditions
for these behaviors. We have realised digital simulations of our models on a workstation.
We have shown that observing in real time the solutions of an equation and their
properties, such as the Fourier spectrum, while changing parameter values, is a powerful tool for mathematical exploration.
An interesting finding is the control of chaotic behavior of our models [31]. It
seemed previously that chaotic sounds could not be of any musical interest. On the
contrary, we have found that these signals exhibit very interesting properties, such
as a clearly perceived pitch or an intermittent type behavior [32]. We have also shown
that a signal which is mathematically chaotic can be heard in a very different way.
What could be named the "proportion
of chaos" can be faint or predominant, from sounds perceived as harmonic without
noise, up to essentially noisy sounds. Finally, we want to control the "proportion
of chaos" in synthetic signals, opening a new field of fascinating research and application
[33].
3.2 Mathematical Model
The trumpet is an example of an instrument uncommonly difficult to model because
of the complicated features of nonlinear elements, lips and air flow between lips,
which are not easy to measure. We have studied such a model of the behavior of the
trumpet and we have tested its operation rather extensively [34]. Without further simplifications,
this system of five non-linear differential equations would be nearly impossible
to understand and control. To only compute a numerical solution would be very unsatisfactory for our musical and artistic purpose. This is why we have started the study
of the basic
behavior
of classes of instruments. In the case of the trumpet or of the clarinet, seen from
the mouthpiece, the bore appears roughly as a delay line with a sign inversion reflection
and some low pass filtering
[29]. The basis of the oscillatory behavior is to be
found in the coupling of the passive linear part with the nonlinear reed, and similarly
for string instruments, where the delay comes from transverse waves along the string.
In this way, many sustained musical instruments can be described by an autonomous
system of integral and differential delay equations. These equations are extremely
difficult and have not received as much attention as usual differential equations
[35]. However, a feedback loop formulation provides some light on their properties [36].
One of the simplest system is written, for x R, with a instantaneous nonlinearity g:
x(t) = h * g(x(t-t)) (1)
where g:RR, t R is some time delay, h:RR is an impulse response and * is the convolution operator. Even in the case of this simplified equation (1), the
solutions and their stability are known only partially and in restricted cases
[37].
We have found a similarity between this model and the so called Time-Delayed Chua's Circuit
, a modification of the famous Chua's circuit governed by the same equation (1) which
we have simulated in real-time [38]. While the original Chua's circuit [39] happens
to be relatively difficult to control and does not offer as rich a palette of timbres
as wished for musical applications, the Time-Delayed Chua's Circuit is much richer
and flexible. A large variety of sounds can be produced by the system. This is due
to the combination of the rich dynamics of the nonlinear map together with the numerous
states represented by the delay line t as opposed to the minimum number of states of the original circuit. Information is
contained in these states and very complex signal patterns come out of the interaction
of the states through the nonlinear function. The feedback loop can also viewed as
a stabilisation
loop added on the original Chua's circuit to render its control much easier [40].
Therefore, the delay in natural instruments can be viewed the same way.
In the case of the clarinet the reed can be considered as massless, i.e. the nonlinearity
is instantaneous
, and the system is described by (1). In the case of the flute [41], [42], it seems
that there is essentially one nonlinearity but two feedback loops with different
delays. This still complies with equation (1) but the open loop transfer function
becomes a complicated combination of the influences of the two loops. In the case of the trumpet
or of the voice, the reed can no more be considered as massless, i.e. the nonlinearity
is not instantaneous [43]. Therefore, the model now consists of the nonlinear coupling of a feedback loop and a mass oscillating with one or several degrees or freedom
[44]. It seems that some important characteristic of the timbre of each of the previous
classes of instruments, particularly in the transients, is related to the corresponding basic structure as just described above.
3.3 Some results about the nonlinearity and the linear element
Let us consider a map g such that the origin O is a fixed point, with a slope s1 about O and a slope s2 at some distance from O. Two important characteristics of the sound, transient onset
velocity and richness, are controlled by the slopes s1 and s2 [38]. A.N. Sharkovsky [45] has shown analytically that the time-delayed Chua's circuit
exhibits a remarkable period-adding phenomenon. In some regions of the (s1, s2) plan, the system has a stable limit cycle with period respectively 2, 3, 4, etc.
In between every two consecutive periodic regions the system exhibits a chaotic
behavior. We have shown that in the k-periodic regions, the harmonics k, 2k, 3k etc.
are absent [32]. This is an interesting result from a musical
point of view as well. The map g can be simulated by a polynomial nonlinearity, or better, a rational function nonlinearity
[33].
Let us consider our system (1). The open-loop transfer function is:
where H is the transfer function of h. The system does not oscillate if the limit
value 1/s1 lies to the left of all intersections of the Nyquist plot of G(jw) with the real axis. In the absence of the filter h or with a zero phase filter,
the delay leads to an oscillation frequency f0 = 1/2t. On the other hand the supplementary delay added by the filter h can move the oscillation
frequency away from 1/2t.
The intersections of G(jw) with the negative real axis define the frequencies of the modes of the instrument.
The system generally oscillates at the frequency of the strongest mode. If the argument
of G(jw) is different from zero, the modes can be moved away from harmonic positions. We
have shown that, when simultaneously g is not odd symmetric and there is a filter h, then even partials can appear. When
g is not very far from odd symmetry, if the argument of G(jw) is zero then the even harmonic partials are of small amplitude (clarinet). If the
argument of G(jw) is different from zero, then the even harmonic partials can be of large amplitude
(saxophone). The case where the argument of G(jw) is different from zero can lead to surprising results which look like quasi-periodicity
or inharmonicity.
Physical models often have more than one oscillating solution for a given setting
of their parameters. If the solution reached by the system is unpredictable, the
usage of a physical model will be rather difficult in a real-time musical performance.
Therefore, another of our goals is to study and limit the numerous stable solutions of
physical models. We have found that a path toward such a goal could eventually be
based upon the low pass character of the linear element [33].
3.4 Hopf bifurcation and periodic solutions
The Graphical Stability Test given above is valid as long as we can partition our
system into a instantaneous
nonlinearity and a linear feedback loop. Since we are interested in periodic oscillation,
we mention a more general method which allows us to prove of the existence of a periodic
solution when it occurs, and provides estimates for the frequency and amplitude of the oscillation. It also applies to an even more general class of systems encountered
with sophisticated physical models of instruments. The Graphical Hopf Theorem [36]
and its algebraic version apply to a nonlinear multiple feedback loop system where g is C4. Then under certain conditions on the nonlinearity g and the open-loop transfer function G, this theorem provides the existence, uniqueness
and test for the unique stable periodic solution required in our application.
3.5 Chaotic signals and musical applications
We have simulated our systems on a workstation [46] and we have written a graphical-user interface allowing for easy experimentation with
the parameter values and display of the output signal, of its Fourier Spectrum,
etc. Harmonic sounds and, in chaotic regions, noisy
sounds are obtained. Noisy sounds exhibit the simultaneous presence of harmonic components
and noise in the signal [32]. This is very interesting since this occurs for the
majority of natural instruments and since this is relatively difficult to model in
a way which is useful for musical purposes. The noisy and sinusoidal components coming
from our system are correlated and fuse
together. The control on the "proportion
of chaos" in the signal provides musicians with the possibility to control precisely
the amount
of chaotic
or noisy
components which they introduce in the signal [33]. Chaotic sounds, even when they
can be extremely noisy, keep some of the harmonic structure derived from the fundamental
frequency that corresponds to the delay t. The persistence of the harmonic structure in the chaotic signal is heard as a pitch
of the noisy sound! Moreover, the value of the pitch and the amount of tonal sound
perceived as compared to noise can easily be controlled. Finally, the gradual passage
from one periodic region to the next gives very innovative sounds, changing progressively from harmonicity to chaos but keeping at will more or less of the harmonic structure
induced by the delay line.
4. SPECTRAL ENVELOPES AND INVERSE FFT SYNTHESIS
4.1. Introduction
Many musical sound signals may be described as a combination of a pseudo-periodic
waveform and of colored noise [14]. The pseudo-periodic part of the signal can be
viewed as a sum of sinusoidal components, named partials
, with time-varying frequency and amplitude. Some of the first attempts at sound synthesis
were based on the method called additive synthesis,
that is the summation of time-varying sinusoidal components [5]. This signal modelling
approach inherits a rich history of signal processing techniques. As an example,
we have developed methods to automatically analyze sounds in terms of partials and
noise that can then be applied directly to additive synthesis [47]. In the sinusoidal
model, harmonic or inharmonic partials are easy to synthesize and partial parameters (frequency
and amplitude) can easily be mapped into the human perceptual space, are meaningful
and easily understood by musicians. Thus, additive synthesis is accepted as perhaps the most powerful and flexible method. However, its development and use have been
discouraged by severe drawbacks. This is why we have developed a new additive synthesis
method based on spectral envelopes and inverse Fast Fourier Transform, named FFT-1 [9], [10].
The first drawback of the classical oscillator method
of additive synthesis is the computation cost: a low pitch piano note that can sometimes
have more than a hundred partials. The FFT-1 method provides a gain of 10 to 30 versus the classical method. The second drawback of the oscillator method is the difficulty
of introducing precisely controlled noisy components which are very important for
realistic sounds and musical timbres. Our method makes noisy components easy to describe and cheap to compute. Last but not least, controlling hundreds of sinusoids
is a great challenge for the computer musician. A scheme based on spectral envelopes
renders this control more simple, direct and user friendly.
4.2. The oscillator method
Additive synthesis is usually done with a bank of sinusoidal oscillators. Let us
call J the number of partials of the signal to be computed at a certain time, that
is at a certain sample n. Let the frequency, the amplitude and the phase of the
jth partial, 1 j J, be named respectively fj, aj, and yj. More precisely, since they are functions of time, i.e. of n, we write them fj[n], aj[n], and fj[n]. Usually, fj[n] and aj[n] are obtained at each sample by linear interpolation of the breakpoint functions
which describe the evolution of fj and aj. The phase is redundant with the frequency and for simplicity we ignore it here.
For a sampling rate Sr, the jth partial is therefore defined by:
, with
and the signal to be computed is: .
In the oscillator method, the instantaneous frequency and amplitude are calculated
first, by interpolation. Then the phase Fj[n] is computed. A table lookup is used to obtain the sinusoidal value of this phase
and the sinusoidal value is multiplied by aj[n]. Finally cj[n] is added to the values of the j-1 previous partials already computed. The computation
cost of the oscillator method is of the form a.J per sample, where a is the cost of at least 5 additions, 1 table lookup, 1 modulo 2p, and 1 multiplication. Even though it is possible to modify the sinusoidal oscillator
in order to produce large or narrow band-limited random signals by combined amplitude
and phase modulation, this has rarely be done.
4.3. Inverse Fast Fourier Transform additive synthesis
In our method [9], the computation of the partials is not done by a bank of oscillators
but by an Inverse Fast Fourier Transform (FFT-1) of short term spectra (STS) Sl[k] into the corresponding time-domain signals swl[n]. To better explain the method, let us consider first the analysis by FFT such
as used in the Phase Vocoder [22] which is familiar to many people: The signal s[n]
is first cut into successive frames sl[m] which overlap. Each frame is multiplied by a so called window
signal w[m] such as the Hanning window. With an appropriate choice of w and of the
overlapping factor d, s[n] can be exactly reconstructed from the windowed frames
swl[m] by the so called overlap-add
method [23]. Then the complex STS of each frame is computed by FFT, leading to a
succession of complex valued spectra Sl[k]. Now the FFT-1 method is easily understood as just the inverse process: Start from the spectra Sl[k], compute their Inverse FFT to get the swl[m] and overlap-add them in order to obtain the time-domain signal s[n].
For reasons of efficiency a partial is represented in a spectra by a few points
of non-negligible magnitude, typically K=7. To build the contribution of a given
partial in the STS Sl[k], we only have to compute these K spectral values and add them to Sl[k]. If N is the size of a frame (typically N=256), we find here a gain in computation
roughly proportional to N.d/K = 36. Other implementation optimisations are given
in [9], [10] and [48]. Simply note that the number K of significant values in the
spectrum W of the window can be adjusted at best by use of an auditory model. As a simple
example, partials with low amplitude require a smaller K.
In our FFT-1 synthesis method, we can introduce noise components precisely in any frequency band
narrow or wide and with any amplitude. We simply add in the STS under construction,
at proper places, bands of STS's of w windowed white noise signal. This is easy and
inexpensive if the STFT has been computed and stored in a table before the beginning
of the synthesis stage. There exist analysis methods, [49], [25], [47], [6],
to separate the noise components from the sinusoidal ones, allowing the preparation
of data for noise component STS's.
The FFT-1 algorithm has been implemented on the MIPS RISC processor of the SGI Indigo [46],
[48]. In terms of cost, one of the critical elements is the construction of a STS
Sl. By careful coding, many of the performance enhancing features of modern processors
[50], [51] may be used to efficiently implement the critical inner loop. The SGI
Indigo implementation takes advantage of the ability of the R4000 to overlap the
execution of integer address operations, floating point additions and multiplications, delayed
writes and multiple operand fetches into cache lines. It is interesting that the
table for the oversampled window transform is small enough to fit into on-chip data
caches of modern processors. This is not the case for the larger sinusoid table required
in standard oscillator based additive synthesis. A detailed comparison of different
implementations is given in [48].
4.4. Control by spectral envelopes
In usual implementations of additive synthesis, fj[n] and aj[n] are obtained at each sample by linear interpolation of breakpoint functions of time
which describe the evolution of fj and aj versus time. When the number of partials is large, control by the user of each individual
breakpoint function becomes impossible in practice. But in the case of the voice
and of certain instruments, a source filter model [7], [8] is a better representation of some of the behavior of the partials. Then the amplitude of a component is a
function of its frequency, i.e. the transfer function of the filter [7], [15]. That
is, the amplitude aj depends of some spectral function, named spectral envelope
, at the frequency fj. The amplitude variation induced by frequency variation such as vibrato can eventually
be very large [13]. To take into account these amplitude variation, a breakpoint
function of time may have many breakpoints.
Moreover, the amplitude of a partial is then not an intrinsic property of the timbre
independent of other characteristic such as fundamental frequency. Amplitudes stored
in breakpoint functions of time, also disallow modifications of fundamental frequency
or vibrato.
On the contrary, a spectral envelope can be described as an analytical function
of a few parameters, whatever number of partials it is used for. It can vary with
some of its parameters for effects such as spectral tilt or spectral centroid changes
known to be related to loudness and brilliance. Spectral envelopes can be obtained automatically
by different methods e.g. Linear Prediction analysis [8]. If the amplitudes and frequencies
of the partials are already known from sinusoidal analysis, the Generalized Discrete Cepstral analysis [52], [53], provides reliable envelopes, the smoothness
of which can be adjusted according to the order. We can use spectral envelopes defined
at specific instants, for example the beginning and the end of the attack, sustain
and decay of a note, etc. Then at any instant, the spectral envelope to be used is
obtained by interpolation between two successive recorded envelopes [21]. Frequencies,
phases and noise components also can be described by similar envelopes that we call
generalized spectral envelopes
. [19]
4.5. Applications
Our new method of additive synthesis by FFT-1 [54] brings a solution to the three main difficulties of classical additive synthesis.
The processing time (calculation cost) can be divided by a large factor. It is easy
and not costly to introduce noise precisely in any frequency band narrow or wide,
and with any amplitude. Control is made easier by use of spectral envelopes
instead of the time-functions classically used for additive synthesis. Under the
name F*A*R [11], the company OberheimDigital has developed a real time multi-timbral
instrument based on FFT-1. This instrument has all the possibilities of present day synthesizers, plus many
others such as the precise modifications of sampled sounds, speech and singing voice
synthesis.
5. The recreation of the voice of a castrato: Farinelli
5.1. Introduction
The recreation of a castrato voice by the means of sound analysis, processing and
synthesis has been done for a musical film about Farinelli, the famous castrato of
the eighteenth century [12]. The film, realized by Gérard Corbiau, and the CD of
the soundtrack produced by AUVIDIS bring back to life a repertoire which could not be sung
anymore. The musical consultant of the film, Marc David, has recovered unedited scores
from the French National Library. Forty minutes of processed singing voice have been
produced with a high level audio quality as needed for a CD.
Castrati were generally well known for the special timbre
of their voices. Their voices had not changed with puberty and, with maturity, castrati
lung capacity, chest's size, physical endurance and strength were generally greater
than those of normal males. Farinelli could sustain a note longer than one minute
and he could sing long phrases of more than two hundred notes without seeming to take
a breath. Their small and supple larynx along with their short vocal cords, allowed
them to vocalise in three octaves and a half and to sing with a great vocal flexibility, to sing rapidly large intervals, cascading scales and trills. All the more that
castrati were selected among the best child singers and trained very intensively.
Castrati's specific repertoire takes their high level singing technique into account,
therefore it is extremely difficult to sing. There is practically no recorded references.
The last castrato has recorded less than one hour of singing voice on wax cylinders. This historical recording has little technical utility due to its poor quality.
Nevertheless, we have taken into account the physical characteristics of the vocal
production system of the castrati, the global aesthetic of the historical recording
and descriptions found in the literature. The voice has also been designed according
to the wishes of the film and music producers.
5.2. Recording and editing of the voice
Two voices have been chosen, a counter-tenor, Derek Lee Ragin and a coloratura-soprano,
Eva Godlevska, with similar and good baroque singing techniques. The recording has
been made in the concert hall "L'Arsenal" in Metz, France by the Les Talens Lyriques
Orchestra conducted by Christophe Rousset. Due to artistical constraints, sound engineer
J.C. Gaberel was obliged to record voice and orchestra simultaneously, despite the
evident interest of a multitrack recording. One consequence is the presence of orchestra components at 20 to 30 dB under the mean average level of the singing voices.
This constraint obliged us to built very robust processing methods. The recording
was made on a Nagra IV-D machine with a precision of 20 bits. The remarkable editing,
sometimes note by note, has been made by J.C. Gaberel on a Sonic Solution machine.
5.3. Processing
First, as one of the artistic specifications was to make the finally processed voice
sound close to the counter-tenor one, we modify the soprano-coloratura parts to match
the counter-tenor timbre. This voice morphing
, constitutes the main and critical step of the scheme. Secondly, we give the voice
a more juvenile aspect by using global modifications. For instance, we attenuate
some high frequency bands to reduce the kind of breathiness found in Derek Lee Ragin's
voice. We
also make the voice sound brighter by modifying the spectrum envelope. Vowel timbres
not only depend upon phonemes, but also upon pitch and intensity. Thus, a reference
data base composed of all the combinations phoneme-pitch-intensity of the two voices
had to be set up.
Then the musical phrases to be processed must be segmented and labelled in terms
of singer, phoneme, pitch, power, begin and end times. Precise fundamental frequency
estimation is made by the algorithm described in [55]. A first segmentation pass
is performed automatically on the fundamental frequency evolution by a method recently developed
in our team [56], then begin and end times and labels of the vowels are adjusted
by hand in a second pass. Our voice morphing first consists in modifying the spectral envelope of the soprano voice to match that of the counter-tenor voice. This is achieved
by frequency domain filtering (the phase vocoder S.V.P., [22]). As scores are written
for castrati, most of the songs are high-pitched, and it's a common fact that in this case the frequency response of the vocal tract is poorly estimated. Then, voice
morphing would not reach the target timbre and the transformation could emphasize
some partials of the orchestra. One could imagine computing a Discrete Cepstrum envelope [57]. But the soprano-coloratura often changes continuously the shape of her vocal
tract when singing a cascade of notes on the same vowel. In addition, the tremolo
correlated to the vibrato effects makes spectral envelope estimation even more difficult.
Under such conditions, instantaneous spectrum envelopes becomes useless. In the middle
range frequencies (2.5 to 5 kHz), the spectrum envelope shape remains constant in
time for a given vowel note, it's global amplitude is modulated and this effect is
emphasized by the loudness. It follows that average spectrum envelopes are a good mean
to cope with these fluctuations. In the upper range frequencies (greater than 5 kHz),
the average level is perceptually more important than the precise shape of the spectrum. Therefore, we use the shape of the envelope weighted by a coefficient in order to
control the breathiness of the voice.
We build the filter frequency response in low frequencies by using additive synthesis
parameters [47]. These parameters, amplitudes and frequencies of the partials, are
used to impose on the processed sound the same relative amplitudes between harmonics
than those of the corresponding phoneme stored in the data base. Moreover, frequency
parameters are used to draw a frequency response which only acts on the vicinity
of each voice partial, in order to let the partials of the orchestra unchanged. The
width of each frequency response active band is computed according to the frequency deviation
due to the vibrato in the temporal window used by the phase vocoder. The phase vocoder
represents the sound without any loss of information and allows the application of any precise frequency response.
6. Conclusion
The developments detailed in this presentation exemplify new possibilities in sound
analysis and synthesis. On one hand, theoretical results have been obtained, e.g.
in the domain of physical models. These results allow the construction of better
computer instruments for musicians, by improving the versatility and the control of these
instruments, and by offering new sonic possibilities such as chaotic sounds.
On the other hand, applications which where impractical a few years ago, are now,
not only possible in research centers but also made available for a large public.
This is the case for the additive synthesis revivified by the FFT-l method. Experience with implementations on affordable desktop workstations has led
to a real-time multi-timbral instrument based on FFT-l. It has all the possibilities of present day synthesizers, such as the sound quality
of sampling, plus many others such as precise and unlimited modifications of recorded
sounds, speech and singing voice synthesis. The recreation of a new singing voice,
which would have been considered out of reach until recently, has been made possible
by improved processing techniques, precise sinusoidal partial analysis and frequency
domain filtering for instance. This result not only permits to bring back to life
the castrato repertoire which could not be sung anymore, but also reaches a public even
larger than amateur musicians. There is no doubt that other techniques will have
similar developments and success in the near future. Music will see a widespread
use of such computer generated sounds and computer assisted composition.
Figure 1 :
References
1.
"Modèles Physiques, Création Musicale et Ordinateurs
", Proceedings of the Colloquium on Physical Modeling, ACROE, Genoble,France, Oct.
1990, Editions de la Maison de Sciences de l'Homme, Paris, France, 1992.
2.
Keefe, D., "Physical Modeling of Wind Instruments", Computer Music Journal, MIT Press,
Vol 16 No. 4, pp. 57-73, Winter 1992.
3.
Cook, P., "A meta-wind-instrument physical model", Proc. International Computer Music
Conference, San Jose, pp. 273-276, Oct. 1992.
4.
. Fletcher, N.H., Rossing, T. D., The Physics of Musical Instruments
, Springer Verlag, 1991.
5.
Risset, J.C., Mathews, M.V., "Analysis of musical-instrument tones", Physics Today,
22(2):23-30, Feb. 1969.
6.
Depalle, Ph., Garcia, G., Rodet, X. "Tracking of partials for additive sound synthesis
using hidden Markov models", IEEE ICASSP-93 ,
Minneapolis, Minnesota, Apr. 1992.
7.
Rodet, X., Depalle Ph., Poirot, G., "Speech Analysis and Synthesis Methods Based on
Spectral Envelopes and Voiced/Unvoiced Functions", European Conf. on Speech Technol.,
Edinburgh, U.K., Sept. 87.
8.
Depalle Ph., "Analyse, Modélisation et Synthèse des sons fondées sur le Modèle Source-Filtre",
Thèse de Doctorat de l'Université du Maine, Le Mans, Déc. 1991, 175p.
9.
Rodet, X., "Spectral Envelopes and Inverse FFT Synthesis", Proc. AES, San Francisco,
1992.
10.
Depalle, Ph., Rodet, X. "A new additive synthesis method using inverse Fourier transform
and spectral envelopes", Proc. of ICMC, San Jose, California, Oct. 1992.
11.
"F*A*R Fourier Analysis Resynthesis, Tecnology Dossier", OberheimDigital, 1994
12.
Depalle Ph., G. Garcia, Rodet, X., "A virtual castrato (?!)", Proc. of ICMC, Copenhagen
Oct. 1994.
13.
Bennett, G., Rodet, X., "Synthesis of the Singing Voice", in Current Directins in Computer Music Research
, ed. M.V. Mathews & J.R. Pierce, MIT Press, 1989.
14.
Rodet, X., "Analysis and Synthesis Models for Musical Applications", IEEE Workshop
on application of digital signal processing to audio and acoustics, Oct. 1989, New
Paltz, New-York, USA.
15.
Rodet, X., Depalle Ph., "Use of LPC Spectral Estimation for Analysis, Processing
and Synthesis", 1986 Workshop on Appl. of Digital Sig. Process. to Audio and Acoust.,
New-Paltz, New York, Sep. 1986
15.
Cointe P., Rodet X., "FORMES: an Object & Time Oriented System for Music Composition
and Synthesis", Conf. Rec. 1984 ACM Symp. on Lisp and Functional Programing, Austin,
Texas, Aug. 1984.
17.
Rodet, X., Barrière, J.B., Potard, Y., "The Chant Project : from the synthesis of
the sung voice to synthesis in general",Computer Music Journal MIT Press, fall 84.
18.
Rodet, X., Depalle Ph., "Synthesis by Rule: LPC Diphones and Calculation of Formant
Trajectories", IEEE-ICASSP, Tampa, Fl., March 85.
19.
Rodet, X., Depalle Ph., Poirot, G., "Diphone Sound Synthesis", Int. Computer Music
Conference, Koeln, RFA, Sept. 88.
20.
Depalle Ph., Rodet X., Poirot,G., "Energy and Articulation Rules for Improving Diphone
Speech Synthesis", Proc. ESCA Int. Conf. on Speech Synthesis, Autrans,France, Sept.
90.
21.
Depalle Ph., Rodet, X., T. Galas, G. Eckel "Generalized Diphone Control", Proc. of
ICMC, Tokyo Sept. 1993, pp. 184-187.
22.
Depalle Ph. and Poirot, G., "A modular system for analysis, processing and synthesis
of sound signals", Proc. of the Int. Comp. Music Conf., Montreal, Canada, 1991.
23.
Moulines, E., Laroche, J., "Non-parametric methods for pitch-scale and time-scale
modification of speech", Speech Communication 16 (1995) 175-205.
24.
Smith, J.O., "Efficient simulation of the reed-bore and bow-string mechanism", Proc
1986 Int. Computer Music Conf., P. Berg, eds., Computer Music Assoc., San Francisco,
pp. 275-280, 1986.
25.
Serra, X. "A system for sound analysis/transformation/synthesis based on a deterministic
plus stochastic decomposition", Philosophy Dissertation, Stanford University, Oct.
1989.
26.
Depalle Ph., Rodet, X., D. Matignon, "State-Space Models for Sound Syntesis", IEEE
ASSP Workshop on Appl. of Digital Sig. Process. to Audio and Acoust., Mohonk, New
Platz, New York, Nov. 1991
27.
Matignon, D., Depalle, Ph., Rodet, X., "State space models for wind-instrument synthesis",
Proc. International Computer Music Conference, San Jose, pp. 273-276, Oct. 1992.
28.
Rodet, X., Depalle Ph., "Modèles de Signaux et Modèles Physiques d'Instruments",
Proc. of the Colloqu. on Physical Modeling, Genoble, France, October 1990 Editions
de la Maison de Sciences de l'Homme, Paris, France, 1992.
29.
McIntyre, M.E. et al., "On the Oscillations of Musical Instruments", JASA 74 (5),
Nov. 83
30.
Hale, J.K., "Dynamics and Delays", in Delay Differential Equations and Dynamical
Systems, Proc., 1990, S. Busenberg & M. Martelli (Eds.), Lecture Notes in Mathematics
1475, Springer Verlag, 1991.
31.
Madan, R.N., "Learning chaotic phenomena from Chua's circuit", Proc. 35th Midwest
Symp. on Circuits and Systems, Whasington, D.C., August 9-12, 1992.
32.
Rodet, X., "Flexible yet Controllable Physical Models: a Nonlinear Dynamics Approach",
Rodet, X., Proc. Int. Computer Music Conference, Tokyo, 10-15 Sept. 1993.
33.
Rodet, X., "Stability/Instability of Periodic Solutions and Chaos in Phyical Models
of Musical Instruments", Proc Int. Computer Music Conference, Copenhaguen, Sept.
1994.
34.
Rodet, X., Depalle Ph., "A physical model of lips and trumpet", Proc. International
Computer Music Conference, San Jose, pp. 132-135, Oct. 1992.
35.
Ivanov, A.F., Sharkovsky, A.N., "Oscillations in Singularly Perturbed Delay Equations",
in Dynamics Reported
, C. Jones, U. Kirchgraber & H.O. Walther edit., Springer Verlag, pp. 164-224, 1992.
36.
A. I. Mees, "Dynamics of feedback systems
", Wiley, 1981.
37.
Chow, S. N., Green, D. Jr., "Stability, Multiplicity and Global Continuation of Symmetric
Periodic Solutions of a Nonlinear Volterra Integral Equation", Japan Journal of Applied
Mathematics, Vol. 2, No. 2, pp. 433-469, Dec. 85.
38.
Rodet, X., "Models of Musical Instruments from Chua's Circuit with Time Delay", IEEE
Trans. on Circ. and Syst., Special Issue on Chaos in nonlinear electronic circuits,
Sept. 1993.
39.
Chua, L. O., Lin, G.-N., "Canonical Realization of Chua's Circuit Family", IEEE trans.
Circuits & Syst., Vol. CAS-37 (July. 1990) No. 7, pp 885-902.
40.
Rodet, X., "Applications of Chua's Circuit to Sound, Music and Musical Instruments",
Proc. 1993 Proc. 1994 Int. Symp. on Nonlinear Theory and its Applications, Hawai,
Dec. 1994.
41.
Verge, M.P., "Jet Oscillations and jet drive in recorder-like instruments", Acta
Acustica 2 (1994), pp 403-419.
42.
Rodet, X., "Basic structure and real-time implementation of J.M. Verge's flute model",
internal report, IRCAM, Mai 1995.
43.
Rodet, X., Steinecke, I., "One and two mass models oscillations for voice and instruments",
unpublished internal report, IRCAM, March 1994.
44.
Rodet, X., "One and two mass models oscillations for voice and instruments", to appear in
Proc Int. Computer Music Conference, Banth, Canada, Sept. 1995.
45.
Sharkovsky, A.N., Mastrenko, Yu., Deregel, Ph., Chua, L.O., "Dry Turbulence from
a time-delayed Chua's Circuit", in J. of Crts., Syst. and Comp., Special Issue on
Chua's Circuit: a Paradigm for Chaos, Vol. 3, No. 2, June 1993.
46.
Freed, A., "Tools for Rapid Prototyping of Music Sound Synthesis Algorithms and Control
Strategies", Proc. Int. Comp. Music. Conf., San José, CA, USA, Oct. 1992
47.
García, G., "Analyse des signaux sonores en termes de partiels et de bruit. Extraction
automatique des trajets fréquentiels par des modèles de Markov cachés", Mémoire de
DEA en automatique et traitement de signal, Orsay, July 1992.
48.
Freed, A., Rodet, X., Depalle Ph., "Synthesis and Control of Hundreds of Sinusoidal
Partials on a Desktop Computer without Custom Hardware", Proc. ICMC, Tokyo, 1993.
49.
McAulay, R.J., and Quartieri, Th. F., "Speech analysis/synthesis based on a sinusoidal
representation", IEEE Trans. on Acoust., Speech and Signal Proc., vol ASSP-34, pp.
744-754, Aug 1986.
50.
Hennessey, J. L., Patterson D. A., 1990, "Computer Architecture: A Quantitative Approach",
Morgan Kaufmann, Palo Alto, CA.
51.
Lee, E. A., "Programmable DSP Architectures", IEEE ASSP Magazine, October 1988
52.
Galas, T., Rodet, X., "A parametric Model of Speech Signals:Application to High Quality
Speech Synthesis by Spectral and Prosodic Modifications", Proc. ICSLP, Kobe, Japan,
1990, p.801-804.
53.
Galas, T., Rodet, X., "Generalized Functional Approximation for Source-Filter System
Modeling", Proc. Eurospeech, Genova, 1991, p.1085-1088.
54.
Freed, A., Goldstein, M., Goodwin, M., Lee, M., McMillen, K., Rodet, X., Wessel, D.,
Wright, M., "Real-Time Additive Synthesis Controlled by a Mixture of Neural-Networks
and Direct Manipulation of Physical and Perceptual Attributes", Proc Int. Computer
Music Conference, Copenhaguen, Sept. 1994.
55.
Rodet, X., Doval, B., "Estimation of Fundamental Frequency of Musical Sound Signals",
IEEE ICASSP, May 1991, Toronto.
56.
Cerveau, L., "Segmentation de phrases musicales à partir de la fréquence fondamentale",
Rapport de DEA ATIAM, IRCAM, Juin 1994.
57.
Galas T., Rodet X., "A new power spectrum estimation method: applications to acoustic
signals", IEEE Workshop on Appl. of Digit. Sig. Process. to Audio & Acoust., Oct.
1989, New Paltz, New-York, USA.
____________________________
Server © IRCAM-CGP, 1996-2008 - file updated on .
____________________________
Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le .