|Serveur © IRCAM - CENTRE POMPIDOU 1996-2005.
Tous droits réservés pour tous pays. All rights reserved.
Macintosh Graphical Interface and Improvements to Generalized Diphone Control and Synthesis
Xavier Rodet, Adrien Lefevre
Proceedings of the International Computer Music Conferences, Hong Kong, 1996
Copyright © Ircam - Centre Georges-Pompidou 1996
Generalized diphone control is a powerful means of building a musical phrase
from dictionaries of analysed sound units by concatenating and articulating
them. The Diphone program, developed at IRCAM for diphone control, has been
improved with fundamental frequency, noise components, spectral
envelopes and parallel sequences. A new graphical user interface on Macintosh
is presented. It communicates with Diphone through a simple command language.
It allows for the managing dictionaries, the building and the articulation of
parallel sequences of diphones, and for the editing of parameter values.
Generalized diphone control has been presented in [Rodet 93]. A spoken sentence
can be modelled as a succession of transient sounds (called diphones). Any
sentence can be reconstructed from a dictionary of diphones. In order that
duration and fundamental frequency can be modified, an analysis-synthesis
model, such as source-filter [Rodet 88], is used. Therefore, it is not the
diphone sound signal itself which is stored in the dictionary, but the
corresponding control signals of the analysis-synthesis model. A
sentence is obtained by concatenation of control signals of a sequence of
diphones. In [Rodet 93] we have extended the concept of diphone control to
musical sounds in general. A dictionary can include any segment of sound
considered as an atom for the musical usage in view. Such an atom, in a
representation like additive representation or source-filter representation, is
called a segment. Diphone control does not rely on a particular
synthesis technique. We have focused on the additive model, i.e. a sum of
sinusoidal partials with time-varying frequencies and amplitudes defined at
frame times. A segment is a data structure containing, in the
additive case, the frame times and the associated parameters for a
segment of sound. A musical phrase is obtained by concatenating basic segments
to produce a new segment from which a sound signal is computed by the additive
synthesizer (Fig. 2). In [Rodet 93], we have detailed the Diphone
program written at Ircam for diphone control and synthesis. We now present
recent improvements to diphone control and synthesis, and a Graphical User
Interface with novel characteristics, both from the conceptual and the
implementation point of view.
A segment has to contain also its original (time-varying) fundamental
frequency, say F0(t), since a desired fundamental frequency trajectory, say
G0(t), is obtained by applying the transposition factor G0(t)/F0(t) to the
sinusoidal partial frequencies (Fig. 2). For music written in terms of notes
with well defined pitch, the segment paradigm applies nicely to each segment
with constant written pitch. Since notes can be largely independent of phones
or other timbre attributes, a musical phrase has to be defined by two sequences
of segments, one for the pitch, the other for timbre attributes such as phones.
We say that these two sequences are parallel (Fig. 3) since none of each
should necessarily impose its metric structure on the other. On the contrary,
vibrato tends to be synchronized on notes. In consequence, vibrato frequency
and excursion can be defined by the same segments as pitch or by sub-segments
of pitch segments. Finally, G0(t) is computed by applying the defined vibrato
to pitch values given by pitch segments. Other articulations, such as
portamento or loudness are implemented in the same way.
Random components of sounds, like flute noise or voice fricatives, are not
correctly represented as sinusoids with parameters recorded in a segment, but
can be represented as white noise filtered through a time-varying spectral
envelope. To be used at the synthesis stage, the values of the noise spectral
envelope at all frame times are also stored in a segment (Fig. 2). In a
preliminary stage, dictionaries of diphones, i.e. of segments, have to
be constituted (Fig. 1). First, an additive+noise analysis [Depalle 93] is
performed on the sound recordings. Secondly, the analysis data are segmented
according to the segment time limits chosen by the musician. Finally the
segments are stored in dictionaries.
Another improvement brought to the Diphone program is the use of spectral
envelopes for sinusoidal partials as well. A source-filter model is well suited
for certain classes of sounds such as the voice. In this case, sinusoidal
partial amplitudes are determined by the value of the spectral envelope at the
frequencies of the partials. At the stage of the synthesis of a segment, these
amplitudes have to be recomputed when the fundamental frequency is modified.
Therefore, the spectral envelopes of sinusoidal partials have to be stored also
in a segment and used at the synthesis stage to compute the amplitudes of the
partials (Fig. 2).
Fig. 1: Analysis data are segmented into segments stored in dictionaries
Finally, the Diphone concatenation and articulation program is given a textual
interface in the form of a simple Command Language opening access to all its
facilities. In this way, Diphone can easily be used an tested separately and is
totally independent of any GUI which is usually platform dependent.
Fig. 2: Concatenation and synthesis of a sequence of segments
3. A Macintosh Graphical User Interface
A Macintosh Graphical User Interface (GUI), named MacDiph has been built to
provide an easy access to the Diphone program. It is tested on PowerPC and
Macintosh-68K platforms. It is written in C++, compiled with
Metrowerks-CodeWarrior IDE 1.4, and built on the Metrowerks-PowerPlant set of
classes. From these classes, two groups of classes have been derived. They are
not specific to MacDiph but designed for general graphic programming. The first
group is aimed at displaying and editing graphs and objects, such as diphone
sequences. The second group of classes is aimed at displaying and editing tree
structures, such as dictionaries of diphones and their constituents, i.e.
instruments, composite segments and basic segments. Finally, a break-point
function editor is being built for the control-signals contained in segments.
The different tools provided by these classes are, as much as possible,
compliant with the Macintosh Human Interface Guidelines (Inside
Macintosh). In particular, they offer copy, cut and paste, as well as drag and
drop facilities, and follow the Wysiwyg guidelines.
We have taken care to separate the GUI from the Diphone program itself. MacDiph
communicates with Diphone by using the Command Language mentioned above. This
permits also to have a version of Diphone running on a fast Unix platform and
be driven, through the network, by MacDiph running on a relatively slow
Macintosh. This connection is implemented with sockets. Segments also can be
read and written by MacDiph for display and editing. Since segments can be huge
data structures and, on the contrary of the Command Language, cannot be handled
by users, this is done through binary streams.
MacDiph provides the usual functions of a data base for a set of diphone
dictionaries, i.e. browsing through different dictionaries, displaying their
content, modifying them, selecting instruments and segments, constituting new
dictionaries and saving them (Fig. 3).
The drag-and-drop paradigm is used in
order to move segments between dictionaries and to build various parallel
sequences. A sequence can contain basic segments and sub-sequences. Segments
and sequences are represented as graphical objects (Fig. 4) directly displaying
their characteristics, i.e. duration, center, interpolation portion between
successive segments, loudness and articulation speed. Click-and-drag allows
easy change of these characteristics in a Wysiwig style.
Fig. 3: Two parallel sequences and Fundamental frequency evolution
Parameter evolution, such as sinusoidal partial frequency or fundamental
frequency evolution, as stored in segments or as computed from a sequence, can
be displayed and edited in graphical windows (Fig. 5) placed under the sequence
windows for easy visualisation of synchrony. Modify, cut, copy and paste are
fully supported on sequences and on parameter evolution.
Fig. 4: Management of a dictionnary of instruments, segments and basic segments
MacDiph and Diphone constitute a promising tool for musicians. On one hand,
diphone control offers new possibilities for precise and powerful control of
synthesis, which could not be obtained in another way. The ability to build and
articulate complicated sequences of segments from diverse origins appears to be
an attractive feature. On the other hand, MacDiph implements a new direct
representation and handling of segments in terms of intuitive graphical
objects. As opposite to discrete values, such as notes, the control of
continuous quantities [Rodet 84] has always been a difficulty in computer
music. MacDiph and Diphone should bring some help in that domain by
establishing a close connection between discrete and continuous
[Rodet 84] X. Rodet, P. Cointe, Formes: Compostion and Scheduling of
Processes, C. M. J., MIT Press, Vol. 8, No 3, Fall 84.
[Rodet 85] X. Rodet, P. Depalle, Synthesis by Rule: LPC Diphones and
Calculation of Formant Trajectories, IEEE-ICASSP, Tampa, Fl., March 85.
[Rodet 88] X. Rodet, P. Depalle, G. Poirot, Diphone Sound Synthesis,
Int. Computer Music Conference, Koeln, RFA, Sept. 88.
[Depalle 93] P. Depalle, G. García & X. Rodet, Tracking of
partials for additive sound synthesis using hidden Markov models, IEEE
ICASSP-93 , Minneapolis, Min., Apr. 1993.
Server © IRCAM-CGP, 1996-2008 - file updated on .
Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le .