IRCAM - Centre PompidouServeur © IRCAM - CENTRE POMPIDOU 1996-2005.
Tous droits réservés pour tous pays. All rights reserved.

Context Effects in Timbre Space

Sophie Donnadieu, Stephen McAdams, Suzanne Winsberg

ICMPC 94, Liège, 1994
Copyright © ICMPC 1994


Multidimensional analysis techniques attempt to account for the mental representation of timbre according to a "timbre space" in which the distance separating the sound objects corresponds to their degree of perceptual dissimilarity. In a first experiment, a four-dimensional space was found based on a multidimensional analysis using the program CLASCAL. Nevertheless, this king of model is often criticized for not taking into account certain cognitive effects that might influence the subjective judgments on which it is based. In a second experiment, such effects were evaluated with respect to the space obtained previously. We were also able to verify qualitatively the models of Tversky (1972) and Sjöberg (1972) according to which similarity is defined as a function of the match between common and distinctive features of two compared objects or by reference to a common class and a universe of classes, respectively. Three sets of nine synthetized timbres that varied little among one of the axes of the original space were constructed. Four timbres were shared by each pair of sets. A fourth superset containing all 15 timbres was also presented. With a geometric analysis we observed that for each of the "reduced" sets, the minimization of variation along one dimension resulted in its disappearance or attenuation in the resulting timbre space, indicating that the reduced variation that remained virtually ignored compared to the superset space. These effects are consistent with the predictions of the classification of Tversky and Sjöberg.


Les techniques d'analyse multidimensionnelle permettent de rendre compte de la représentation mentale du timbre selon un "espace de timbres" dans lequel la distance séparant les objets sonores correspond au degré de leurs dissemblables perceptives. Dans une première expérience, un espace en quatre dimensions a été établi suite à une analyse multidimensionnelle avec le programme CLASCAL. Néanmoins, il est reproché à ce type de modèle de ne pas tenir compte de certains effets cognitifs susceptibles d'influencer les jugements subjectifs. Nous avons pu, dans une seconde expérience, évaluer de tels effets dans l'espace obtenu précédemment. Nous avons également vérifié de façon qualitative les modèles de Tversky (1972) et Sjöberg (1972) selon lesquels la similarité est définie selon une fonction d'appariement des traits communs et distinctifs des deux objets comparés ou encore en référence à une classe commune et un univers de classes. Trois ensembles de neuf timbres de synthèse variant peu sur l'un des axes de l'espace original ont été construits. Quatre timbres étaient partagés par chaque paire d'ensembles. Un quatrième sur-ensemble contenant tous les 15 timbres a été également présenté. Nous avons pu constater par l'approche géométrique qu'à la minimisation de la variation le long d'une dimension correspondait la disparition ou l'atténuation de celle-ci dans l'espace des timbres établi pour chacun des ensembles, indiquant que la variation réduite qui restait était essentiellement ignorée en comparaison avec l'espace du sur-ensemble. Les effets observés confortaient de plus les prédictions des modèles de classification de Tversky et de Sjöberg.


Numerous debates on timbre raise the difficulty of its definition. The American Standards Association proposes: "that attribute of auditory sensation in terms of which a listener can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar". Further, timbre as a multidimensional attribute of auditory sensation cannot be measured on a single continuum (soft-loud or low-high). The problem has thus been to discover the number of dimensions that it possesses, as well as their psychophysical nature. Multidimensional scaling (MDS) techniques have been fruitful in the study of perceptual relations among timbres (Grey, 1977). These geometric models allow determination of the Euclidean space (in an appropriate number of dimensions) within which different timbres can be ordered such that the distances separating them correspond as much as possible to listeners' judgments of their relative dissimilarities. This representation is called "timbre space" and the axes are interpreted as being the perceptual dimensions of timbre. Nevertheless, timbre could be defined not only by a certain number of continuous dimensions by also by discrete features that may be unique to each class of sound event. According to the hybrid model EXSCAL (Winsberg & Carroll, 1988), a specificity value representing the presence of such features can be added to the Euclidean distance separating two stimuli in a space of common dimensions. Krumhansl (1989) applied this model to the analysis of a three-dimensional timbre space for a set of synthetic timbres that simulated traditional instruments as well as hybrids between them. The three axes appeared qualitatively to correspond to attack quality (temporal envelope), to spectral flux (the evolution of the spectral envelope other time) and to the brightness (spectral envelope). In addition, some timbres had an associated "specificity" weight indicating the presence of other features affecting their perception. Recent acoustic analyses have determined the acoustic parameters underlying the common perceptual dimensions of this space (Krimphoff et al., 1994). The "brightness" dimension is highly correlated (r=0.94) with the spectral center of gravity. The "attack quality" dimension is strongly correlated (r=0.94) with the log of the attack time. The putative "spectral flux" dimension turns out to be poorly correlated with a variety of measures of spectral evolution, but is well correlated with the spectral fine structure measured either as the rms deviation of the component amplitudes from a global spectral envelope (r=0.85) or as the ratio between the amplitudes of even and odd harmonics (r=0.71).

Some authors, however, consider that the application of geometric models to dissimilarity data is not appropriate in the sense that they cannot account for certain context effect that might influence dissimilarity judgments (Tversky, 1976, Sjöberg, 1979). Tversky and Sjöberg propose classificatory models based on a conception of the similarity relation that is very different from that in geometric models. According to Tversky, the degree of similarity is defined by a matching function between the common and distinctive features of two compared objects. For Sjöberg, similarity is determined by reference to a common class and a universe, which is in turn defined by a set of mutually exclusive classes. In both conceptions, the extension of a stimulus context by the introduction of a new, different stimulus would result in an increase in the similarities among the objects that formed part of the original, "reduced" stimulus set. Tversky claims that features shared by all the objets in the original set acquire a greater "diagnosticity value" in the extended context since they are only shared by the subset of the objets drawn from the reduced context and not by the new objects in the extended context. Sjöberg proposes that the new objects may not be integrated into class of objets from the original set, which would have the effect of expanding the "universe" by the creation of new classes. He showed that the expansion of the universe results in an increase in judged similarity among objects belonging to a common class. No research to date has demonstrated such effects in a multidimensional space according to the geometric approach.

The present study investigated the possibility of such effects in the context of a geometric space. The idea is to test for these effects in a multidimensional timbre space by evaluating the influence of context changes on dissimilarity judgments between timbres. These effects should be reflected in changes in the metric distances between timbres in the geometric space as a function of the kind of contextual change. The timbre space upon which this study is based was determined by Donnadieu (1992) using 18 of the synthetic timbres employed by Krumhansl (1989) of which 12 were imitations of traditional instruments and six were hybrids. The pitch, loudness, and subjective duration of all sounds were equalized. The musical training of the 58 subjects participating in that study varied from nonmusician to professional musician. The MDS analysis used the CLASCAL model developed by Winsberg & De Soete (1993) which allows for weights on both the common dimensions and the specificities for latent classes of subjects. This analysis revealed a four-dimensional space without specificities and a single latent class of subjects. The first, second, and fourth dimensions correlated significantly with the dimensions of Krumhansl's space (attack, brightness, and spectral flux, respectively). The first and third dimensions of this space, however, were also significantly correlated. Based on this space, four contexts were constructed. Three "reduced" contexts comprised sets of timbres varying little on one of the four dimensions. Each of these contexts shared four timbres with each of the reduced contexts. The "extended" context comprised the set of 15 timbres common to all reduced contexts. According to the classificatory models, the dissimilarities between the four timbres shared by each pair of reduced contexts should be greater in the reduced contexts than in the extended context. We are also interested in knowing whether a change in context influences the dissimilarity between timbres along a single dimension (the dimension along which variation was minimized) or along several dimensions. Another hypothesis is that the minimization of variation along one of the four dimensions may cause subjects to no longer use that dimension in their evaluations of dissimilarity. According to this hypothesis, the reduce contexts would be best modeled by three-dimensional solutions and the extended context by a four-dimensional solution.


Stimuli. Fifteen timbres played on a Yamaha TX802 FM synthesizer were drawn from the 18 studied by Donnadieu (1992). Twelve were imitations of traditional instruments and three were hybrids. Pitch, loudness, and subjective duration were equalized. Three sets of nine timbres (A, B, C), each varying little on one dimension of the original space (1, 2, and 4, respectively), were selected such that each set shared four timbres with other set (Fig. 1). A fourth superset (D) comprised the 15 timbres present in all reduced contexts.

Figure 1. Timbre sets for the "reduced" contexts extracted from the four-dimensional space of Donnadieu (1992).

Procedure. The stimuli were presented diotically over a Sony Monitor K240 headset. Subjects were seated in a sound-treated room in front of the computer screen. After having read the instructions, the timbres composing the context to be tested were presented in random order to give the subject a sense of the range possible variation. Fifteen practice trials were chosen from the test context, including pairs for which the timbres had varying distances separating them in the original four-dimensional space. On each trial, a pair of timbres was presented and the subjects were asked to judge the degree of dissimilarity on a scale from 1 (very similar) to 9 (very dissimilar). The order of presentation of the sounds in the pair and the presentation order of the pairs were randomized. Each pair was presented only once in a given block. Two blocks for each context were presented within each test session. All subjects participated in four sessions corresponding to the four contexts. The presentation order of contexts A, B, and C were counterbalanced across subjects. Context D was always presented last. A mean interval of 11 days separated each session in order to reduce the possible influence of the previous context on judgments for the current context. 27 subjects participated in the experiment (15 nonmusicians and 12 musicians).


Analysis of variance. The first hypothesis concerned the effect of context on dissimilarity judgments. If subjects were to judge differently the six pairs of timbres common to each pair of reduced context and to the extended context, according to the context within which they are presented, a significant interaction between pairs and contexts should be observed indicating differential effects of the perceptual dimensions. The second hypothesis concerns the nature of the context effect: dissimilarity judgments should be generally higher in the reduced contexts than in the extended context which would result in a significant effect of context. Three repeated measures ANOVAs were performed on factors Pair (6) and Context (3) for contexts A, B, and D, for A, C, and D, and for B, C, and D. The dependent variable was the summed dissimilarity judments across the two presentation blocks for each pair of timbres in each context. For contexts A/B/D, no significant effect of Context was observed suggesting a lack of global effect of context. However, a significant effect of the Pair X Context interaction (F(2,50)=2.8; p=.01) was found. For the other combination of context (A/C/D and B/C/D), neither the effect of context nor the interaction was significant.

Repeated measures ANOVAs were also performed on the dissimilarity judgments for all pairs among the 9 timbres common to a given reduced context and the extended context (Pairs (36) X Context (2)). A significant effect of context was only observed for the A/D set (F(1,25)=6.1; p=.02) : dissimilarities were higher in the A context than in the D context. A significant interaction between pair and context was found for all three sets (A/D: F(35,875)=1.4; p=.05; B/D: F(35,875)=1.9; p=.001; C/D: F(35,875)=2.5; p<.0001). These interactions show that the effect of context depends on the pair of timbres being judged and may indicate differential effects of the underlying dimensions.

Multidimensional scaling with CLASCAL. According to the third hypothesis, we should observe the disappearance of one dimension in the reduced contexts. The MDS analyses for each of the three reduced contexts yield spaces in three dimensions. A four-dimensional space was found for the extended context. These results support the hypothesis.

Correlations among the dimensions of the different spaces. In order to verify that the missing dimensions in the reduced contexts corresponded to the dimensions on which the variation was minimized, correlations were performed between the positions of the timbres along each dimension of the extended and reduced contexts. According to the disappearance hypothesis, the dimensions of A should not correlate significantly with dimension 1 of the extended context. The same should hold for context B and dimension 2 of context D as well as for context C and dimension 4 of context D. The dimensions of A correlated significantly with dimensions 2, 3 and 4 of D. The dimensions of B correlated most highly with dimensions 1, 3, and 4 of D, though B's first dimension also correlated at -0.67 with D's dimension 2. The dimensions of C correlated most highly with dimensions 1, 2, and 3, thought C's dimension 2 also correlated at -0.60 with D's dimension 4. We may conclude from these results that dimension 1 of context D disappeared in context A, but that the variance along dimensions 2 and 4 in context D were only attenuated in context B and C respectively.

Analysis of context effects on the multidimensional structures. In order study the effects of context manipulation on the multidimensional structures, we examined the Euclidean distances between each of the six pairs of timbres common to a pair of reduced contexts and the extended context. The goal was to see if a change in context created by minimizing variation along a given dimension affected the distances between timbres along this dimension or along other dimensions of the space as well. This comparison requires a good correlation between the dissimilarity judgments and the Euclidean distances. The analysis reveals that all of the correlations were significant at a criterion of p=0.01 except two that were significant at p=0.05. The mean explained variance (r2) was 91.5% . The structure of relations between the four common timbres in their three possible contexts were analyzed qualitatively. In each case one of four timbres served as a reference point for lining up the structures in the different contexts for visual comparison. In general, the distances between the four timbres varied along the different dimension and not systematically along the minimized dimension. Furthermore, the distance separating the timbres along the minimized dimension was not systematically greater in context D than in the reduced contexts.

Interpretation and Discussion

The aim of this research was to evaluate possible context effects in a timbre space established by a geometric approach (Donnadieu, 1992), as well as to evaluate qualitatively the theoretical positions of the classificatory models of Tversky and Sjöberg. The analysis of variance indicates that a context effect was only found for a comparison between contexts A and D for the set of nine shared timbres. In this case the dissimilarity ratings were greater in context A than in context D. This result supports the classificatory models which predicts an increase similarity among the nine timbres of context A within the expanded D context. The nine timbres in context A varied little on the attack quality dimension: all sounds came from simulated wind instruments which have rather soft attacks. Presenting these instrument sounds alone may have allowed the subjects to focus on finer differences in attack quality. However, when presented with other, more percussive sounds of struck and plucked instruments (a different class of attacks) in context D, the perceived the similarity among them increased causing a decrease in dissimilarity ratings, as predicted by the classificatory models. Sjöberg (1972) has also shown a similar effect with musical instruments. However, this effect is not observed for contexts B and C. This absence of effect may be due to the fact that the dimensions related to brightness and spectral fine structure cannot be considered in terms of discrete features, but rather in terms of continuous dimensions. In fact, the timbres are spread along the attack dimension in a discontinuous fashion tending to cluster into two groups: blown and bowed sounds in one class and plucked and struck sounds in the other. These same timbres are more continuously distributed along the other two dimensions. Further research is needed to see if the strong correlation between position along this dimension and the log attack time parameter found by Krimphoff et al. (1994) really implies a continuous dimension.

The hypothesis that a minimization of variation along one dimension should correspond to the disappearance of that dimension was verified with the qualification that in some cases the dimension seemed to be merely attenuated. In fact, the presentation of a set of timbres that vary little along a dimension leads subjects to take this dimension of variation into account to a lesser degree in their dissimilarity ratings, which does not support the predictions of the classificatory models. MDS analyses only gave three-dimensional structures for the reduced contexts and a four-dimensional structure for the expanded context. Correlations between these spaces indicate that indeed the dimension that was absent or attenuated was the minimized dimension. This effect is clearer for context A than for contexts B and C. The physical interdependence of certain dimensions in multidimensional timbre space may not allow subjects to evaluate the dissimilarities between timbres along a single one of its defining dimensions. It may be that in the case of timbre, the "brightness" and "spectral fine structure" dimensions are not separable, both being of a spectral nature. The partial correlation between the second dimension of the four-dimensional space and brightness and spectral fine structure dimensions of Krumhansl's study would argue in favor of this hypothesis. This result is partially coherent with the classificatory models, according to which a change in context should correspond to a change in the degree of similarity. In our case this would correspond not to the disappearance of a dimension but to a change in scale of the degree of similarity along the minimized dimension. In other words, the dimension considered as a succession of features should remain just as salient for the subjects. However, the dimensions tend to disappear or to be attenuated without the degree of similarity varying in a systematic fashion in the direction observed by Tversky and Sjöberg.


Donnadieu, S . (1992). Perception du timbre et classes latentes. Master's thesis. Laboratoire de Psychologie Expérimentale (CNRS URA 316), Université René Descartes, Paris.

Grey, J.M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America.61, 1270-1277.

Krimphoff, J. , McAdams, S. & Winsberg, S. (1994). Caractérisation du timbre des sons complexes. II : Analyses acoustiques et quantifications psychophysiques. Proceeding of 3rd French Congress Of Acoustics, Toulouse (in press).

Krumhansl, C.L. (1989). Why is musical timbre so hard to understand ? In J. Nielzen & O. Olsson (Eds.) Structure and Electroacoustic Sound and Music (pp.43-53). Amsterdam : Elsevier (Excerpta Medica 846).

Sjöberg, L. (1979). A classificatory theory of similarity. Psychological Research, 40, 223-247.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.

Winsberg, S. & Carroll, J.D. (1988). A quasi-nonmetric method for multidimensional scaling via an extended Euclidean model. Psychometrika, 53, 217-229.

Winsberg, S. & de Soete, G. (1993). A latent class approach to fitting the weighted Euclidean model, CLASCAL. Psychometrika, 58, 315-330.

Server © IRCAM-CGP, 1996-2008 - file updated on .

Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le .