|Serveur © IRCAM - CENTRE POMPIDOU 1996-2005.
Tous droits réservés pour tous pays. All rights reserved.
Proceedings of the 6th Annual Meeting of the International Society for Psychophysics, 1990
Copyright © International Society for Psychophysics 1990
Thresholds for detection of frequency modulation coherence and incoherence were measured. The task consisted of detecting a "figure" composed of 1 or more frequency components embedded in a multi-component "ground". The relations among components were either harmonic or inharmonic. Components were modulated in frequency with a random waveform. Frequency modulation width was varied in an adaptive tracking procedure. The components of the figure were always modulated coherently such as to maintain constant frequency ratios. The components of the ground were either modulated coherently or independently of one another. Thresholds for coherent grounds were considered to reflect the sensitivity to modulation incoherence between figure and ground components. Thresholds for incoherent grounds were considered to reflect the sensitivity to modulation coherence among the figure components. The results show that thresholds in the coherent ground conditions depend primarily on the proximity of figure and ground components, suggesting that interactions within auditory channels are used to detect incoherent modulation. Thresholds for incoherent grounds are higher than those for coherent grounds and show that coherently modulated components must fall within the same auditory channels in order for the coherence to be detected, but that information about coherence can be integrated across channels to some extent.
Frequency modulation coherence has been shown to contribute to the perceptual segregation of concurrent sounds (, , ). Two kinds of information related to this phenomenon have been proposed : within-channel incoherence generated by adjacent components that derive from different sound sources and cross-channel coherence existing among components deriving from the same sound source. Whether the information that is encoded in the auditory system truly allows access to both of these properties of the stimulus is currently being debated. The experiments described below were designed to examine both kinds of information by controlling 1) the proximity among figure components and between figure and ground components, 2) the harmonicity (or periodicity) of all the components, 3) the coherence of modulation among ground components, and 4) the number of figure components. Combinations of these parameters allow us to understand the relative contributions of within-channel and cross-channel cues derived from the patterns of activity generated by frequency modulation in the auditory system.
The stimuli consisted of a series of 8, 9 or 15 equal-amplitude partials whose starting phases were randomized for each tone presented. The tones were presented at a level of 45 dB/component. In test tones, some of the partials composed the "figure" to be detected while the rest composed the ground. All components were modulated with a low-frequency random waveform (jitter) derived from a white noise signal low-pass filtered at 30 Hz. The same jitter waveform was applied to all components of the figure, though this waveform changed from tone to tone. The ground components were either coherently or incoherently modulated. For coherent grounds, the same waveform was applied to all components, while for incoherent grounds, independent waveforms were applied to each component. Figure jitters were always independent of ground jitters. Tones were synthesized at a sampling rate of 9 kHz and low-pass filtered at 3.6 kHz. Tone duration was 1 s with 200 ms raised cosine attack and decay ramps.
Figure conditions. Figures were composed of 1, 4 or 7 frequency components. For 1-component figures, only the coherent ground was presented. The component ranks for these figures included partials numbered 3, 5, 7, 9, 11, 13; that is, one of these components was modulated independently of the other 14. For the other figure types, both coherent and incoherent grounds were presented. The components were either all adjacent or interleaved with those of the ground. In 4-component figures, the lowest component was either of rank 2 or 5 (labeled fig4/2 and fig4/5, respectively). In 7-component figures, the lowest component was always of rank 2 (fig7/2). The possible combinations of figure component adjacency and ground coherence are shown in Figure 1 for the fig4/5 figure.
Figure 1. Schematic representation of stimulus configurations for a 4-component figure (fig4/5) with all components belonging to the same harmonic series.
In each trial subjects heard two tones separated by a silence of 200 ms. In one tone a figure was modulated independently of the ground components and in the other, no figure was present, i.e. all components were modulated either coherently or incoherently depending on the ground coherence condition in the test tone. All components were modulated at the same rms modulation width in a given trial. The task was to detect the interval containing the embedded figure and press the appropriate button. A feedback light was illuminated after each response. A 1-up, 2-down adaptive procedure was used to determine figure detection threshold at 71% performance as a function of rms frequency modulation width. Each run consisted of 12 turnarounds of which the last 8 were averaged to estimate threshold. The step-size was adjusted for each condition and each subject such that it was no more than 25% of the threshold modulation width. Since the psychometric functions were non-monotonic, care was taken to start each condition at a modulation width positioned near the upper end of the lowest positively sloped portion of the function.
Prior to each run, the subject was allowed to explore the stimulus set at will since there were great differences in the perceptual cues that indicated the presence of the figure for the different experimental conditions. One button presented a tone pair in which the figure was in the first tone. Another button presented a tone pair with the figure in the second tone. A third button changed the modulation width. Conditions were loosely blocked by stimulus type. Runs were performed until thresholds stabilized and then 3 threshold estimates for each experimental condition were obtained for each subject. These estimates were then averaged across subjects.
Figure 2. Mean detection thresholds of 1-component figures embedded in harmonic and perturbed harmonic series as a function component rank. Data points represent the mean of 3 thresholds for each of 4 subjects, with the exception of rank = 3 where one subject did not attain threshold.
Figure 2 displays the mean thresholds for single-component figures for both harmonic and perturbed harmonic stimuli. These reflect the sensitivity to incoherent modulation on a single component. Thresholds decreased with increasing rank up to 7 and then remained at about 0.3% rms modulation width above that rank. Once the 7th component in these stimuli is reached, adjacent components are separated by less than a critical bandwidth and the degree of interaction of incoherent components within an auditory channel has apparently attained its maximum. The same qualitative results hold for the perturbed harmonic series except that thresholds tend to be 2-4 times higher. Thus when the background is less periodic, a greater modulation width is needed before the incoherence can be detected, though it is still quite possible as is witnessed by the relatively low thresholds (~1%) for the higher ranked components.
Detection thresholds for three multi-component figure conditions are shown in Figure 3.
Figure 3. Mean detection thresholds for harmonic figures with 4 or 7 components. Vertical bars indicate +/-1 standard deviation. Means represent 3 thresholds for each of 4 subjects with the following exceptions : fig4/5+coherent+adjacent: 3 subjects; incoherent+interleaved: 1 subject.
A comparison of multi-component and single-component thresholds was made. Those for adjacent figures were higher than would be expected from the 1-component stimulus that is equivalent to the most proximal figure and ground components in the complex figure (e.g. compare fig4/2 in Fig. 3 with the 1-component figure at rank 5 in Fig. 2). The reverse is true for interleaved figures where the thresholds were lower than those for the most proximal single-component stimuli (e.g. compare fig4/2 with the 1-component figure at rank 9). These results also indicate that a simple within-channel incoherence detection mechanism would be too simplistic to explain the data.
For interleaved figures, only 1 of the 4 subjects could perform the task and only for two of the figure conditions. In cases where a threshold could be measured, it was very much larger than that for an adjacent figure in an incoherent ground. This indicates an inability to use cross-channel coherence unless some modulating periodicity information is present within a channel as would be the case for the higher components of the two thresholds found for one subject. However, even here the modulation width values are so large as to fall completely outside of the range of natural jitter found in physical forced-vibration systems (0.4% in sustained musical instrument tones and generally less than 1.5% in trained voice in the absence of vibrato .
Detection thresholds for multi-component figures in three harmonicity conditions are shown in Figure 4.
Figure 4. Mean detection thresholds for multi-component figures with harmonic, perturbed harmonic or equal-Bark spaced components. Vertical bars indicate +/-1 standard deviation. Means represent 3 thresholds for each subject with the following exceptions : perturbed harm/incoherent/adjacent -- 2 of 4 subjects; equal-spaced/coherent -- 3 of 4 subjects.
One of the main results of this study can be gleaned from a comparison of interleaved figures embedded in coherent or incoherent grounds. In order for subjects to perform the figure detection task in an incoherent ground with an interleaved figure, they would need access to information on modulation coherence across auditory channels. The fact that they are generally unable to perform this task suggests that this information is not available in a raw form.
Another main result is that the auditory system is very sensitive to within-channel modulation incoherence even for inharmonic sounds, at least in the case of complex sounds composed of sinusoidal components and presented in quiet. It is also relatively sensitive to coherent modulation of adjacent figure components embedded in an incoherent ground even when these components have inharmonic relations. This raises some very interesting questions about auditory sensitivity to dynamic stimuli, that can not be totally explained by mechanisms based purely on detection of periodic activity in a given set of auditory channels (2).
The task of detecting frequency modulation coherence and incoherence may involve storing a representation of a coherence profile across channels and comparing it to the succeeding tone. This profile has the greatest clarity in the case of two independently modulated harmonic series as is attested to by the very low figure detection thresholds for harmonic figures in a coherent harmonic ground. When regions of coherence are periodic, the detection of incoherence is very sensitive. Likewise, when the ground is incoherent and the task is to detect a region of coherent modulation (adjacent disposition), this is more easily performed when the figure is harmonic, and thus periodic. However, this task can also be performed with inharmonic stimuli, though the thresholds increase with increasing departure of the series from harmonicity and with decreasing proximity of frequency components.
These results lead us to consider a model of frequency modulation coherence and incoherence detection based on the storage of a global activity profile that can then be compared with a succeeding tone. This profile must necessarily contain a great deal of fine-grained temporal information to account for the sensitivity of subjects to modulation coherence of harmonic as well as inharmonic sounds.
Server © IRCAM-CGP, 1996-2008 - file updated on .
Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le .