Serveur © IRCAM - CENTRE POMPIDOU 1996-2005. Tous droits réservés pour tous pays. All rights reserved. |
in Basic Issues in Hearing, Academic Press (1988)
Copyright © Academic Press 1988
While this seems intuitively obvious, previous work on high-pitched vowel identification in the presence of frequency vibrato has yielded ambiguous results ([Sundberg, 1977]). These researchers claim that intonation contours or vibrato had slight, or even detrimental, effects on vowel identification. The present study demonstrates, to the contrary, that if the amplitude behavior of a given partial is coupled with its frequency behavior according to a given spectral envelope, this information can be used by the auditory system to discriminate and identify the vowel quality or timbre of complex harmonic sounds.
Eight peak-to-peak widths were used throughout: 0, 1, 2, 3, 4, 6, 8, 10% of the partials' frequencies. A psychometric function was determined based on this dimension. The vibrato was a sinusoidal frequency modulation with a frequency of 6.5 Hz. The starting phase of the vibrato was randomly selected from 0, /2, , 3/2. This was necessary to avoid discrimination judgments based on counting the number of peaks in the amplitude envelope for the modulated pure tone condition, since with sine phase vibrato the amplitude would always increase at the beginning for one spectral envelope, and decrease for the other.
The two table-lookup spectral envelopes applied to the components, SE1 & SE2, are illustrated in Figure 1. They are like allophones of // with SE1 being closer to /o/ and SE2 being closer to /æ/. The envelopes were stored in a table which returned the instantaneous amplitude corresponding to the instantaneous frequency of each partial. In this way, the vibrato was coupled to an amplitude modulation defined by the spectral envelope function. The spectral envelopes consisted of 5 formants, 4 of which were identical in the two cases. The only difference was the center frequency of the second formant (F'2=1215 Hz for SE1, F"2=1485 Hz for SE2). The skirts of these 2 formants intersect at the center frequency of the second harmonic, and the main difference between them in the region just around this harmonic (at small vibrato widths) is a change in the sign of the spectral slope. The part of the slopes in the boxed area in Fig. 1 covers the range for a 20% peak-to-peak vibrato, and was constructed over this range such that the two envelopes are mirror images (in linear frequency) of one another about the 2nd harmonic frequency. The envelopes were also constructed so that the frequency-amplitude coupling for all other harmonics was identical in the 2 cases.
Figure 1. The two spectral envelopes used in the experiment.
In the non-roving global amplitude condition, complex tones were presented at 75 dBA and sine tones were presented at 58 dBA, this latter being equal to the intensity of that tone within the complex. The rms amplitudes of the stimuli at a given vibrato width (2 spectral envelopes at 4 vibrato starting phases) were identical. In the roving amplitude condition the tones were presented at the above intensities or at that value +/- 5 dB.
The subject's task was to identify the interval containing the "different" pair by pressing a corresponding button. Feedback indicating the correct response was given.
Trials were presented in blocks of 80 with ten repetitions of each of the 8 vibrato widths in random order. The interval containing the "different" pair was counter-balanced across trials, as was the SE chosen for the "same" pair.
Subjects completed several training blocks until their psychometric functions appeared to stabilize. The number of training blocks varied considerably between subjects (5-35). Then ten blocks were collected, giving a total of 100 2IFC judgments for a data point at each vibrato width. The % correct measures at each vibrato width were averaged across the 10 blocks in order to obtain the mean and standard deviation. This latter statistic was then used to evaluate the reliability of the 75% threshold on the psychometric function as described in the next section.
Four subjects completed the non-roving conditions for complex and sine tones in that order. Afterward, two of these subjects completed the roving conditions for complex and sine tones.
Figure 2. Psychometric functions for 4 subjects (non-roving condition) showing % correct discrimination as a function of vibrato width in %f. The points with horizontal bars in the lower portion of the plot indicate the 75% point of the mean curve and the estimated range of standard error for each subject.
Figure 3. Psychometric functions for 2 subjects showing % correct discrimination as a function of vibrato width. Points with horizontal bars as in Figure 2.
The mean data for 4 subjects are shown for the non-roving condition for both complex and sine tones in Figure 2. All Ss attain near-perfect performance in the range of vibrato widths used indicating that the stimulus difference is easily discriminable. All psychometric functions are monotone increasing indicating that the perceptual factor upon which discrimination is based varies with vibrato width. The range of threshold vibrato widths for the 4 Ss was 1.2-3.8% (16-51 Hz at the 2nd harmonic). For S1 & S4, complex thresholds are significantly lower than sine thresholds (p<.01; t-test). They are approximately equal for S2 & S3. This may indicate differences in decision strategies among the Ss. If complex thresholds were always less than sine thresholds, one might hypothesize that the same dynamic frequency-amplitude slope information was more easily interpreted in the global context given by the behavior of the other harmonics. Such may be the case for S1 & S4.
The mean data comparing roving and non-roving conditions for Ss 1 & 2 are shown in Figure 3. Both subjects attain near-perfect performance by 6% peak vibrato width in both of these conditions. For complex tones, the thresholds for roving amplitude stimuli appear to be greater than those without roving for both Ss. Only the difference for S1 is statistically significant (p<.01), though it is relatively small. For sine tones, neither S shows an effect of amplitude roving. Comparing sine with complex tones within the roving condition, S1 shows no difference while S2 does, complex tones having a higher threshold (p>.05).
We might conclude from these data that the difference in spectral envelope following for minimally different envelopes is indeed possible at relatively low vibrato widths (1.2-3.8%). This discrimination is easier for some Ss in the presence of a vowel-like spectral envelope on flanking harmonics than it is with a single sinusoid. While roving the global amplitude causes some deterioration in performance, its effect is quite small indicating that Ss are not using a within-frequency channel intensity discrimination strategy, but are extracting the spectral profile and storing it in short-term memory.
Ss were to identify the single tone presented on each trial. The tone had a randomly selected vibrato starting phase as in Experiment 1. No feedback was given. Trials were presented in blocks of 80 with ten repetitions of each of the vibrato widths. An equal number of SE1's and SE2's was presented in each block. Four Ss performed the experiment for complex tones and then sine tones in the non-roving condition. Two of these Ss then performed the roving condition.
Subjects completed training blocks until their psychometric functions (% correct identification as a function of vibrato width) stabilized. Ten blocks were then collected from which were determined the mean % correct and standard deviation across the 10 blocks. From these values the 75% threshold and range of its standard error were calculated from fitted spline curves as in Experiment 1.
Reversing the trend in Experiment 1, Ss 2 & 3 have different thresholds for complex and sines in this identification task (p<.05 for S2; threshold for complex well beyond stimulus range for S3), while Ss 1 & 4 show no difference. The difference shown by S2 is from 0.6% for sines compared to 1.2% for complex tones. The threshold for sines may well be lower since more data would be needed between 0 and 1% to accurately estimate the threshold.
The psychometric functions for roving amplitudes (Ss 1 & 2) showed no differences between sine and complex tones nor between roving and non-roving conditions.
These data suggest that even very similar spectral envelopes can be successfully identified in the presence of vibrato (and with a severely restricted set of choices), when the fundamental frequency is fairly high and the spectral envelope is not well defined by the relative amplitudes of the frequency components in the absence of vibrato.
A possible criticism of this experiment may be that the stimulus set was too small, making the "identification" experiment one of "discrimination across trials". All Ss remarked during the experiment that when the vibrato width was small for several trials, they tended to shift their criterion toward the less bright of the 2 SE's. Subsequently, a large vibrato stimulus would be judged (sometimes erroneously) as SE2. These errors would raise the measured thresholds.
Figure 4. Psychometric functions for identification (non-roving condition). Points with horizontal bars as in Fig. 2.
Three of the four Ss (1,3,4) showed a tendency for discrimination thresholds to be lower than identification thresholds in the non-roving condition for both sine and complex tones. The general tendency seems to be for identification to require greater perceptual difference than discrimination. It seems intuitively obvious that sounds must be easily distinguishable in order to be correctly categorized and identified, but given the limited number of stimuli to be identified, these results must be interpreted with caution.
The apparently small difference in performance introduced by the presence of other harmonics around the 1350 Hz 2nd harmonic (whose frequency-amplitude coupling defines other regions of the spectral envelope) bears some consideration. For the 2 Ss who show this trend (Ss 1 & 4), the average increase in threshold when the flanking harmonics are removed is only on the order of 1% of the component frequency. This corresponds to an average increase of about 1 dB in the peak-to-peak amplitude fluctuation on the 2nd harmonic. What this may suggest is that the information already present in the modulated sine tone is sufficient to explain performance with the complex tones. The reports from Ss about what they listened to in order to make the judgments on sine tones varied. Ss 1 & 4 felt that they were listening for a tone color difference in the two sines. Ss 2 & 3 felt that they were listening to a pitch difference. This latter criterion is easily understood since the amplitude of the sine is greater at lower frequencies for SE1 and greater at higher frequencies for SE2. A strategy that consisted of accumulating a weighted pitch representation of the tone and deciding whether it was the higher or lower would be successful since the discrimination thresholds are above frequency discrimination threshold if one were to measure the distance between the lower excursion of SE1 and the upper excursion of SE2. All Ss, however claimed to use tone color or vowel quality as the cue with the complex tones. It is entirely possible that Ss 2 & 3 used different strategies in the two cases.
Whatever the mechanism responsible for this performance, it is clear that one must take into account the dynamic nature of the stimuli. The basilar membrane activity pattern proposed by Green (1983) as the stimulus structure used to make the profile comparison is never present at any given moment in these stimuli. It is thus necessary for the auditory system to accumulate it through time.
____________________________
Server © IRCAM-CGP, 1996-2008 - file updated on .
____________________________
Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le .