Serveur © IRCAM - CENTRE POMPIDOU 1996-2005. Tous droits réservés pour tous pays. All rights reserved. |
To appear in Journal of the Acoustical Society of America, 1997
Copyright © ASA 1997
PACS nos.: 43.66.Mk, 43.66.Jh, 43.66.Qp
The acoustic environment is composed of numerous sources of sound and one of the important tasks for any animal is to be able to perceive them separately and to perform actions with respect to them. This ability may be considered to involve the building up of a veridical mental representation of the sources of sound present in the environment, a representation that is then used to plan appropriate action. One class of processes that seems to be involved in this kind of perceptual organization is the connecting together through time of individual sound events that are emitted by the same source, a process called sequential auditory organization or auditory streaming (Bregman and Campbell, 1971; McAdams and Bregman, 1979; Bregman et al., 1990). Several acoustic factors have been shown to play a role in streaming, such as spectral discontinuity (Bregman et al., 1990), intensity discontinuity (van Noorden, 1977), and spatial discontinuity (Hartmann and Johnson, 1991). That is, a sequence composed of sounds that are more or less similar with respect to these three dimensions tends to be heard as a single sound stream, whereas a sequence that alternates between two regions of relatively distant values along one or more of these dimensions tends to be heard as two streams.
For most sound sources in the everyday world, the various sound dimensions give rise to unambiguous organizations into sound objects and allow a clear understanding of their behavior through time. In addition, this organization appears to take place automatically without much conscious or deliberate intervention on the part of the organism. An important question that arises in the face of this evidence is whether the processes of sequential auditory organization are part of our innate perceptual equipment (present at birth or maturing within the first few weeks of life) or whether they are acquired through experience in the acoustic world. From birth, infants have to deal with an acoustic environment in which many simultaneous sources of sound can compete. Moreover, it is from such noisy situations that infants must correctly extract highly relevant acoustic information such as their mother's voice among others speaking her native language. Surprisingly, there is almost no research on infants' capacity to organize their acoustic world in terms of separate sources, or to perceive auditory streams. Auditory development has received much interest (e.g. Trehub and Trainor, 1993; Werner and Rubel, 1992), particularly in terms of the development of speech perception (e.g. Jusczyk, 1997), but very few studies have addressed the problem of how infants perceive complex sounds as coherent units, similarly to what is called object perception in visual cognition (cf. Spelke, 1990; Spelke et al., 1992). Although scarce, the data available seem nonetheless to favor the existence of unlearned, basic mechanisms engaged in streaming very early in life. Some arguments in this direction have been proposed by Bregman (1990).
The results of two previous studies (Demany, 1982; Fassbender, 1993) are consistent with such an assumption. They have addressed the question directly by testing young infants in auditory conditions that lead adults to perceive sound sequences as organized into two different streams. Demany (1982) demonstrated with a visual fixation procedure that 1.5- to 3-month-old infants organize sound sequences according to spectral proximity (frequency region of pure tones). Fassbender (1993) further demonstrated with a non-nutritive sucking paradigm that 2- to 5.5-month-old infants organize sound sequences on the basis of frequency proximity, intensity similarity, and spectral similarity. From these results it appears that stream segregation processes are operative in the very first weeks of life, at least when streams of fairly simple sounds are separated according to a number of acoustic dimensions known to be effective in promoting segregation in adults. Other properties of sound sources that are powerful cues for detecting and recognizing auditory events in the natural environment, such as correlated multidimensional variation of source timbre and spatial location, have not yet been tested with infants. Thus the present study aimed to extend previous research in two ways. Firstly, we tested newborns to verify whether stream segregation is part of the perceptual apparatus that first encounters the acoustic world. And secondly, but more importantly, we tested their capacity to use the timbre and spatial position of naturalistic sounds to perceptually organize sound sequences in terms of two distinct auditory entities. The timbres of the sound sources (vibraphone and trumpet sounds) as well as their spatial locations were chosen to be sure that the differences were easily discriminable by young infants (see Clifton, 1992, for a review on sound localization in infants, and Trehub et al., 1990, concerning processing of timbre). In this study, we do not question infants' ability to distinguish vibraphone from trumpet or sounds coming from the right or left. We assume that this ability is part of the newborn's perceptual repertoire. Our question is whether such differences in timbre and spatial position are automatically used by newborns' to perceive sound sequences as originating from two individual sources.
A melody discrimination paradigm similar to the one used by Demany (1982) was employed to probe stream formation in both newborn infants and adults. A test of infants' ability to discriminate a melody from its retrograde (or reverse order of the tones) was first performed in order to find stimulus conditions under which they could perform the discrimination using a non-nutritive sucking paradigm (Exps. 1 and 2). Similar conditions were presented to adults for comparison (Exp. 4). Then the discriminable melody patterns were presented under conditions that adults perceive as two separate sound streams organized on the basis of timbre and spatial position (Exp. 3 for newborns, Exp. 4 for adults). If the latter sequence is organized by newborns on the basis of timbral and spatial similarity, then the sequence discrimination demonstrated in Experiments 1 and 2 should fail in Experiment 3.
Repeating melodic patterns of four ascending or descending pitches were used (Fig. 1). The experiment was based on the ability to discriminate an initial melodic contour1 (rising or falling) from its retrograde (falling or rising, respectively). Trehub et al. (1987) and Ferland and Mendelson (1989) have demonstrated that 9- to 11-month-old infants can discriminate and categorize these kinds of contours, although, to our knowledge, no data are available on this capacity in newborns.
Figure 1
The first such configuration is labeled 3/1 due to the fact that three tones are played by the vibraphone in one speaker and one tone is played by the trumpet in the other speaker. Under the hypothesis that the sequences would be organized into perceptual streams on the basis of timbre and/or spatial position of the tone events, it was expected that two streams would arise from these sequences: one with three pitches in a rising or falling pattern, and another with a single repeating pitch. Our experimental hypothesis was that the initial and retrograde patterns would be discriminable due to the difference of the melodic contours in the vibraphone stream. In fact, for this configuration, even if the subject paid no attention to timbre and spatial position, the sequences should be discriminable if the melodic contour has been stored and can be compared across sequences. The reason that two timbres and spatial positions are included in this sequence is to avoid confounding configurational differences between this sequence and the second one to be described below with differences in the number of timbres and spatial positions present in each sequence.
The second configuration is labeled 2/2 since the four pitches are split into two interleaved groups, with pitches 1 and 3 being assigned to the vibraphone and pitches 2 and 4 to the trumpet. Under the hypothesis that the sequences are organized on the basis of these parameters, two perceptual streams should result, each one consisting of a pattern alternating between two pitches. Our experimental hypothesis was that for this configuration, the initial and retrograde patterns would not be discriminable since each stream is perfectly symmetric in terms of its melodic pattern. Therefore, if the newborns can discriminate the 3/1 configuration and cannot discriminate the 2/2 configuration, this result would allow us to argue that they organize sound sequences on the basis of timbre and/or spatial position of the tone events.
Stimuli were synthesized on a Yamaha TX802 FM Tone Generator controlled by a Macintosh computer. The musical instrument simulations were developed by Wessel, Bristow and Settel (1987). The stimuli were recorded on a stereo cassette tape.
Procedure
An habituation/dishabituation paradigm was used in which sound
presentation was controlled by the infant's sucking on a non-nutritive pacifier(Jusczyk, 1985; Floccia et al., 1997).
Each time the amplitude of the infant's suck exceeded a fixed threshold,
several cycles of a repeating four-tone sequence were played. It was decided to
present a minimum of three to four cycles in order to ensure that stream
segregation occurred, since Bregman (1978)
has shown that this process takes time. A computer program recorded the sucking
rate in one-minute periods. The habituation point was considered to have
occurred when the sucking rate fell by at least one-third for two consecutive
minutes compared to the one-minute period immediately preceding these two
minutes. At this point, the stimulus sequence was changed for the Experimental
group and remained identical for the Control group. The infants were considered
to have discriminated the sequence change if the difference in mean sucking
rate between the one-minute periods before and after habituation was
significantly greater for the Experimental group than for the Control group.
The tapes were played over loudspeakers placed on either side of the infant's head. They were elevated by about 20 cm and formed an azimuthal angle of approximately 120deg. that was bisected by the orientation of the infant's head. The stimuli were presented at a level of about 70 dBA in a single-walled sound isolation chamber. The frequency components of the complex sounds used would generally be well above pure tone thresholds in newborns when presented at this level (cf. Werner, 1992).
Babies were rejected during experimentation if they refused to suck on the pacifier, failed to reach the habituation criterion within 15 minutes, lost the pacifier during the three minutes prior to or following habituation, fell asleep, cried or became agitated.
Subjects
Subjects in Experiments 1-3 were newborn infants of three to four days
of age who were selected to participate in the experiment on the basis of their
health record during pregnancy, delivery, and the three or four days after
birth. The selection criteria included the following: 1) their weight at birth
had to exceed 2.8 kg, 2) the gestation period had to be at least 38 weeks, 3)
their APGAR score (a general measure of health and responsiveness) had to
attain the value of 10 at least by the fifth minute following birth, and 4)
they had to be in good health at the time of testing. After having obtained the
permission of the parents, infants were brought to the experimental situation
about 2 1/2 hours after their last feeding and 30 minutes to an hour before
their next one. The newborn infants were tested at the Baudelocque Maternity
Hospital in Paris.
Method
Stimulus sequences consisted of a repeating four-note pitch pattern
(C4-D4-E4-F#4, or its retrograde, F#4-E4-D4-C4) presented at a rate of 10
tones/s (an inter-tone onset interval of 100 ms). Each tone had a duration of
85 ms. Pitches were assigned to either the left or right channel in order to
effect spatial separation and to one of two instruments in order to effect a
separation based on source timbre (Fig. 1). In these sequences, the three
lowest pitches (C, D, E) were assigned to one channel and presented with the
vibraphone timbre. The highest pitch (F#) was presented with the trumpet timbre
in the other channel. The side of presentation of the timbres was
counterbalanced across subjects. The initial pattern was either the ascending
or descending contour, each being presented to an equal number of subjects.
The subjects were randomly assigned to one of two independent groups: Experimental, with contour change at habituation, and Control, with no contour change. Twenty subjects completed the experiment in each group for a total of 40 subjects. Data for 32 additional subjects were rejected: 17 did not habituate within 15 minutes, four produced insufficient or irregular sucking or spat out the pacifier, three fell asleep, seven cried or became agitated, and one was removed due to experimenter error.
Sound presentation was contingent upon the infant's sucking behavior. Each time a high-amplitude suck was detected by the computer, a ramped gate was opened that allowed the continuous sound sequence on tape to be heard. The ramp lasted half the duration of a 4-tone cycle. After the ramp, a minimum of 3 complete cycles was presented. If no further sucking was detected during this time the sequence was ramped off over half a cycle. Each high-amplitude suck detected during sound presentation resulted in the continuation of sound presentation for 1.2 sec following the suck. In general, sucking behavior in newborns occurs in bursts of regular sucking that last several seconds and have a rate of 1.5 to 2 sucks/sec. Thus when the infants in our study maintained a rate of at least 0.83 sucks/sec during a burst, the sound would be presented continuously during the burst.
Results and Discussion
The data for three subjects (two control and one experimental) were
subsequently rejected since their sucking rate had fallen below seven sucks/min
in the one-minute periods either preceding or succeeding the habituation point.
This rejection is justified by the fact that a very low sucking rate may give
rise firstly to very few stimulus cycles at a crucial point in the experiment,
and secondly to an exceedingly long silent interval between the temporally
adjacent presentations of the pre- and post-habituation stimulus sequences. The
average sucking rates for the two groups (18 control, 19 experimental) in the
three one-minute periods preceding and the two one-minute periods succeeding
the habituation point are shown in Figure 2. No difference between groups was
found in the three one-minute periods prior to habituation according to a
repeated measures ANOVA [F(2,70)<1], indicating similar behavior during the
habituation phase in both groups. It appears that the sucking rate of the
Experimental group increases slightly more than that of the Control group after
the habituation point, due to the fact that rate for the Experimental group is
slightly lower in the pre-habituation period. In order to test the amount of
change across the habituation point, a mixed ANOVA was performed with sucking
rate as dependent variable and with within-subjects factor Period (one-minute
periods before and after habituation) and between-subjects factor Group
(Control, Experimental). The change in average rate across the habituation
point for the Control group was not significantly different from that for the
Experimental group [F(1,35)<1]. Identical results are obtained if average
sucking rate is computed in two-minute periods preceding and succeeding
habituation, although additional subjects must be removed whose sucking rates
do not reach criterion in the newly included one-minute periods. These results
may be interpreted as indicating that newborn infants cannot discriminate these
rising and falling sequences.
In the absence of data on newborns for this kind of discrimination we chose the values of pitch interval and tempo used in this experiment since they generally give good rising/falling contour discrimination in adults. It is possible that either the sequences were too rapid or the pitch intervals too small (or both) for the infants to be able to acquire differentiable mental representations of these contours. Our goal in this experiment was not to study the respective effects of each of these sequence variables, but to find a melodic pattern for which original and retrograde versions could be discriminated in order to test our main hypothesis concerning stream organization. Therefore, a lower tempo and greater interval size were employed in Experiment 2.
Figure 2
Results and Discussion
The data for four control subjects were subsequently rejected since
their sucking rate had fallen below seven sucks/min in the one-minute periods
either preceding or succeeding the habituation point. Data for 20 experimental
and 16 control subjects were analyzed. According to a repeated measures ANOVA,
no difference between groups was found in the three minute periods prior to
habituation [F(2,68)<1]. A Period (2) X Group (2) mixed ANOVA was performed
as in Experiment 1 and revealed that the Experimental group's sucking rate
increased more across the habituation point than did that of the Control group
[F(1,34)=6.9, p<0.05]. This latter effect is slightly weaker if sucking rate
is computed on two-minute periods on either side of the habituation point
[F(1,30=3.6, p=0.066]. These results indicate that newborn infants can
discriminate the rising and falling melodic contours used in this experiment.
Comparisons of the data for Experiments 1 and 2 by way of planned contrasts within a between-subjects ANOVA [Experiment (2) X Group (2)] with the difference in sucking rate between pre-habituation and post-habituation periods as dependent variable clearly show that: 1) across the two experiments performance was similar in the Control groups [F(1,69)<1] and marginally different between Experimental groups [F(1,69)=3.5, p=0.065, although this difference is nonsignificant if two-minute periods are used to compute mean sucking rate: F(1,61)=2.4], and 2) differences between Experimental and Control groups were not significant for Experiment 1 [F(1,69)<1] but were significant for Experiment 2 [F(1,69)=8.0, p<0.01]; the same pattern of results was found for sucking rates computed on two-minute periods. This suggests a greater sensitivity to melodic contour change for the slower tempo and larger pitch interval patterns. This difference is unlikely to be related to psychoacoustic limits of frequency discrimination. Frequency resolution is better than or equal to 4% for pure tones at three months of age when the sounds are presented at 40 dB SL (Olsho et al., 1987) . Frequency resolution is lower at higher frequencies in early infancy and increases over the first year of life, but there is a much smaller effect of development at lower frequencies that are comparable to the fundamental frequencies in our stimuli (Spetner and Olsho, 1990) . Further, Olsho (1985) has shown similar psychoacoustic tuning curves in 4-month-olds and adults, although it is unknown whether newborns also have similar tuning to adults.
Figure 3
The better discrimination of slower sequences may also be due to limits in temporal resolution. Gap detection studies by Werner et al. (1992) have measured thresholds at about 60 ms in the period of three to twelve months, while thresholds measured in adults are generally less than 10 ms. These thresholds measured in the presence of low-pass noise are up to 100 ms for 3- to 6--month-olds. Corresponding thresholds (d'1.0) in data from Trehub et al. (1995) for 6.5-month-old infants were just under 30 ms for tone pips. Again, to our knowledge no data are available on temporal resolution in newborns. The inter-tone intervals in our Experiments 1 and 2 were 15 and 115 ms, respectively. Marean and Werner (1991) have shown 20 dB of forward masking of a 1 kHz pure tone by a broad-band noise with an inter-tone interval of 20 ms and 12 dB of masking for an interval of 100 ms. While these data are for older infants, the resolution may be even worse at birth. Further research will be needed to tease apart the relative importance of these two factors in the perception of melodic contours by newborn infants and their applicability to discrimination of melodic sequences such as those employed in this study.
Method
The method was identical to that in the previous two experiments. The
tempo and pitches of the stimulus sequences were identical to those from
Experiment 2, but the 2/2 configuration of timbres and spatial positions was
used (Fig. 1). Twenty-four subjects completed the experiment in each of the
Experimental and Control groups. Data for 45 additional subjects were rejected:
10 did not habituate within 15 minutes, 18 produced insufficient or irregular
sucking or spat out the pacifier, five fell asleep, ten cried or became
agitated, and two were removed due to experimenter error.
Results and Discussion
The data for five subjects (four control and one experimental) were
subsequently rejected since their sucking rate had fallen below seven sucks/min
in the one-minute periods either preceding or succeeding the habituation point.
The data for the two groups (20 control, 23 experimental) are presented in
Figure 4. No difference between groups was found in the three minute periods
prior to habituation [F(2,82)=1.3, n.s.]. A mixed ANOVA [Period (2) X Group
(2)] indicates that the Experimental group's mean sucking rate did not increase
more across the habituation point than did that of the Control group
[F(1,41)<1], indicating that they did not discriminate rising and falling
contours in the 2/2 configuration. Identical results were found for sucking
rates computed on two-minute periods before and after the habituation point.
Data from Experiments 2 and 3 were compared by way of planned contrasts within a between-subjects ANOVA [Experiment (2) X Group (2)] with the difference in sucking rate between pre-habituation and post-habituation periods as dependent variable. Although rates were similar in the Control and Experimental groups across the two experiments [F(1,75)<1 in both cases], differences between Experimental and Control groups were significant for Experiment 2 [F(1,75)=5.2, p<0.05] but were not significant for Experiment 3 [F(1,75)<1]. The significant contrast for Experiment 2, derived from one-minute sucking periods, is weakened somewhat when two-minute periods are used. This difference may be due to the transient increase in response to novelty of the sequence change for the experimental group as can be seen in Figure 3. These results suggest that newborns do not discriminate rising from falling patterns when their events are distributed in the 2/2 configuration on the basis of timbre and spatial position.
We can rule out a number of potential explanations for this difference between experiments: 1) The effect cannot be ascribed to general differences in the stimuli since the only difference between Experiments 2 and 3 is that one note in the pattern changed timbre and position (transforming the 3/1 configuration into the 2/2 configuration). Otherwise there was no change in stimulus complexity: the melodic patterns had the same pitches, the same intensity, the same tempo, and each had two timbres and two spatial positions. 2) Differences in time elapsed between the last presentation of an habituation stimulus and the first presentation of a new stimulus for the Experimental group can also be ruled out. A comparison of critical inter-stimulus intervals across the habituation point show no significant differences between Control and Experimental groups within each experiment [Exp. 2: unpaired t(34)=-0.09, n.s.; Exp. 3: t(41)=-0.48, n.s.] nor differences between corresponding groups across the two experiments [Control: t(34)=1.00, n.s.; Experimental: t(41)=0.95, n.s.]. 3) Finally, differences between experiments in newborns' overall response rates are not responsible for the effet either. The planned contrasts described above demonstrate that there was no difference in global response rate between corresponding subject groups in the two experiments.
The most plausible explanation for the fact that the melodic contour is discriminated in one pattern and not the other is that sequential organization processes operating on the basis of timbre and spatial position do not give perceptual access to a discriminable contour in the 2/2 configuration, but do in the 3/1 configuration. This conclusion is weakened somewhat by the lack of significant interaction in the across-experiment ANOVA, but the results of the planned contrasts are consistent with the stream-segregation hypothesis.
Figure 4
Method
In the experiments with the newborns, stimulus presentation was
contingent upon high-amplitude sucking. The time interval between the last
presentation of the habituation sequence and the first post-habituation
sequence varied from 0.3 to 45.3 sec across all three experiments (M=11.2 sec,
s.d.=11.7 sec). The upper limit on this interval would be constrained by our
low-rate rejection criterion. In order to simulate the same kind of variation
for adult listeners, we therefore decided to present inter-sequence silences of
5, 15, and 25 sec for each experimental condition. These values are longer than
37%, 70%, and 84%, respectively, of all ISIs across the habituation point in
the newborn experiments.
The experiment was conducted in two 45-minute sessions, one with the fast tempo/small pitch interval condition and one with the slow tempo/large interval condition. The order of presentation of the sessions was counterbalanced across two groups of six non-musician listeners. In each session 48 trials were presented composed of two Configurations (3/1, 2/2), three inter-sequence Silences (5, 15, 25 sec), four Comparisons (rising/rising, falling/falling, rising/falling, falling/rising), and two Repetitions.
The subjects heard a warning signal followed 2 sec later by the initial sequence which faded in over 2 sec, played 10 cycles at full level, and then faded out over 2 sec. After a variable silence, the second sequence was presented in the same manner. The subject was to judge whether the order of the pitches (i.e. the pitch contour) in the two sequences was the same or different. One "same" and one "different" pair were presented for each initial sequence in each experimental condition. No feedback was given concerning the correct response.
The subjects in each group were tested collectively. They were seated in a sound-treated room in front of two loudspeakers that formed an azimuthal angle of between 60deg. and 90deg. depending on the subject's distance from the loudspeakers. The trumpet sound was always presented in the right speaker and the vibraphone in the left. Subjects marked their responses on an answer sheet.
Results and Discussion
For each condition and subject, a "true discrimination" score was
computed by subtracting the false-alarm rate from the hit rate across
Comparisons and Repetitions. This gives a performance score that varies between
0 (chance performance) and 1 (perfect performance) unless listeners
systematically respond incorrectly in which case the score can be negative. The
mean scores across Tempo/Interval and Presentation order are shown in Figure 5
as a function of inter-sequence Silence. The scores were submitted to a
four-way ANOVA with repeated within-subjects factors Tempo/Interval (2) X
Configuration (2) X Silence (3) and between-subjects factor Presentation order
of sessions (2). The only significant effects were Silent interval
[F(2,20)=3.94, p<0.05], indicating that performance decreases slightly
overall when the delay between target and comparison sequence is long, and
timbre/space Configuration [F(1,10)=74.48, p < 0.0001], demonstrating
clearly that subjects perform well with the 3/1 configuration and very poorly
with the 2/2 configuration (0.70 vs 0.11 globally), as was hypothesized at the
beginning. Thus as measured in a paradigm requiring an explicit response from
adult subjects, results similar to those obtained with newborns are obtained,
suggesting in both cases that the sequences are organized into two streams on
the basis of timbre and spatial location of the sound events.
Figure 5
Neither newborns nor adults discriminated original from retrograded 2/2 configurations (Exps. 3 and 4, respectively). This result, taken together with the capacity to discriminate similar 3/1 configurations in Experiment 2, suggests that the ability to discriminate melodic sequences depends on the way the events are organized into streams on the basis of timbre and/or spatial position of the sources. Work by Hartmann and Johnson (1991) on adults using an interleaved melody recognition task, indicates that it is probably the timbre difference that is primarily responsible. Our study extends work reported by (Demany, 1982; Fassbender, 1993) for similar kinds of sequences presented to infants of 1.5 to 5.5 months of age. We are inclined to interpret the data as indicating that stream organization mechanisms are present at birth. However, the statistical weakness of the critical comparisons makes it clear that this kind of study needs to be replicated with other response paradigms and perhaps performed with a paradigm in which discrimination is required to demonstrate streaming to confirm the present results in which streaming is inferred from a lack of discrimination.
This study is the first to investigate stream organization in newborns.
Methodological limitations of research with newborn babies could be partly
responsible for the lack of work on such questions. Most experiments address
questions of the type "Are babies able to discriminate A from B?", A and B
being single stimulus events or categories. Questions about how newborns
construct percepts and organize complex stimuli are far more difficult to
address, as answers to them are often inferred from discrimination paradigms.
These latter present difficulties for long sequences even in adults, as is
witnessed by the relative lack of such research in this journal. In spite of
the tenuousness in interpreting newborn data on such complex perceptual
processing, the present data suggest, in accord with previous studies, that the
most basic processes for stream organization are operative very early in life.
Babies can build auditory streams on the basis of frequency proximity,
intensity similarity, and spectral similarity (Demany, 1982; Fassbender, 1993).
The present study suggests that, in addition, complex timbral and spatial
properties of sound sources may also be used as cues to form auditory streams
from sequences of events and to differentiate the streams thus formed. These
presumably unlearned processes (even if they are more affected by certain
qualities of the stimuli such as pitch interval and tempo than corresponding
processes in adults) would allow infants to perceptually structure their
acoustic environment.
We know that infants perceive visual objects and events according to innate constraints that allow them to organize their knowledge of objects and events earlier than they could do on the basis of their own experience with these objects (Baillargeon et al., 1990; Spelke et al., 1992). There is almost no comparable data on the way infants acquire knowledge about the acoustic world. Some innate predisposition seems to be at work for perceiving speech (Pinker, 1994). But what about more general-purpose mechanisms? To what extent is the human auditory system innately predisposed to use different acoustic cues in such a way that auditory scenes could be analyzed in terms of objects or coherent sources and not as a collection of acoustic dimensions that the infant would have to learn to relate and to combine through experience? Our results are compatible with the hypothesis that some basic processes are in place at birth, even if not completely developed. More research is needed to better understand how these basic processes participate in the early development of more "strategic" or "heuristic" processes such as those that are characterized as "schema-based" by Bregman (1990).For example, Newman and Jusczyk (1996) recently tested 7.5-month-old infants' abilities to extract speech information (words) delivered by a voice in the presence of another, competing voice, speaking simultaneously at different intensities. Several experiments showed that the infants could recognize target words, indicating that they were able to segregate two speech streams and to selectively and continuously attend to the target speech, at least when it was more intense than the background speech.
Given the extensive research with adults on such questions, it is imperative to explore infants' primary abilities in auditory cognition. The present research will perhaps have had the merit of inciting further work in this little explored domain.
Bregman, A. S. (1978). "Auditory streaming is cumulative," J. Exp.
Psychol.: Human Percept. Perf. 4, 380-387.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual
Organization of Sound (MIT Press, Cambridge, MA).
Bregman, A. S. and Campbell, J. (1971). "Primary auditory stream
segregation and perception of order in rapid sequences of tones," J. Exp.
Psychol. 89, 244-249.
Bregman, A. S., Liao, C., and Levitan, R. (1990). "Auditory grouping
based on fundamental frequency and formant peak frequency," Can. J. Psychol.
44, 400-413.
Clifton, R. K. (1992). "The development of spatial hearing in human
infants," in Developmental Psychoacoustics, edited by L. A. Werner E. W.
Rubel (American Psychological Association, Washington, DC), pp. 135-158.
Demany, L. (1982). "Auditory stream segregation in infancy," Infant
Behav. Dev. 5, 261-276.
Fassbender, C. (1993). Auditory grouping and segregation processes in
infancy (Kaste Verlag, Norderstedt).
Ferland, M. B. and Mendelson, M. J. (1989). "Infants' categorization of
melodic contour," Infant Behav. Dev. 12, 341-355.
Floccia, C., Christophe, A., and Bertoncini, J. (1997). "High-amplitude
sucking and newborns: The quest for underlying mechanisms," Journal of
Experimental Child Psychology 64, 175-198.
Hartmann, W. M. and Johnson, D. (1991). "Stream segregation and
peripheral channeling," Music Perception 9, 155-184.
Jusczyk, P. W. (1985). "The high-amplitude sucking technique as a
methodological tool in speech perception research," in Measurement of
Audition and Vision in the First Year of Life: A Methodological Overview,
edited by G. Gottlieb, N. A. Krasnegor (Ablex, Norwood, NJ), pp. 195-222.
Jusczyk, P. W. (1997). The Discovery of Spoken Language (MIT
Press, Cambridge, MA).
Marean, G. C. and Werner, L. A. (1991). "Forward masking functions of
3-month-old infants," J. Acoust. Soc. Am. 89, 1914(A).
McAdams, S., Bertoncini, J., and Bobrow, J. (1990). "Organization and
discrimination of repeating sound sequences by newborn infants," J. Acoust.
Soc. Am. 88, S91.
McAdams, S. and Bregman, A. S. (1979). "Hearing musical streams,"
Comput. Mus. J. 3(4), 26-43.
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., and Krimphoff, J.
(1995). "Perceptual scaling of synthesized musical timbres: Common
dimensions, specificities, and latent subject classes," Psychol. Res.
58, 177-192.
Newman, R. S. and Jusczyk, P. W. (1996). "The coctail party effect in
infants," Percept. Psychophys. 58, 1145-1156.
Olsho, L. W. (1985). "Infant auditory perception: Tonal masking," Infant
Behav. Dev. 8, 371-384.
Olsho, L. W., Koch, E. G., and Halpin, C. F. (1987). "Level and age
effects in infant frequency discrimination," J. Acoust. Soc. Am. 82,
454-464.
Pinker, S. (1994). The Language Instinct (W. Morrow, New York).
Spelke, E. S. (1990). "Principles of object perception," Cognitive
Science 14, 29-56.
Spelke, E. S., Breinlinger, K., Macomber, J., and Jacobson, K. (1992).
"Origins of knowledge," Psychol. Rev. 99, 605-632.
Spetner, N. B. and Olsho, L. W. (1990). "Auditory frequency resolution
in human infancy," Child Dev. 61, 632-652.
Trehub, S. E., Endman, M. W., and Thorpe, L. A. (1990).
"Infants'perception of timbre: Classification of complex tones by spectral
structure," Journal of Experimental Child Psychology 49, 300-313.
Trehub, S. E., Schneider, B. A., and Henderson, J. L. (1995). "Gap
detection in infants, children, and adults," J. Acoust. Soc. Am. 98,
2532-2541.
Trehub, S. E., Thorpe, L. A., and Morrongiello, B. A. (1987).
"Organization processes in infants' perception of auditory patterns," Child
Dev. 58, 741-749.
Trehub, S. E. and Trainor, L. J. (1993). "Listening strategies in
infancy: The roots of music and language development," in Thinking in Sound:
The Cognitive Psychology of Human Audition, edited by S. McAdams E. Bigand
(Oxford University Press, Oxford), pp. 278-327.
van Noorden, L. P. A. S. (1977). "Minimum differences of level and
frequency for perceptual fission of tone sequences ABAB," J. Acoust. Soc. Am.
61, 1041-1045.
Werner, L. A. (1992). "Interpreting developmental psychoacoustics," in
Developmental Psychoacoustics, edited by L. A. Werner E. W. Rubel
(American Psychological Association, Washington, DC), pp. 47-88.
Werner, L. A., Marean, G. C., Halpin, C. F., Spetner, N. B., and Gillenwater,
J. M. (1992). "Infant auditory temporal acuity: Gap detection," Child
Dev. 63, 260-272.
Werner, L. A. and Rubel, E. W. (Eds.) (1992). Developmental
Psychoacoustics,. (American Psychological Association, Washington, DC).
Wessel, D. L., Bristow, D., and Settel, Z. (1987). "Control of phrasing
and articulation in synthesis", in Proceedings of the 1987 International
Computer Music Conference, pp. 108-116 (Computer Music Association, San
Francisco).
Figure 2. Results for Experiment 1. Mean sucking rate is shown for one-minute sample periods before and after the habituation criterion was
attained. Separate curves are shown for Experimental (melodic contour change at
habituation) and Control (no change) groups. Vertical bars represent +/- one
standard error.
Figure 3. Results for Experiment 2 (see Fig. 2 caption).
Figure 4. Results for Experiment 3 (see Fig. 2 caption).
Figure 5. Results for Experiment 4. Mean "true" discrimination scores (hit rate minus false alarm rate) are shown as a function of duration of the silent interval separating the two sequences to be compared in a trial. (Chance performance is zero for this score.) The data for the two stimulus
configurations (3/1 and 2/2), averaged across tempo/interval size conditions,
are shown as separate curves. Vertical bars represent +/- one standard error.
a) Portions of these results were presented at the 120th Meeting of the
Acoustical Society of America, San Diego ____________________________ ____________________________FIGURE CAPTIONS
Figure 1. Three cycles of each of the four-tone repeating stimulus patterns used in the melodic discrimination paradigm are shown. Pitch
corresponds to the vertical dimension and time to the horizontal dimension.
Timbre and spatial location of the sound sources are shown by the form and
shading of the events. Speakers were placed to the right and left of the
infant's head.
1 While we conceive of the dimension of discrimination as being
"contour" in this study, it should be pointed out that we cannot separate
melodic interval pattern discrimination from discrimination of the contour of
ups and downs in the melody here since the two covary, i.e. we do not present
similar contours that vary in interval pattern and absolute pitch as did Trehub et al. (1987)
Running title: Auditory streaming in newborns
Server © IRCAM-CGP, 1996-2008 - file updated on .
Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le .