Résumé |
Many applications and practices of working with recorded sounds are based on the segmentation and concatenation of fragments of audio streams. In collaborations with composers and sound artists we have observed that a recurrent musical event or sonic shape is often identified by the temporal evolution of the sound features. We would like to contribute to the development of a novel segmentation method based on the evolution of audio features that can be adapted to a given audio material in interaction with the user. In the first place, a prototype of a semi-supervised and interactive segmentation tool was implemented. With this prototype, the user provides a partial annotation of the stream he wants to segment. In an interactive loop, the system is able to build models of the morphological classes the user defines. These models will then be used to provide an exhaustive segmentation of the stream, generalizing the annotation of the user. This achievement relies on the use of Segmental Models, that have been adapted and implemented for sound streams represented by a set of audio descriptors (MFCC). The very novelty of this study is to use real data to build models of the morphological classes, issued from various audio materials. A singular method to build our global model is defined, using both learning paradigms and the integration of user knowledge. The global approach of this work is validated through experimentations with both synthesized streams and real-world materials (environmental sounds and music pieces). A qualitative and less formal validation also emerges from the feedback given by composers that worked with us along the whole internship. |