This paper deals with the automatic generation of music audio summaries from signal analysis without the use of any other information. The strategy employed here is to consider the audio signal as a succession of “states” (at various scales) corresponding to the structure (at various scales) of a piece of music. This is, of course, only applicable to certain kinds of musical genres based on some kind of repetition. From the audio signal, we first derive dynamic features representing the time evolution of the energy content in various frequency bands. These features constitute our observations from which we derive a representation of the music in terms of “states”. Since human segmentation and grouping performs better upon subsequent hearings, this “natural” approach is followed here. The first pass of the proposed algorithm uses segmentation in order to create “templates”. The second pass uses these templates in order to propose a structure of the music using unsupervised learning methods (Kmeans and hidden Markov model). The audio summary is finally constructed by choosing a representative example of each state. Further refinements of the summary audio signal construction, uses overlapadd, and a tempo detection/ beat alignment in order to improve the audio quality of the created summary.