This paper presents virtual sound scene rendering and user interaction aspects in a European Union project called Carrouso. It is an IST project used for recording, transmission and rendering of 3D sound scenes in the MPEG-4 format and using the Wave Field Synthesis (WFS) method. In the encoder side the sound sources are recorded as monophonic, dry signals, and parameters describing the room acoustics of the recording environment are encoded as a separate MPEG-4 stream. In the decoder, the transmitted room acoustic description defines how the sounds should be perceived in the virtual sound environment created at the renderer. This scheme ensures that the room acoustic properties of the sound scenes can be modified interactively, and together with the WFS rendering technique it makes possible to have interactive 3D music performances that are audible to many users simultaneously. We will present the Carrouso framework, and concentrate on user interface and rendering issues in a case where the acoustic environment is represented with a set of perceptual room acoustic parameters.