Dom Schlienger: Acoustic Localisation Techniques for Interactive and Locative Audio Applications



Reviewing the literature on positioning systems using acoustic source localisation principles for Interactive and Locative Audio Applications (ILAA), it becomes clear that Acoustic Localisation and Positioning Systems (ALPS) implemented on ubiquitous devices can provide an alternative to Motion Capture systems (MoCap) wherever multiple speakers are part of an application. Providing background on and defining the notion of ILAA, this paper argues that, based on comparisons of recent applications in the literature, ALPS can provide competitive alternatives to MoCap, the system prevalently used in ILAA.

1 Introduction

With the advent of multi-track recording technology and its possibilities for spatial distribution, the relation between the origin of a sound and the listener was essentially abstracted. Today, with location aware mobile technology and almost ubiquitous internet access, we can use the position of a moving object in space as a dynamic parameter and map it, arguably, to anything. In the case of musical applications these mappings can be used for musical expression, to track gestural control for hyper instruments for example, or to let a dancer control musical parameters through spatial cues.

The use of these spatial interfaces for musical applications are well documented in the proceedings of the international conference on New Interfaces for Musical Expression (NIME), for example. The importance of digital positioning technology in general is born witness by the yearly IEEE conference on Indoors Positioning and Indoors Navigation, and Ubiquitous Positioning Indoors Navigation and Location Based Service.

In Interactive and Locative Audio Applications (ILAA), Motion Capture (MoCap) is the most commonly applied principle, using cameras to track movement. However, the question is if it is always the best choice. In our previous work (Schlienger 2012), we found evidence that for some applications acoustic localisation techniques might perform equivalently to MoCap and often at a fraction of the cost.

In some scenarios acoustic localisation techniques could clearly outperform MoCap. MoCap requires a line of sight between the tracked object and the camera. If something obstructs this line, position data cannot be obtained. An analogous situation for acoustic localisation, where an object obstructs the signal path between microphone and loudspeaker, is a lot less problematic due to the physical nature of the audio signal. In fact, the principle’s feasibility is well documented (Schlienger 2012; Janson et al. 2010; Rishabh et al. 2012; Mandal et al. 2005; Filonenko et al. 2010; Raykar et al. 2005; Tervo 2012), despite its rare implementation in ILAA. The implementation is particularly straightforward when multiple speakers are already part of the system, as is often the case in ILAA.

By comparing the performance of Acoustic Localisation and Positioning Systems (ALPS) with the documented performance of MoCap, we can further evaluate the principle’s suitability for ILAA, and contribute to a growing body of work which advocates its implementation. ALPS are often cheaper and usually make better use of existing technology within ILAA, in accordance with the notion of ubiquitous computing, as our findings show.

This paper provides background on ILAA, followed by a discussion of technical aspect of ALPS and how they compare to MoCap and some MoCap hybrid systems. It then goes on to discuss the limits of MoCap for ILAA and why ALPS could provide alternatives, followed by a section on future work and conclusive remarks.

2 Background

2.1 Interactive and Locative Audio Applications

In the present context, ILAA are audio applications in which the position or the change of position of an object in physical space gives a cue for some process to happen. In general, ILAA could also include audio applications where some process changes the physical position of something in the space, but for this paper the former definition shall apply.

The potential uses of positioning systems for interactive art, multimedia and interactive spatial music are well documented (Zvonar 2005) and early concepts go back to the emergence of electroacoustic music in the 1950s. Some examples are the potentiomètre d’éspace (Manning 2004), the spatialisation schemes of Varèse et al. for the Phillips Pavilion at the Expo in Brussels 1958 (Sonhors 2014) or Stockhausen’s for Kontakte (­Stockhausen 1959). Admittedly, these spatialisations were probably interactive in nature merely by the absence of digital multitrack control. If the computers were available at the time to control dozens of audio tracks automatically, this option would probably have been embraced, effectively disposing of interactive control.

Already in 1941, in a more mainstream but not less ground-breaking vein, the spatialisation of the soundtrack for Disney’s Fantasia, Fantasound (Garity 1941), used several operators to control the panning of separate tracks over multiple speakers. A more recent example for ILAA in the mainstream are interactive video game controllers, like Wii and Kinect, where spatial interactivity provides increased immersiveness, notably through surround sound (Schlienger 2012; Collins 2007, 2008) used to create a virtual space. The virtual space shall be defined here – similar to the definition in Normandeau 2009 – as the audio space created for the listener by the composer or, in paraphrase for the gaming environment by the developer. Virtual audio space can but does not have to be congruent with a real existing sound field in space. For the discussion on suitability of positioning technology for ILAA, this generalisation is helpful, as it provides a denominator for space as a concept in electroacoustic music, video games and spatial sound reproduction.

The very recent presentations held by Normendeau and Siegel at Sibelius Academy, both as guest performers at the 2013 SibA MuTe Fest (Centre for Music and Technology 2013)  highlight exemplarily the interest of electroacoustic composers in ILAA. Compositions like Movements in Possible Histories or a Composition for 24 Windows (Muredu and Mela 2011), also at Sibelius Academy (our research environment); research conducted at SARC, Belfast; Locus Sonus Laboratoire Art Audio, Aix-En-Provence; Zentrum für Kunst und Medientechnologie, Karlsruhe; the activities of NIME just to name a few, show the importance of the field for composers and musicians.

Electroacoustic spatial composition is just one area of ILAA where positioning technology can provide interaction. The ubiquity of mobile technology facilitates locative games like Papa Sangre (Wikipedia 2014), for example. Spatial audio interaction is a growing area in music technology, and is becoming increasingly pervasive. Various ongoing projects at institutions like Pervasive Media Studio Bristol are examples thereof (Pervasive Media Studio 2014).

2.2 Acoustic and Local Positioning Systems

Using the principle of acoustic source localization, we can obtain the distance between a speaker and microphone by measuring the time delay of the acoustic signal, as we know the speed of sound through air a priori. If we know the distances for multiple speakers, a 3-D position can be triangulated. The accuracy of such a system lies within the low decimetre range and its latency is the same as for audio recording systems, in the low millisecond range. The latency derives from the buffer lengths applied, which means that at the cost of covered area, by shortening the buffer size, the latency can be reduced. This is useful for applications where small gestural movements need to be tracked: for example, in case of instrument control. The system is thus scalable. The presence of multiple loudspeakers is a prerequisite, which in most ILAA is given. Other than that, all that is required is a microphone. This makes the ILAA principle inexpensive.

ALPS have not often been implemented to date. Rishabh et al. presented a system which uses controlled ambient sound, i.e. pseudo random noise, as a signal to measure the time delays (Rishabh et al. 2012). Random noise can be problematic for use in ILAA, as the noise would need to be masked at all time by the audible signal. The system described in Janson et al. (2010) compares arrival times of a distinct signal on networked devices, applying a multiple receiver principle. The systems in Mandal et al. (2005) and Filonenko et al. (2010) measure signals outside the limits of audible sounds, making them effectively ultrasound. Our suggestion concerns a one receiver multiple sender model, using the audible airborne sound (which is already part of an ILAA) directly as a measuring signal, as described in Schlienger (2012). ALPS are rarely implemented like this, particularly not by measuring the time delays of the signal carrying the audio content of an audio application itself, despite the feasibility of this method documented in Schlienger (2012); Janson et al. (2010); Rishabh et al. (2012); Mandal et al. (2005); Filonenko et al. (2010); Raykar et al. (2005); Tervo (2012).

Our previous work about the suitability of positioning systems for ILAA in general compared specifications in the literature to user requirements, obtained from early findings of an ongoing survey. The survey shows that optical tracking in form of MoCap provides good solutions as long as the requirement for line of sight between tracking device and tracked object does not cause issues. When this indeed does create issues, respondents revert to hybrid systems, where MoCap is combined with auxiliary systems using dead reckoning principles. Dead reckoning systems predominantly consist of inertia meters. In the literature the term inertial navigation system is thus often used. However, dead reckoning methods also include the use of compasses, providing a more general term. Dead reckoning systems usually require frequent updating with absolute data in ILAA, which predestines them for hybrid systems. On their own, they are usually only applied where absolute position is not necessary.

Furthermore, the survey showed respondents’ concerns about tracking as a privacy-sensitive issue. To have an opt-in choice is regarded as important. This can easily be achieved with the one receiver – multiple sender model. The control over privacy stays with the person holding the mobile device, the microphone.

3 Comparing MoCap to ALPS

In the discussion on suitability of ALPS for ILAA one does not get around the fact that the currently predominantly used optical tracking systems are generally considered to be satisfactory. Not many people who use MoCap criticise it, as there is not much to compare it to. Thus, in many cases it is the best available option.

However, the requirement of line of sight between a tracked device and a camera is an issue for many conceptual possibilities. Particularly when the tracked devices are to be on members of a crowd (Ulyate and Bianciardi 2001). Also for the systems described by Normandeau for the Klangdom using ZirkOsc (Zirkonium 2014), positioning information from ALPS could open new possibilities. In the examples discussed in Siegel (2011) on interactive applications, where dance controls music, ALPS could provide interesting and dynamic possibilities.

In view of audio-mobility, the dependence of MoCap on cameras means that ad-hoc networks of ubiquitous devices using MoCap are unrealistic. The environment needs to be controlled and the tracked object defined by color or shape. Even if line of sight could be established between mobile devices, in a multi user environment this would require some form of choreography. Despite its popularity, MoCap has a considerable disadvantage to ALPS, which is inherent to the very nature of the camera and constitutes its limitation as an interface to gather spatial data. The camera provides a 2-dimensional view of space and the further away from the camera an object is, the less information we gain about it.

The following example should not be misunderstood as a criticism of the artistic quality of the work in question. The tracking system applied by Siegel for Two Hands, (Not Clapping) (Siegel 2010) was originally developed as an interface for dancers to interact with music, wherein a movement in the field of the camera is digitally registered. But interaction through this interface requires the dancer to adapt performance to the interface’s characteristics. Due to the optical perspective of the camera, a moving object close to the camera causes more change in pixels than an object further away. That is, it provides a warped perspective. Thus the interface’s resolution for a movement in space is not linear but proportional to distance. As innovative as this is for the performance of Two Hands, (Not Clapping), the fact that the interface weighs a particular amount of movement in one area of the performance space differently to the same amount in a different area, makes it a poor interface for 3-D interaction. As a spatial interface, a lot of movement should be mappable to a pro rata equivalent. If this is not the case, additional information is needed, to differentiate between small cow very close, and large cow far away. ALPS could provide better results (N.B., not for Two Hands, (Not Clapping), where the warped perspective is an idiosyncratic part of the composition, but for a more general spatial human computer interface). This is particularly poignant if the performance space in question is not supposed to be understood as 3-dimensional, which is usually the case in ILAA, especially in implementations in ad-hoc networks. By using multiple cameras, depth information is added in some MoCap systems. For ILAA, this is a cost, which, in the presence of multiple loudspeakers, could be avoided by using ALPS.

It is evident that every medium influences the message by its characteristics (Normandeau 2009). This might not necessarily be negative, but it is understood that this influence can warp the data, as exemplified above. If the system is supposed to be a good interface according to the notion of ubiquitous computing, the influence of its characteristics, its visibility, has to be reduced (Weiser 1994).

4 Future Work and Conclusive Remarks

Looking at the existing literature and the early findings of our own research, the impression prevails that ALPS provide a competitive alternative for many spatially interactive applications, where audio is diffused over multiple speakers. The possibilities arising from ALPS implementation in ad-hoc networks stand in contrast with what is achievable with MoCap. MoCap’s limitation as an ubiquitous interface for ILAA are further compounded by the intrinsic distortion of spatial data through the 2-dimensional depiction of space by a camera.

In the next steps of the research, it is considered of paramount importance to find ways of establishing requirements directly from the potential uses in musical practices. As ALPS are rarely implemented for ILAA, not much material is available to document precisely how the system will be used by musicians or any other early adapters. So insight needs to be gained experimentally. As very little music of spatially interactive nature exists, free improvisation suggests itself as a means to study interaction of space and audio. To use free improvisation as a methodology to explore the relation of organised sound and space is in itself a very interesting but vast field, which, as a means to establish user requirements, shall be explored in the future work.



  1. “Spatialisation et image: la mise en espace des sons concrets et électroniques.” Sonhors. Accessed December 1, 2014.

Collins, Karen. 2007. An introduction to the participatory and non liner aspects of video games audio. Helsinki: Helsinki University Press.

Collins, Karen. ed. 2008. From Pac-Man to Pop Music: Interactive Audio in Games and New Media. Aldreshot: Ashgate.

Filonenko, Viacheslav, Charlie Cullen and James Carswell. 2010. “Ultrasonic positioning on mobile phones.” In International Conference on Indoor Positioning and Indoor Navigation (IPIN), Zürich, Switzerland-IEEE Xplore.

Centre for Music and Technology, Sibelius Academy Helsinki, 2013. Mute fest 2013. Accessed December 1, 2014.

Garity, WM. E. and J. N. A Hawkins. 1941. “Fantasound.” Journal of the Society of Motion Picture Engineers. Accessed December 1, 2014.

Janson, T., C. Schindelhauer and J. Wendeberg. 2010. “Self-localization application for iphone using only ambient sound signals.” In International Conference on Indoor Positioning and Indoor Navigation (IPIN), Zürich, Switzerland-IEEE Xplore.

Mandal, A., C.V. Lopes, T. Givargis, A. Haghighat, R. Jurdak, and P. Baldi. 2005. “Beep: 3D indoor positioning using audible sound.” In Consumer Communications and Networking Conference, 2005. CCNC. 2005 Second IEEE, 348-353.

Manning, Peter. 2004. Electronic and Computer Music. New York: Oxford University Press.

Mureddu, Libero and Martti Mela. 2011. Movements in possible histories or a composition for 24 windows. Accessed December 1, 2014.

Normandeau, Robert. 2009. “Timbre spatialisation: The medium is the space.” Organised Sound 14: 277-285.

Raykar, V. C., I. V. Kozintsev, and R. Lienhart. 2005. “Position calibration of microphones and loudspeakers in distributed computing platforms.” Speech and Audio Processing, IEEE Transactions on 13 (1), 70-83.

Rishabh, I., D. Kimber, and J. Adcock. 2012. “Indoor localization using controlled ambient sounds.” In International Conference on Indoor Positioning and Indoor Navigation (IPIN), Zürich, Switzerland-IEEE Xplore.

Schlienger, Dominik. 2012. “Indoors and local positioning systems for interactive and locative audio applications.” Accessed December 1, 2014.

Siegel, Wayne. 2010. Two hands (not clapping). Accessed December 1, 2014.

Siegel, Wayne. 2011. Dancing the Music. Oxford: Oxford University Press.

Stockhausen, Karlheinz. 1959. Kontakte. Accessed December 1, 2014.

Pervasive Media Studio. 2014. Pervasive media studio. Accessed December 1, 2014.

Tervo, Sakari. 2012. Localization and tracing of early acoustic reflections. Ph.D. thesis, Aalto University School of Science.

Ulyate, Ryan and David Bianciardi. 2001. “The interactive dance club: Avoiding chaos in a multi participant environment.” In NIME ’01 Proceedings of the 2001 conference on New interfaces for musical expression edited by I. Poupyrev, M. J. Lyons, S. Fels, and T. Blaine. Accessed December 1, 2014. Seattle (USA).

Weiser, Mark. 1994. Building invisible interfaces. Presentation Slides.

Wikipedia. 2014. Papa sangre. Wikipedia, the free encyclopedia. Accessed December 1, 2014.

Zirkonium. 2014. Zentrum f_r kunst und medientechnologie karlsruhe. Accessed December 1, 2014.

Zvonar, R. 2005. “A history of spatial music.” Accessed December 1, 2014.

PDF version of this article

Dom Schlienger is a musician and composer-researcher. He graduated with a MSc Audio Production from UWE Bristol in 2012 where he also did his BSc in Creative Music Technology in 2010. Now a doctoral student at Sibelius Academy Helsinki, Finland, he works on the development of an indoor positioning system for interactive audio applications on ubiquitous devices. He holds a graduate residency at Pervasive Media Studio Bristol, UK and also works as a freelance sound engineer and composer/sound designer for video and film.

Dom Schlienger- Acoustic Localisation Techniques for Interactive and Locative Audio Applications

Leave a Reply

Your email address will not be published. Required fields are marked *