The aim of this project is to investigate, develop and demonstrate new ways to make sense from large amounts of everyday sounds, focussing on real-world non-music, non-speech sounds and soundscapes. In this way we will realize latent value in existing sound and broadcast archives, enable more productive interaction with sound data, and improve the lives of people in their sound environment.
To achieve this aim, our specific objectives are:
- To investigate and develop new machine learning and signal processing methods to analyse sounds and soundscapes;
- To investigate how to use other modalities such as vision and text that can bring complementary information to improve analysis and interaction with sound-focussed data;
- To investigate human sound perception and cognition in the context of sound data understanding, including emotional response, attention and context;
- To build a research software framework and datasets to encourage other researchers to contribute to the field; and to build a set of software demonstrators to illustrate the outcomes to potential users;
- To create a network of national and international partners from academia and industry, to realize the potential of current and future research and applications in making sense of sound data.
In this project we will investigate how to make sense from sound data, focussing on how to convert these recordings into understandable and actionable information: specifically how to allow people to search, browse and interact with sounds.
Increasing quantities of sound data are now being gathered in archives such as sound and audiovisual archives, through sound sensors such as city soundscape monitoring and as soundtracks on user-generated content. For example, the British Library (BL) Sound Archive has over a million discs and thousands of tapes; the BBC has some 1 million hours of digitized content; smart cities such as Santander (Spain) and Assen (Netherlands) are beginning to wire themselves up with a large number of distributed sensors; and 100 hours of video (with sound) are uploaded you YouTube every minute.
However, the ability to understand and interact with all this sound data is hampered by a lack of tools allowing people to “make sense of sounds” based on the audio content. For example, in a sound map, users may be able to search for sound clips by geographical location, but not by “similar sounds”. In broadcast archives, users must typically know which programme to look for, and listen through to find the section they need. Manually-entered textual metadata may allow text- based searching, but these typically only refer to the entire clip or programme, can often be ambiguous, and are hard to scale to large datasets. In addition, browsing sound data collections is a time-consuming process: without the help of e.g. key frame images available from video clips, each sound clip has to be “auditioned” (listened to) to find what is needed, and where the point of interest can be found. Radio programme producers currently have to train themselves to listen to audio clips at up to double speed to save time in the production process. Clearly better tools are needed.
To do this, we will investigate and develop new signal processing methods to analyse sound and audiovisual files, new interaction methods to search and browse through sets of sound files, and new methods to explore and understand the criteria searchers use when searching, selecting and interacting with sounds. The perceptual aspect will also investigate people’s emotional response to sounds and soundscapes, assisting sound designers or producers to find audio samples with the effect they want to create, and informing the development of public policy on urban soundscapes and their impact on people.
There are a wide range of potential beneficiaries for the research and tools that will be produced in this project, including both professional users and the general public. Archivists who are digitizing content into sound and audiovisual archives will benefit from new ways to visualize and tag archive material. Radio or television programme makers will benefit from new ways to search through recorded programme material and databases of sound effects to reuse, and new tools to visualize and repurpose archive material once identified. Sound artists and musicians will benefit from new ways to find interesting sound objects, or collections of sounds, for them to use as part of compositions or installations. Educators will benefit from new ways to find material on particular topics (machines, wildlife) based on their sound properties rather than metadata. Urban planners and policy makers will benefit from new tools to understand the urban sound environment, and people living in those urban environments will benefit through improved city sound policies and better designed soundscapes, making the urban environment more pleasant. For the general public, many people are now building their own archives of recordings, in the form of videos with soundtracks, and may in future include photographs with associated sounds (audiophotographs). This research will help people make sense of the sounds that surround us, and the associations and memories that they bring.
- Audio Analytic Ltd
- City, University of London
- NYU Steinhardt
- Pompeu Fabra University
- Queen Mary, University of London
- University of Groningen