People in the Making Sense of Sounds project

Principal Investigator

Mark Plumbley

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Professor of Signal Processing in the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey. He has investigated a wide range of audio signal processing methods, including automatic music transcription (Abdallah & Plumbley, 2006) and audio source separation (Nesbit et al, 2011), using techniques such as sparse representations (Plumbley et al 2010), and high- resolution NMF (Badeau & Plumbley, 2014). He led the D-CASE data challenge on Detection and Classification of Acoustic Scenes and Events (Stowell et al, 2015), his work on bird sound classification (Stowell and Plumbley, 2014) was widely featured in news media, and collaborative article on Best practices for scientific computing (Wilson et al, 2014) was the most-read PLoS Biology article of 2014. He is PI on the EPSRC project Musical Audio Repurposing using Source Separation, is the lead academic on the Innovate UK projects Audio Data Exploration and Advanced Smart Microphone, with the company Audio Analytic, and has been PI on several other EPSRC grants. Before joining Surrey (Jan 2015), Plumbley was Director of the world-leading Centre for Digital Music (C4DM) at Queen Mary University of London.

Co-Investigators

Philip Jackson

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Senior Lecturer at CVSSP and leads the Machine Audition Lab within CVSSP. His research in audio and speech processing has contributed to projects (e.g., Columbo, BALTHASAR, DANSA, QESTRAL, UDRC and POSZ) in acoustics of speech production (Jackson & Shadle, 2001), audio source separation (Liu et al, 2013; Alinaghi et al, 2014), audio-visual processing for speech enhancement and visual speech synthesis, and spatial aspects of subjective sound quality evaluation (Coleman et al, 2014; Conetta et al, 2014), and has over 100 academic publications. He is Surrey's technical lead on the EPSRC-funded S3A Programme Grant on spatial audio, responsible for object-based audio production.

More...

Wenwu Wang

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Reader in Signal Processing at CVSSP and Co-Director of the Machine Audition Lab in CVSSP. His research interests include audio source separation (Liu et al, 2013; Alinaghi et al, 2014), blind dereverberation (Jan & Wang, 2012), sparse signal processing (Dai et al, 2012), machine learning (Barnard et al, 2014), and audio-visual signal processing (Kilic et al, 2015). He has over 150 publications in these areas including two books: Machine Audition: Principles, Algorithms and Systems (2010) and Blind Source Separation: Advances in Theory, Algorithms and Applications (2014). He has been a PI and Co-I on several EPSRC projects, including Multimodal Blind Source Separation for Robot Audition (EPSRC EP/H012842/1) and Audio and Video Based Speech Separation of Multiple Moving Sources (EPSRC EP/H050000/1).

Krystian Mikolajczyk

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Reader in Robot Vision in CVSSP. He has more than 80 publications in top-tier computer vision and machine learning forums, in areas such as feature extraction, object recognition, and tracking, including kernel based methods (Yan et al, 2011), audio-visual annotation (Yan et al, 2014), and deep canonical correlation analysis (Yan & Mikolajczyk, 2015). He received Longuet-Higgins Prize 2014 for his contribution to invariant local image descriptors. He chaired BMVC 2012 and the IEEE AVSS 2013. His team regularly comes top in retrieval challenges, including in TRECVid 2008, Visual Object Classes Challenge 2008 & 2010, and ImageCLEF 2010.

David Frohlich

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Director of Digital World Research Centre (DWRC) at Surrey and Professor of Interaction Design. He joined the Centre in 2005 from HP Labs, to establish a new research agenda on user- centred innovation in digital media technology. His work explores a variety of new media futures relating to digital storytelling, personal media collections, and community news and arts. David has a longstanding interest in sound through his work on Audiophotography as a new media form (Frohlich, 2004; Frohlich & Fennell, 2007): this led to the first published study of the domestic soundscape on the Sonic Interventions project (Oleksik et al, 2008), and the Com-Me toolkit (Frohlich et al, 2012) for creating audiophoto narratives on the Community Generated Media project (EP/H042857/1), and the design of audiopaper documents on the Interactive Newsprint and Light Tags projects (EP/I032142/1, EP/K503939/1).

Bill Davies

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Expert in soundscapes, room acoustics and perception. He has 27 years’ experience in acoustic and psychoacoustic research methods. He led the EPSRC Positive Soundscape Project (EP/E011624/1), a large consortium project to develop new ways of evaluating soundscapes, involving artists, social scientists and acousticians. Project outputs included new methods to measure soundscape perception, advice for urban planners, a major art exhibition, radio programmes, and a soundscape sequencer toy for people to gain a greater understanding of their aural environment (Davies et al, 2013). He edited a special edition of Applied Acoustics on soundscapes, and sits on ISO TC43/SC1/WG54 producing standards on soundscape assessment. He led Defra project (NANR200) to advise on UK soundscape policy (Payne et al, 2009). Davies is currently Vice-President (International) of the Institute of Acoustics, and leads work on perception of complex auditory scenes on the EPSRC-funded S3A Programme Grant.

Trevor Cox

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Professor of Acoustic Engineering and a past President of the Institute of Acoustics (IOA). He was awarded the IOA’s Tyndall Medal in 2004. He has been an investigator on EPSRC projects on room acoustics, signal processing, perception and public engagement. These include EP/J013013/1 on the perception and automatic detection of audio recording errors using perceptual testing and blind signal processing methods (Jackson et al, 2014). Cox leads the qualitative and quantitative perceptual work on the EPSRC S3A Programme Grant. He was an EPSRC Senior Media Fellow and has presented 21 science documentaries on BBC Radio, authored articles for The Guardian, New Scientist and Sound on Sound. His popular science book Sonic Wonderland was published in 2014. He pioneered psychoacoustic testing as a method for engaging the public, working with BBC R&D and the British Science Association on theme tunes (Davies et al, 2011), a technique that will be exploited in this project.

More...

Postdoctoral Researchers

Oliver Bones

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

A Research Fellow with research interests in sound perception and auditory science. He studied for his PhD at the University of Manchester, researching the neural basis for individual differences in the perception of musical consonance with Chris Plack. He then took a postdoctoral research position with Patrick Wong at the Chinese University of Hong Kong researching pitch perception in tone-language speakers, before joining the Acoustic Research Centre at the University of Salford as a Research Fellow working with Bill Davies and Trevor Cox.

More...

Qiang Huang

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

A Research Fellow now working on the project of Making Sense of Sound. He started his PhD research in speech recognition and natural language processing at the University of East Anglia. He has worked on several EPSRC projects on speech recognition, information retrieval, sports video analysis and multimodal dialogue system. He is now focusing on multimodal information processing for sound understanding.

Yong Xu

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Research Fellow working at the University of Surrey. He received his PhD degree from University of Science and Technology of China (USTC) in 2015. He once visited Georgia Institute of Technology, USA from Sept., 2014 to May, 2015. He had a short intern in Bosch research center, USA. He also worked in IFLYTEK company from April, 2015 to April, 2016. He serves as a reviewer for ICASSP, IJCNN, EUSIPCO, DSP, Eusipco, Audio Engineering Society conference, IEEE /ACM Transactions on Audio, Speech and Language Processing, IEEE signal processing letters, Speech communication. His papers now have 455 citations.

Yin Cao

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Research fellow at CVSSP of the University of Surrey. He received his Ph.D. degree in 2013 at the Institute of Acoustics, Chinese Academy of Sciences, majoring in active noise control of sound and vibration. He then went to Brigham Young University and conducting post-doctoral research from 2013 to 2015. After that, he held a position at the Institute of Acoustics, Chinese Academy of Sciences, and served as an associate research fellow. He serves as a reviewer for the Journal of Acoustical Society of America, Applied Acoustics, Noise Control Engineering Journal. His research interests range from air acoustics to signal processing.

Saeid Safavi

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Research fellow at CVSSP of the University of Surrey. He received a MEng in 2009 and a PhD in 2014 in Electrical Engineering from the University of Birmingham, UK, where he also worked as a post-doctorate researcher for a year in 2015. He then joined the University of Hertfordshire, UK, from 2016 to 2017 to work as a research fellow in the EU H2020 project 'Objective Control for TAlker VErification'. After the project ended, he joined the University of Surrey in 2018 for another EU H2020 Project entitled 'Audio Commons'. He developed a framework to automatically predict the perceived level of reverberation directly from the audio files in an uncontrolled recording environments. He has published over 30 peer-reviewed papers in the leading journal and international conferences in the areas of speech processing, machine learning, human perception, accent recognition and automatic speaker characterisation.

More...

Simone Graetzer

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Research Fellow with research interests in acoustics, particularly in speech acoustics, psychoacoustics, room acoustics, spatial audio, and signal processing of speech and audio. Previously, she was a Research Associate at the Acoustics Research Unit (ARU) at the University of Liverpool, where she worked with Carl Hopkins on speech security, speech intelligibility and speech enhancement. Prior to joining the ARU, she was a Research Associate in the Voice Biomechanics and Acoustics Laboratory at Michigan State University, where she worked on a National Institutes of Health (NIH) funded project concerning speech accommodation in occupational settings and acoustical environments. She is a member of the Institute of Acoustics (IOA) and the Acoustical Society of America (ASA), and is on the steering committee of the UK Acoustics Network. She was the recipient of a Young Investigator Award for promising young acousticians (Acoustical Society of America, 2015).

Research Software Developer

Christian Kroos

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Cognitive scientist with focus on algorithm development. He was awarded an MA and PhD in Phonetics and Speech Communication with Logic as minor by the Ludwig-Maximilian-Universität (Munich, Germany). Over the last two decades he has conducted research in Germany (Ludwig-Maximilian-Universität), Japan (ATR, Kyoto), USA (Haskins Laboratories, New Haven, CT), Australia (Western Sydney University, Sydney & Curtin University, Perth) and the UK, spanning cognitive science, artificial intelligence, robotics and the arts.

More...

Associated Researchers

Lara Harris

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Technician supporting a team of research fellows on two large EPSRC-funded projects, Making Sense of Sounds and S3A, primarily responsible for gathering perceptual data in listening experiments. Lara has industry experience in R&D of electronic safety products, aeroacoustics, and automotive infotainment, but has always been most interested in audio technology. She gained a PhD from the Institute of Sound and Vibration Research, University of Southampton, researching the objective and perceptual assessment of bass reproduction accuracy in mix monitors. Lara subsequently contributed to a chapter on this subject in the second edition of the textbook Loudspeakers: For Music Recording and Reproduction (Newell and Holland, 2018. Focal Press).

Iwona Sobieraj

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Qiuqiang Kong

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

PhD student. Working on non-speech audio processing using deep learning methods.