Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360 Images

Abstract

Recent progresses in Virtual Reality (VR) and Augmented Reality (AR) allow us to experience various VR/AR applications in our daily life. In order to maximise the immersiveness of user in VR/AR environments, a plausible spatial audio reproduction synchronised with visual information is essential. In this paper, we propose a simple and efficient system to estimate room acoustic for plausible reproducton of spatial audio using 360 cameras for VR/AR applications. A pair of 360 images is used for room geometry and acoustic property estimation. A simplified 3D geometric model of the scene is estimated by depth estimation from captured images and semantic labelling using a convolutional neural network (CNN). The real environment acoustics are characterised by frequency-dependent acoustic predictions of the scene. Spatially synchronised audio is reproduced based on the estimated geometric and acoustic properties in the scene. The reconstructed scenes are rendered with synthesised spatial audio as VR/AR content. The results of estimated room geometry and simulated spatial audio are evaluated against the actual measurements and audio calculated from ground-truth Room Impulse Responses (RIRs) recorded in the rooms.

Paper

Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360 Images
Hansung Kim, Luca Remaggi, Philip J.B. Jackson and Adrian Hilton
IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR) 2019

Citation

    @inproceedings{Kim:IEEEVR:2019,
        AUTHOR = "Hansung Kim and Luca Remaggi and Philip J.B. Jackson and Adrian Hilton",
        TITLE = "Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360 Images",
        BOOKTITLE = "IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR)",
        YEAR = "2019",
    }

Data

Public datasets used in this paper can be found in the S3A Audio-Visual Scene Analysis datasets and resources.

Acknowledgments

This work was supported by the EPSRC Programme Grant S3A:Future Spatial Audio for an Immersive Listener Experience at Home (EP/L000539/1) and the BBC as part of the BBC Audio Research Partnership.Details about the data underlying this work, along with the terms for data access, are available from: http://dx.doi.org/10.15126/surreydata.00812228