Updated: 7 June 2018
Contact people: Hansung Kim and Luca Remaggi
This is a web-page to distribute Audio-Visual Scene Analysis datasets created in the EPSRC S3A project.
This page includes data that enables the reproduction of results of the papers below, and it is indexed with DOI 10.15126/surreydata.00812228
The source of the datasets should be acknowledged in all publications in which it is used as by referencing the following paper [2] and [3] and this web-site:
The Listening Room is a controlled listening environment approximately 5.6m x 5.0m x 2.9m with loudspeakers surrounding a central listening position.
Data:
Input 360 camera image - Top Input 360 camera image - Bottom
Estimated Depth Map
Object recognition result
Reconstructed 3D geometry model with [3] (obj format)
Reconstructed 3D geometry model with [4] (obj format)
Recorded room impulse responses (sofa format)
Rendering Results - Music
Original music recorded in an anechoic chamber | Music rendered with Ground Truth RIR |
Music rendered with the synthesized RIR by [3] | Music rendered with the synthesized RIR by [4] |
Rendering Results - Speech
Original speech recorded in an anechoic chamber | Speech rendered with Ground Truth RIR |
Speech rendered with the synthesized RIR by [3] | Speech rendered with the synthesized RIR by [4] |
The Usability Lab is representative of typical domestic living room environments approximately 5.6m x 5.2m x 2.9m
Data:
Input 360 camera image - Top Input 360 camera image - Bottom
Estimated Depth Map
Object recognition result
Reconstructed 3D geometry model with [3] (obj format)
Reconstructed 3D geometry model with [4] (obj format)
Recorded room impulse responses (sofa format)
Rendering Results - Music
Original music recorded in an anechoic chamber | Music rendered with Ground Truth RIR |
Music rendered with the synthesized RIR by [3] | Music rendered with the synthesized RIR by [4] |
Rendering Results - Speech
Original speech recorded in an anechoic chamber | Speech rendered with Ground Truth RIR |
Speech rendered with the synthesized RIR by [3] | Speech rendered with the synthesized RIR by [4] |
The Meeting Room is another representative of typical domestic living room environments approximately 5.6m x 4.3m x 2.3m
Data:
Input 360 camera image - Top Input 360 camera image - Bottom
Estimated Depth Map
Object recognition result
Reconstructed 3D geometry model with [3] (obj format)
Reconstructed 3D geometry model with [4] (obj format)
Recorded room impulse responses (sofa format)
Rendering Results - Music
Original music recorded in an anechoic chamber | Music rendered with Ground Truth RIR |
Music rendered with the synthesized RIR by [3] | Music rendered with the synthesized RIR by [4] |
Rendering Results - Speech
Original speech recorded in an anechoic chamber | Speech rendered with Ground Truth RIR |
Speech rendered with the synthesized RIR by [3] | Speech rendered with the synthesized RIR by [4] |
The Meeting Room is another representative of typical domestic living room environments approximately 5.6m x 4.3m x 2.3m
Data:
Input 360 camera image - Top Input 360 camera image - Bottom
Estimated Depth Map
Object recognition result
Reconstructed 3D geometry model with [4] (obj format)
Recorded room impulse responses (sofa format)
Rendering Results - Music
Original music recorded in an anechoic chamber | Music rendered with Ground Truth RIR |
Music rendered with the synthesized RIR by [3] | Music rendered with the synthesized RIR by [4] |
Rendering Results - Speech
Original speech recorded in an anechoic chamber | Speech rendered with Ground Truth RIR |
Speech rendered with the synthesized RIR by [3] | Speech rendered with the synthesized RIR by [4] |
Depth estimation code (require OpenCV to compile)
Room Model Reconstruction for [3] (Windows Executable and sample data)
Full Reconstruction pipeline for [4] (Windows Executable and Samples only)
Full Reconstruction pipeline for [4] (Full code included)
* Simple User-Interactive version of Room model reconstruction which doesn't require the SegNet setting will be released in May
Centre for Vision, Speech and Signal
Processing, University of Surrey,
Guildford, UK
EPSRC S3A
To contact us: Dr. Hansung Kim (h.kim@surrey.ac.uk)