The NAVVS dataset includes a variety of short volumetric sounding actions. It provides a valuable resource for multimodal research and testing under realistic conditions. The dataset includes ten different actions designed with both semantic and acoustic diversity. For each action, four 2-seconds takes are available to provide a total of forty audio-visual clips.
The scenes were captured at the Centre for Vision, Speech & Signal Processing (CVSSP) of the University of Surrey (UK) with the aid of multiple cameras and multiple microphones. For more information, please refer to the paper (see below) or contact the authors. Along with the final clips' volumetric textured instances and the audio stereo mix, additional data is provided. This includes: the separated microphones' audio channels, raw images from the 16 UHD cameras, binary masks, camera calibration data, coarse visual hull reconstruction, and volumetric stereo refinement.
Note that the dataset requires registration, please check the licence information below.
If you have already received the username and password, you can download the NAVVS dataset from this link
The datasets are free for research use only.
This agreement must be confirmed by a senior representative of your organisation. To access and use this data you agree to the following conditions:
The copyright of the NAVVS dataset is owned by The Centre for Vision Speech and Signal Processing, University of Surrey, UK. The data should not be redistributed. Permission is hereby granted to use the NAVVS dataset for academic purposes only, provided that it is referenced in publications related to its use as follows:
H. Stenzel, D. Berghi, M. Volino and P.J.B. Jackson, "Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction," 2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstract and Workshop (VRW), 2021, pp. 637-638, doi: 10.1109/VRW52623.2021.00201.
@inproceedings{Stenzel:IEEEVR:2021, AUTHOR = "Stenzel, Hanne and Berghi, Davide and Volino, Marco and Jackson, Philip J.B.", TITLE = "Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction", BOOKTITLE = "2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)", YEAR = "2021", PAGES = 637-638, DOI = 10.1109/VRW52623.2021.00201 }