Updated: 18 November 2015
Contact people: Teo de Campos and Qingju Liu
This page includes data that enables the reproduction of results of the papers below, and it is indexed with DOI 10.15126/surreydata.00807708
Before downloading any material from this page, please ensure that you have filled in the registration form and agreed to its terms.Dataset
VML BRIR recordings
VML Kinect and binural recordings
- Kinect2 XEF data file (requires Kinect for Windows SDK v2.0)
- JSON file containing skeleton tracking results obtained by Kinect SDK v2.0
- Raw RGB and Depth image sequences extracted from the XEF file
- Kinect SDK v2.0 skeletal tracking results in Matlab and JSON formats.
- Raw head tracking results from Kinect2 on sequence 1.
- Left channel and right channel in binaural recordings.
- Kinect2 XEF data file (requires Kinect for Windows SDK v2.0)
- JSON file containing skeleton tracking results obtained by Kinect SDK v2.0
- Raw RGB and Depth image sequences extracted from the XEF file
- Kinect SDK v2.0 skeletal tracking results in Matlab and JSON formats.
- Raw head tracking results from Kinect2 on sequence 2.
- Left channel and right channel in binaural recordings.
- Kinect2 XEF data file (requires Kinect for Windows SDK v2.0)
- JSON file containing skeleton tracking results obtained by Kinect SDK v2.0
- Raw RGB and Depth image sequences extracted from the XEF file
- Full ground truth annotation data
- Kinect SDK v2.0 skeletal tracking results in Matlab and JSON formats.
- Raw head tracking results from Kinect2 on sequence 3.
- Left channel and right channel in binaural recordings.
- Kinect2 XEF data file (requires Kinect for Windows SDK v2.0)
- JSON file containing skeleton tracking results obtained by Kinect SDK v2.0
- Raw RGB and Depth image sequences extracted from the XEF file
- Kinect SDK v2.0 skeletal tracking results in Matlab and JSON formats.
- Raw head tracking results from Kinect2 on sequence 4.
- Left channel and right channel in binaural recordings.
VisLab Kinect recordings
- Kinect2 XEF data file (requires Kinect for Windows SDK v2.0)
- Kinect SDK v2.0 skeletal tracking results in Matlab and JSON formats.
- Raw head tracking results from Kinect2 on sequence 2.
A total of 24 Genelec 1032B loudspeakers circularly positioned around the scene centre with 1 m radius and 15 degree apart. The Cortex MK2, as the binaural microphone, stands at the centre of the room with ear height of 165~cm. BRIRs were recorded at 48 kHz by the log sine sweep method.
Data:We recorded a dataset for spatial audio production in living room conditions, where each speech signal is an audio object. The data were recorded at a TV/film studio built following professional media production standards. The room has furnitures and a size of 244 x 396 x 242 cm^3, which is very similar to that of a typical living room. As with typical TV/film production sets, its ceiling and one of the walls are missing, though this set is assembled inside a larger room. The reverberation time of this room is about 430 ms. The binaural microphone, i.e. Cortex MK2 as the HATS, is located in the centre of the room with ear height of 165 cm. The depth sensor, i.e. Kinect2, faces Cortex MK2 at the distance of 329 cm just outside the openside wall of the recording room to get a full view, at the height of 170 cm.
Four sequences were recorded which in total last about 10 minutes, involving two actors denoted as Actor and Actress, with height of about 190 cm and 160 cm respectively. The audio materials used were randomly-selected from TIMIT sentences. VML setup is illustrated in the following figure.
Sequence 1 (VML_male_circle)
In Sequence 1, Actor started at the position highlighted by the small circle, facing the centre (i.e. the HATS), walking slowly, sideways, along the circular trajectory anti-clockwise while reading the audio materials. After completing one circle, he returned clockwise back to the starting point.
Below is a RGB color image from Sequence 1. Data:Sequence 2 (VML_female_circle)
In Sequence 2, Actress repeated the process of Sequence 1 with a faster pace.
Below is a RGB color image from Sequence 2. Data:Sequence 3 (VML_MF_rectangle)
In Sequence 3, Actor walked back and forth along the L-shaped path highlighted with the single solid line. At the same time, Actress walked along the L-shaped curve in the double solid line.
Below is a pair of RGB color image and associated depth image from Sequence 3.Sequence 4 (VML_MF_lines)
In Sequence 4, Actor walked along the single dashed line while Actress along the double dashed line at their preferred pace, both facing the centre of the room (where the HATS was located) while reading concurrently.
Below is a RGB color image from Sequence 4. Data:Sequence 1 (VisLab_random)
Below is a pair of RGB color image and associated depth image from Sequence 1.Head Tracking Data Format
Data description for the above two sequences, which are both in Matlab format:Each mat file contains two parts: 1 allTimeStamp: time stamp for each depth frame, with or without a person being tracked 2 headPoses: head tracking results, each row represents one frame in the folloing format [time_stamp person_ID good_head x y z Qr Qx Qy Qz]
Related publications
Person tracking using audio and depth cues
PDFThe four sequences in VML Kinect and binural recordings were used.
Bibtex entry
@InProceedings{liu_etal_iccvw2015, author = {Qingju Liu and Teofilo {de Campos} and Wenwu Wang and Philip Jackson and Adrian Hilton}, title = {Person tracking using audio and depth cues}, booktitle = {Proceedings of the {ICCV} Workshop on {3D} Reconstruction and Understanding with Video and Sound}, year = {2015}, address = {Santiago, Chile}, month = {December 17}, publisher = {{IEEE}}, url = {www.vis.uky.edu/~song/AVS}, doi = {10.1109/ICCVW.2015.97} }
Demos
Single speaker tracking
Two speakers tracking
Identity Association using PHD Filters in Multiple Head Tracking with Depth Sensors
PDFSequence 3 (VML_MF_rectangle) in VML recordings and Sequence 1 (VisLab_random) in VisLab Kinect recordings were used.
Source code
Bibtex entry
@InProceedings{liu_etal_icassp2016, author = {Qingju Liu and Teofilo {de Campos} and Wenwu Wang and Adrian Hilton}, title = {Identity Association using PHD Filters in Multiple Head Tracking with Depth Sensors}, booktitle = {Proceedings of the {ICASSP}}, year = {2016}, address = {Shanghai, China}, month = {20-25 March}, publisher = {{IEEE}}, doi = {10.1109/ICASSP.2016.7471928}, url = {http://www.icassp2016.org/Papers/ViewPapers_MS.asp?PaperNum=3519}, }
Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion
All VML sequences were used.
Bibtex entry
@Article{liu_etal_journal2017, author = {Qingju Liu and Wenwu Wang and Teofilo {de Campos} and Philip Jackson and Adrian Hilton}, title = {Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion}, journal = {Preprint under review}, year = {2017}, }