MV3DHumans Dataset

Yue Zhang, Akin Caliskan, Adrian Hilton and Jean-Yves Guillemaut

Centre for Vision, Speech and Signal Processing, University of Surrey

Dataset Example


To overcome the shortage of real-world multi-view multiple people, we introduce a new synthetic multi-view multiple people labelling dataset named Multi-View 3D Humans (MV3DHumans). This dataset is a large-scale synthetic image dataset that was generated for multi-view multiple people detection, labelling and segmentation tasks. The MV3DHumans dataset contains 1200 scenes captured by multiple cameras, with 4, 6, 8 or 10 people in each scene. Each scene is captured by 16 cameras with overlapping field of views. The MV3DHumans dataset provides RGB images with resolution of 640 × 480. Ground truth annotations including bounding boxes, instance masks and multi-view correspondences, as well as camera calibrations are provided in the dataset. For further details, please refer to the README.txt file enclosed with the dataset.


The dataset is freely available under the following terms and conditions:

  1. All original images and associated data provided may be used for non-commercial research purposes only.
  2. The source of the datasets must be acknowledged in all publications where they are used. This should be done by referencing all of the following:
  3. The data may not be redistributed.


To access, download and/or use the data you must agree to the terms and conditions stated above in the Licence Agreement. If you agree to these, please click here to download the dataset.