SyDog-Video: A Synthetic Dog Video Dataset for Temporal Pose Estimation

This dataset is a part of the following publication:

"SyDog-Video: A Synthetic Dog Video Dataset for Temporal Pose Estimation"
Moira Shooter, Charles Malleson, Adrian Hilton
International Journal of Computer Vision (2023). https://doi.org/10.1007/s11263-023-01946-z

Paper - Code

Data description

Each verion of the dataset includes 500 synthetic dog videos of 175 frames (87,500 frames) including 2D ground truth such as bounding box coordinates, 33 keypoint labels and segmentation maps. There are 6 versions of the datasets: clean_plate, w_assets, w_assetsPlusPeople, w_people, wo_fur_clean_plate, wo_groundplane. For example, the clean_plate-version of the dataset includes images with HDRI and a ground geometry which represents the floor. There are'nt any distractors such as 3D assets or people in the background present.

Download

All the data can be downloaded by clicking the following link: SyDogVideo2024_full.tar.gz.
You can also download the separate datasets, if you are only inteterest in one of them: Cleanplate2024.tar.gz, Assets2024.tar.gz, People2024.tar.gz, AssetsPeople2024.tar.gz, WoGroundplane2024.tar.gz, WoFurCleanplate2024.tar.gz. You can also download the training, validation and test splits (JSON) individually from the following link SplitAnnotations2024.tar.gz.

The folder structure is stated bellow:

[clean_plate, ..., wo_groundplane]/
These folders contain the data of the videos e.g. RGB data and labels. The subfolders refer to sequences associated to the type of dog e.g. dog2, labrador, etc. Because the dataset was generated using the Unity Perception package the subfolder (e.g. dog2) contains the following structure including the keypoint labels (Dataset[ID]), RGB data (RGB[ID]) and semantic segmenation information (SemanticSegmentation[ID]). These labels were processings using code from the following link and transformed into JSON annotation files. For the JSON annotation files please refer to the vid_annotations folder.
- dog2/
  - Dataset[ID]
  - Logs
  - RGB[ID]
  - SemanticSegmentation[ID]
- labrador/
- pug/
- pitbull/
- wolf/
vid_annotations/
contain the annotations for each video sequence e.g. video-000-clean_plate-dog2.json
The data format of the JSON files are the following: video-[videoID]-[dataset_type]-[dog_type].json
split_annotations/within_dataset/
The dataset training/test split files e.g. {[dataset_type]}.json files, which represent the sequences used.
scripts
access_data.py : shows how to access the data for training or just the individual video sequences.

If you use this dataset please cite

@article{Shooter2023,
author={Shooter, Moira
and Malleson, Charles
and Hilton, Adrian},
title={SyDog-Video: A Synthetic Dog Video Dataset for Temporal Pose Estimation},
journal={International Journal of Computer Vision},
year={2023},
month={Dec},
day={29},
abstract={We aim to estimate the pose of dogs from videos using a temporal deep learning model as this can result in more accurate pose predictions when temporary occlusions or substantial movements occur. Generally, deep learning models require a lot of data to perform well. To our knowledge, public pose datasets containing videos of dogs are non existent. To solve this problem, and avoid manually labelling videos as it can take a lot of time, we generated a synthetic dataset containing 500 videos of dogs performing different actions using Unity3D. Diversity is achieved by randomising parameters such as lighting, backgrounds, camera parameters and the dog's appearance and pose. We evaluate the quality of our synthetic dataset by assessing the model's capacity to generalise to real data. Usually, networks trained on synthetic data perform poorly when evaluated on real data, this is due to the domain gap. As there was still a domain gap after improving the quality of the synthetic dataset and inserting diversity, we bridged the domain gap by applying 2 different methods: fine-tuning and using a mixed dataset to train the network. Additionally, we compare the model pre-trained on synthetic data with models pre-trained on a real-world animal pose datasets. We demonstrate that using the synthetic dataset is beneficial for training models with (small) real-world datasets. Furthermore, we show that pre-training the model with the synthetic dataset is the go to choice rather than pre-training on real-world datasets for solving the pose estimation task from videos of dogs.},
issn={1573-1405},
doi={10.1007/s11263-023-01946-z},
url={https://doi.org/10.1007/s11263-023-01946-z}
}

License Agreement

License file is in license.txt

All original images and associated data provided may be used for non-commercial research purposes only.
The source of the datasets must be acknowledged in all publications where they are used.
The data may not be redistributed.

Questions

For any questions about this dataset, please contact Moira Shooter.