We present "Tragic Talkers", an audio-visual dataset consisting of excerpts from the "Romeo and Juliet" drama captured with microphone arrays and multiple co-located cameras for light-field video. Tragic Talkers provides ideal content for object-based media (OBM) production. It is designed to cover various conventional talking scenarios, such as monologues, two-people conversations, and interactions with considerable movement and occlusion, yielding 30 sequences captured from a total of 22 different points of view and two 16-element microphone arrays. Additionally, we provide voice activity labels, 2D face bounding boxes for each camera view, 2D pose detection keypoints, 3D tracking data of the mouth of the actors, and dialogue transcriptions.
The scenes were captured at the Centre for Vision, Speech & Signal Processing (CVSSP) of the University of Surrey (UK) with the aid of two twin Audio-Visual Array (AVA) Rigs. Each AVA Rig is a custom device consisting of a 16-element microphone array and 11 cameras fixed on a flat perspex sheet. For more information, please refer to the paper (see below) or contact the authors.
This is the author’s version of the paper. It is posted here for your personal use. This paper is published under a Creative Commons Attribution (CC-BY) license. The definitive version was published in the ACM Digital Library, https://doi.org/10.1145/3565516.3565522
Tragic Talkers paperThe datasets are free for research use only.
This agreement must be confirmed by a senior representative of your organisation. To access and use this data you agree to the following conditions:
The copyright of the TragicTalkers dataset is owned by The Centre for Vision Speech and Signal Processing, University of Surrey, UK. The data should not be redistributed. Permission is hereby granted to use the TragicTalkers dataset for academic purposes only, provided that it is referenced in publications related to its use as follows:
D. Berghi, M. Volino and P. J. B. Jackson, "Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research," European Conference on Visual Media Production (CVMP), 2022, doi: 10.1145/3565516.3565522.
@inproceedings{Berghi:2022:TragicTalkers, AUTHOR = "Berghi, Davide and Volino, Marco and Jackson, Philip J. B.", TITLE = "Tragic {T}alkers: A {S}hakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research", BOOKTITLE = "European Conference on Visual Media Production (CVMP)", PUBLISHER = "Association for Computing Machinery", YEAR = "2022", DOI = "10.1145/3565516.3565522" }