Abstract

This paper introduces Deep4D a compact generative representation of shape and appearance from captured 4D volumetric video sequences of people. 4D volumetric video achieves highly realistic reproduction, replay and free-viewpoint rendering of actor performance from multiple view video acquisition systems. A deep generative network is trained on 4D video sequences of an actor performing multiple motions to learn a generative model of the dynamic shape and appearance. We demonstrate the proposed generative model can provide a compact encoded representation capable of high-quality synthesis of 4D volumetric video with two orders of magnitude compression. A variational encoder-decoder network is employed to learn an encoded latent space that maps from 3D skeletal pose to 4D shape and appearance. This enables high-quality 4D volumetric video synthesis to be driven by skeletal motion, including skeletal motion capture data. This encoded latent space supports the representation of multiple sequences with dynamic interpolation to transition between motions. Therefore we introduce Deep4D motion graphs, a direct application of the proposed generative representation. Deep4D motion graphs allow real-tiome interactive character animation whilst preserving the plausible realism of movement and appearance from the captured volumetric video. Deep4D motion graphs implicitly combine multiple captured motions from a unified representation for character animation from volumetric video, allowing novel character movements to be generated with dynamic shape and appearance detail.

Paper

Deep4D: A Compact Generative Representation for Volumetric Video
João Regateiro, Marco Volino Adrian Hilton

Frontiers in Virtual Reality 2021



Citation

@ARTICLE{ Deep4D,
	AUTHOR={Regateiro, João and Volino, Marco and Hilton, Adrian},   	 
	TITLE={Deep4D: A Compact Generative Representation for Volumetric Video},      
	JOURNAL={Frontiers in Virtual Reality},      
	VOLUME={2},      
	PAGES={132},     
	YEAR={2021},      
	URL={https://www.frontiersin.org/article/10.3389/frvir.2021.739010},       
	DOI={10.3389/frvir.2021.739010},      
	ISSN={2673-4192},   
}
	    

Acknowledgments

This research was supported by the EPSRC “Audio-Visual Media Research Platform Grant” (EP/P022529/1), “Polymersive: Immersive Video Production Tools for Studio and Live Events’” (InnovateUK 105168), and “AI4ME: AI for Personalised Media Experiences” UKRI EPSRC (EP/V038087/1). The authors would also like to thank Adnane Boukhayma for providing the “Thomas” dataset used for evaluation. The work presented was undertaken at CVSSP.