We introduce the first approach to solve the challeng-ing problem of unsupervised 4D visual scene understand-ing for complex dynamic scenes with multiple interactingpeople from multi-view video. Our approach simultane-ously estimates a detailed model that includes a per-pixelsemantically and temporally coherent reconstruction, to-gether with instance-level segmentation exploiting photo-consistency, semantic and motion information. We furtherleverage recent advances in 3D pose estimation to constrainthe joint semantic instance segmentation and 4D temporallycoherent reconstruction. This enables per person seman-tic instance segmentation of multiple interacting people incomplex dynamic scenes. Extensive evaluation of the jointvisual scene understanding framework against state-of-the-art methods on challenging indoor and outdoor sequencesdemonstrates a significant (≈40%) improvement in seman-tic segmentation, reconstruction and scene flow accuracy.
U4D: Unsupervised 4D Dynamic Scene Understanding
Armin Mustafa, Chris Russell and Adrian Hilton
ICCV 2019
@INPROCEEDINGS{Mustafa19, title = {U4D: Unsupervised 4D Dynamic Scene Understanding}, year={2019}, booktitle={ICCV}, author={Mustafa, A. and Russell, C. and Hilton, A.} }