In this paper, we propose an approach to indoor scene understanding from observation of people in single view spherical video. As input, our approach takes a centrally located spherical video capture of an indoor scene, estimating the 3D localisation of human actions performed throughout the long term capture. The central contribution of this work is a deep convolutional encoder-decoder network trained on a synthetic dataset to reconstruct regions of affordance from captured human activity. The predicted affordance segmentation is then applied to compose a reconstruction of the complete 3D scene, integrating the affordance segmentation into 3D space. The mapping learnt between human activity and affordance segmentation demonstrates that omnidirectional observation of human activity can be applied to scene understanding tasks such as 3D reconstruction. We show that our approach using only observation of people performs well against previous approaches, allowing reconstruction of occluded regions and labelling of scene affordances.


Human-Centric Scene Understanding from Single View 360 Video
Sam Fowler, Hansung Kim and Adrian Hilton
3DV 2018


SegNet Model


			author = {Fowler, S. and Kim, H. and Hilton, A.},
			title = {Human-Centric Scene Understanding from Single View 360 Video},
			booktitle = {3DV},
			year = {2018}


This work was supported by the EPSRC Programme Grant S3A: Future Spatial Audio for an Immersive Listener Experience at Home (EP/L000539/1) and the BBC as part of the BBC Audio Research Partnership.

Related Work by Sam Fowler

Towards Complete Scene Reconstruction from Single-View Depth and Human Motion, BMVC 2017
Affordance Surface Segmentation from Video of Human Activity in Indoor Scenes