Towards optimal non-rigid surface tracking

ECCV, 2012

Martin Klaudiny, Adrian Hilton
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
M.Klaudiny@surrey.ac.uk, A.Hilton@surrey.ac.uk

Abstract

This paper addresses the problem of optimal alignment of non-rigid surfaces from multi-view video observations to obtain a tempo- rally consistent representation. Conventional non-rigid surface tracking performs frame-to-frame alignment which is subject to the accumulation of errors resulting in drift over time. Recently, non-sequential tracking approaches have been introduced which re-order the input data based on a dissimilarity measure. One or more input sequences are represented in a tree with reducing alignment path length. This limits drift and increases robustness to large non-rigid deformations. However, jumps may occur in the aligned mesh sequence where tree branches meet due to independent error accumulation. Optimisation of the tree for non-sequential tracking is proposed to minimise the errors in temporal consistency due to both the drift and jumps. A novel cluster tree enforces sequential tracking in local segments of the sequence while allowing global non-sequential traversal among these segments. This provides a mechanism to create a tree structure which reduces the number of jumps between branches and limits the length of branches. Comprehensive evaluation is performed on a variety of challenging non-rigid surfaces including faces, cloth and people. This demonstrates that the proposed cluster tree achieves better temporal consistency than the previous sequential and non-sequential tracking approaches. Quantitative ground-truth comparison on a syn- thetic facial performance shows reduced error with the cluster tree.

Keywords

dense motion capture, non-rigid surface alignment, non-sequential tracking, minimum spanning tree, cluster tree, dissimilarity

Materials

Main paper: PDF
Supplementary text: PDF
Videos: high quality(280MB), low quality(45MB), video spotlight(1.3MB)

Videos are in AVI format and are encoded by H.264 codec from ffmpeg package.

Acknowledgements

We would like to thank Thabo Beeler from Computer Graphics Laboratory at ETH Zurich for providing DisneyFace dataset and their results (http://graphics.ethz.ch/publications/papers/paperBee11.php).
This work was partly supported by EU ICT project SCENE and EPSRC Visual Media Platform Grant.