High-detail temporally consistent 3D capture of facial performance

PhD thesis

Martin Klaudiny
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
M.Klaudiny@surrey.ac.uk

Abstract

Capturing a realistic digital copy of a facial performance has high importance for film and television production. This allows high-quality replay of the performance under different conditions such as a new illumination or viewpoint. The model of performance can be altered by space-time editing or can be used for building and driving a facial animation rig. This thesis presents a novel system to capture high-detail 4D models of facial performances. A geometric model without appearance is reconstructed from videos of an actor's face recorded from multiple views in a controlled studio environment. The focus is on achieving temporal consistency and a high level of detail of the 4D performance model which are crucial aspects for the use in film production.

A baseline method for dense surface tracking in multi-view image sequences is investigated for facial performance capture. Evaluation shows limitations of previous sequential methods which provide accurate temporal alignment only for faces with a painted random pattern. A novel robust sequential tracking is proposed to handle weak skin texture and rapid non-rigid facial motions. However, gradual accumulation of frame-to-frame alignment errors still results in significant drift of the tracked mesh. A non-sequential tracking framework is introduced which processes an input sequence according to a tree derived from a measure of dissimilarity between all pairs of frames. A novel cluster tree enables balancing between sequential drift and non-sequential jump artefacts. Comprehensive evaluation shows temporally consistent mesh sequences with very little drift for highly dynamic facial performances. Improvements are also demonstrated on whole-body performances and cloth deformation.

Photometric stereo with colour lights is used for capturing pore-level skin detail. An original error analysis of the technique is conducted for image noise and calibration errors. The proposed markerless capture system for facial performances combines photometric stereo with non-sequential surface tracking based on the cluster tree. A practical capture setup is constructed from standard video equipment without active illumination or high-speed recording. Errors in the photometric normals are corrected using the temporally aligned mesh sequence. The resulting 3D models enhanced by the normal maps capture fine skin dynamics such as skin wrinkling. High-quality temporal consistency of the models is also demonstrated with minimal drift in comparison to the previous approaches. Qualitative and quantitative comparison with the best state-of-the-art system shows comparable results.

Keywords

facial performance capture, dense motion capture, non-sequential surface tracking, photometric stereo with colour lights

Materials

Thesis: PDF

Experimental results discussed in the thesis are shown in the supplementary videos. Video format AVI and H.264 codec from ffmpeg package are used. The videos are organised according to chapters and are related to individual datasets or pieces of empirical analysis.

High-detail temporally consistent 3D capture of facial performance

PhD thesis

Abstract

Keywords

Materials

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7