Michael J. Black

Title: Expressive human body models for communication and interaction

Human bodies in computer vision have often been an afterthought. Human pose is typically represented by 10-12 body joints in 2D or 3D. The joints of the body, however, do not capture all that we need to understand human behavior. In our work we have focused on 3D body shape, represented as a triangulated mesh. Shape gives us information about a person related to their health, age, fitness, and clothing size. But shape is also useful because our body surface is critical for our physical interactions with the world. We cannot interpenetrate objects and we have to make contact to manipulate the world. Consequently we developed the SMPL body model, which is widely used in academia and industry. It is compact, posable, and compatible with most graphics packages. It is also differentiable and easy to integrate into optimization or deep learning methods. While popular, SMPL has drawbacks for representing human actions and interactions. Specifically, the face does not move and the hands are rigid. To facilitate the analysis of human actions, interactions and emotions, we have trained a new 3D expressive model of human body, SMPL-X, with fully articulated hands and an expressive face, using thousands of 3D scans. We estimate the parameters of SMPL-X directly from images. Specifically, we estimate 2D image features bottom-up and then optimize the SMPL-X model parameters to fit the features top-down. In related work, we address hand object interaction by training a neural network to simultaneously regress hand and object pose and shape. A key novelty is a loss function that enforces physical constraints on contact and penetration. These methods represent a step towards automatic expressive human capture from monocular RGB data.

Suggested readings: