Andrew Fitzgibbon

Title: Learning about Shape.

Abstract: Vision is naturally concerned with shape. If we could recover a stable and compact representation of object shape from images, we would hope it might aid with numerous vision tasks. Just the silhouette of an object is a strong cue to its identity, and the silhouette is generated by its 3D shape. In computer vision, many representations have been explored: collections of points, “simple” shapes like ellipsoids or polyhedra, algebraic surfaces and other implicit surfaces, generalized cylinders and ribbons, and piecewise (rational) polynomial representations like splines and NURBS. When recovering shape from measurements, there is at first sight a natural hierarchy of stability: lines and planes can represent very little but may be robustly recovered from data, then come conic sections and quadric surfaces, splines with fixed knots, and general piecewise representations. I will show, however, that one can pass almost immediately to piecewise representations without loss of robustness. In particular, I shall show how a popular representation in computer graphics—subdivision curves and surfaces—may readily be fit to a variety of image data using the technique for ellipse fitting introduced by Gander et al in 1994.

Many of these shape representations can be embedded more or less straightforwardly into probabilistic shape spaces, and recovery (a.k.a. “learning”) of one such space is the goal of the experimental part of this talk. I will show how such learning cannot proceed by blockwise estimation: “first find the shapes, then fit the distribution”, but must be approached as a large simultaneous optimization problem. Given these tools, I show how we can address the previously-difficult problem of recovering 3D shape from multiple silhouettes, and the considerably harder problem which arises when the silhouettes are not from the same object instance, but from members of an object class, for example 30 images of different dolphins each in different poses. This requires that we simultaneously learn the shape space and the projections from each instance into its image. This simultaneous optimization is reminiscent of the bundle adjustment problem in computer vision, and indeed our most recent application, to tracking the human hand, makes good use of the Ceres Solver.