Otmar Hilliges

Title: Deep Autoregressive Generative Modelling of Human Activity

Abstract:
Generative modelling of natural time series data (audio, video, etc) is of great interest to many computer vision and machine learning tasks. This requires learning dynamics of the objects/humans that are captured in the input and being able to predict the future states. Such models could benefit different problems ranging from vision, graphics to reinforcement learning, planning and robotics. For example, the outcome of an action performed by an agent could be predicted by means of generating future observations, allowing for an internal evaluation of the actions. In computer vision research much attention has been devoted to generative modelling of individual images via GANs or VAEs. However, generative modelling of time-series data has received comparatively little attention. In this lecture we will discuss the simple yet effective class of auto-regressive generative models which recently have achieved impressive results both in image (PixelCNN/RNN) and audio generation (WaveNet). We will then discuss some recent research that enhances the modelling capacity of such models via introduction of stochastic latent variables, combining the advantages of autoregressive models (simplicity + computational efficiency) with those of variational RNNs (modelling capacity + robustness). We will also discuss how to apply auto-regressive models to a variety of interesting tasks.