Video-based Face Recognition

Rama Chellappa, University of Maryland, College Park

Pavan Turaga, Arizona State University

September 23rd 2012



Several studies in neuroscience have established that movement of faces is utilized by humans to mitigate the harsh effects of non-optimal viewing conditions, such as low-resolution, occlusion, and low illumination [1]. However, devising computational models that can exploit motion has been much more challenging. This is due to the simultaneous challenges of detection, tracking, motion modeling, and matching. Video-based face recognition holds the promise of enabling more accurate and robust recognition performance. In this tutorial, we present an overview of models and algorithms that address these issues with the hope of fostering further research into this unique problem.

Tutorial Slides (Final Slides will be available after the tutorial)

  1. 1.Face tracking and recognition in camera networks, Rama Chellappa Pdf of presentation

  2. 2.Manifold models for video-based face recognition, Pavan Turaga

    Pdf of presentation


Joint tracking and recognition of faces using particle filters

Temporal information in videos can be exploited for simultaneous tracking and recognition of faces without the need to perform these tasks in a sequential manner. There are several advantages in performing these tasks simultaneously. In the tracking- then-recognition framework, estimation of parameters for registration between a test face and a template face, compensating for appearance variations due to changes in viewpoint, illumination etc and voting on the individual recognition results might be an ad hoc solution. By effectively exploiting temporal information, the tracking-and-recognition framework performs all these steps in an integrated manner. We will present a general frameowork using particle filters [3] that can be used for this task.

Manifold models of face appearance

Most face recognition approaches rely on a static model of appearance for each individual subject. The simplest appearance model is simply a static image of the person. Such appearance models are rather limited in utility in video-based face recognition where subjects may be imaged under varying viewpoints, illuminations, expressions etc. Thus, instead of using a static image as an appearance model, a sufficiently long video which encompasses several variations in facial appearance can lend itself to building more robust appearance models. In this context, we will discuss the appearance manifold representation [2] and the shape-illumination manifold [4], and how these manifold models can be used for joint detection, tracking and recognition.

Matching videos using parametric dynamical models

The dynamic signature in the form of idiosyncratic gestures or expressions of the face also play an important role in identifying faces. We will discuss models that can be used to encode such variations and how recognition can be performed using such models. Specifically, we will consider a dynamical-systems approach to model videos of faces. This results in a joint appearance and dynamics-based matching between a probe video to a gallery video. The matching of dynamical models will be presented in-depth using tools from differential geometry and manifold theory [5].


Prof. Rama Chellappa received the B.E. (Hons.) degree from University of Madras, India, in 1975 and the M.E. (Distinction) degree from Indian Institute of Science, Bangalore, in 1977. He received M.S.E.E. and Ph.D. Degrees in Electrical Engineering from Purdue University, West Lafayette, IN, in 1978 and 1981 respectively. Since 1991, he has been a Professor of Electrical Engineering and an affiliate Professor of Computer Science at University of Maryland, College Park.  He is also affiliated with the Center for Automation Research (Director) and the Institute for Advanced Computer Studies (Permanent Member).  In 2005, he was named a Minta Martin Professor of Engineering. Prior to joining the University of Maryland, he was an Assistant (1981-1986) and Associate Professor (1986-1991) and Director of the Signal and Image Processing Institute (1988-1990) at University of Southern California, Los Angeles.  Over the last 30 years, he has published numerous book chapters, peer-reviewed journal and conference papers.  He has co-authored and co-edited books on MRFs face and gait recognition and collected works on image processing and analysis.  His current research interests are face recognition, clustering and video summarization, 3D modeling from video, image and video-based recognition of objects, events and activities, dictionary-based inference, compressive sensing, and hyper spectral processing.

Prof. Chellappa has received several awards, including an NSF Presidential Young Investigator Award, four IBM Faculty Development Awards, an Excellence in Teaching Award from the School of Engineering at USC, and two paper awards from the International Association of Pattern Recognition. He received the Society, Technical Achievement and Meritorious Service Awards from the IEEE Signal Processing Society. He also received the Technical Achievement and Meritorious Service Awards from the IEEE Computer Society. At University of Maryland, he was elected as a Distinguished Faculty Research Fellow, as a Distinguished Scholar-Teacher, and received an Outstanding Innovator Award from the Office of Technology Commercialization, and an Outstanding GEMSTONE Mentor Award. He received the Outstanding Faculty Research Award and the Poole and Kent Teaching Award for the Senior Faculty from the College of Engineering. In 2010, he was recognized as an Outstanding ECE by Purdue University. He is a Fellow of the IEEE, the International Association for Pattern Recognition, the Optical Society of America and the American Association for Advancement of Science.  He holds three patents.

Prof. Chellappa served as the associate editor of four IEEE Transactions, as a Co-Editor-in-Chief of Graphical Models and Image Processing and as the Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence.  He has also served as a co-guest editor for six special issues published by leading journals and Transactions. He served as a member of the IEEE Signal Processing Society Board of Governors and as its Vice President of Awards and Membership. He has served as a General and Technical Program Chair for several IEEE international and national conferences and workshops. He is a Golden Core Member of the IEEE Computer Society and served a two-year term as a Distinguished Lecturer of the IEEE Signal Processing Society. Recently, he completed a two-year term as the President of the IEEE Biometrics Council.

Pavan Turaga is an Assistant Professor in Arts, Media and Engineering, and Electrical, Computer and Energy Engineering. He received the B.Tech. degree in Electronics and communication engineering from the Indian Institute of Technology Guwahati, India, in 2004, and the M.S. and Ph.D. degrees in electrical engineering from the University of Maryland, College Park in 2007 and 2009 respectively. He then spent 2 years as a Research Associate at the Center for Automation Research, UMD. His research interests are in statistics and machine learning with applications to computer vision and pattern analysis. His research work includes human activity analysis from videos, biometrics, video summarization, dynamic scene analysis, and statistical inference on manifolds for these applications. He was awarded the Distinguished Dissertation Fellowship by UMD in 2009, and was selected to participate in the Emerging Leaders in Multimedia Workshop by IBM, New York, in 2008.


[1] O'Toole, A. J., Roark, D., Abdi, H.: Recognizing moving faces: A Psychological and Neural Synthesis. In: Trends in Cognitive Sciences, 6, 261-266 (2002).

[2] Kuang-Chih Lee, Jeffrey Ho, Ming-Hsuan Yang, David J. Kriegman: Visual tracking and recognition using probabilistic appearance manifolds. Computer Vision and Image Understanding 99(3): 303-331 (2005)

[3] Zhou, S., Krueger, V., and Chellappa, R.: Probabilistic recognition of human faces from video. In: Computer Vision and Image Understanding (CVIU) (special issue on Face Recognition), Vol. 91, pp. 214-245, 2003.

[4] Arandjelovic, O. and Cipolla, R.. Face recognition from video using the generic shape-illumination manifold. In: Proc. 9th European Conference on Computer Vision, Graz (Austria) volume LNCS 3954, pages 27-40, Springer, 2006.

[5] Pavan K. Turaga, Ashok Veeraraghavan, Anuj Srivastava, Rama Chellappa: Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11): 2273-2286 (2011)


Various representations for video based face recognition. Figures courtesy [2,3,5].