Tuesday, March 22, 8:30am
Jamie Shotton, Microsoft Research Cambridge
Title: Body Part Recognition: Making Kinect Robust
Abstract: Last November, Microsoft launched Kinect for Xbox 360 (http://www.xbox.com/kinect), a revolution in gaming where your whole body becomes the controller - you need not hold any device or wear anything special. Human pose estimation has long been a “grand challenge” of computer vision, and Kinect is the first product that meets the speed, cost, accuracy, and robustness requirements to take pose estimation out of the lab and into the living room. In the first two months since launch it has sold over 8 million units. In this talk we will discuss some of the technology behind Kinect, detailing our new approach which forms one of the core algorithms inside Kinect: body part recognition. Deriving from our earlier work that uses machine learning to recognize categories of objects in photographs, body part recognition uses a classifier to produce an interpretation of pixels coming from the Kinect depth-sensing camera into different parts of the body: head, left hand, right knee, etc. Estimating this pixel-wise classification is extremely efficient, as each pixel can be processed independently on the GPU. The classifications can then be pooled across pixels to produce hypotheses of 3D body joint positions for use by a skeletal tracking algorithm. We designed the system to be robust, in two ways in particular. Firstly, we train the system on a powerful cluster from a vast and highly varied training set of synthetic images to ensure the system works for all ages, body shapes & sizes, clothing and hair styles. Secondly, the recognition does not rely on any temporal information, ensuring that the system can initialize from arbitrary poses and preventing catastrophic loss of track, enabling extended gameplay for the first time.
His research interests include Object Recognition, Machine Learning, Human Pose Estimation, Gesture and Action Recognition, and Medical Imaging. He has published papers in all the major computer vision conferences and journals, with a focus on object detection by modelling contours, semantic scene segmentation exploiting both appearance and semantic context, and dense object part layout constraints. His demo on real-time semantic scene segmentation won the best demo award at CVPR 2008. More recently, he has investigated how many of the ideas from visual object recognition and machine learning can be applied in new ways. In human pose estimation, he architected the human body part recognition algorithm that drives Xbox Kinect’s skeletal tracking algorithm. In the sphere of medical imaging, he has published papers on the automatic recognition of organs and other anatomical structures from CT data, with a view to simplifying and speeding up the radiologist’s workflow.
More information is available here: http://research.microsoft.com/en-US/people/jamiesho/default.aspx
Wednesday, March 23, 8:30am
Brad Duchaine, Dartmouth College
Title: Exploring human social perception via deficits and disruptions
Abstract: I will discuss findings from studies involving developmental prosopagnosics,
acquired prosopagnosics, and transcranial magnetic stimulation that shed light on the organization
of face processing, body processing, and visual recognition more generally.
Thursday, March 24, 3:10pm
Jonathan Gratch, USC
Title: So she's smiling, now what?
Abstract: Face and gesture research has made enormous progress in recognizing human
nonverbal signals, but still faces important challenges in understanding the social meaning and
significance of such cues. In this talk I will discuss a number of successes and some failures
in using expression and gesture recognition techniques in a variety of human-human and
human-computer social contexts. On the one hand, I will describe research on virtual humans
(interactive digital characters) that can engage users in rich face-to-face interactions. I will
describe evidence that endowing these artifacts with the ability to recognize and respond to human
nonverbal cues has important social effects such as enhanced mutual understanding and
persuasiveness. On the other hand, by facilitating the annotation of human nonverbal behavior,
face and gesture research is revolutionizing the study of human social processes and providing new
insights into theories of human social behavior. These two topics work hand-in-hand, as theories
of human social processes have important implications, not only for the design of interactive
virtual humans, but for face and gesture research as well. I will end by describing how social
psychological theory, especially theories of emotion, has implications for research in the
automatic recognition and understanding of human social signals.