Purpose:
Thompson et al (Invest Ophthalmol Vis Sci. 2003;44:5035-5042) demonstrate that faces may be recognized on a 32x32 retinal implant array, however, this requires the face be represented at full resolution. Here we develop a real-time system capable of automatically detecting, tracking and zooming into faces from video to facilitate face and expression recognition on such a device.
Methods:
Our system is based on an improved cascade classifier that predicts face/non-face label for every sub-window (at different locations and scales) in each video frame. The input to the classifier is Haar-like image features computed from each sub-window, and the output is integrated with a Hidden Markov Model that imposes smoothness on position and scale of face display to keep the face still over time despite movements of camera or face. Model parameters are estimated from a standard face dataset using supervised learning. Detected face regions are cropped and normalized to target size for display on a sub-retinal implanted device or large screen.
Results:
The system can reliably detect and robustly track faces within distance of 0.5 to 5 meters in a normal indoor environment. The figure shows results of our system for a single frame across a live sequence in high resolution, as well as low resolution (32x32) version of the whole frame and detected face sub-window. We can see that the zoomed face window provides rich and informative cues for identity and expression recognition.
Conclusions:
A top-down saliency and face-based fixation system has been built to provide retinal prosthesis recipients the ability to zoom onto faces at varying distance, which can be helpful for improving their recognition by selectively sending relevant information to low resolution devices.
Keywords: image processing • vision and action • quality of life