April 2011
Volume 52, Issue 14
ARVO Annual Meeting Abstract  |   April 2011
Face Detection And Tracking In Video To Facilitate Face Recognition In A Visual Prosthesis
Author Affiliations & Notes
  • Xuming He
    NICTA, CRL, Canberra, ACT, Australia
    ANU, Canberra, Australia
  • Chunhua Shen
    NICTA, CRL, Canberra, ACT, Australia
    ANU, Canberra, Australia
  • Nick Barnes
    NICTA, CRL, Canberra, ACT, Australia
    ANU, Canberra, Australia
  • Footnotes
    Commercial Relationships  Xuming He, NICTA (E); Chunhua Shen, NICTA (E), Patent held on related technology (P); Nick Barnes, NICTA (E), Patent held on related technology (P)
  • Footnotes
    Support  Bionic Vision Australia and Australian Research Council (ARC), NICTA, Dept Broadband, Communications and the Digital Economy, Australian Government
Investigative Ophthalmology & Visual Science April 2011, Vol.52, 4972. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Xuming He, Chunhua Shen, Nick Barnes; Face Detection And Tracking In Video To Facilitate Face Recognition In A Visual Prosthesis. Invest. Ophthalmol. Vis. Sci. 2011;52(14):4972.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Thompson et al (Invest Ophthalmol Vis Sci. 2003;44:5035-5042) demonstrate that faces may be recognized on a 32x32 retinal implant array, however, this requires the face be represented at full resolution. Here we develop a real-time system capable of automatically detecting, tracking and zooming into faces from video to facilitate face and expression recognition on such a device.


Our system is based on an improved cascade classifier that predicts face/non-face label for every sub-window (at different locations and scales) in each video frame. The input to the classifier is Haar-like image features computed from each sub-window, and the output is integrated with a Hidden Markov Model that imposes smoothness on position and scale of face display to keep the face still over time despite movements of camera or face. Model parameters are estimated from a standard face dataset using supervised learning. Detected face regions are cropped and normalized to target size for display on a sub-retinal implanted device or large screen.


The system can reliably detect and robustly track faces within distance of 0.5 to 5 meters in a normal indoor environment. The figure shows results of our system for a single frame across a live sequence in high resolution, as well as low resolution (32x32) version of the whole frame and detected face sub-window. We can see that the zoomed face window provides rich and informative cues for identity and expression recognition.


A top-down saliency and face-based fixation system has been built to provide retinal prosthesis recipients the ability to zoom onto faces at varying distance, which can be helpful for improving their recognition by selectively sending relevant information to low resolution devices.  

Keywords: image processing • vision and action • quality of life 

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.