Abstract
We have constructed an inexpensive, video-based, motorized tracking
system that learns to track a head. It uses real time graphical user
inputs or an auxiliary infrared detector as supervisory signals to
train a convolutional neural network. The inputs to the neural
network consist of normalized luminance and chrominance images and
motion information from frame differences. Subsampled images are also
used to provide scale invariance. During the online training phase,
the neural network rapidly adjusts the input weights depending upon
the reliability of the different channels in the surrounding
environment. This quick adaptation allows the system to robustly
track a head even when other objects are moving within a cluttered
background.