|
Highlights
The Objective
of this project is to identify handwritten characters at a
very high speed and level of accuracy.
- The
project utilizes neural networks and a suite of character
processing algorithms.
- The
prototype system offers high performance.
- MIT
has obtained patent rights on the technology.
- Prototype
software is available for evaluation purposes.
About
the Project
The project,
headed by Dr. Amar Gupta, focuses on developing Intelligent
Character Recognition technology for automating recognition
of handwritten numerals on bank checks.
This system
is divided into three stages: preprocessing, recognition,
and postprocessing.
A typical
bank check contains several components (e.g. name and address,
date, handwritten legal amount, etc.) One of the most important
components is the courtesy amount block (CAB) containing the
dollar amount of the check. For each check, a scanner generates
a pixel image of the courtesy amount block. The image then
undergoes preprocessing.
The first
step in preprocessing is segmentation. Segmentation divides
each numeral into its individual digits. Punctuation such
as commas and periods are identified based on their location,
alignment, and size within the segment image. Once these symbols
are established, each segmented digit is then passed to the
recognition stage.
The next
step in preprocessing normalizes each digit to a standard
size of 16x16 pixels. This is followed by a process of slant
correction in which the character is rotated to an upright,
vertical position. Next, the character is thinned and thickened.
Thinning involves turning the numeral into a bare skeleton
of one pixel thickness. A good skeleton retains the connectivity
and structural features of the original pattern. Once defined,
it is thickened to a width of two pixels.
After
preprocessing, each 16x16 segment is passed to a neural network
based recognizer to read the characters. The network has been
trained over a large number of feature vectors and histograms.
The first layer of the network consists of 256 input nodes,
one for each element in the 16x16 matrix. Furthermore, these
input nodes are connected to 40 hidden nodes that actually
perform the computational recognition. These hidden nodes
are connected to 10 output nodes, each corresponding to the
digits 0-9.
Each segment
is also sent through a second neural network trained with
"negative templates" of the same set of histograms
as the primary network. Following recognition, the output
from the two networks are compared and if they conflict, the
courtesy amount image is resegmented. For example, an image
that the first network recognized as a "4" is, in
reality, not a "4" but falls within the acceptable
parameters for a "4". Since the result from the
"negative" network would indicate that the input
image could not be a "4", the courtesy amount image
would be resegmented and processed again.
The neural
network provides a high level of accuracy and to gauge its
ability to correctly identify digits, confidence measures
of the recognized digit are produced by postprocessing. Below
a certain confidence, the digit is rejected and either the
entire courtesy amount is resegmented and processed again
or the user intervenes to recognize the digit.
|