REMAKING THE NEURAL NET: A PERCEPTRON LOGIC UNIT. M. Van Alstyne.
Independent Project; MIT Lincoln Laboratory; Rm M209D; Box 73; Lexington,
MA 02173 USA.
A three node network to compute XOR is possible by using transcendental
functions. As the logistic function increases monotonically it cannot play
the same role as a sine in performing an intrinsically periodic operation.
While four node, hidden unit logistic solutions exist, none can achieve
a solution in three. The nonmonotonic network solution examined here, however,
learns to solve all eight binary operations and to discriminate even between
inputs 0,1 and 1,0, which the binary operators treat identically. Further,
it learns any such computation from the same initial state. In areas where
they coincide, average sine node networks require fewer than half the number
of training iterations required by logistic node networks.
Figure 1, on the left, shows a linear error plot. Figure 2, on the right,
shows a squared error plot.
The paradigm, in achieving this uniform initial state, differs from standard
back propagation in its use of linear rather than quadratic error. A linear
error term results in a remarkably smooth error contour with an infinite
number of global minima. More interesting still, this contour is
basically the same irrespective of the operation being learned. Each binary
operation lies, under this representation, at a fairly constant distance
from other operations in error space. Learning takes place by "lifting"
and "shifting" the error contour to find an appropriate zero point
while deforming the original shape only slightly. A linear error contour
is depicted in Fig. 1 and a contrasting squared error contour is depicted
(for "AND" only) in Fig. 2.
Figure 3 gives a vector representation in spherical coordinates of solutions
to the eight binary operations. To achieve an exact solution, a resting
bias adds to the sine term bringing negative output values to zero and positive
values to one. Figure 1 shows the basic error as a function of two weights
from the input nodes with a resting bias of Pi/2. As the network learns
resting biases and connection weights this shape warps, but at a constant
bias it remains identical for all permutations on a three node network.
Figure 2 illustrates a second advantage of a linear error over an quadratic
error function by showing the comparatively more complicated shape of the
quadratic.
Figure 3 -- This shows vector representations of all eight logic operations.
Using a resting bias, this configuration has achieved error values as
low as le-15. A resting bias functions as a source of activation from a
node emitting a constant signal. It contributes to node output, however,
not to node input. Unlike the bias term of the logistic function, it therefore
requires a slightly modified learning rule a modest penalty for the additional
accuracy it contributes. While a sinusoidal activation function has the
biologically unusual property of periodicity, it can be restricted to one
phase by combining it with a multiplicative factor inversely proportional
to the magnitude of its connection weights.
If it can be shown that more difficult problems are similarly free of local
minima, a transcendental network would prove to be a powerful learning tool
as any continuous function may be approximated using sine waves. Simple
number, parity, and complement problems have proven tractable, and an experiment
to learn to recognize a digitized calculator display is ongoing.
References:
[1] Rumelhart, D.E., Hinton, G.E., and Williams, R. J; "Learning Internal
Representations by Error Propagation;" Parallel Distributed Processing,
Vol. I; ed. Rumelhart, D.E. and McClelland, J.L.; MIT Press, Cambridge,
MA.; 1986.
[2] Minsky M.L., and Papert, S.; Perceptrons, Expanded Edition; MIT Press,
Cambridge, MA. 1988.
[3] Burr, D.J.; "A Neural Network Digit Recognizer," Proc. IEEE,
International Conference on Systems, Man, and Cybernetics; October, 1986.
[4] Dym, H., and McKean, H.P. Fourier Series and Integrals; Academic Press,
New York; 1972.