'aper
LaserBeam Spatial Filter as an Example
=
=
Arthur
J. Decker,
Michael and
"
J. Krasowski,
Kenneth
i i
• .....
E. Weiland
"

(NASATP3372) NEURALNETWORKDIRECTEO OF OPTICAL SYSTEMS LASERBEAM SPATIAL EXAMPLE
(NASA)
N94I5961 ALIGNMENT USING THE FILTER AS AN B3
Unclas
p HIIB5
.....
!
,i_:
...........
0190910
i
_i


mlm
.....
•
.
.
JJ
I
................
•
_r
z w
m =
__
. ............

.

,
,== NASA Technical Paper 3372
1993
NeuralNetworkDirected Alignment of Optical Systems Using the LaserBeam Spatial Filter as an Example
k_
Arthur
J. Decker,
Michael and Lewis
J. Krasowski,
Kenneth Research
Cleveland,
Lk
National Aeronautics Space Administration
and
Office of Management Scientific and Technical Information Program
I
E. Weiland
Ohio
Center
.
f
,:'..... , ::ALLY BLANK , _I_A GE
it
/
/
Contents
............
Summary _Introduction
I 1
........................................................................... .................................................................... ..............................
......

.3 4
Linear Mappings and Table Lookup .................................................... Artificial Neural Networks ........................................................... Introduction to neural networks Backpropagationtrained
networks
4 .4
....................................................
Counterpropagation networks ....................................................... Preclassifiers and systems of neural networks ........................................... Backpropagationtrained network and counterpropagation network using an Adaptive
8 .9 11 Resonance
.11 13
Theory 2 preclassifier ........................................................... Experimental Setup and Procedures ..................................................... Development of Training Sets ....................................................... Training set design for optical alignment ............................................... Dimensionless training sets ......................................................... Experimental Procedure for NeuralNetDirected Alignments .............................. Apparatus and software ............................................................. Neuralnetworkdirected alignments of a spatial filter ..................................... Neuralnetworkdirected alignments of a model spatial filter ............................... Results and Discussion ...............................................................
=  
i
2 2
• ,.
Automation of Optical System Alignment ................ Description of Alignment Process ..................................................... Theories and Models of Spatial Filter Alignment ...........................................
13 14 .15 16 16 17 18 18
Comparison of Training the BackPropagationTrained Network and the CounterPropagation Network ..................................................... Alignment Tests With Neural Network Systems ............................................ Concluding Remarks ................................................................. Appendix ATheory and Models for BeamSmoothing Spatial Filter .......................... Fresnel Diffraction Theory .......................................................... Region A Model  Filter Out of Focus ................................................. Region B Model Filter Focused or Nearly Focused ..................................... Appendix B Visualization of Spatial Filter Alignment ..................................... Standardization ..................................................................... Example of Visualization Process Final Comments on Visualization
18 19 21 23 23 25 25 26 26
...................................................... .....................................................
26 27 28 30
1
7. =
Appendix C Symbols ............................................................... Rfe e rences .......................................................................
""
PRECIEDtNqB iii
PAGE
BLANK
NOT
FILMED
\
\
Summary This report describes an effort at NASA Lewis Center to use artificial neural networks to automate
Research the align
ment and control of optical measurement systems. Specifically, it addresses the use of commercially available neural network software and hardware to direct alignments of the common laserbeamsmoothing spatial filter. The report presents a general approach for designing alignment records and combining these into training sets to teach optical alignment functions to neural networks and discusses the use of these training sets to train several types of neural networks. Neural network configurations used _nciude the ad_ 'A'_eresonance network, the backpropagationtrained network, and the counterpropagation network. This work shows that neural networks can be used to produce robust sequencers. These sequencers can learn by example to execute the stepbystep procedures of optical alignment and also can learn adaptively to correct for environmentally induced misalignment. The longrange objective is to use neural networks to automate the alignment and operation of optical measurement systems in remote, harsh, or dangerous aerospace environments. This work also shows_that_whe_n n eu_ral networks are trained by a human should
operator, training sets should be recorded, training be executed, and testing should be done in a manner
that does not depend on intellectual
judgments
of the human
operator.
Introduction This report describes an effort at NASA Lewis Research Center to use neural networks to automate the alignment and control of optical measurement systems. This project, which was supported by the EarthtoOrbit Propulsion Instrumentation Working _3_upof NASA,w_6e[gu n in 1989 because of a need to make optical measurements near an operating test bed of the Space Shuttle Main Engine where the environment is intolerable for humans. Aerospace measurement environirients can be Chara_'fer= ized in terms of two challenges. The first challenge is the optical access of areas of interest in the experiment, rig, or facility. The Se_cbfii_challenge, the one that we address in this report, is that hands'on alignment, adjustment, and control is often difficult or impossible. Although handson adjustment
of these systems is frequently necessary during a test, human safety considerations often prevent access during testing. This may mean, for example, that a test must be shut down for a period in order to do the required adjustments. Obviously, it would be less costly and far more efficient to automate these adjustments. For a typical alignment the usual procedure is for a human operator to control the illumination of an extended region visually by using the beam pattern for alignment clues. This procedure is not necessarily trivial. For example, four mirror mounts, each with 3 rotational degrees of freedom, have 12[ or 479 001 600 possible orderings of an alignment sequence involving all 12 degrees of freedom. A human operator, of course, imposes many constraints to restrict the number of possible moves. A particular mount can be aligned to center the beam on another mount and then locked. Several moves can be made sequentially to move the hot spot of a beam in a horizontal direction only. Nevertheless, even a simple optical measurement system may require frequent random adjustments of 3 to 6 degrees of freedom. A recent test of the simplest offaxis referencebeam holography setup, with components already laid out, required between 50 and 100 translational and rotational motions to bring the setup into alignment. Solving the problem of automated alignment then requires automating the learned human skill of patterndirected operation of a complex system of controls. An approach to this problem has recently become available. This approach involves a method of parallel processing referred to as an artificial neural network (ref. 1). As will be discussed later, an artificial neural network can learn to map a general set of input patterns into an appropriate set of patterns of output control actions. The advantage of using a neural network is that, like a human operator, it can learn an alignment procedure, a control law, or any other mapping by example. It is not necessary to discover a mathematical representation of the mapping by human analytic processes. The human operator need only know by experience a representative set of input patterns and output control information. Such a set of patterns and information is called a training set. However, the procedure is not quite that straightforward: the person training the neural network must know or discover the composition of the input pattern. Part of this work consisted of discovering an optical alignment paradigm (ref. 2).
Application of problem (1)
consisted Study
neural networks to the optical of several tasks:
of the theory
of neural
networks
alignment
and their
application to optical pattern recognition (refs. 3 and 4) (2) Acquisition of neural network development systems (refs. 5 to 7) (3) Selection of a benchmark component for automated alignmentthe spatial filter commonly used for laser beam smoothing and signal isolation (ref. 8) (4) Generation of humandirected alignment records to use for training neural networks or several systems of neural networks (5) Testing the systems of neural networkscomparing neuralnetworkdirected alignments of the spatial filter with humandirected alignments
Figure
1 .Disassembled
spatial
filter assembly.
Figure
filter assembly.
The spatial filter was chosen as a benchmark component because it is a simple component whose alignment is pattern controlled. Its theory, models, and pattern visualization were important for this effort, but the neural network procedures and alignment paradigm that resulted are considered to be quite generally applicable. The types of neural networks used included the backpropagationtrained network (BPN) (ref. 9), the counterpropagation network (CPN) (ref. 10), the Euclidean Preclassifier (ref. 11), and the Adaptive Resonance Technique 2 (ART2) (ref. 12). This report discusses these networks in the order mentioned. The theory of the laserbeamsmoothing spatial filter is given in appendixes A and B, and a list of symbols is given in appendix C.
Automation
of Optical
System
Alignment Description
of Alignment
Process
The spatial filter (fig. 1) and its alignment are understood, in principle, from the science of physical optics. Figure 1 is a photograph of a disassembled spatial filter of the type used to clean up laser beams in the laboratory. Figure 2 shows the assembled spatial filter, which consists of a microscope objective (typically 20x) that focuses a laser beam onto a pinhole (typically 10 _n in diameter). Scattered light (from dust particles on the lenses, for example) does not generally pass through the pinhole. An aligned spatial filter thereby filters the scattered light from the laser beam. A person learns how to align these spatial filters with some practice. There are two alignment procedures required: (1) centering the laser beam on and making it coaxial with the microscope objective and (2) aligning the pinhole with the microscope objective. The first procedure, which is done
2.Spatial
with the pinhole removed, can be complex; it may require the adjustment of several mirrors, prisms, or beam elevators. Only the second procedure is considered for automation in this report. For the second procedure, the pinhole is inserted, and the laser beam is centered and focused onto it. The spatial filter assembly (figs. 1 and 2) supports the pinhole in an XYtranslation stage with micrometer adjustments. The microscope objective is in a separate Zaxis stage for focusing. This Zaxis stage is the large, knuried cylinder shown in figures 1 and 2. These three controls allow the focal spot (beam waist for gaussian laser beams) to be centered on the pinhole. If the first alignment Y, and Z controls
procedure has been done correctly, th e X, can independently adjust the X, Y, and Z
coordinates of the focal spot relative to the center of the pinhole. On rare occasions, the total Zmotion might be as large as 1000 lam, and the X or Ymotion as large as 300 pm for the
spatial filter containing a 20x microscope objective 101xm pinhole. Final adjustments, however, might
and a be as
small as 1 _un. The human approach to executing this second procedure varies quite a bit. Appendix A contains the theoretical interpretation of the beam pattern observed. For the following description, which we believe is typical, albeit inexact, figure 3 shows the appearance of the laser beam at the various stages of alignment.
Initialappearance (focusis backed off)
Region"A" appearance (diffractionrings)
Patternprior to focus adjustment
tion, the beam normally will become fainter. The focus or Zaxis control is operated until the beam is near the edge of the field of view or until the beam starts becoming too faint. The centering procedure is then repeated with the X and Y controls followed by the focusing procedure with the Zcontrol. At some point, the appearance of the beam will change significantly. The pronounced ring pattern may disappear, the beam may become asymmetrical, the central spot may appear to be larger and softer, or the beam may brighten substantially. The region B model of appendix A represents this behavior. Essentially, in this region the beam is too small to fill the pinhole uniformly. Although the spatial filter is almost aligned, the beam patterns vary much more for a given control change than they do in region A. In region B, beams that are apparently off center or asymmetrical require slight adjustments in X or Y. Beams that are essentially circular require slight adjustments of Z. An experienced person learns to recognize the action required by a particular beam pattern. At some point, the beam will appear the same as an unfiltered gaussian laser beam. The filter is then aligned, and the human operator stops making adjustments. The task of automating the alignment of an optical system
Typical result after focus adjustment
Region "B" pattern
Aligned filter
Figure 3.Appearance of filtered laser beam at selected stages of alignment process for spatial filter.
First, the room lights are turned off, and the focus is backed off toward the laser. This process allows the laser beam to fill the pinhole. Then the output of the filter is pro

jected onto a white card for viewing. Internal reflections can occur, and faint multiple beams might be observed. The main beam will have a bright spot and a pattern of concentric rings, but initially this pattern may be too faint to be seen. In fact, internal reflections may produce beams that are brighter than the main beam. An experienced person can still tell the difference between the main beam and the internal reflections. The region A model of appendix A is an attempt to describe the pattern of this main beam (region A and region B are used to designate regions of alignment space). Typically, the human operator begins centering the main beam by moving the X or Ycontrol the same direction that the beam is to move. Usually, the largest X or Yerror is corrected first. The beam usually brightens considerably during these corrections, and the ring pattern becomes visible. Internal reflections become negligible. Once the beam is centered, the focus is then moved toward the viewing card. The beam's reflection from the card normally will move away from the center during this process, unless the pinhole is exactly centered on the beam. In addi
can be approached in various ways. In each, a given input (expressed as a vector including the present state of the system, past alignment actions, and estimates of future output) produces the output (expressed as a vector including control settings) that is used to direct the next action by the effector, which can be human or electromechanical. The object is to always move closer to the aligned state. The process of examining the current state of alignment and performing an appropriate alignment step is repeated until the alignment is complete. One way to automate the alignment process would be to model the process theoretically. Such a model might be used directly in a control system as a transfer function. Another approach would be to discover a linear mapping between the whole range of possible input vectors and corresponding output vectors. A third method would be to store the pairwise inputoutput data in a lookup table. The controller would simply access the desired output control from the table. Severe difficulties exist with all of these methods. The next sections briefly detail the methods Theories
and Models
We have
a theory
and describe
of Spatial and
the difficulties.
Filter Alignment
models
of the
spatial
filter
(appendix A) and a model for visualizing the alignment process (appendix B). What prevents us from developing alignment control systems from these theories and models alone? There are many reasons, but the key words are "credibility" and "practical difficulty." Some of the reasons follow: (1) The models are based on a diffraction integral theory that ignores coupling between components of the electromagnetic field.
(2) Thediffraction integrals themselves aresimplified toa paraxial orsmallangle approximation. (3) Thespatialfiltermodelisassumed tohavethinlenses, zerothickness apertures, andlossfree, reflection free,internalsurfaces. (4) Thetheory (eqs.(A1)to(All)) isillposed. Thebeam pattern iscomputed fromthecontrolinformation (misalignmentCoordinates), whereas theobjective istoobtaincontrol information fromthebeam pattern. (5) Themodelsareapproximations ofthetheoryitself, Twodistinctalignment regions areproposed, andthenature ofthetransition between thetwoisignored. (6) Thenumerical calculations fromthemodels arecomplicated andrequiresignificant computer time. (7) Theapplication of thetheoryassumes lineardetectors (thebeam patterns maydiffer,although thedetector readouts arethesame);hysteresis, backlash,andothernonlinear mechanical errorsarenotconsidered. Thisdiscussion applies tothespatial
filter, but it is charac
teristic of all attempts to use physical theories and models to understand complex systems. Illposedness, numerical complexity, nonlinearities, simplifications, and an incomplete understanding of the humanmachine interface greatly devalue this exercise. Nevertheless, some physical understanding of the complex system to be controlled is essential to train the neural networks discussed in the following sections. As an example, we show in appendix B that, theoretically, the laser beam and spatial filter can be made dimensionless. This discovery suggests that the training sets for the neural networks can be supplied in dimensionless form, thereby making the neural networks trained by them applicable to more general laserbeam, spatialfilter combinations. Linear
Mappings
and Table Lookup
The process of automating the alignment of a spatial filter can be considered as executing a sequence of nonlinear mappings. Linear mappings and transformations (ref. 13), by contrast, have numerical evaluations represented by the matrix equation
As an example, consider one of the laserbeam, spatialfilter combinations used for the work reported herein. This combination consisted of a 30mW heliumneon laser and a spatial filter with a 20x microscope objective and a 10I.u'n pinhole. As stated before, the total ranges of the X, Y, and Z mechanical motions were contained in a volume approximately 300 _ by 300 I.u'nby 1000 pro. The mechanical resolution in X, Y, or Z was about 1 pm, so there were about 90x106 distinct positions. Each position had a beam pattern. Labeling the beam patterns based on the 1024pixel, 8bit characterizations of the beam patterns (the resolution used in appendlxBTor visualization) would require 92.16 GB of address space. Storing, for example, a 2bit representat ion=of the control to be selected for operation (X, Y, or Z) would then require an additional 23.04 GB of storage space. For adequate table lookup results, the whole range of input space would have to be covered with the stored alignment records. The obvious objections to this method are that no human operator would ever collect 90x106 training examples for any system and that memory requirements are unacceptably large. The table lookup method can be modified to use less data. The modifications, which are described in terms of a lookup _table Space, wherein input vectors are called points, folloW: (1) Eliminate points that have zero probability. (2) Divide the space into volume elements that are sized to be visited with equal probability. (3) Determine an exemplar for each volume element. (4) Develop a lookup algorithm that associates a point with its exemplar. (5) Read out the output pattern associated with the exemplar. (6) Interpolate between outputs, particularly if a point has a nonzeroprobabiiity of being associated with more than one exemplar. The objection using theories is required. Artificial
b : Wa
i
to using this procedure and models:
Neural
detailed,
is the same as for
specialized
knowledge
Networks
(1) Introduction
to neural
networks.
 Neural
networks
are a
where a and b are column vectors with numbers of components m and n, respectively, and Wis an n by m matrix. The use of equation (1) in science or engineering implies a linear system or a system that can be linearized.
new approach to a very old problem: extracting and implementing the mapping or transformation of a set of ifiput vectors into a set of output vectors. That operation is expressed in functional form as
One possible solution to a mapping problem is to use a table lookup procedure to compare an input pattern with a representative collection of input patterns and to interpolate in a table of inputoutput pairs to determine the output control information. The major drawback with table lookup is that entries in the table are considered to be independent. The training and memory requirements can be quite large.
=
i it =
b
= f(a)
(2)
The components of an input vector a might consist of sensor values and results of operations on sensor values from times in the past, components of the output vector b or results of operations on components of the output vector b from times
in thepast,and
modelderived estimates of future components of b. The components of an output vector b might consist of control settings and sensor values. The function f represents a general combination of processes. In a sense, discovering or implementing f is a primary task of both instrumentation and controls personnel. However, measurement errors and noise change the task somewhat: the objective is to discover or to implement an fthat minimizes the meansquareerror E between the mapping or transformation and the set of target output vectors. In general, we want to minimize the expression
.?S (OUTPUT
0\
_\ _
= (Ib
f(a)[
2)
_LTn_S w_T.TRANS._S_ON "ROPOR'ZONAL TO "_t' w1'' w1'
_
l E
(3)
NODI)
OPTTCAL
__z_pu, I TO a
t,
FIBERS
w_s &=,
P_OPO,TZom"
a 3
Figure 5.Concept of optical realization of figure 4. One output node shown. where < > represents an expectation value. The discipline of artificial neural networks does not yet have standard terminology (ref. 14). However, neural networks can be viewed as special cases of networks of independent, parallel operating, interconnected processors. The term "parallel distributed processing," which is the title of a fundamental reference (ref. 3) also describes the discipline. This report adopts figure 4. Artificial
neural
the operational networks
definition
that is shown
are biologically
in
inspired
(refs. 15 and 16). The terms "synapse," "axon," and "neuron" are used occasionally as in figures 4 and 5. A neuron, or node, has weighted inputs or synapses. In software simulations these weights, by convention, are assumed to belong to the node, and they are part of its local memory. A node sums its weighted inputs. In addition, during training the node can be allowed to add a bias term to that sum. A major change from the linear transformation is that the output or axon value is a nonlinear function of the weighted sum. The important property of the nonlinear function (sometimes called an activation function) is that the network generates internal degrees of freedom. The nonlinear function used in the work
b
ha
_
OUTPUT
OUTPUT
VECTOR
_41 WEIGHTED
CONNECTIONS (Axons )
INPUT
CONNECTIONS
(S/NAPSZS)
reported herein is deterministic, or reversible. The output (axon value) usually is transmitted to other nodes, but it also can be fed back as a weighted input to its own node. It can have external inputs for resetting it or for applying external data. Nodes can fire or put out new information synchronously or asynchronously. Interconnections can be fairly arbitrary. The internal degreeoffreedom generating ability is responsible for the very complex behavior of most nonlinear systems. A nonlinear system with a few input variables acts like a linear system with a much larger number of variables. The nonlinear mechanics or fluid dynamics of these systems is complex because of this phenomenon (ref. 17). Feedforward networks use this phenomenon to perform the general mappings given by equations (2) and (3). The summations and nonlinear activations constitute an engine, and the weights constitute the knowledge, control information, and program for the engine. Calculating the weights efficiently, accurately, and stably is a primary topic of research and development. This report is concerned with the use of commercially available neural network systems to learn and direct optical alignment. The algorithm used to learn the training set is a weightcalculation algorithm. The neuralnetwork generalization of the linear mapper discussed previously is called a feedforward network. Such a network (fig. 6) is arranged in layers of nodes, where a layer receives inputs only from the previous layer and transmits outputs only to the next layer. In addition, unlike the linear mapper, the feedforward network uses multiple layers. There are a number of modifications of this architecture in use, but the architecture
just described
the general value of neural networks. To cause a neural network to learn, a_.._l
INPWI _ VECTOR
Figure4.Unear transformationdescribed by neural network terminology.
allows
the training
presents input and output vectors to the network time. For each iteration, the network is allowed connection algorithm.
us to explain program
one pair at a to adjust the
weights using its particular weightcalculation The abstract objective is to determine weights
W
f(w_ax
I
I
_
+
w1=a2
+
bias)
_'NPUT
(a) Network. Figure&Feedforward
(b) Sample neuron in hiddenlayer. artificialneural network,
that no longer require adjustment as the entire set of input/ output pairs is applied. Nonlinear differential equations or nonlinear difference equations (for the discrete time case) can be used to describe neural networks as functions of iteration number. Equations (2) and (3) for stable mappings (supposing such mappings exist) should be changed to the equations b = g(a,W)
(4)
and
E
= (b
g(a,W)12
where W is an array of weights
(5)
)
whose values are to be deter
mined by training. If the weights are varied or stepped, a differential equation or difference equation replaces equation (4). The differential equation is given by
_t
=
g a,
dt'
_
b
(6)
where both a and W are allowed to vary with the iteration number. When b is fed back as an input to the neural network, it becomes an argument of the nonlinear function g. The problem stated here is an inverse problem; it is generally illposed even for linear systems such as those encountered in computed tomography (ref. 18). Fortunately, the accuracy or uniqueness of the weights is not important for the alignment problem as long as they generate correct output vectors. There are two approaches to developing neuralnetwork architectures and weightcalculation algorithms. The first and oldest approach is phenomenological: take clues from biology and cognitive science (the psychology of learning). This approach is very much aligned with artificial intelligence. An example is the Adaptive Resonance Technique 2 (ART2) that we used in our alignment studies and which we discuss later. References 3 and 15 discuss the psychological and biological viewpoints thoroughly.
The second approach is to apply techniques from the dynamics of nonlinear systems (ref. 17). The most common approach defines an energy function (sometimes referred to as the "cost function" or "Lyapunov function") in terms of the weights. The objective is to design or discover an artificial neural network and energy function where the energy function has minima. Modified steepestdescent techniques are used to determine the weights that correspond to the energy minima. Feedforward networks are trained by minimizing an energy error between the generated as in the expression
e =
defined as the mean outputs and the training
square outputs
E (bcw) t,,)2
(7)
where the sum is over the N samples in the training set. The vector bt is a training vector, and the vector b is the actual output vector resulting from weights W and a training input at. Topologically, E is imagined to be a surface in a space whose dimension equals the number of weights. The surface has dimples or valleys that represent the minima. There are two poss_le complications. The first is that E can have multiple minima, so a solution for Wmay not represent the lowest of the minima. The second complication is that the algorithm may not converge, and the solution may oscillate wildly. Therefore, the trajectory of a point in the weight's phase space (the space of W and dW/dt or W and its increments) may be chaotic (ref. 19). Artistic adjustments are sometimes necessary to control these complications during a training procedure. The dynamics of nonlinear systems, in general, has become a topic of major research interest. The energy minimization method is the basis for the backpropagation algorithm. That algorithm was used for our alignment study and will be discussed later. The error (eq. (7)) cannot be expected to decrease to zero for an actual training set, even when convergence is to the lowest minimum. This phenomenon can be understood in that training sets, in general, are characterized implicitly by a joint probability function or joint distribution function P(ba), where the concatenation ba means that the components of b and a are the arguments of P. The use of this function accounts for the facts that (1) human generators of input/ output pairs do n0t execute exactly the same control actions b every time they encounter essentially the same input pattern a and (2) that there are groups of scattered input patterns a which result in essentially the same control action b. Statistical analysis of training sets can be very complicated. Inputs are not necessarily independent of one another; outputs are not necessarily independent of one another. Joint probability distributions of components of a and b are not necessarily normal. Nevertheless, neural networks trained with statistically simple training sets perform in a statistically optimum manner (ref. 6). That is, the neural networks will
(1) Minimizethemean square errorofequation (3) (2) ExhibitBayesian performance (ref.4) (3) Produceoutputswith themaximumlikelihood ofbeingcorrect(ref.4) Onewaytoperformastatistically simpletestofaneural network istogenerate training setswherethecomponents of aninputa belong to one of several multivariate normal distributions (ref. 20). The bivariate normal density vector a = (al, a2) is given by the equation
f(al ,a2)
=
1
2_o,o2_
exp
 p2
If P(a/O is the probability density that a is associated with the winning distribution i, then the predicted number of errors after a large number of tries is given by the sum of ratios
for an input (9)
The first sum is over a large
2(1'1 p2)
input vectors a. The Bayes the distributions are present
number
pute the probabilities that a is caused and we select as winner the distribution x
(a I 02 Ul) 2
2p
(al
Ul)(a 0102 2 
u2)
(8)
where 0.1 and 0"2 are the standard deviations of the two components, ul and u2 are the means of the two components, and p is the correlation coefficient. The function.f is a probability density function. Multivariate normal distributions for any number of components are defined in reference 20. Output vectors can be defined in at least two ways. A separate output node can be associated with each distribution, or the output vector can contain a binary code of the distribution. For example, the first and fourth distributions of a fourdistribution training set could be designated with output vectors (1, 0, 0, 0) and (0, 0, 0, 1) or with output vectors (0, 0) and (1, 1). Training consists of associating each input vector with its most probable distribution and executing one of the training approaches discussed earlier. Testing is based on the recognition that the normal distributions overlap. An input vector has a nonzero probability of belonging procedure would be to
to any distribution.
One test
(1) Use a random number generator to select the components of the input vector a. (2) Compute the probability densities that a belongs to the various distributions. (3) Select as Bayes winner the distribution associated with the largest probability density. (4) Use the neural network to select a distribution. (5) Count a difference between the neural network winner and the Bayes winner as an error. (6) Repeat this process a large number of times. (7) Compare the measured number of errors with the number predicted from the Bayes rule.
of randomly
selected
rule is applied by assuming that with unity probability. We comby each distribution, with the largest prob
ability. We then use the Bayes rule to compute the probability that given a, the winning distribution is i. We subtract this result from unity to calculate the probability that the losing distributions would be caused by a. This process is repeated for a large number of a values. The sum rounded off is an estimate of the expected number of errors consistent with the Bayes rule. One of the neural
network
development
(ref. 6) provides demonstration examples for several neural network architectures.
packages
used
of this procedure In every example,
the performances of the neural networks approach that of the Bayes classifier. However, the training sets used for optical alignment will be more complex than those used for these tests. There will be correlations between different output components as well as different input components. Distributions frequently will not be normal. The networks will generate outputs not in the original training set. Nevertheless, we make the following general comments without formal mathematical proof. Neural networks or systems of neural networks exist to perform the mappings defined by the minimization of equation (3). The most important property of these networks is their ability to generate internally, in some sense, enough degrees of freedom to meet the linear system's requirement for linearly independent exemplars. These neural networks are trained by an iterative procedure that minimizes an energy function. This energy is defined in terms of a training set of exemplars; it may be the mean square error between the exemplars and the outputs. For automated optical alignment, the training set is generated by a human operator. The training is influenced by the idiosyncratic and stochastic behavior of that operator. Hence, isolated output examples of the neural network may be erroneous. We must design a system to depend on the average behavior of a neural network and to be resistant to, or recoverable from, occasional bad directions. Because neural networks are complex nonlinear systems, hard to interpret, sometimes occur during the training process.
chaotic behavior
might
Theuse of neural networks to implement mappings appears straightforward at this point. However, once the network is trained, there is still the problem of executing the electrical and mechanical alignment. Before that problem can be tackled, however, and even before the network can be trained, the architecture of the training set itself has to be designed. What clues does a human operator use to select an alignment step? What output information is necessary to execute the alignment step? What output information is used to select a next alignment step? Clearly, knowing something about optics and optical alignment is important. Neural networks cannot replace expert knowledge entirely. The design of training sets for optical systems, and for optical alignment in particular, is discussed in the next section. Although MaxwelI's equations are linear in the electric and magnetic fields, classical optics produces many nonlinear processes. Examples are (1) Gain saturation and feedback in optical cavities (2) Nonlinear Constitutive relations (3) Nonlinear sensors, detectors, and recording materials (4) Ray tracing (5) Nonlinear practices in data handling (6) Alignment, adaptation, and control (7) Acoustooptic and electrooptic switching (8) Photometry and physiological optics (9) Nonlinear mappings from physical causes to optical effects Nonlinear processes complicate the use of optics for studying linear phenomena. One viewpoint of neural networks is that they make nonlinear activities "transparent to the user." The optical alignment process is a very good test case. It has the potential to involve all the nonlinearities mentioned previously. This process also tolerates the property of a trained neural network being correct on average. Because alignment is accomplished in a series of steps, a single erroneous step need not destroy the entire process. Backpropagationtrained networks.  Back propagation refers to a variety of algorithms used to train feedforward networks of the kind shown in figure 6. These algorithms find the minimum of an energy function expressed in weight coordinates. The method is essentially an incremental steepestdescent search. Damping in the form of socalled momentum terms, smoothing terms, or averaging is added to prevent oscillatory, or even chaotic, trajectories. The energy function mentioned is the mean square error between the actual outputs of a feedforward network and the corresponding training outputs. A good derivation of the unmodified algorithm is provided in reference 4. This algorithm treats weights in the output and hidden layers slightly differently. The unmodified algorithm is executed one training record at a time. The notation varies from reference to reference.
The algorithm
for updating the weights
connect
ing the output layer with the adjacent the equations
hidden
layer is given by
wb÷t=wb+
(10)
where
t_i =
f'(si)
(bti 
b i)
(11)
and where s i is the total weighted input at output node i, lj is the output of node j in the adjacent hidden layer,/'is the nonlinear activation function, b i is the output of output node i, bti is the training output for node i, the superscript t is an iteration number, and tx is the learning rate. Note that 5i depends on the derivative of the error. The total weighted input to output node i can include a bias term. The bias term is thought of as a weight times a unity input. Equation (12) redefines 5 for the calculation of weights the adjacent hidden layer and subsequent hidden layers:
8, _r(,,)Es,
for
(12)
i=1 The 5's in the sum are from the previous layer; the weights are the old (nonincremented) weights from the layer above. Note that the learning rate tx might vary from one layer to the next. The application
of the algorithm
is summarized
as follows:
(1) Apply an input vector. (2) Calculate the weighted inputs si to the nodes of the next layer. (3) Use the activation function to calculate the outputs I i of the nodes. (4) Calculate the derivatives tion at each node.
f'(s)
of the activation
func
(5) Continue until calculations produce an output vector b. (6) Use equations (10) and (11) and the training vector bt to calculate 5i and the increments in weights. (7) Substitute the 5i calculated in step (6) into equation (12) to calculate the 5k in the next layer. (8) Substitute the 8/_calculated in step (7) into equation (10) to calculate the increment in weights. (9) Repeat steps (7) and (8) to propagate the calculation of weight increments backward through the network. (10) Update the weights and execute the process for the next record. This bare algorithm may work in some cases, but it is likely to lead to erratic behavior. One solution is to calculate the weight
increments
for all training
records
and to average
these increments before updating the weights. This approach constitutes a steepestdescent search for an energy involving the entire training set and is the correct approach implied by equation
(5). A second
proportional
solution
is to add a momentum
8O0
m )K
term
to the last increment: 6OO )K )K
The purpose of these steps is to prevent the weight space trajectory' from deviating drastically from the average; the trajectory is pulled toward the trend established by the previous increments. The commercial package identified by reference 6 uses a slight modification of the momentum principle. The previous increment the learning coefficient
400
m
is multiplied by a coefficient 13,and (x is multiplied by 1  13. This tech200
nique is called smoothing, and I] is called the smoothing coefficient. The commercial package in reference 5 was used with momentum, and the commercial package in reference 6 was used with smoothing. When it did give adequate results, the backpropagation algorithm was slow to converge. Frequently, between 10 000 and 100 000 passes through the training set were used. The parameters were changed during training. Training might start with (x = 0.4 and _ = 0.8 and end with (x = 0.05 and 13= 0.4. The backpropagation algorithm was able to learn individual alignments fairly well, but was unable to perform well when trained with the complete training sets. Counter.propagation networks.The counterpropagation network (CPN) (refs. 6 and 10) attempts to use the neural network approach to map equation (2) by table lookup. One version (ref. 6) uses a neural network to adaptively learn a table, perform lookup, and interpolate. CPN embodies its inventors' philosophy of how table lookup should be executed. The objectives and operations of CPN can be understood by adopting a nonEuclidean geometrical viewpoint. We start with a set of records of the type R = (I, 7'). The input vector I has seven elements for the spatial filter example, and
E
we perform the following operations: (1) Choose a volume in the space of ! that contains the training set. (2) Note that the volume will contain a finite number N of grid points and that these points may or may not be close to training set vectors. (3) Now, distort the Cartesian grid in the volume.
I
x Alignment error, ixm u)
(a)z Focuserror versusx alignmenterror. 800 W
!.4
]K
600 )K )K
400 
the training vector T has five elements for the spatial filter example. Now, imagine that we have a sevendimensional Cartesian coordinate system. The input vectors can be plotted in this space, and a particular grid point in this hyperspace may or may not represent a realistic input vector. This problem was discussed in the Linear Mappings and Table Lookup section. Figure 7 shows mechanical alignment errors for members of a training set. Now, imagine that
!
I
I
It_K
200
I 400 200 0 200 y Alignmenterror, ixm (b) z Focuserrorversus y alignmenterror.
400
Figure7.Mlsallgnment coordinatesfor a heliumneon trainingset.
(4) Pull each grid point into the midst of a group of input vector points so that (a) An input vector belongs to a grid point if its Euclidean distance from that point is smaller than its Euclidean distance from any other point. (b) Each of the N grid points has about the same number of input vectors.
_
__F_
u aI_.91
0UTPUT
VECTOR
GROS SB ERG __
KOHONEN
WEIGHTS
NODES
(5) Now, average the training vectors T associated with each of the N sets of input vectors tied to a grid point and associate that average with the grid point. Steps (1) to (6) create a table.
To perform
a table lookup
grid
by
(1) Generate ala input vector (2) Determine the nearest measure.
I.
(3) Read the training point.
average
The reading mapping.
in lookup
vector
step
entered
for that grid
(3) is the estimate
+
of the
_,_
where r is usually equal to "1." (3) Average the entries at the grid points weight factors in step (2).
(14)
by using
The interpolated value in step (3) is now the estimate mapping. Essentially, CPN is a neural network that executes
the
of the these
operations. If we use the geometrical metaphor, each grid point is represented by a node, or neuron, in a layer called the Kohonen layer as shown in figure 8. The components of the grid point associated with a node are represented by a weight vector W. Every Kohonen node is fully connected to the input layer because each node must measure its Euclidean distance ]W/l from the iriput Vector. average values of the training vectors associated with each Kohonen node are stored in a second layer called a Grossberg layer. The number of nodes or neurons in a Grossberg layer equals the number of elements in the training vector T and output vector O. Each Grossberg node is fully connected to the Kohonen nodes (as in fig. 8), and each connection is weighted.
10
I,oo, v,o,o network
(CPN).
Euclidean
(1) Determine the n nearest grid points and their Euclidean distances Si. (2) Evaluate a weight factor for each grid point given by
=
I
Figure 8.Counterpropagation
point
This procedure allows only N levels: one for each grid point. Interpolation can be used to eliminate this quantization:
ei
i
The Grossberg layer is a linear layer. Each node has a vector U of N weight values: one weight value for each Kohonen node. The Grossberg weights corresponding to a particular Kohonen node are, of course, the averaged components of the training vectors associated with that node. In the noninterpolative mode, the outputs of the Kohonen nodes are zeros except for the winner (the least Euclidean distance node), which produces a "1." In this mode, the dot product of the vector Z of the Kohonen outputs with each weight vector U of the Grossberg nodes produces the correct output. In interpolative mode, several weighted winners are enabled; their outputs are the signals e i defined earlier. The output of a Grossberg node is then an average of more than one of its weights. The counterpropagation network is trained in two stages. The Kohonen layer is trained first; then the Grossberg layer is trained. Or in geometrical terms, the space of the input vectors is distorted first; then the training vectors associated with a grid point are averaged. Each training process is iterative. The Kohonen layer is adapted to the space of input vectors. Each Kohonen node starts out with a weight vector. The grid is not necessarily Cartesian as in the earlier conceptual discussion. The Euclidean distances ]W II are calculated for each node, and the minimum is selected as winner. The winner is allowed to change its weights slightly to reduce the distance slightly. This fractional change is the winner's learning rate et. This process can be continued, and the space will be distorted. However, there is a problem. Some grid points might not be close enough to an input vector to ever modify the grid points' weights, and contrary to the objective, some nodes might end up with many more than their share of input vectors. This "unconscionable" result is !6reventedby adding a property called conscience. The winning rates of the nodes are monitored, and a node is shut down if it has too large a winning rate. Other nodes can win and adjust their weights. A node is not actually turned off. Instead, bias terms are added to the distances to increase or decrease them artificially, thereby imparting a winratedependent disadvantage or advantage.
Once all the Kohonen nodes are winning at about the same rate without the need for conscience, it is time to train the Grossberg Grossberg difference
layer. In the package used for this work (ref. 6), training occurs continuously. A fraction a of the between the weight and the corresponding compo
input nodes for the control operated previously and one node for the pattern class. The control operated previously is none, Z (focus), X, or Y. The pattern class is region A or B. One system, which was trained with the heliumneon training set, contained 13 classes. Those classes are identified in table I.
nent of the training vector is added to the weight. The coefficient a, which is called the Grossberg learning rate, is kept
The training sets constructed from the 13 classes used to train 13 BPN's. Thesefeedforward networks
large while the Kohonen nodes are learning. The Grossberg nodes then effectively flip from one training value to the next. Once the Kohonen nodes have learned, the learning rate a is reduced to a very small value. The Grossberg weights
tained a sevennode input layer for the sevenelement vector, one sevennode hidden layer, and a fivenode
for a particular Kohonen node then tend to the average of the components of the member training vectors. For CPN, the effectiveness of the learning process is determined from a
defined
mean square error. In general, the two architectures performed equally well, or equally badly, given the same training sets. CPN is interesting because its table lookup, geometrical viewpoint is different from a viewpoint discussed previously. The previous viewpoint was that neural networks learn general mappings because the nonlinear activation function effectively transforms the training vectors into linearly independent forms. The CPN viewpoint is one of learning discriminants (ref. 4). A point that is enclosed by a volume defined by discriminant surfaces in some space is considered to be associated most likely with the properties of that volume. The properties in CPN are contained in an average training vector. Another concept is transforming a space to bring spatially separated vectors into proximity (ref. 4). In reference 4, the layers of the neural network move different input vectors with essentially the same output into the same volume of some multidimensional space. Preclassifiers and systems of neural networks.From a practical viewpoint, acceptable alignments of the laserbeamsmoothing spatial filter required using a preclassifier as a step in creating a system of neural networks. These systems are discussed next. Preclassifiers use socalled unsupervised learning to group vectors into classes; that is, preclassifiers learn classes of vectors without being taught by example. This means, of course, that a classification procedure must be built into the neural network architecture from the beginning. A Euclidean preclassifier was used for this work. The Kohonen layer in CPN is an example
by the equation 1
f(x)

1 + eAx
(15)
where A determines the gain. As A increases, the sigmoidal function approaches the unit step function. Sigmoidal functions were used for all the BPN mappers discussed in this report. The commercial package identified in reference 5 performed very well once it was trained. Training times for this allsoftware package were very long. Overnight training sessions were required for some networks. This package is useful for learning about neural networks. However, the systems discussed next were trained more rapidly with a coprocessorbased system (ref. 6). Backpropagationtrained network and counterpropagation network using an Adaptive Resonance Theory 2 Preclassifier.  The Adaptive Resonance Theory 2 (ART2) preclassifier produced classes for the heliumneon training set that did not differ drastically from the classes produced by the Euclidean preclassifier. These classes are tabulated in
TABLE
1.
CLASSES
FOR EUCLIDEAN
PRECLASSIFIER HELIUMNEON TRAINING SET Class
of a Euclidean and each from the
exemplar of that class. Learning consists of setting up the exemplars for the chosen number of classes, and interrogation consists of classifying a set of vectors. The Kohonen layer adaptively moves grid points (exemplars) in the space of the input vectors until the classes (Kohonen nodes) are uniformly occupied. For the work discussed herein, classification ended up
Control
Pattern
operated last
(region)
NONE"
Number
of
training records'
A
15
2
Y
A
78
3 4
Z X
A A
10 45
5
Z
A
28
6
Z
B
49
7
Z
A
19
8
Y
B
20
9
X NONE h
B
19
A
16
Z NONE h
B
4
A
7
X
A
37
10 11 12 13 •Each
being based mainly, but not entirely, on the states of the digital components of the input vectors. Those components are three
input output
layer for the fiveelement output vector. A sigmoidai function, which was used as the nonlinear activation function, is
1
distance preclassifier. Each class has an exemplar, class member has a minimtim Euclidean distance
were con
training
record
oulput
veclor.
_Beginning
of
consists
of
one
in
ml
vector
rand
one
alignment.
11
table II. A comparison
of table I with table II shows, for exam
ple, that the Euclidean preclassifier puts all the region A, previous operation = Y records into a single class of 78 records; whereas ART2 creates two classes where one class has 57 records and the other has 21 records. Both preclassifiers create three classes for region A, previous operation = none records. The distribution of records between the three classes is slightly different for ART2 and the Euclidean preclassifiers. ART2 (refs. 6 and 12) monitors how well input vectors agree with those supplied in the original training set. It also embodies a significant philosophy of neural networks. This philosophy is likely to be important for optical applications; therefore, ART2 will be discussed used only as a preclassifier. TABLE
briefly even though
II.CLASSES
PRECLASSIFIER NEON Class
FOR
it was
ART2
HELIUM
TRAINING
Control
Pattern
operated
(region)
last
SET Number
of
training records'
1
NONE h
A
15
2
Y
A
21
3
Z
A
20
4
Z
A
31
5
X
A
63
6
Y
A
57
7
Z
B
53
8 9
Y X
B B
20 19
10
NONE b
A
11
11
NONE h
A
12
12
X
A
9
13
Z
A
6
"l_¢htraining rco.)rdcon_,irds of onc inputveclor andone outputvector. _Beginningof alignment.
The following comments might be useful in reading the literature about the Adaptive Resonance Theory (ART). The terminology used to describe ART2 in the references differs from the terminology used so far. As previously mentioned, nonstandardized terminology is a problem in this field, which only recently has been used for applications. ART is derived from ad hoc efforts to combine theories from biology and cognitive science with artificial neural networks, and its significance may be a little hard to understand by those not versed in neuronal biology. In addition, some people (such as biologists) may dislike the use of biological terms for artificial systems. For example, in ART a layer of nodes with adjustable weights is called an adaptive filter rather than a layer of neurons, and the nodes are said to use competitive learning. Competitive learning means that only one node in a group wins the right to learn during an iteration. In that sense, a leastEuclideandistance criterion is competitive learning. ART attempts to incorporate or emulate in a single system the formation of a properly scaled Shortterm memory from
12
an input, the comparison of that shortterm memory with longterm memory (bottomup weights), the selection of a best longterm memory, the generation of an expected value for the shortterm memory from longterm memory (topdown weights), a comparison between the expected and actual shortterm memories (vigilance), and a decision based on that comparison. The decision based on vigilance may be (1) to accept the triggered longterm memory as correct and to modify it slightly for the new input, (2) to shut down that memory node and look for a better match, or (3) to shut down the active memory nodes and select a new node (class) in the longterm memory. This process is complicated by analog computation in several loops, by variable gain factors, and by stabilities that are established independently, with different time scales, and in different loops. In effect, recall, comparison, and learning occur continuouslybut with different time constants. ART attempts to add two features not used explicitly for this projectcontinuous learning or adaptation and the comparison of memorygenerated expectations with inputs. The second feature appears implicitly during the alignment of a spatial filter as is explained later. In contrast, our systems of neural networks are designed to learn training sets. Once our systems are trained, they are used to respond to every vectorgenerated input, and they generate an output vector in response to every input vector. There is no attempt to check the validity of the output. We assume that a good training set will be inclusive; therefore, input vectors will fall in the space of the training vectors. We also assume that a slightly bad move will not be fatal, but will be corrected in the next move. These assumptions have proven, so far, to be correct for spatial filter alignments. Suppose that there is a chance that the nature of our input vectors will change. Ideally, we would like to detect that change and to learn the appropriate inputoutput combination as rapidly as possible. The following approach could be considered. Train two feedforward networks with the training set. But for the second feedforward network, reverse the roles of the input and output vectors. That is, use the training set output vector as the input and the training set input vector as the training vector. During operations, an input vector is applied to the first network. That network generates an output vector, the output vector is applied to the second network, and a new vector appears at the system input. Then the two vectors at the system input are compared. A certain agreement is required as specified by an adjustable vigilance parameter. The system output is used for control if the agreement is good; otherwise the system is halted and learning is activated. Agreement, within the error set by the vigilance parameter, is called resonance. Learning is called adapting the resonance. There are significant problems with this procedure. One problem is that mappings are not oneone. Consider a network that is to learn exclusive OR operation (XOR). Both (0, 0) and (1, 1) map into a zero output. Which pair should
be used in the reverse direction?
The solution
is to iterate the
network twice. Train the reverse network to produce a (1, 1). Then, a zero output will generate that input. Now, feed the (1, 1) through again. Everything repeats on the second iteration. The processing loop must be stabilized, and there must be normalization since signal levels will vary. The nature of discrepancies must be considered in normalization. A large error in one node might be a reason to halt the network, whereas the same total error distributed over several nodes might not be a reason to halt the network. Unsupervised learning bypasses a problem with supervised learning (which is based on inputvector, outputvector training sets). Systems training with supervised learning will fail if the training set was not designed correctly or if the training set does not contain enough examples. Unsupervised learning or pattern classification, on the other hand, is designed to accept and learn new patterns. ART is intended for pattern classification. There are several ART architectures;
ART2 consists
of two
versions (refs. 6 and 12). This project used ART2 to assign the training set records to classes. The vigilance parameter p was used to determine the number of classes. This parameter, which is in the range (0, 1), was compared with an expression containing the cosine of the angle between vectors derived from the input and the expected input vectors. Good agreement between these vectors produces an expression value close to "1." The value decreases as agreement deteriorates. A vigilance parameter that is too large produces too many classes, and a vigilance parameter that is too small produces too few classes. The heliumneon training set required p = 0.98 for 13 classes, and one argonion training set required p = 0.99 for 12 classes. These numbers apply to the training sets in dimensionless form. Although ART2 was used primarily did detect input vectors that deviated training sets. Processing would halt the input could not be classified at However, this problem did not occur to reduce the vigilance slightly to
for preclassification, it significantly from the during interrogation if the training vigilance. often. The solution was allow classification to
proceed. The classes established by ART2 were used to train both BPN and CPN networks. In general, a system consisting of the preclassifier and 12 or 13 BPN or CPN mappers could learn the training set adequately. Given a trained system of artificial neural networks, the remaining task is to test that system with an actual alignment of the beamsmoothing spatial filter or with a model of that alignment. Those experimental procedures are discussed next. The neural networks discussed here embody the ad hoc or anecdotal viewpoints of their inventors. However, all the systems of neural networks tested, regardless of the viewpoints of their creators, could learn a training set equally well, if they could learn it at all. We were unable to train any single, isolated neural network architecture adequately.
Combinations of neural networks were required to learn the training sets. We suspect that the design of the training set, which combined digital and analog representations, may have been partly responsible.
Experimental
Development Some
Setup
of Training
previous
work
and Procedures
Sets has been
done to develop
optical
alignment training sets (ref. 2). The inputs of the first training sets consisted of the control last operated (X, Y, Z, or none), the xy position of the beam bright spot on a reflector, and the average brightness. The outputs consisted of the control to be operated (X, Y, Z, or none), the new xy position of the beam bright spot, and an estimate of the new brightness. The network was expected to learn the following sequence: zero x and y, reduce the focusing error z subject to the beam remaining on the reflector, and repeat the process until the beam brightness equals the previously known maximum. However, nets trained with this training set became locked in loops. The trained network would direct an alignment procedure that would return again and again to the same unaligned condition. The problem was that there are at least two classes of beam patterns (region A and region B in appendix A). The control to be operated and the expected behavior are different for the two classes, even though the other components of the input vector are the same. The training set written from a human operator's exemplars must include the input clues and control actions actually employed by the human operator. There is a need to discover what items to include in the input and output vectors and when to include them. Models of how human operators anticipate the outcome of their actions are applicable to adaptive neural networks. The difference between the anticipated outcome and the actual outcome is important in adapting the alignment procedure to different setups. The personal models of a particular human operator do not even have to be scientifically rational as long as they work. Defining personal models probably would involve an interview process similar to that used in developing the rules for an expert system. If it is possible for the input and output vectors to include all the components in the following lists, an adequate neuralnetcontrolled alignment should be possible. The input vector should contain: (1) (2) (3) (4) (5)
Previous control action Beam coordinates Beam pattern Beam brightness Expected consequences
and consequences
of control
actions 13
800The output
vector
should contain
(1) Control action to be taken (2) Characteristics to be set (beam coordinates, beam pattern, or beam brightness) (3) Characteristics to be estimated (again, the beam coordinates, beam pattern, or beam brightness)
600
The use of neural networks that incorporate item (5) from the input vector and item (3) from the output vector is part of intelligent adaptive control and is an area of substantial ongoing research and development (ref. 21). Input item (5) requires that physical or procedural models be incorporated in the network, thereby somewhat compromising the concept of a humantrainedonly system of neural networks. The work reported herein uses only a subset of the features in these lists. Training set design for optical alignment.A major challenge when using neural networks is the design of an appropriate training set. This requirement demands an intimate knowledge of the application. In our study, the development of an approach to designing training sets for optical alignment involved some trial and error. Only the final designs are discussed here. First, consider the input vector. The previous control action and consequences are represented by a threecomponent vector containing zeros and ones. The possible values are (0, 0, 0) for the start of an alignment (no previous control action), (1,0, 0) for previous operation of the zaxis or focus control, (0, 1, 0) for previous operation of the xaxis control of the pinhole position, and (0, 0, 1) for previous operation of the yaxis control of the pinhole position. The beam coordinates consisted of the x and y positions of the brightest point on the beam. These coordinates were measured relative to crosshairs drawn on the diffusely reflecting card. Initially, the pinhole was removed, and the beam was centered on these crosshairs. This step, as mentioned, is not part of the alignment procedure to be automated. Then, the pinhole was inserted, and the spatial filter was aligned carefully. The focus control was backed off about the same distance for each alignment, and the X and Y controls were set at random values. This procedure initialized an alignment ((0, 0, G) in the previous paragraph). Figures 7 and 9 show data for which the beam positions were not measured so carefully. Hence those" data show more scatter and required more steps per alignment than subsequent training sets. It usually is very easy to measure the beam position in region A because the bright spot is typically surrounded by diffraction rings (fig. 3). The only exception is early in the alignment when the beam is faint and internal reflections can be mistaken for the main beam. Beam coordinates are sometimes intensity distribution
14
difficult to define in region B. The can appear to be fairly uniform yet not
400
200
E 0
x Alignment error, txm (a) z Alignmenterror versusx alignment error.
,,8800 N
6OO
400
200 
o
I
I
I
4OO 20O 0 2O0 y Alignmenterror, ixm (b)z Alignmenterrorversus y alignment error. Figure9.Misalignment coordinatesfor one alignment. 4OO
symmetrical. measurementof
It can have a wispy texture. Sometimes, the a beam coordinate is little more than a guess.
Most of the alignment steps occur in region A, but the complexity of region B makes the spatial filter an excellent benchmark component for testing neural networks. We characterized the beam pattern by only 1 bit: a zero for region A and a "1" for region B. There was little point in using a more complex characterization in the absence of a machine vision system. Future work will probably require more complex characterizations, at least in region B, and perhaps to handle internal reflections at the beginning of region A. The beam brightness
was measured
as follows.
The 1cm 2
sensor of a power meter was centered on the bright spot, and the average power in microwatts per centimeter squared was measured after each control action. The base10 logarithm of this power was used as the beam brightness in a somewhat rudimentary attempt to emulate an operator's visual response. The brightness was then renormalized easily for different laser powers and for different distances of the reflecting card. There was no attempt to anticipate the consequences of the control actions. It is important to ask to what extent human operators use a personal mental model of the alignment procedure to anticipate the consequences of their actions. As a rule, the outputs of that model should be inputs to the neural network. Expected consequences of control actions should be investigated as an input for future research. Now, consider the output vector. The control action to be taken is again represented by a threecomponent vector. The choices are (0, 0, 0) for no action (the appropriate choice when the alignment is complete), (1, 0, 0) for operation of the zaxis or focus control, (0, 1, 0) for operation of the pin
beambrightness node. A training record then contains 12 elements, and training records are sequenced to form an alignment. Alignments are executed from random starting positions to form a training set. Two slight modifications of this design were investigated: (1) a training set where the change in brightness was used in place of the brightness and (2) a training set where the beam brightness and beam coordinates were made dimensionless. Dimensionless training sets. Human operators can align spatial filters for different gaussian beam parameters, laser powers, distances of the viewing card, microscope objectives, pinhole diameters, and mechanical designs of the spatial filter assembly. The dimensionless training set is a very primitive attempt to emulate this generality. Learning the general procedure by example is the ultimate goal. The beam incident on the viewing card is approximately a single spatial and single temporal eigenmode of the electromagnetic field. Generic eigenmodes are represented in terms of a limited number of variables, and these variables can be made dimensionless. Unfortunately, the spatial mode changes during the alignment process. Dividing the alignment process into two regions called A and B in appendix A greatly simplifies describing alignments, but changes still are particularly noticeable in region B. There is not a unique dimensionless training record. The region A and region B models were used to guide the creation of the following particular example. The dimensionless variables differ between regions A and B. Control identifications, such as (0,1,0) are unaffected, of course. The 1bit identification of the beam pattern also is unaffected. The position coordinates X, Y are replaced by
hole's xaxis position control, and (0, 0, 1) for operation of the pinhole's yaxis position control. The only characteristics to be set are the x and y coordinates of the beam's bright spot. This choice is certainly adequate for region A; region B alignments eventually might require setting some beam pattern parameters. Typically, the human operator zeros x or y when the x ory control action is selected. Both x and y are allowed to increase when the focus or zaxis control action is selected. They are allowed to increase Until the distance of the beam bright spot from the center reaches a maximum permissible value; that maximum is the output value when selecting the focus control. Beam brightness is the only characteristic to be estimated. That characteristic was fed back as an input during Simulated alignments:bf t_e spatial filter, whicll_required a modelgenerated choice for the beam pattern. The design of a training set record is then summarized as follows. There is a sevencomponent input vector consisting of three previouscontrolaction nodes, two beamcoordinate nodes, one beampattern node, and one beambrightness node. There is a fivecomponent output vector consisting of three controlaction nodes, one beamposition node, and one
xf
and
(Zws)
,=7 in region A
tzws)
and by
in region B
The dimensionless
coordinates
for region
A follow
from
the geometrical optics interpretation discussed in appendix A. The coordinate is essentially the angular location (X/Z, Y/Z) of ihe bright spot normalized with respect to the polar angle (ws/f) of the 1/e 2 edge of the beam. The square of this coordinate also is the argument of the exponential in equation (A13) in appendix A after applying the region A assumptions to that equation. The geometrical optics interpretation does not depend on the wavelength; therefore, the wavelength does not appear in the definition of the dimensionless coordinates for region A. The coordinates refer to a single point (the
15
position of thebeambrightspot);therefore, D also
does not appear in the definition. The pinhole diameter D does affect the spread of the diffraction ring pattern about the bright spot. The dimensionless coordinates for region B are based on equation (B6) in appendix B, but are not as easy to define because the shape of this mode varies significantly. The dimensionless groupings in the exponential and in the terms following the exponential are different. The position of a bright spot, if one occurs, will be determined by the bracketed terms. This suggests using the definition
x,
= (zi2x
tf2r ]
)
equation (B6) in appendix B. Hence, normalization (or renormalizatjon) is accomplished by subtracting these logarithms expressed in dimensionless variables. The first step is to express the beam power P in terms of the maximum axial intensity at the reflectifig Card. The intensity averagedover the 1/e 2 diameter also could be used, but the effect is constant Equation
(A2) is
Z in place of _z
The result is given
=
D _
and
Wf

ws
(18)
These transformations are done on the experimental training set; then the transformed training set is used to train the neural network. The neural network forming an input vector, presenting
is interrogated by transit to the network, and
performing the inverse transformation on the output vector. In addition, scaling transformations can be performed to use the full dynamic range available from the networks. These activities were performed by Clanguage functions, and the results were stored internally.
(16)
Brightness is a sum of the logarithms of factors such as those in equation (A13) or (A15) in appendix A or in
and not important in dimensional analysis. used for this purpose by substituting
dx
Experimental Alignments
Procedure
for NeuralNetDirected
Apparatus and software.The experimental setup consisted of the beamsmoothing spatial filter (figs. 1 and 2) with two combinations of the microscope objective and pinhole; a heliumneon and an argon ion laser together with mirrors to direct the laser beams to the spatial filter; a diffusely reflecting card with crosshairs for observing and centering the patterns of the laser beams; a power meter for measuring the average beam power; several experimentally generated training sets, which were produced by Kenneth E. Weiland (one of the authors); commercially supplied neural network hardware and software (refs. 5 to 6), and an AT microcomputer. Figure 10 is a photograph of the experimental setup. Most of the experimental work was done with the second package (ref. 6) of hardware and software. That package included a coprocessor that was installed in a PC/AT microcomputer.
by
_2P
_
Zws_
f
/_V_max
(17)
This result is substituted in equation (A13) or (A15). These equations are affected by the mechanical misalignment variables that are unknown during the formation of a training set. In particular, w' depends on 8z. If we choose 5z = 0 to form a dimensionless parameter, the quantity to be subtracted from the brightness elements in the experimental training sets is then given by
where
16
Figure sets
10.Typical experimental setup for the spatial filter alignment.
for acquiring
training
The package could be accessed at three levels: a menudriven level, a Clanguage function level, and a compiler level. The menudriven level supported two algorithms for learning mappings: BPN and CPN. The Clanguage function level supported 17 algorithms. It also could operate systems of neural networks by passing the outputs of one network to the inputs of the next. The compiler could create new neuralnetwork architectures. All three levels were used, but most of the data reported herein were collected using the Clanguage function level. The package identified in reference 5 also was used to create systems of neural networks. This package relied entirely on software, and it required long training times. Nevertheless, its trained systems of neural networks performed as well as those of the other package. The reference 5 package also offered a different viewpoint. Recall that the nonlinear activation functions of networks internally generate the degrees of freedom needed to handle the mappings. In contrast, the reference 5 package can perform nonlinear operations on the input vector and supply the results of these nonlinear operations as additional inputs. In some cases, a mapping can be learned by a singlelayer network when this procedure is used (ref. 4). The experimental procedure is straightforward: select a format for the training set members or records, execute a large number of alignments recording the training set record at each step, select a neural network or combination of neural networks to learn the training set, execute the training algorithm for the neural networks while monitoring some measure of learning success, note the difficulty and time for training, and test the effectiveness of training by using the neural networks to direct the alignment of a spatial filter. The experimenter also must learn the architectures and programming languages for the neural computers. This report discusses neuralnetworkdirected, alignments. In other words, the trained neural network was used to pass alignment instructions to a human operator. The human operator executed the instructions and then passed a vector of inputs back to the neural network. The neural network then generated another instruction. This experiment is not a neuralnetworkcontrolled alignment of the spatial filter. Neuralnetworkcontrolled alignments of the spatial filter, with actuatordriven degrees of freedom, are, of course, the real goal of the overall research and development effort. Neuralnetwork.directed alignments of a spatial filter.Two kinds of experiments were conducted: neuralnetworkdirected alignments of actual spatial filters and neuralnetworkcontrolled alignments of a model spatial filter. The configuration of the neural network and of its parameters were determined by the designer who monitored its learning progress. Neuralnetworkdirected alignments were performed on the same
setups
used
to create
the alignment
records
for
training. The same person who created the training sets also executed these alignments as directed by the system of neural networks. In preparation for the experiment, the spatial filter was aligned carefully. Then, the focus was backed off typically about 700 to 900 grn, and X and Ywere randomly turned. The misadjustments of X and Ywere not so large as to make the beam invisible. The beam characteristics and previous history were specified, measured, and used to create input vectors as described in the Training set design for optical alignment section. The input vector was then relayed to the computer operator who then entered the input vector. The AT computer contained the neural network coprocessor (ref. 6) and its software (refs. 5 and 6). Generally, the system of neural networks responded immediately with an output vector. Occasionally, an input vector could not be classified at the same level of vigilance as was used to prepare the training sets. Then it was necessary to reduce the vigilance slightly to force classification and routing of the input vector to a mapper. The output instructions were then relayed to the person performing the alignment as follows: (1) (2) (3) (4)
The control to be operated (Z (focus), X, Y, or none) If control = X, the new x position If control = Y, the new y position If control = Z (focus), the new distance from the center
The control to be operated was selected by rounding off the first three components of the output vector. The largest value >__0.5was rounded up to 1.0. A tie was settled by rounding Z first and X second. All other values were equated to 0.0. The predicted brightness was recorded for comparison with the actual brightness. The predicted brightness was not used by the person performing the alignment. The focus adjustment sometimes required a modification of these output instructions. This modification decreased the value of the experiment because it required interpretation after the training was completed. When Z (focus) was adjusted, the beam could not always be forced to the distance indicated by the output vector. Hence, action on step (4) of the procedure was modified as follows: (4a) If possible, the predicted distance was (4b) If the brightness decreased too much, which the beam was barely visible was set. (4c) If the brightness increased without increase in distance, the distance at which the brightening
set. the distance
at
the predicted beam stopped
was set.
The person performing the alignment executed the instructions and then measured a new input vector. Sometimes, photographs were recorded of the card reflected beam. This
17
procedure wascontinued untilthesystem ofneuralnetworks Results and Discussion issueda "halt" or "alldone"outputvectordefinedby (0,0,0)in thefirstthreecomponents. Theperson executing theinstructions designated thebeam Comparison of Training the BackPropagationTrained Network and the CounterPropagation Network asregionA or B and measured the position of the beam bright spot relative to crosshairs. The systems of neural networks also were used to align a model spatial filter as discussed next. Neuralnetworkdirected alignments of a model spatial filter.Some versions of the software contained an option to complete an alignment from an initial input vector. This option was used to test the selfconsistency of the system of networks; that is, it determined whether a system of networks would proceed to an aligned state on the basis of only a string of its own input and output vectors. In contrast, the alignment of an actual spatial filter, as discussed in the previous section, provided continuous corrections through remeasurements of the input vector. A second application of alignment completion was used to provide rapid visualization of an alignment with the procedures discussed in appendix B. For the selfconsistency test, the output vector, the input vecior, and a very coarse model were used to create input vector. The following procedure was used:
a new
(1) The first three components of the new input vector were created from the first three components of the output vector by rounding off as defined in the previous section. (2) IfX was selected, the new X value and the old Y value were selected for the next two components. (3) If Y was selected, the old X value and the new Yvalue were selected for the next two components. (4) If Z (focus) was selected, then the old X and Yvalues were multiplied by the new distance from the center divided by the old distance from the center, and the results were used as the next two components. (5) The beam pattern, given by the next component, was determined from the brightness. Region B was defined to occur with an output brightness greater than or equal to 3.39.
ART2 was used to preclassify the argon ion training set into 12 classes. These classes were then taught to both the backpropagationtrained network (BPN) and the counterpropagation network (CPN). The entire training set was rendered dimensionless, and the individual classes were rescaled prior to training the mappers. Rescaling places elements of the vectors in the range (0.9, 0.9). The two architectures were equally effective at learning these training sets as can be seen by examining table III. Table III tabulates mean square errors for each class for BPN and CPN. The mean square error is measured after training by performing one more pass through the training set. Table IV lists the types of input vectors characterizing the different classes of table III. The largest mean square errors appear for the larger classes where X or Ywas the control last operated. Although the input vectors in a class are characterized by the same values of the first three components and of the sixth component, brightness and positions can vary substantially. The output vectors can vary in the control to be operated, the brightness, and the position. Mappers are limited by a statistically best performance. The fact that two completely different forms of mappers achieve the same performance indicates that this limit has been reached. BPN and CPN learned the class training sets equally well; they can be compared according to other criteria. Both networks were trained with 12 000 iterations (passes through the
TABLE CPN:
Class
(9) The process nents of the output
was sent to the system
was halted when the first three rounded off to (0, 0, 0).
of
of netcompo
OF BPN
WITH
RESCALED SETS
Mean square
error
BPN
records'
In step (4) there is a danger of division by zero. However, the original training set and the model training set involved sufficient errors in zeroing that division by zero never occurred.
BPN
1 2
50 50
0.006006
0.004297
.079269
.072757
3
51
.120310
.123502
4
49
5
40
.008547 .062840
.002877 .084559
6
39
.009437
.000000
7
41
.098409
.090904
8
3
.0000_
.000000
9
5
.000001
.000000
10
5
.082420
.081007
11
6
.000002
.000000
12
2
.000000
aEach vector.
18
Number training
(6) The brightness in the new input vector was taken, of course, from the last component of the output vector. (7) The new input vector was used to plot the new beam position and profile. (8) The new input vector works for another iteration.
III.  COMPARISON
DIMENSIONLESS AND ARGONION TRAINING
training
rccord
consists
of
one
input
.000000 and
onc
output
TABLE
TABLE AND
Ill:
CLASSES
Number
FROM
DIMENSIONLESS
RESCALED TRAINING
Class
of
Iraining records a
ARGONION SETS Mean
square
e_or
BPN
BPN
NONE
A
i
50
2
50
Y
A
3
51
4
49
X Z
A A
5
40
X
A
6
39
Z
B
7
41
Y
A
8
3
2
B
9
5
X
B
I0
5
Z
B
11
6
Y
B
12
2
Z
B
• Each
training
lems and indicates work.
IV.  CHARACTERIZATION
OF TRAINING
record
consisls
of one
inpul
aad one
Network
direction
for future
The system tested was used to generate tables III and IV. Figure 11 is a photographic record of the alignment test (ref. 2) that was recorded with the version of the system that used CPN as a mapper. A neural network system consists of a preclassifier and about 10 mappers. CPN was used with the interpolation technique discussed in the description of the
do not require these adjustments. Table V lists the neural net alignment in training set form. There are three differences between the form of this table and
outpul
training sets). The training of CPN was faster, but it required more frequent adjustments of parameters. The CPNparameters (Kohonen learning rate, a parameter called the bias multiplier, and the Grossberg learning rate) (ref. 6) were adjusted six times during training. The BPN parameters were adjusted three times. CPN must be large for large training sets, whereas the size of BPN does not change. All the BPN nets had 7 input nodes, 14 nodes in one hidden layer, and 5 output nodes. The CPN network used to learn the third class contained 2548 bytes. The total sizes of the network files for the third class contained 1744 bytes for BPN and 4260 bytes for CPN. One advantage of CPN is its large number of Kohonen nodes: accuracy increases as the number of active Kohonen nodes increases. CPN requires somewhat arbitrary choices of the number of Kohonen nodes to use for interpolation and of the value of the interpolation exponent r. The relationship between accuracy and the number and sizes of hidden layers in BPN is difficult to discern. Tests With Neural
the required
CPN earlier. Up to six Kohonen nodes were allowed to participate in determining the output where the interpolation exponent was r = 2. These values were chosen by trial and error tests on the original training set. BPN mappers, which also successfully directed the alignment of the spatial filter,
vcctor.
Alignment
clearly
Systems
The test of a trained system of neural networks was whether that system could direct the alignment of a laserbeamsmoothing spatial filter. The results of such tests are discussed in the following section. The objective for this work is to automate the alignment and operation of optical measurement systems in inaccessible aerospace environments. The only acceptable test, therefore, is to demonstrate alignments of optical components. The following alignment test is representative; it shows some prob
the form of the typical training set. First, the beam pattern is listed as "A" or "B" for region A or B rather than as "0" or "1." Second, the brightness in the output vector generally differs from the brightness in the subsequent input vector. The output brightness is predicted, and the input brightness is measured. Third, the distance predicted for an adjustment of Z(focus) may not equal the distance achieved because of limitations in the apparatus as discussed in an earlier section. Table V can be compared with figure 11, but the following comments must be kept in mind. Figure 11 contains an imaging reversal: right is interchanged with left and top is interchanged with bottom. In addition, saturation makes it difficult to show the bright spot; therefore, the beam may appear to be off center when the bright spot is actually centered. The center of an image must be overexposed to bring out the ring pattern. The region A versus region B judgment was made by the human operator actually executing the alignment instructions. The same human operator recorded the training sets as stated previously. The person operating the computer and relaying the instructions was in a different room. Figure 11 is to be read in television raster fashion from left to right and top to bottom. The first three frames of figure 11, representing the first three lines in table V, clearly are region A patterns. Multiple diffraction rings are visible in all three frames. The broken appearances of the diffraction rings may be caused by spatial variations in the sensitivity of the film used to record the photographs. They also could be anomalous diffraction effects from dirt or pinhole imperfections, because the simple theory applies to a perfectly circular, undamaged, clean pinhole. The operator classified the next two frames as region A frames also. However, the photograph shows them to be region B frames. Nevertheless, the system of neural networks could order essentially correct moves in spite of the
19
Figure 11 .Photographic
record of neuralnetdirected
alignment of spatial filter.
TABLE V. NEURALNETWORKDIRECTED ALIGNMENT IN TRAINING RECORD FORM FOR ARGONION TRAINING SET AND CPN MAPPERS Output
Input Control
Position
(Z, X, Y)
0 0 0 I 0 1 0
pattern
error.
earlier
training
yet
it could
system,
A system
would
and oscillate
states
in between. of a pair
state
earlier
Treating
rather
than
There
points
A A A A A B B
0.744 .602 1.415 3.415 3.613 3.602 3.633
trained
cases.
input
A, region to be made
two some
vectors. and
state.
a
caused
Hence,
information,
to an we
leading
from
by
rather
the
preceding
dis
the main
20
point
of this whole
report:
neural
second nets
is really
are intended
They
are
academic
of physical
and the only
for
example,
and frame
actuators.
way
would on
operator.
The
theories
craftsman
need
must
to avoid
this defect A complete
of a machine
to make
be regarded is to acquire system
for
system
a chargecoupled
device
(CCD)
togetherwith
eiectromechanical
system
would
without
operator's
The
vision
grabber
vectors
to learn
system.
consist
A compl&e
inputoutput
adapted
knowledge.
set for a complete
camera
record
substantial
only
a training
set of
intervention
contribution
would
by the be skilled
example. successful
was
The
than
based,
sions
process.
0.560 !.338 1.810 3.504 3.585 3.556 3.574
example.
filter
smoothing
the whole
0.0 .0 16.0 .0 .0 .0 .0
the spatial
cussion. The first is that a serial alignment process appears to be robust in the sense that occasional errors and bad decido not destroy
Brighthess
on the basis
lncidently,
B theory.
Position (x. y, or distance)
0 1 0 0 0 0 0
a training
One
back
to learn
I 0 0 1 0 1 0
as a defect,
was
the other
as identical
alignment
0 0 1 0 1 0 0
judgments
That
state
Control (Z, X, 10
ship
with
second
to an aligned
an
between
them,
patterns the
with
information,
A pattern
1 bit of pattern
to the region are two
14.0 14.0 .0 .0 .0 .0 .0
the
identical
these
20.0 .0 .0 6.0 .0 .5 .0
in many
that
to direct
to incorporate
eventually
out had
ness
between
a region
of networks
state
decided
that
(region)
get stuck
and Iorth
produced
y
pattern
alignments
Bright
x
networks
incorporate
It turned
B pattern.
the system
0 0 I 0 0 0 0
occasionally
back
of states
of this pair
region
not
complete
however,
states
one
of neural
set did
direct
0 1 0 0 1 0 !
Pattern
Table
same
a model
trained
at directing spatial
point
as with
neural
network
alignment
also
of the
was
fairly
laserbeam
filter.
VI contains
the mapper.
the
an alignment
The
alignment
CPN
sequence with
and followed
BPN nearly
in which started
from
the same
BPN the path.
TABLE VI. NEURALNETWORKDIRECTED ALIGNMENTIN TRAINING RECORDFORM FOR ARGONION TRAINING SET AND BPN MAPPERS Output
Input Control (Z,X, Y) 0 0 0 0 0 1 0
0 1 0 0 l 0 1
0 0 1 1 0 0 0
Position y
x
20.0 .0 .0 6.0 .0 .5 .0
14.0 14.0 .0 .0 .0 .0 .0
Pattern Bright(region) ness A A A A A B B
0.744 .602 1.415 3.415 3.613 3.602 3.633
Control (Z,X, Y) 0 0 1 0 I 0 0
1 0 0 1 0 1 0
0 1 0 0 0 0 0
Position (x,y, or distance)
BrightBess
0.0 .0 16.0 .0 .0 .0 .0
0.442 1.354 1.892 3.584 3.594 3.570 3.570
The result in the previous section was that BPN and CPN learned the 12 sets of training vectors equally well. The most noteworthy observation is that line 4 of table IV contains an experimental error. In this line, Ywas entered as the last control operated rather than Z, as indicated by the previous output vector. The BPN network, which was erroneously consulted, still directed the correct move. The argonion trained network was trained to precisely zero x or y whenever one or the other occurred with a nonzero value. The helium
intervention. Any procedure that programs a network with weights learned in a laboratory is a weak procedure. The neuralnetworkdirected alignments of the spatial filter described herein do not meet this standard. The training sets were designed to be recorded by the alignment expert, and they incorporated the human expert's interpretation of the beam pattern. The alignment tests also required human interpretation and human translation of the output vectors into mechanical actions. The discussion of tables V andVI
neon training set is different. That training set was originally constructed by measuring errors in the mechanical drives for each alignment step. Crosshairs were not used to zero the beam coordinates.
shows how imperfect interpretations can be. Nevertheless, the neuralnetworkdirected alignments produced some important conclusions and some motivations for additional work. The approach mandated in the first paragraph of this section is one conclusion. Another conclusion is that neuralnetworkcontrolled sequencers are very robust in the sense that they tolerate mistakes. The alignment path has an excellent chance of recovering from an erroneous move and proceeding to an aligned state. This single property is a good reason for continuing the development of neural networks for alignment. This work also points out the importance of adaptive systems. The longrange goal for such systems is the control of adaptive optics. A more realistic nearterm project is to develop and test a system to correct for misalignment induced by the environment. Vibrationally induced misalignment is an example. This project is consistent with the program objective of automating alignments in remote, harsh, and dangerous environments and can probably be demonstrated with conlmercially available equipment. The objective is quite different from the spatial filter alignment discussed in this report. The alignment of the spatial filter requires proceeding in steps from a misaligned state to an aligned state. The adaptive system would need to learn, detect, and correct for transitions from aligned to misaligned states.
Figures 7 and 9 were constructed from this original training set. Later, the beam coordinates were recovered by making a second pass through the training set. The precision of that procedure was limited, because of the nonlinear effects in the mechanical drives. The heliumneon training set gives results closer to human alignment, yet it is less efficient. Another point is that CPN learned the heliumneon training set better than BPN. With CPN, we can choose any number of Kohonen nodes, up to the maximum. In effect, we can have one node for each training entry. Generally, we n_ed Kohonen nodes equal to onethird or more of the training entries.
Concluding An important
Remarks conclusion
drawn from this work is that a
neuralnetworkcontrolled alignment process should be trained and tested in its entirety. Ideally, the environment for training and testing will be the environment for the final application. Training and testing should be nonverbal. The alignment expert should view the light pattern on a monitor attached to the machine vision system used by the network. The alignment expert should perform alignment actions via the actuators used by the network, and the training set should be recorded automatically. Then the trained system should be tested by how well it completes alignments without human
The adaptive resonance theory (ART) discussed in this report is one system for using unsupervised learning to detect and classify new states of misalignment. Learning to respond correctly to these new states of misalignment is the essence of adaptation. This type of learning is one step removed
21
fromthetrainingalgorithms already discussed. Essentially, thesystem of neuralnetworks mustlearnhowtolearnthe correct response tothemisalignment. Thetextbook approach istohaveaseparate network thatcontains andimplements a modelof learning.A primitiveapproach mightconsistof interrupting theexperiment whenanewstateisdetected and classified. (Asdiscussed inthisreport, ART2doesthisautomaticallyviathevigilanceparameter.) A human operator
One conclusion
is that CPN is the network
small data sets.
A more
would
existing technologies is always a factor in developing new technologies. Neural networks should have a significant role in processing_opticai data. A nonlinear network has the ability to reconstitute a compressed data set. In effect, feedforward networks have unlimited internal degrees of freedom that can store and resupply missing data, provided that the data can be reconstituted by some definite rule. The neural network, of course, learns the rule implicitly, transparently, and by example. It will simply classify the inputs in the Bayesian sense, at best, if it cannot discover a rule. Neural networks offer an efficient way to study data compression. The payoff can be enormous in the aerospace field. A current system uses holograms that must be recorded through windows. The hologram must be processed; then the information is measured comparatively slowly from 29 different views. Adata compression to three views, followed by reconstitution, would allow holography to be replaced by highspeed electronic interferometry. The retention of all the views, but with a few measurements per view, would permit the use of fiberoptic interferometry, thereby eliminating the need for windows and solving the optical access problem in the aerospace field.
then teach the networks
a correct
response,
and the
experiment (actually the network development) would continue. As mentioned, some people_consider the creation of full adaptation to be the most important area for research. The use of neural networks to compensate for environmentally caused misalignment in a component used in the field would be an ideal demonstration project. This is a very appropriate point to recite some conclusions about the current resources for implementing neural networks. The current software and hardware are slow, and they have comparatively small memories (in relation to their human operator counterparts). The project suggested in the previous paragraph might require or benefit from some customized, frontend hardware for rapid acquisition and classification of misalignment states. It is a mistake, however, to claim that present neural network demonstrations are not real because they use digital computers. The neural network architectures are quite real, but the d!gital computer cannot take advantage of the ability of groups of identical neurons to be updated at the same time. The digital computer essentially updates these neurons one at a time. The conceptual difficulty is easily eliminated by recognizing that true neural networks also update neurons one at a time, if a short enough time interval is selected. Certainly, neural network applications will benefit from the development of parallel hardware. The comparison of BPN and CPN suggests some interest
important
conclusion
of choice
for
is that table
lookup may be a significant, superior alternative to neural networks for a long time. Large memories are becoming inexpensive and algorithms such as CPN are available for learning and organizing tables. The role of table lookup in mapping and directing the alignment of spatial filters was discussed with figures 7 and 9. Effective competition from
about resources and technologies. CPN is to table lookup when the number of or neurons equals the number of entries. The performs weighted interpolation. CPN peras BPN for the relatively small training sets
Finally, a fully automated spatial filter alignment must be demonstrated. Neural networks are indeed the expert systems of craftsmanship. They learn by example. Only the complete system can avoid the need for the verbal intercourse that spoils this example learning. A person with an optical systems background will feel very comfortable with neural networks. It works very well for patternbased processes, is well constructed for research in adaptive optics, and has
that were used for the experiments. The advantage of CPN is that the number of weighted connections increases in proportion to the size of the training set, whereas the number of weighted connections, in BPN
potential for processing optical data. The development of supporting technologies and competition from existing technologies will set the timetable for applications of neural networks.
increases in proportion to a power of the size of the input vector. This effect did not create a problem for the sevenelement input vectors used for the spatial filter alignments. It certainly will create a problem for input vectors containing tens of thousands of elements, as discussed in connection
National Aeronautics and Space Administration Lewis Research Center
with the use of neural
Cleveland,
ing conclusions fully equivalent Kohonen nodes Grossberg layer formed as well
22
networks
for processing
optical
data.
Ohio 44135,
May 1993
Appendix A Theory and Models BeamSmoothing Spatial Filter
for
The beam waist then has a 1/e 2 radius given by
w
:
_
V
(A1)
The derivation of the Fresnel diffraction theory (ref. 22) of the beamsmoothing spatial filter is straightforward, but the theory is hard to use for calculations. This difficulty should
The beam power
not be surprising because the patterns observed in the step or two before alignment is achieved are complex. In this appendix, the diffraction theory is used to generate easytohandle approximate models of the alignment of a spatial filter. This appendix has three sections: the general diffraction theory of the spatial filter, a model of the alignment process when the filter is out of focus, and a model of the alignment process when the filter is focused or nearly focused.
We obtain a formal expression for the beam pattern by mathematically propagating the beam to the pinhole, multiplying the pattern by the pinhole aperture function, and then mathematically propagating the apertured beam to the viewing plane at distance Z. Constant phase factors are not retained because they cancel in evaluating the intensity. The field at the pinhole plane z = 0, minus constant phase factors, is given by
Fresnel
Diffraction
KW s
is called P.
Theory u(p)
This discussion (fig. 12) assumes the use of a gaussian laser beam (ref. 23) that is coaxial with the optical axis of the spatial filter assembly. A thin lens of focal length freplaces the microscope objective commonly used to focus the laser beam, and aberrations are ignored. The pinhole, or spatial filter, is assumed to be in the xy plane at z = 0. The center of
17 e_ap2 w
=
where
_1 12
A =
the circular pinhole is assumed to be misaligned with (x,y) coordinates 8x and By, and the center of the beam waist is assumed to be misfocused with the zcoordinate 5z. The result of a successful alignment is to zero or nearly zero these
.,'=
w
coordinates. The remaining parameters are the pinhole diameter D, the distance (zcoordinate) Z at which the beam pattern is observed or measured, and the laserbeam characteristics.
aperture, the beam waist appears at the focus at distance f from the lens, and the vignetting effect of the lens is ignored.
BEAM
, ]
/
'i_
p2
PINHOLE
OF
BEAM
WAIST
_ I
b,gl fi.
/
A
MICROSCOPE OBJECTIVE
FILTERED BEAM ON CARD
_z MIS_CUS
_
l+t, aw2)
=
(A3)
(A4)
(A5)
x 2 + y2
COORDINATES BRIGHT /
.,®
I
(A6)
OF SPOT
, !v %'_LI
v "1
I .................................
_
_
jR ,;tR
L
The laserbeam characteristics are defined as follows. The laser beam has a wavelength _, and a 1/e 2 radius ws at the lens
LASER
(A2)
OF
_
PINHOLE Z
DISTANCE
OF CARD
I_ [ y_
CROSSHAIRS
WHITE
Figure12.Simplified diagram of spatial filtersetup.
23
This field (eq. (A2))
is multiplied
by a circular
aperture
function C(x  8x, y  By) that is centered on the pinhole and is unity inside the pinhole and zero outside the pinhole. The Fraunhofer diffraction integral is then used to represent the field at the observation plane. Capital letters X, Y, Z represent the coordinates of the observation plane. The scalar field in the observation plane is then given by
U(X)
=
w 1
:2P
=
17 "LZ 1 _2D w" a

** X
C(p
2
2
e (
/A)F'2
i
n
Io(BF')JI(nF'D
(A9)
) dF"
0
t n4 x
e(n2/A)F
where
joA:
1
U(X)
F2
4n2(5 2
j
8_ 3
d.F
8)eJ(2n/Xz)(xP)dP
where
p
= (x,y)
(A7)
a = (G,as)
The symbol In represents a modified Bessel function where lo(x) = Jo(jx). The exponential preceding the integral sign represents the beam profile in the absence of the filter. The integral itself is interpreted as a beamprofile, beamposition modification factor; it represents the main effect of filter misalignment. The integral
x
= (x,v)
in equation
(A9) can be evaluated
formally,
and the result is given by
Equation (A7) is evaluated in part by applying the shift and convolution theorems of Fourier transformations (ref. 22).
The result is given by
U(X)
= t2P
1 w'
1 XZ
x.D2 4
e_(n2/A)F2_o
1 _.t =
U(X)
=_
[2._P 1 7r w'
1 2.Z
n'D f eJ 2xF''6 2AJ
m
(AIO)
e (: /A)(v F') dr
Jl (ffF'D) ×
(n2ml _4n2)
"
(as) where
P
where
Cm I 12/
X F
=

),2
= Ill
= 1 + E
i!( i + l)!(m
I)! _._)
(All)
i=1
The symbol Jn represents a Bessel function of the first kind of order n. Integrals are evaluated over the entire domain of the variable of integration. Equation (A8) can be evaluated formally with the assistance of a table of integrals (ref. 24). The first step is to integrate with respect
24
to the polar angle.
The result
is given by
Equations (A1) to (All) represent the diffraction theory of the process of aligning a spatial filter. It is more convenient in this report to use approximate models based on the theory. The model discussed next represents the alignment process when the filter is well out of focus.
Region
A Model
 Filter Out of Focus
Most alignment steps occur when the spatial filter is substantially out of focus. For example, for typical values, f= 8,300 ;t = 0.5145
_un
r
ban
ws = 1,750 lain the beam 1/e 2 diameter will exceed a typical pinhole diameter of 10 lam when _ = 20 lam. Because the misfocus can be as large as 1000 gm at the start of an alignment process, the beam diameter can exceed the pinhole diameter over 98 percent of the focus adjustment range. A model of this region can be generated with equation (A7). First the equation is transformed to the center of the pinhole, and then the remaining quadratic exponential under the integral is assumed to deviate negligibly from 1.0. The resulting integral is easily evaluated to yield the following result:
&, = z"z gz
(A14)
This center is exactly that determined from geometrical optics by drawing a line from the center of the waist, through the center of the pinhole, to the observation plane. Equations (A13) and (A14) define the region A model for a beam that is out of focus. It is equally easy to produce a model when the beam is nearly focused. That model is discussed next. Region
B ModelFilter
Focused
or Nearly
Focused
The model is developed directly from equation (A9). As stated, the exponential in front of the integral is the beam profile with the pinhole removed, and the integral can be thought of as a shape modification factor. In region B, I0 is assumed to vary slowly in comparison with Jl. It is removed from the integral with an argument evaluated at the first maximum of Ji, which occurs at 7tDF' = 1.8
U(X)
=
__
1 w'
1 AZ
2D
e_A82 The integral
remaining
overall
result
is given by the equation
is approximately
U(X)
=
DA/4n,
and the
(A12) The model is especially simple if the imaginary terms in the argument of the Bessel function and in the denominator of equation (A12) are ignored. The result is given by
1 w'
1 $Z
3zD2 4
" _(rt2/A)F2[ o_(1.8B'_ zD )
(A15) ( ztDl_Z U(X)
=
w' ,_Z
7rD2 4
e At52 2JIL __zD
X]
(A13) Choosing R = 8z produces an especially simple result. center of the diffraction ring pattern is then located ;it
The
The I0 in equation (A15) constitutes a complex shape function and also permits the intensity maximum to occur off axis as is sometimes observed in region B. Equation (A15) is the region B model. The models, like the diffraction theory, are approximate. Criteria for switching from region A to region B are somewhat arbitrary. Nevertheless, neural networks trained with these models perform well.
25
Appendix Filter
B
Visualization
of Spatial
Alignment
and plot such an array from a sevenelement input vector of the kind discussed in the Development of Training Sets section. Inputs
A person who aligns a spatial filter looks at and is guided by the diffuse reflection of the filtered and diverging laser beam. The subjective appearance of that beam depends on the beam shape and beam parameters discussed in appendix A. The appearance also depends on the beam power, the properties of the diffuser, room lighting, and nonideal effects such as escaping reflections that originate inside the filter assembly. Similar comments apply to any other visualization method such as a camera and a monitor. Definite, although occasionally erroneous, decisions must be made on the basis of such subjective visualizations. This alignment example requires that the observer (human or mechanical) decide on a value of beam position, on whether the beam pattern resembles the region A model or region B model, and on some estimate of brightness. Even the choice of the position of the diffuser is a somewhat arbitrary decision. The interpretation of visual information is a good subject for neural networks. The work reported in this report used visualization for demonstrations only; standardization was used to reduce arbitrariness.
to the function
were
(1) a pointer
to the seven
element input array, (2) a pointer to the array of pixels, (3) the 1/e'radius ws of the laser beam at the lens, (4) the wavelength 3. of the laser beam, (5) an estimate of the maximum logarithm of intensity encountered during the alignment, (6) an estimate of the minimum logarithm of intensity encountered during the alignment, (7) the distance Z to the diffuser, (8) the diameter D of the pinhole, (9) the focal length fof the lens, and (10) a flag indicating whether the array of pixels was to be supplied externally or calculated internally. Element 5 of the input vector (in Clanguage, the first element is indexed as zero) decided the form of the internal calculation if one was called for. Equation (A13) in appendix A was adopted if element 5 = 0, indicating a region A pattern of rings. Positions on the diffuser were expressed in units of Z, and the 1/e 2 radius ws was made dimensionless in units of]', as in wf The pinhole wavelength
diameter ;_, as in
=
w.2s
f
D was made dimensionless
(B1) in units of
Standardization The standardization (1) The beam
is defined
pattern
dx
as follows:
shall be presented
in a 32 by 32
pixel format. (2) The unfiltered beam shall be centered in the 32 by 32pixel window when the spatial filter (pinhole) is removed. (3) The overall magnification of the beam recording system shall be set such that the 32 by 32pixel window is 4w' on a side, where w' is the 1/e 2 radius of the unfiltered laser beam. (4) The value of a pixel shall be based on the logarithm of irradiance (intensity) or a corresponding photometric quantity. The radius w" in item (3) can be estimated by replacing 8z in equation (A4) with Z, which is the distance from the aligned Example
pinholeto
the diffuser.
of Visualization
Process
There are many ways to display the beam pattern in a manner consistent with the standard described in the previous section. The following method was used to display a schematic representation of the filtered laser beam during neuralnetworkdirected alignments of the spatial filter. A Clanguage function was created to be used with EGA graphics software. The function was designed either to plot an externally supplied 32 by 32pixel array or to compute
26
=
D _.
(B2)
The logarithm of intensity at the center of the pattern was estimated as element 6. In the experiments, element 6 was computed from an average intensity measured over a fairly large detector size. The main effect of ignoring that fact is equivalent to adding the same offset to all region A values. The Bessel function was computed with a commercially available software package. The 32by32 pixels were stepped out horizontally and vertically from 2wf to 2w/in the normalized x and y position variables. A much more complex region B pattern must be calculated when element 5 = 1. An already simplified expression for this pattern is given by equation (A15) in appendix A. This expression was simplified further to provide a convenient expression for visualization. However, the dimensionless groupings defined by equations (B1) to (B4) are applicable to the full region A and region B models. These groupings also suggest the training of neural networks using dimensionless inputs. A dimensionless position and a dimensionless xy misalignment are defined by the equations
X_ and
=
X Z
(B3)
6 = D
6o
(B4)
Equation (A15) is replaced by a proportionality with the first two terms of a smallargument approximation of the Bessel function substituted. That proportionality is given by
I )[ where the assumptions
_
1 A
w"
4
._
w 2
w
The proportionality expressed in the dimensionless ables of equations (B1) to (B4) is given by
U(Xz)
_
exp
1(Xz) v"
to the
v (Xz)
The position of the beam bright spot (elements 3 and 4) and the logarithm of intensity (element 6) are combined with equations (B6) and (B7) to derive the components of 8. These components of 8 are substituted in equation (B6), and the array of pixels is calculated as for region A. The array of pixels was represented on an EGA monitor by associating a 4by5 array of screen pixels with each of the calculated pixels. The four vertical pixels and the five horizontal pixels correspond to the aspect ratio of the screen. The actual emulation of the beam on the monitor is somewhat
near focus are
n
The intensity for both regions A and B is proportional field times its complex conjugate:
vari
arbitrarythe color (red, green, blue, or yellow) is chosen from the wavelength. The logarithm of intensity at the beam bright spot ranges between the minimum and maximum values supplied to the Cfunction; however, an actual camera would have a variable iris whose setting might range from fully opened at the minimum brightness to nearly closed at the maximum brightness. The iris effect was inserted as a brightnessdependent offset to the logarithm of the intensity at the bright spot. The brightness of a calculated pixel was represented by the number of screen pixels illuminated in the 4by5 representation of the calculated pixel. The iris correction and the order of illumination of the screen pixels were adjusted by trial and error in an effort to create the visual effect of an actual alignment of the spatial filter. Final Comments
I (B6)
on Visualization
The Cfunction just described takes many seconds to compute and display a beam profile, in spite of the simplifications, standardization, and artistic license mentioned. The fact that a beam profile can be represented with dimensionless variables is the most important finding of appendix B. A neural network trained with dimensionless inputs is intended to work for any spatial filter and laser beam combination.
27
Appendix Lengths, millimeters matrices
C
Symbols
o
including wavelength, are normally expressed in and occasionally in micrometers. Vectors and
are denoted by bold faced type.
another symbol
P P(a/i)
for output vector
probability
density, or beam power
conditional
probability
or probability
density
(eq. (9)) A
area of iris aperture, or coefficient in sigmoid function (eq. (15)), or complex coefficient in gaussian
exponent
(eq. (A3)),
radius
R
of curvature
of gaussian
beam
wave
front (eq. (AS))
or in reference 2
to region A model a
input vector
B
coefficient B model
b
R
(eq. (2)) (eq. (A9)), or in reference
to region
output vector (eq. (2))
b,
training
vector
c( )
circular
aperture
D
diameter
d_.
DI_.
E
mean square
ei
weight
F
Si
function
of pinhole
(spatial
s filter)
error (eq. (3))
t
node i when CPN is used
Euclidean
distance
between
vector
and grid
U
weighted
input at node
total weighted training
input at node i
vector
time, or iteration vector
index
of Grossberg
weights
in CPN
with interpolation
U( )
scalar field
(XIkZ, Y/LZ)
u
mean value (eq. (8))
u()
scalar field at pinhole
W
linear mapping or transformation (eq. (1)) or matrix of neural network (eq. (4))
w
gaussian
beam waist (eq. (A1))
W"
gaussian
beam
w/
ws//
wij
weight atnode i forsignalfrom node j
hypergeometric
[
effective
f_)
mapping function (eq. (2))
I
another expression
function
focal length of microscope
I()
irradiance
6
output of node ]
objective
for input vector
or intensity
function
maximum measured
irradiance in unfiltered at white card
/o()
modified
Bessel
J.( )
Bessel
N
number
28
si T
for Kohonen
exponent used in formula for calculating weight factors when CPN is used in interpolation mode
point i
(eq. (7))
V(a, b; c; x)
/max
r
training record (I, T) where I is an input vector and T is a training vector
beam
(eq. (A2)) matrix weights
1/e 2 radius (eq. (A4))
function
as
ws
l/e 2 radius of laser beam at microscope objective
function
of the first kind and order n
of training
records
X
Cartesian xposition
xcoordinate control
of beam bright spot, or
on spatial
filter assembly
5
(ax,By)
5o
8/1)
5i
error used in backpropagation (eq. (10))
X
Xz
(x/z, Y/z)
x
xposition
Y
in pinhole plane
Cartesian ycoordinate of beam bright spot, or
algorithm
x displacement
of pinhole
from optical
axis
y displacement
of pinhole
from optical
axis
yposition
control on spatial filter assembly
Y
yposition
in pinhole plane
Z
Cartesian zcoordinate, or distance from pinhole (spatial filter) to white card, or focus control
wavelength
Kohonen
correlation coefficient (eq. (8)), or vigilance parameter used by ART2, or radial distance in plane of pinhole (eqs. (A2) and (A6))
focus error
Z
learning
outputs
rate (eq. (10))
smoothing coefficient algorithm 7
in CPN
of laser beam
momentum algorithm)
coefficient; replaces momentum in one version of backpropagation
coefficient
P
(x, y)
(Y
standard
deviation
II
absolute
value or magnitude
<>
statistical
(eq. (8)) of enclosed
terms
value of enclosed
terms
(backpropagation expectation
29
14. Padgett, M.L.; and Roppel, T.A.: WNN91 Perspectives on Neural Networks. Proceedings of the Second Workshop on Neural Networks:
References
Academic/Industrial/NASA/DefenseWNNAIND91, 1. Maren,
A.J.:
A Logical
Topology
of Neural
the Second Workshop on Neural NASA/DefenseWNNAIND91, Vol.
1515,
Society
San Diego, 2. Decker, A.J.:
for
Networks: M.L.
Computer
CA, 1991, pp. 2936. Self Aligning Optical
Networks.
Simulation
Systems.
Y.H,:
Adaptive
Pattern
7. Fuzzy
and
ing: Opt.,
Process
tists
11000
Cedar
Avenue,
19. Manausa,
Cleve
Logic
by HechtNilsen,
Comparator
Orlando,
8. Seasboltz,
Inc.
5501
Oberlin
Drive,
San
vended
by Micro
Oberle,
L.G.;
and Weikle,
FringeType Laser Anemometers Testing. AIAA Paper 841459, 1984). 9. Rumelhart,
Devices.
5695
B Beggs
D.E.;
Representations
Hinton,
G.E.;
by Error
D.H.:
for Turbine 1984 (Also
and Williams,
Propagation.
Optimization
Engine NASA
R.J.:
Report
of
Component TM83658,
Learning
ICS8506,
no. 23, Dec. Pao,
Y.H,:
1, 1987,
Adaptive
AddisonWesley, 12. Carpenter, Stable
G.A.; Category
Reading, and
Recognition MA,
1989,
Grossberg,
Recognition
S.:
Codes
and
Neural
for Analog
SelfOrganization Input
Patterns.
30
Theory.
Holt,
Rinehart,
Winston,
New
York,
of Appl.
1976,
24. Gradshteyn, Products; Academic
Algebra 1954.
Tomography.
Wiley,
R.C.:
Chaos
Algorithm.
and the StepSize Proceedings
Dilemma
of the Second
Networks: Academic/IndustrialNASAl M.L. Padgett, ed., SPIE Vol. 1515,
Simulation
International,
An Introduction 1969,
pp.
to Statistical
San
Diego,
CA,
Communication
Society 1991, Theory.
5264.
Academic/lndustrial/NASA/DefenseWNNAIND91,
M.L. Padgett, ed., SPIE Vol. 1515, Society for Computer Simulation International, San Diego, CA, 1991, pp. 523527. 22. Goodman, J.W.: Introduction to Fourier Optics. McGrawHill,
York,
Opt., vol. 26, no. 23, Dec. 1, 1987, pp. 49194930. 13. Beaumont, R.A.; and Ball, R.W.: Introduction to Modern Matrix
Networks.
pp. 214217. ART2:
PsyMA,
1987.
of Computerized
Learning
New York, 1968. 23. Koechner, W.: Solid
pp. 49794984. Pattern
Cambridge,
in Neuronal Modeling: From SynCambridge, MA, 1989. An Introduction for Applied Scien
New York,
and Lacher,
New York,
Networks:
for
Cognitive Science, California University, San Diego, CA, 1985. 10. HechtNielson, R.: Counterpropagation Networks. Appl. Opt., vol. 26, II.
Press,
Process
Vol. 2:
21. White, D.; and Sofge, D.: NSF Workshop on Aerospace Applications of Neurocontrol. Proceedings of the Second Workshop on Neural
Internal
Institute
Wiley,
Distributed
of Cognition. MIT
Padgett,
International,
1986.
Computer
Wiley,
Simulation
Parallel
Models.
The Mathematics
M.E.;
pp. 153160. 20. Thomas, J.B.:
D.E.:
I.: Methods MIT Press, Vibrations:
Workshop on Neural DefenseWNNAIND91, for
FL 328102603.
R.G.;
Biological
and Engineers.
New York,
CA 921211718.
Road,
and
F.:
M.L.
for Computer
in the Microstructure
16. Koc h, C., and Segev, apses to Networks. 17. Moon, F.C.: Chaotic 18. Natterer,
Networks.
Society
1991, pp. 36. and Rumelhart,
Explorations
chological 1986.
Vol. 1: Foun
Neural
Vol. 1515,
in the BackProp
land, OH 44106. 6. ANZA PLUS vended Diego,
Distributed
of Cognition. 1986.
Recognition
AddisonWesley, Reading, MA, 1989. 5. N NET 210 vended by AI WARE, Inc.
Appl.
ed., SPIE
San Diego, CA, 15. McClelland, J.L.;
International,
Measurement
ing: Explorations in the Microstructure dations. MIT Press, Cambridge, MA,
of
Academic/Industrial/ Padgett, ed., SPIE
vol. 31, no. 22, Aug. 1, 1992, pp. 43394340. 3. Rumelhart, D.E.; and McClelland, J.L.: Parallel
4. Pao,
Proceedings
and
State
Laser
Engineering.
SpringerVerlag,
New
pp. 176178. I.S.; and Corrected Press,
Ryzhik, I.M2 and Enlarged
New York,
1980.
Table Edition
of Integrals, Series, and Prepared by A. Jeffrey.
REPORT
DOCUMENTATION
Form Approved OMB No 07040188
PAGE
Publicreportingburdenfor thiscollection of informationIs estimatedto averageI hour per .rasp°nEe'includingthetime for reviewing.instructions, .saa'_ch;c'_ existing c_atasources, gathedngandmaintaining the data needed,and completing andreviewingthe c_lecbonof tnf.... tK_ Sendc_mments " r=egard,ng th!sburdenept_a_teor a.nyo/hera=s jPj_e_eoftf_t_ collection Oflnlormation, h'cludlng suggestions for reducing thisburden,toWashingtonHeadquarters ;_ervlces.ulrec_ora_e or in]ort"[la[K)n uperall*,.)[_, dH'J _¢xF't./IL_. IL_ _ _,_ DavisHighway,Suite1204, Arlington, VA 222024302,andto the Officeof ManagementandBudget,PaperworkReductionProiect(07040188),Washington,DC 20503. 1. AGENCY USE ONLY (Leave blank)
2. REPORT DATE
3. REPORT TYPE AND DATES COVERED
Technical Paper
August 1993
5. FUNDING NUMBERS
4. TITLE AND SUBTITLE NeuralNetworkDirected LaserBeam
Alignment
Spatial
Filter
of Optical
Systems
Using
the
as an Example
WU5902111 6. AUTHOR(S) Arthur
J. Decker,
Michael
J. Krasowski,
7. PERFORMING ORGANIZATION
National Lewis
Aeronautics Research
Cleveland,
Administration E7524
441353191
and
D.C.
11. SUPPLEMENTARY
10.
AGENCY NAMES(S) AND ADDRESS(ES)
Aeronautics
Responsible
8. PERFORMING ORGANIZATION REPORT NUMBER
NAME(S) AND ADDRESS(ES)
and Space
9. SPONSORING/MONITORING
Washington,
E. Weiland
Center
Ohio
National
Kenneth
Space
SPONSORING/MONITORING AGENCY REPORTNUMBER
Administration NASA
205460001
TP3372
NOTES
person,
Arthur
J. Decker,
12L D4STRIBUTION/AVAILABILITY
(216)
4333639.
12b. DISTRIBUTION
STATEMENT
CODE
Unclassified  Unlimited Subject Category 35
13. ABSTRACT
(Maximum 200 word#)
This reportdescribes an effort atNASA Lewis ResearchCenter to use artificial neuralnetworks to automatethe alignment and control of optical measurement systems. Specifically, it addresses the use of commercially available neural network software and hardware to direct alignments of the common laserbeamsmoothing spatial filter. The report presents
a general
alignment networks.
Neural
network,
and
sequencers. can
learn
networks aerospace should
approach
functions
for designing
to neural network
sequencers
adaptively
be recorded,
tual judgments
This
training
of the
for
human
the
include This
work should
and also
induced
be executed,
of optical that and
combining these
the adaptive shows
to execute
operation shows
and use of
work
by example
environmentally
the alignment
environments.
used
learn
records
discusses
network. can
to correct
to automate
and
configurations
the counterpropagation These
alignment
networks
when testing
these
training resonance
that
The
measurement
should
networks be done
can
networks;
to teach types
be used of optical
longrange
objective
systems
in remote,
are trained
optical
of neural
to produce
procedures
by
in a manner
that
is to use
harsh,
a human does
robust
alignment
and
also
neural
or dangerous operator, not depend
training
sets
on intellec
operator.
15. NUMW_.H OF PAGES
14. SUBJECT TERMS Neural
sets
several
the backpropagationtrained
networks
the stepbystep
neural
training
to train
network,
neural
misalignment.
into
sets
36 Computer
software;
Optics
16. PRICE CODE
A03 17. SECURITY CLASSIFICATION OF REPORT
Unclassified NSN 7540012805500
18. SECURITY CLASSIFICATION OF THIS PAGE
Unclassifiod
19. SECURITY CLASSIFICATION OF ABSTRACT
20. UMITATION
OF ABSTRACT
Unclassified Standard Form 298 (Rev. 289) Prescribed by ANSI St(I. Z3918 298102