Aug 8, 1995 - University of Exeter, ... Keywords: Iterative learning control, 2D systems, optimal control, ..... If R = R0 where R0 is fixed and the s...

1 downloads 0 Views 306KB Size

$ Iterative Learning Control for Discrete Time Systems with Exponential Rate of Convergence

Notker Amann, David H. Owens and Eric Rogers

Report Number: 95/14

&

%

August 8, 1995

Centre for Systems and Control Engineering, University of Exeter, North Park Road, Exeter EX4 4QF, Devon, United Kingdom.

For more information on Centre activities contact Professor D.H.Owens Tel: 01392 - 263689/263628/263650

Fax: 01392 - 217965

Research funded by the UK Engineering and Physical Sciences Research Council under contract No. GR/H/48286

Abstract An algorithm for Iterative Learning Control is proposed based on an optimization principle used by other authors to derive gradient type algorithms. The new algorithm is a descent algorithm and has potential benefits which include realization in terms of Riccati feedback and feedforward components. This realization also has the advantage of implicitly ensuring automatic step size selection and hence guaranteeing convergence without the need for empirical choice of parameters. The algorithm achieves a geometric rate of convergence for invertible plants. One important feature of the proposed algorithm is the dependence of the speed of convergence on weight parameters appearing in the norms of the signals chosen for the optimization problem. Keywords: Iterative learning control, 2D systems, optimal control, singular optimal control, reference-input tracking, descent methods.

Contents 1 Introduction

1

2 Norm Optimal Iterative Learning Control

2

2.1 2.2 2.3 2.4

::::::::: The Optimal Learning Algorithm : : : The Proof of Convergent Learning : : The Rate of Convergence : : : : : : :

Problem Formulation

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

3 The Practical Application of the Algorithm 3.1 3.2

:::::::::::::::::::::: Simulation Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

The Causal Formulation of the System

2 4 6 7 9 9 10

4 Conclusions

13

A Regularity Conditions on the Plant

15

List of Figures 1

Plot of the bound of convergence as function of the design parameter for several values of 0 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

8

4

::::::: Simulation of the proposed algorithm for plant (41) with = 10 : : : : : : : Simulation of the proposed algorithm for plant (41) with = 1 : : : : : : : :

12

5

Optimal Iterative Learning Control applied to the nonlinear system from [15]

14

2 3

Block diagram of proposed Iterative Learning Control algorithm

i

10 11

1 Introduction Iterative Learning Control considers systems that repetitively perform the same task with a view to sequentially improving accuracy. Examples of this idea can e.g. be found in [3, 7, 8, 9, 10, 11, 15, 16, 17, 21] and includes the general area of trajectory following in robotics. The specified task is regarded as the tracking of a given reference signal r (t) or output trajectory for an operation on a specified time interval. The objective of Iterative Learning Control is to use the repetitive nature of the process to progressively improve the accuracy with which the operation is achieved by changing the control input iteratively from trial to trial. Iterative Learning Control was originally introduced in 1984 by Arimoto et al. [3] who presented an algorithm that generated the new trial control input by adding a “correction” term to the control input of the previous trial. Iterative Learning Control has since then been further explored but is still underdeveloped. A recent textbook about Iterative Learning Control [15] includes a good literature survey to 1992. The technical difficulty of Iterative Learning Control lies in the two-dimensionality (in the mathematical sense) of the overall system [9, 17]. The two dimensions are the trial index k and the elapsed time t during a trial. It is obviously desirable to have notions of stability with respect to both dimensions in a precisely defined sense (see [19] for some related ideas in the theory of repetitive dynamical systems). Whilst stability in the t-direction has the simple and usual interpretation, the stability in the k -direction is taken to be equivalent to convergence of the Iterative Learning Control algorithm, defined as follows: Definition 1 An Iterative Learning Control process is convergent if and only if it constructs a sequence of control inputs fuk (t)gk0 which, when applied to the plant, produces an output sequence fyk (t)gk0 with the property that the following limits exist

lim y (t) = r(t) k!1 k

; klim u (t) = u1 (t) !1 k

8t 2 [0; N ]

(1)

In this paper, a new convergent Iterative Learning Control approach is proposed that can be realised in terms of current trial feedback mechanisms combined with feedforward of previous trial data. The approach is based on the conceptual idea of splitting the two-dimensional dynamics into two separate one-dimensional dynamics. This is done by introducing a performance criterion as the basis of specifying the control input to be used on each trial. The algorithm uses the criterion to evaluate the performance of the system on a given trial by “averaging over time” and hence removing the dimension of time from the analysis. The performance criterion is then used to construct and solve an optimization problem whose solution is the proposed control input for the new trial. The use of optimality criteria in Iterative Learning Control is not new to this paper. For continuous time systems, Furuta and Yamakita [8] have used a steepest-descent algorithm to minimize the L2 norm of the tracking error. Their approach is as a steepest-descent optimization method guaranteed to converge provided that the “step length” is judiciously chosen on each trial. The results presented in this paper represent an improvement on their algorithm with the added bonus that convergence is guaranteed without the need to choose any step length parameters. In [7], an optimization problem related to the one in this paper is proposed. Its precise form is however not adapted to the special characteristics of Iterative 1

Learning Control. It also does not make use of the current error and hence does not have a feedback form. One of the largest advantages of the proposed algorithm is that it achieves a reduction of the norm of the error at each step if the plant is known and the rate of reduction can be influenced by available design parameters. Most of the other algorithms currently available cannot make any statements about the rate of convergence, even if the plant is fully known. This paper is related to more abstract results for general systems [2], differing in the concentration on the discrete time case and the much higher level of detailed analysis. The outline of the next sections is as follows. In section 2, the mathematical problem formulation and the proposed learning algorithm are shown. Its main properties are derived. In the next section, the causal algorithm for linear, discrete-time plants is presented, together with a discussion of the design parameters and illustrative simulation results.

2 Norm Optimal Iterative Learning Control 2.1 Problem Formulation The following sampled-time system is considered:

x(0) = x0 ; 0 i N x 2 R n; u 2 R m ; y 2 R p

x(t + 1) = A x(t) + B u(t) ; y(t) = C x(t)

(2)

The state space matrices A; B; C are assumed to be time-invariant for simplicity. It is however straightforward to extend all results to time-varying systems as this involves only time indexing of the matrices in the following derivations. Comments about exceptions for timevarying systems will be given where appropriate. This system has the following well-known solution [4]:

t?1 X t y(t) = CA x0 + CAt?1?i B u(i) i=0

(3)

Because only finite time intervals (with N < 1 samples) are considered in Iterative Learning Control, one can rewrite this in “vector”-form by building super-vectors1 y and u from y (t) and u(t) as follows: 2 3 3 2

6 6 y = 66 4

y(1) y(2) .. .

y(N )

7 7 7 7 5

6 6 u = 66 4

;

u(0) u(1) .. .

u(N ? 1)

7 7 7 7 5

(4)

Note that signals at different times are used for the input and output vectors. This leads to an equivalent formulation of (2) as a matrix equation, introduced in a similar fashion in [11]:

y = y0 + Gu

1 The super-vectors are marked by the omission of the argument time.

2

(5)

with the matrix G 2 R (pN )(mN ) defined as

2

6 6 G = 66 4

CB CAB

0

CB

.. .

.. .

0 0 ..

.

.. .

CAN ?1 B CAN ?2 B CB h

3 7 7 7 7 5

(6)

iT

and the vector of initial condition response y0 = (CA)T (CA2 )T (CAN )T x0 : The matrix G is a block lower-triangular matrix with a special structure. In the time-invariant case, the first column determines the whole matrix. This kind of structure is known as Toeplitz matrix [12]. The plant input/output operator is hence just a matrix, mapping mN -vectors to pN -vectors. It might possibly be a very large matrix, but this is not a problem as G does not appear in the final calculations. It is only required for theoretical analytical purposes. It is noted that the matrix G is invertible in the SISO-case if and only if CB 6= 0. If the system has a delay, i.e. CB = 0, then it can be regularised, as shown in appendix A. Consider now a tracking problem where the reference trajectory or desired output is denoted by r (t), given for 1 t N (assuming a relative degree of one for simplicity of presentation). The tracking error is defined as

e = r ? y = r ? Gu ? y0 = (r ? y0 ) ? Gu (7) where e and r are super-vectors, defined analogously to y . Hence, without loss of generality, it is possible to replace r by r ? y0 in the analysis and thence assume that y0 = 0, or equivalently, x0 = 0.

It is clear that any Iterative Learning Control procedure, if convergent, solves the equation

r = Gu1 for u1. If G is invertible, then the formal solution is just u1 = G?1r. A basic assumption of the Iterative Learning Control paradigm is that the direct inversion of G is not acceptable.

Inversion of a dynamical system is regarded as an impractical solution because it explicitly requires exact knowledge of the plant. This would make the approach sensitive to uncertainty and other disturbances. An estimated inverse could be used to calculate an initial guess for the input, but this input must then be improved on by the iterative learning procedure that uses a more or less exact model of the plant and can cope with disturbances.

The problem can easily be seen to be equivalent to the solution of the optimisation problem u1 = arg minfkek2 : e = r ? y; y = Gug (8)

u

This can be interpreted as a singular optimal control problem [6, 22] that, by its very nature, needs an iterative solution. This iterative solution is traditionally seen as a problem in numerical analysis but, in the context of this paper, it is seen as an experimental procedure. The difference between the two viewpoints is the fact that an experimental procedure has an implicit causality structure that is not naturally there in numerical computation. A formal definition of causality for Iterative Learning Control systems is as follows: Definition 2 An Iterative Learning Control algorithm is causal if, and only if, the value of the input at time t on the (k +1)th trial/experiment is computed only from data that is available from the (k +1)th trial in the time interval [0; t] and from previous trials on the whole of the time interval [0; N ]. 3

(Note: This process is not causal in the classical sense as data from times t0 used, but only from previous trials)

> t can be

2.2 The Optimal Learning Algorithm There are an infinity of potential iterative procedures to solve the optimization problem (8). The approach taken in this paper is to construct and describe in detail an algorithm with the important properties of: 1. Achieving a reduction of the norm of the error at each step, 2. Ensuring automatic choice of step size and 3. Including the potential for improved robustness through the use of causal feedback of current trial data and feedforward of data from previous trials. More precisely, the algorithm proposed here, on completion of the k th trial2 , calculates the control input on the (k +1)th trial as the solution of the minimum norm optimization problem

uk+1 = arg umin fJk+1 (uk+1) : ek+1 = r ? yk+1; yk+1 = Guk+1g k+1

(9)

where the “performance index” or optimality criterion used is defined to be

Jk+1 (uk+1 ) = kek+1 k2Y + kuk+1 ? uk k2U

(10)

U

Y

where the norms k k are the appropriate norms for the input- and output-spaces and , respectively. These spaces are `2 -spaces of m- and p-vectors on [0; N ? 1] and [1; N ], or isometrically equivalent, spaces of vectors with mN and pN real elements. Using the more familiar formulation where the norms have been written out as sums the performance index is

Jk+1 =

N X t=1

[r(t)?yk+1 (t)]T Q(t)[r(t)?yk+1(t)]+

NX ?1 t=0

[uk+1 (t)?uk (t)]T P (t)[uk+1 (t)?uk (t)]

(11)

The weight matrices Q(t) and R(t) must be symmetric and positive definitive for all t. The cost function (10) is equivalent to (11) if the norms used are induced from the following inner products. Using a block-diagonal matrix Q with Q(t) on the diagonal and similarly a matrix R with R(t) on the diagonal, the definitions of the inner products h; i in and are

Y

hy ; y iY = 1

2

hu ; u iU = 1

2

U

N X T y1 Qy2 = y1 (t)T Q(t)y2 (t) t=1 NX ?1 T u1 Ru2 = u1 (t)T R(t)u2 (t) : t=0

U

(12) (13)

can be arbitrary in theory but, in practice, it will be chosen to The initial control u0 2 be a good first guess at the solution of the problem.

2 To identify signals from different trials, signals are indexed with the trial number as subscript. 4

The problem can be interpreted as the determination of the (k + 1)th trial control input as an input that reduces the tracking error in an optimal way whilst not deviating too much from the control input used on the k th trial. The relative weighting of these two objectives is absorbed into the matrices Q and R. The immediate benefits of the approach are apparent from the simple interlacing result that kek+1 k2 Jk+1(uk+1) kek k2 8k 0 (14)

which follows from optimality and the fact that the (non-optimal) choice of uk+1 = uk would lead to the relation Jk+1 (uk ) = kek k2 . The result states that the algorithm is a descent algorithm as the norm of the error is monotonically decreasing in k and kek+1 k = kek k implies that the algorithm has terminated, i.e. the input remains unchanged. The controller on the (k + 1)th trial is obtained with vector differential calculus from the required stationarity condition

1 @Jk+1 = ?GT Qe + R(u ? u ) = 0 k+1 k+1 k 2 @uk+1 Since R(t) > 0

8t guarantees the existence of the inverse, the optimal control input is uk = uk + R? GT Qek 8k 0 1

+1

+1

(15)

(16)

The learning controller R?1 GT Q is equivalent to the adjoint operator G of G with respect to the weighted inner products (12) and (13) [13]. One can consider G as an abbreviation for R?1 GT Q. This equation is the formal update relation for the proposed new Iterative Learning Control algorithm. It is only a “formal” update law because the transpose in this law implies that uk+1 (t) ? uk (t) depends on values of ek+1 (t0 ) for t < t0 N , just as y (t) depends in (3) only on values of u(t0 ) for 0 t0 < t. The algorithm cannot therefore be implemented in this form. An equivalent form of this learning law that can actually be implemented in an algorithm is derived in a later section. Before this, general properties of this learning algorithm are addressed. Using e = r ? Gu gives the tracking error update relation

ek+1 = (I + GG)?1 ek

8k 0

(17)

and the recursive relation for the input evolution

uk+1 = (I + G G)?1 (uk + G r)

8k 0

(18)

This last relationship is a form of Levenberg-Marquardt [14] or modified Newton iteration which is familiar in the context of numerical analysis [18], particularly the least-squares fitting of parameters appearing nonlinearly in models, but in this case it is used for a dynamical system. The algorithm has a number of other useful properties. For example, monotonicity immediately gives the result that the following limits exist

lim ke k k!1 k

2

= klim J (u ) := J1 0 !1 k k 5

(19)

This implies furthermore that

lim kuk+1 ? uk k2 = 0 :

(20)

k!1

The existence of the limits suggests that the algorithm has a form of convergence property. The details are discussed in the next subsection.

2.3 The Proof of Convergent Learning The most interesting and important result regarding the convergence of the error sequence is as follows. Proposition 1 (Convergence in norm to zero) If either ker GT = 0 or r 2 range G, then the Iterative Learning Control tracking error sequence fek g converges in norm to zero. That is, if the plant is ‘regular’ (or was regularised, see appendix A) then the Iterative Learning Control algorithm has guaranteed convergence of learning! Proof: It was shown above that uk+1 ? uk

0 = klim (u !1 k+1

! 0 in norm as k ! 1 and hence ? uk ) = klim R? GT Qek !1 1

(21)

+1

If ker GT = 0, then there exists no e such that GT e = 0 and it follows that limk!1 ek+1 0, noting that Q and R are nonsingular.

=

ker GT 6= 0 but r 2 range G, the proposition is proved using (19) by showing that J1 = 0 holds. Combining the learning law (16) with the cost criterion (10), it follows

If

that

Jk = eTk Qek + eTk QGR?1GT Qek = hek ; (I + GG )ek i (22) Using the error transition law (17), it is easy to show that ek = (I + GG )?k e0 and therefore

Jk = hek ; ek?1 i = h(I + GG)?k e0 ; (I + GG )k e2k?1 i = he0 ; e2k?1 i : (23) If r 2 range G, there exists (at least) one u such that r = Gu . One can hence express e0 = r ? Gu0 as e0 = G(u ? u0 ) and get the result by substituting from (16) lim J k!1 k

= klim (u ? u0 )T GT Qe2k?1 = klim (u ? u0 )T R(u2k?1 ? u2k?2 ) = 0 : !1 !1

(24)

2

Y

Note that a basic result from linear algebra states that ker GT = 0 implies range G = , i.e. the plant operator G is such that all vectors y 2 have a pre-image in . Proposition 1 states in other words that the learning algorithm achieves a zero terminal error if it is possible at all. It is however a different matter how fast this limit error is reached — a question considered in the next section.

Y

6

U

2.4 The Rate of Convergence Theorem 1 considered the tracking error. The convergence of the input sequence fuk g is another issue. Firstly note that, from a mathematical point of view, even if the tracking error goes to zero, this does not imply convergence of the input sequence in unless the mapping G is one-to-one. The next proposition makes this more precise. Secondly, it is a very different question how rapidly the tracking error goes to zero. This question is also clarified in this section.

U

Proposition 2 (Convergence of the Input) 1. The sequence fuk gk0 has the property that

lim kG (r ? Guk+1 )kU = 0 ;

(25)

k!1

i.e. it minimizes in the limit the error in a least-squares sense. 2. If, moreover, G G has a inverse with norm 1= 2 in that

U or, equivalently, there exists a > 0 such 8u 2 U

kGukU kukU 2

2

2

then the input sequence converges in norm to u1 = (G G)?1 G r bounded by a geometric relation of the form

(26)

2 U . The convergence is

kuk ? u1k 1 +1 kuk ? u1k +1

Proof:

(27)

2

1. It has been noted that the sequence fuk+1 ? uk gk0 converges in norm to zero in . The limit (25) follows trivially from the identity uk+1 ? uk = G ek+1 = G (r ? Guk+1).

U

G G has a inverse, then the sequence f(G G)?1 (uk+1 ? uk )g = f(G G)?1Gr ? uk+1g also converges to zero in U as required and the limit point of uk is u1 = (G G)?1 G r . The geometric bound can be proved starting from (26)

2. If

with the string of inequalities:

kGuk kuk (28) hu; (I + G G)ui (1 + )hu; ui (29) kuk k(I + G G)uk (1 + )kuk (30) Let now u = uk ? u1 2 U and, using the relationships uk = (I + G G)? (uk + 2

2

2

2

2

2

+1

+1

G r) and u1 = (GG)?1 G r, it follows that:

u = uk+1 ? u1 = (I + G G)?1 (uk ? u1)

1

(31)

Inserting this into the above inequality yields finally the desired bound:

kuk ? u1k (1 + )kuk ? u1k 2

7

+1

(32)

2

Corollary 3 (Rate of Convergence) If G has a inverse with norm 1= then the error sequence converges exponentially with the following rate:

kek k 1 +1 kek k +1

(33)

2

The first part of the proposition shows that, in the limit, the input sequence minimizes the error in a least-squares sense, even if G is singular or non-square. If it is assumed that G is square and nonsingular, then it is guaranteed that there exists a scalar > 0 such that (26) holds. This number is known as the smallest singular value of G. It is evident from (6) that is strictly greater than zero if and only if the terms on the diagonal of G are nonsingular, as discussed in appendix A. The second part of the proposition and the corollary make an even stronger point. They state that the algorithm achieves for a “regular” plant an exponential rate of convergence. For the practical use of the algorithm, it is very important to note that , and hence the rate of convergence (27) and (33), can be changed arbitrarily with the design weights Q and R. This is evident if the definition of is rewritten as the equivalent expression

uT GT QGu 2 uT Ru

8u 2 U :

(34)

If R = R0 where R0 is fixed and the scalar is a variable parameter, it follows that 2 = 02 = where 0 is the smallest singular value corresponding to R0 . The parameter provides thus complete control over the convergence rate: the smaller is, the faster is the convergence rate of the input. Fig. 1 shows the dependence of the bound on the rate of convergence on for Bound on Rate of Convergence

ρ / (ρ + σ 0 )

1 σ 0 = 0.01 σ 0 = 0.1

0.5

σ0 = 1 σ 0 = 10 σ 0 = 100

0 −2 10

10

−1

0

10 ρ

1

10

10

2

Figure 1: Plot of the bound of convergence as function of the design parameter for several values of 0 several values of 0 . For example, to get a guaranteed reduction of the error of about 1=2 at each step, should be chosen in the order of magnitude of 0 . Simulation results suggest however that this may be overcautious as the smallest singular value stems from a worstcase consideration and the convergence for typical reference signals is (at least initially) much faster than guaranteed by the bound (33). This control over the convergence rate is one of the biggest advantages of this algorithm and stands in contrast to other algorithms, as e.g. [3, 21], where there is typically neither an exponential rate of convergence nor any possibility to increase the speed of convergence. 8

3 The Practical Application of the Algorithm 3.1 The Causal Formulation of the System As mentioned above, the algorithm in the form (16) is not causal in the usual sense, but only in the Iterative Learning Control sense as defined in section 2.1. It can, however, be converted into a computational procedure. This is shown next. The optimal solution uk+1 was found to be uk+1 = uk + G ek+1 . The adjoint operator G can be transformed into a usual dynamical system as e.g. (2) constitutes by noting that the transpose contains the operation of time-reversal plus an appropriate change of the statespace parameters. The equation uk+1 ? uk = R?1 GT Qek+1 containing the adjoint operator G becomes the familiar costate system [5]:

k+1 (t) uk+1(t)

= AT k+1 (t + 1) + C T Q(t + 1) ek+1 (t + 1) ; k+1 (N ) = 0 = uk (t) + R?1 (t)B T k+1 (t) ; N >t0

(35)

This system has a terminal condition (at t = N ) instead of an initial condition, marking it (as expected) as an anti-causal representation of the solution. It cannot therefore be implemented in this form, but a causal implementation can be found when assuming full state knowledge. The optimal control is transformed by writing for the costate

k+1 (t) = ?K (t)(I + BR

?1(t)B T K (t))?1 A[xk+1 (t) ? xk (t)] + k+1(t):

(36)

Standard techniques then yield the matrix gain K (t) as the solution of the familiar discrete matrix Riccati equation on the interval t 2 [0; N ? 1]:

K (t) = AT K (t +1)A + C T Q(t +1)C ? AT K (t +1)B (B T K (t +1)B + R(t +1))?1 B T K (t +1)A

(37) with the terminal condition K (N ) = 0. This equation is independent of the inputs, states and outputs of the system. In contrast, the predictive or feedforward term k+1 (t) is generated by

k+1 (t) = (I + K (t)BR?1(t)B T )?1 (AT k+1 (t + 1) + C T Q(t)ek (t)) ; k+1 (N ) = 0

(38)

The predictive term is hence driven by the tracking error on the previous (i.e. the k th trial). The input update law becomes with these terms

uk+1(t) = uk (t) ? (B T K (t)B + R(t))?1 B T K (t)A[xk+1 (t) ? xk (t)] + R?1 (t)B T k+1 (t) :

(39)

This is hence a causal Iterative Learning Control algorithm consisting of current trial full state feedback combined with feedforward from the previous trial output tracking error data. This representation of the solution is causal because (37) and (38) can be solved off-line, between trials, by reverse time simulation using available previous trial data. For a time-invariant system, the matrix K (t) for 0 t < N in fact needs to be computed only once before the sequence of trials begin. Fig. 2 shows a diagram of the closed loop in the proposed Iterative Learning Control algorithm. With this formulation, the storage of input, error and state of one trial is required.

9

Optimal learning controller

uk+1-

r

uk

J

Memory:

ek+1- provides ek xk+1-

signals from last trial

xk+

f

- optimal

predictor

- optimal

? 6 x k+1

feedback

f

J J J^ -J +

- plant G

f

yk+1- ?+ ?

xk+1

ek+1

Figure 2: Block diagram of proposed Iterative Learning Control algorithm

3.2 Simulation Examples In this section, two simulation examples are shown. The first one illustrates how the rate of convergence can be arbitrarily changed with this algorithm and the second demonstrates robustness with respect to nonlinearities of the plant. The first example is similar to an example in [10]. The plant represents the linearised model of a single-link robot manipulator with a link of mass m, length l and friction coefficient v . The linearised, sampled plant is

x(t + h) = y(t) =

"

0

1

0

1

vh2 h ml

#

? 1 2 ? mlvh2 x(t) + i

x(t)

"

0

h2 ml2

#

u(t)

(40) (41)

with parameters v = 0:8 kg m2 /sec, l = 0:8 m, m = 1:5 kg and sampling period h = 0:01 sec. The desired output trajectory is r (t) = t3 (4 ? 0:3t)h in the interval [0; 10], corresponding to N = 1; 000. The weight matrices are set to Q(t) = R0 (t) = 1. Two simulations are shown where only the parameter is changed. Fig. 3 shows the output and input from ten trials for = 10 and Fig. 4 shows the same graphs for = 1. As explained in section 2.4, the rate of convergence for a smaller value of is faster and this is clearly visible from the simulations. Both the outputs yk (t) and inputs uk (t) converge nearly immediately to good precision to the desired output and input, respectively, in Fig. 4 where is smaller. It is also evident from the norm of the errors: for = 10, the norm of the error after ten trials is ke10 k2 = 2:15 and for = 1 it is ke10 k2 = 0:207. Note that full state knowledge was assumed in the simulations. This would mean that measurements of position and velocity of the robot link need to be available. 10

Output and Reference

y(t), r(t)

15 r(t)

10

1

5

0 0

1

2

3

4

5 6 time [sec] Input

7

8

9

10

7

8

9

10

2

u(t)

1.5

10 6 3 2

1

1

0.5 0 0

1

2

3

4

5 6 time [sec]

Figure 3: Simulation of the proposed algorithm for plant (41) with = 10

11

Output and Reference

y(t), r(t)

15

10

1

5

0 0

1

2

3

4

5 6 time [sec] Input

7

8

9

10

7

8

9

10

2 1

u(t)

1

0

−1 0

1

2

3

4

5 6 time [sec]

Figure 4: Simulation of the proposed algorithm for plant (41) with = 1

12

The plant of the second example is the same as in [15]. The nonlinear model is as follows:

1)(y(t) + 2:5) + u(t) y(t + 1) = y(1t)+y(yt(? 2 t) + y(t ? 1)2

(42)

and the desired output is

r(t) =

(

t=50 (50 ? t)=50

for for

0 t 50 50 < t 100 :

(43)

For the application of the learning algorithm, the plant is linearised around the stationary point y = 0:5, u = 0, leading to the linear approximation

y^(t + 1) = 56 y^(t) + 23 y^(t ? 1) + u(t)

(44)

with y^(t) = y (t) ? y . The linearised plant is used for the calculation of the Riccati gain K (t) and the feedforward term k (t). In the actual trials, the nonlinear plant is used and its state is observed with a state-estimator. The weight matrices are chosen to be Q(t) = 1 and R(t) = 0:5. Fig. 5 shows the result of the simulations. It is evident that the proposed method achieves good convergence even for this nonlinear plant. Simulations for a range of first to fourth order plants have been carried out. They showed generally that good convergence can be obtained. In nearly all cases, good control over the `2 norm of ek is achieved using variation of the single parameter . Only non-minimum phase plants showed a slower convergence. The causes for this phenomenon were studied in [1] where it is indicated that non-minimum phase properties always slow down this form of algorithm, but convergence is still maintained. In the simulations, full state knowledge was assumed. If the state is not available, it must be estimated by an observer or alternatively, output feedback schemes can be used. Overall, the new Iterative Learning Control algorithm was found to be highly successful.

4 Conclusions This paper has proposed an Iterative Learning Control algorithm based on optimization principles and has provided a complete convergence analysis of the algorithm. The formal form of the results indicates the need to transform the representation of the solution into a causal representation for Iterative Learning Control implementation. This transformation has been provided. It was shown that the causal representation is a combination of feedback (of current trial data) and feedforward (of previous trial data). A formal examination of general robustness issues was not included, but the experience in numerical analysis with the related Levenberg-Marquardt method and results of Iterative Learning Control simulations indicate that the algorithm possesses robustness to a useful degree. This is a topic of present research. One specific advantage of the proposed algorithm and its interpretation as a linear quadratic tracking and disturbance accommodation problem is that the rate of decrease of the error can be influenced in a natural and intuitive way by several design parameters (weighting 13

Outputs and Reference

Errors 0.2 0

1 e(t)

y(t), r(t)

1.5

0.5

0 0

−0.4 50 time Inputs

−0.6 0

100

2

0.4

50 time Performance criterion

100

5 trial index

10

10

0.2 u(t)

−0.2

0

10

0 −2

10

−0.2 −0.4 0

−4

50 time

10

100

0

Figure 5: Optimal Iterative Learning Control applied to the nonlinear system from [15]. Linetypes: trials 1 and 10: solid, 2: dashed, 3: dotted, 6: dot-dashed, r (t): marked with +.

14

matrices in quadratic costs). This benefit is not available in other optimal descent methods, e.g. the steepest descent method in [8]. This is largely due to the simultaneous selection of step direction and step length of the proposed algorithm as compared to the steepest descent method where these are computed sequentially. Issues which this paper did not address and possible topics of future research include questions of robustness, an extension for nonlinear plants and the possibility of improvement of the rate of convergence through more complicated optimality criteria.

A Regularity Conditions on the Plant It is assumed in theorem 1 that the kernel of GT is zero. This means in other words that there exists no non-zero y such that GT y = 0. In terms of linear algebra, the assumption is that GT is monic, or equivalently, that G is epic [23]. This assumption is easily related to the state-space parameters of the plant by considering (6). In the SISO case, G (and GT ) is square and non-singular if and only if CB 6= 0. The above assumption is therefore a restriction on the relative degree of the plant. Such an assumption is quite usual in Iterative Learning Control literature [3, 15]. The physical situation is that if CB = 0, then the first output y (1) cannot be affected by the first input u(0). The first input has only an effect on later outputs y (t); t > 1. Also, the last input u(N ? 1) has no effect on any outputs during the trial. Because of this, y (1) and u(N ? 1) can be removed from the definitions of y and u. Furthermore, if the relative degree of the system is d, i.e. if CAi?1 B = 0 for 0 < i < d and CAd?1 B 6= 0, then the first d ? 1 elements from y and the last d ? 1 elements from u must be removed. This ensures that G remains square and becomes nonsingular. In the MIMO case, GT can only be monic if m p, i.e. if there are at least as many inputs as outputs. If ker GT = 0, it is guaranteed that for each vector r(t), there exists (at least) one input u(t) that generates it. If G is square and regular then there exists exactly one input for each output. These “regularity” conditions are strongly related to the inverse of the plant. The analogous procedure to make G regular in the MIMO case is then similar to Silverman’s inversion algorithm [20]: as above, the first and last elements of y and u, respectively, are omitted. Now, however, only some components of y (1) and u(N ? 1) must be removed, according to the rank of CB . The precise details are omitted because they do not form part of the main issue of this paper. Reference [21] gives some details about this problem in Iterative Learning Control for continuous time systems. Finally, for time-varying systems, regularity of G means additionally that the system does not change its relative degree during a trial. For example, if the system has relative degree 1 at time t = 1, then one must assume that C (t)B (t ? 1) 6= 0 for all t 2 [1; N ].

References [1] N. Amann and D. H. Owens. Non-minimum phase plants in iterative learning control. In Proc. 2nd Int. Conf. on Intelligent Systems Engineering, pages 107–112, HamburgHarburg, 1994. [2] N. Amann, D. H. Owens, and E. Rogers. Iterative learning control using optimal feedback and feedforward actions. To appear in Int. J. Control, 1996.

15

[3] S. Arimoto, S. Kawamura, and F. Miyazaki. Bettering operation of dynamic systems by learning: a new control theory for servomechanism or mechatronic systems. In Proc. 23rd IEEE Conf. on Decision and Control, pages 1064–1069, Las Vegas, Nevada, 1984. ˚ om [4] K. J. Astr ¨ and B. Wittenmark. Computer controlled systems. Prentice Hall, Englewood Cliffs, N.J., second edition, 1990. [5] M. Athans and P. L. Falb. Optimal Control. McGraw-Hill, New York, 1966. [6] D. J. Bell and D. H. Jacobsen. Singular Optimal Control Problems. Academic Press, New York, 1975. [7] K. Buchheit, M. Pandit, and M. Befort. Optimal iterative learning control of an extrusion plant. In Proc. IEE Int. Conf. Control ’94, pages 652–657, Coventry, 1994. [8] K. Furuta and M. Yamakita. The design of a learning control system for multivariable systems. In Proc. IEEE Int. Symp. on Intelligent Control, pages 371–376, Philadelphia, Pennsylvania, 1987. [9] Z. Geng, R. Carroll, and J. Xie. Two-dimensional model and algorithm analysis for a class of iterative learning control systems. Int. J. Control, 52(4):833–862, 1990. [10] D.-H. Hwang, Z. Bien, and S.-R. Oh. Iterative learning control method for discrete-time dynamic systems. IEE Proceedings, Pt. D, 138(2):139–144, 1991. [11] T. Ishihara, K. Abe, and H. Takeda. A discrete-time design of robust iterative learning controllers. IEEE Trans. on Systems, Man and Cybernetics, 22(1):74–84, 1992. [12] T. Kailath. Linear Systems. Prentice Hall, Englewood Cliffs, N.J., 1980. [13] D. G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, New York, 1969. [14] D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indust. Appl. Math., 11(2):431–441, 1963. [15] K. L. Moore. Iterative Learning Control for Deterministic Systems. Advances in Industrial Control Series. Springer-Verlag, London, 1993. [16] D. H. Owens. Iterative learning control – convergence using high gain feedback. In Proc. 31st IEEE Conf. on Decision and Control, pages 2545–2546, Tucson, AZ, 1992. [17] D. H. Owens. 2D Systems theory and iterative learning control. In Proc. 2nd European Control Conf., pages 1506–1509, Groningen, 1993. [18] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. The Art of Computing. Cambridge University Press, Cambridge, second edition, 1992. [19] E. Rogers and D. H. Owens. Stability Analysis for Linear Repetitive Processes, volume 175 of Lecture Notes in Control and Information Sciences. Springer-Verlag, Berlin, 1992. [20] L. M. Silverman. Inversion of multivariable linear systems. IEEE Trans. on Automatic Control, AC-14(3):270–276, 1969. 16

[21] T. Sugie and T. Ono. An iterative learning control law for dynamical systems. Automatica, 27(4):729–732, 1991. [22] J. C. Willems, A. K˙ıtapc¸i, and L. M. Silverman. Singular optimal control: A geometric approach. SIAM J. Control and Optimization, 24(2):323–337, 1986. [23] W. M. Wonham. Linear Multivariable Control: a Geometric Approach, volume 10 of Application of Mathematics. Springer-Verlag, New York, second edition, 1979.

17