problem of adaptive control of stochastic d-dimensional discrete-time nonlinear .... independent and identically distributed random vectors, with zero...

1 downloads 0 Views 322KB Size

COMMUNICATIONS IN INFORMATION AND SYSTEMS Vol. 2, No. 1, pp. 69-90, June 2002

004

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS COMBINING NONPARAMETRIC AND PARAMETRIC ESTIMATORS∗ BRUNO PORTIER† Abstract. In this paper, a new adaptive control law combining nonparametric and parametric estimators is proposed to control stochastic d-dimensional discrete-time nonlinear models of the form Xn+1 = f (Xn ) + Un + εn+1 . The unknown function f is assumed to be parametric outside a given domain of Rd and fully nonparametric inside. The nonparametric part of f is estimated using a kernel-based method and the parametric one is estimated using the weighted least squares estimator. The asymptotic optimality of the tracking is established together with some convergence results for the estimators of f . Keywords. Adaptive tracking control; Kernel-based estimation; Nonlinear model; Stochastic systems; Weighted least squared estimator.

1. Introduction. In a recent paper, [Portier and Oulidi 2000] consider the problem of adaptive control of stochastic d-dimensional discrete-time nonlinear systems of the form (d ∈ N): (1)

Xn+1 = f (Xn ) + Un + εn+1

where Xn , Un and εn are the output, input and noise of the system, respectively. The state Xn is observed, the function f is unknown, εn is an unobservable noise and the control Un is to be chosen in order to track a given deterministic reference trajectory denoted by (Xn∗ )n≥1 . To satisfy the control objective, Portier and Oulidi introduce an adaptive control law using a kernel-based nonparametric estimator (NPE for short) of the function f denoted by fbn . Following the certainty-equivalence principle, the desired control is given by (2)

∗ Un = −fbn (Xn ) + Xn+1

However, to compensate for the possible lack of observations which disruptes the NPE, some a priori knowledge about the function f is required. In a recent paper, [Xie and Guo 2000] study scalar models of the form (1) and prove that, without assuming any a priori knowledge about the function f and estimating it using a nearest neighbors method, only weakly explosive open-loop models (typically f such that √ |f (x) − f (y)| ≤ (3/2 + 2) |x − y| + c) can be stabilized using a feedback adaptive control law. Portier and Oulidi model the needed a priori knowledge by a known ∗ Received

on December 3, 2001, accepted for publication on June 11, 2002. de Math´ ematique – U.M.R. C 8628, “Probabilit´ es, Statistique et Mod´ elisation” Universit´ e Paris-Sud, Bˆ at. 425, 91405 Orsay Cedex, France, and IUT Paris, Universit´ e Ren´ e Descartes, 143 av. de Versailles, 75016 Paris. E-mail: [email protected] † Laboratoire

69

70

BRUNO PORTIER

continuous function fe satisfying £ £ (3) ∀ x ∈ Rd , kf (x) − fe(x)k ≤ af kxk + Af for some af ∈ 0 , 1/2 and Af < ∞ The adaptive tracking control is then given by: (4)

∗ Un = −fbn (Xn )1En (Xn ) − fe(Xn )1E n (Xn ) + Xn+1

where En = {x ∈ Rd ; kfbn (x) − fe(x)k ≤ bf kxk + Bf } with bf ∈ ]af , 1 − af [ and Bf > Af ; E n denotes the complementary set of En . From a theoretical point of view, introduction of the control law (4) combined with (3) ensures the global stability of the closed-loop system, which is the key point to obtain the uniform almost sure convergence for the NPE fbn , over dilating sets of Rd , and then, to derive the tracking optimality. From a practical point of view, the knowledge of a function fe satisfying (3) plays a crucial role in the transient behaviour of the closed-loop model. Indeed, when function f is not yet well estimated by fbn , the control law (2) could not always stabilize the process around the reference trajectory and therefore, if the model is very unstable in open-loop, the process can explode. In that case, we need an information which can allow the controller to get back the process around the reference trajectory. This information, given by fe, is crucial since thanks to condition (3), it ensures that the ∗ is globally stable. However, this model driven by the control Un = −fe(Xn ) + Xn+1 e scheme suffers from some drawbacks: the function f can be unavailable or not wellknown, the set En can be difficult to interpret and from a theoretical point of view, the asymptotic results require the uniform almost sure convergence of the NPE over dilating sets, obtained for a well-suited noise ε, and leading to slow convergence rates. The contribution of this paper is to provide an alternative way to handle a priori knowledge. We replace the previous set En by introducing a fixed domain D of Rd containing the reference trajectory (Xn∗ ), which is more explicit. To cope with the unability of the NPE (due to its local nature) to deliver an accurate information when only a few observations are available, we propose to consider some parametric a priori knowledge about the function f outside D (for example linear). This a priori is not modelled by a given fixed function like fe in the previous scheme but, for giving more flexibility, it depends on an unknown parameter to be estimated. The objective is to design a control law which gets the state back to D, after an excursion outside D. A convenient theoretical framework should consist on assuming that outside D, the function f is approximately of the given parametric form. Nevertheless, for some technical reasons, this framework cannot be addressed. We shall see later that the stability result obtained in this paper is largely due to the ability of the parametric estimator based adaptive controller to stabilize the closed-loop system and needs the exact knowledge of the parametric structure of f . Hence, this work must be considered as a first step towards the study of nonlinear stochastic systems using such adaptive control laws which combine nonparametric and parametric estimators. We suppose

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

71

that, outside a given domain D, the function f is of the form f (x) = θt x where θ is an unknown d × d matrix. The new adaptive control law is then of the form: (5)

∗ Un = −fbn (Xn )1D (Xn ) − θbnt Xn 1D (Xn ) + Xn+1

where θbn denotes the parametric estimator of θ. The resulting closed-loop model is given by ³ ´ ∗ (6) = f (Xn ) − fbn (Xn ) 1D (Xn ) Xn+1 − Xn+1 ³ ´t + θ − θbn Xn 1D (Xn ) + εn+1 From a theoretical point of view, only the almost sure uniform convergence on fixed compact of Rd is now required for the NPE and poorer noises can be considered. However to ensure the global stability of the closed-loop model (6) and then derive the uniform convergence of the NPE, we need good properties for the prediction errors associated with the parametric estimator. For this reason, we focus our attention on the well-suited weighted least squares estimator (Bercu and Duflo,1992) for which convergence results were previoulsy established. Now, let us make some comments about adaptive control of discrete-time stochastic systems which have been intensively studied during the past three decades. For linear models, ARX and ARMAX models, the problem of adaptive tracking has been completely solved using both a slight modification of the extended least squares algorithm (Guo and Chen, 1991; Guo 1994) and the weighted least squares algorithm (Bercy, 1995,1998; Guo, 1996). For nonlinear systems, several authors have proposed interesting methods: neural networks-based methods, for example, have been increasingly used (Narendra and Parthasarathy, 1990; Chen and Khalil, 1995; Jagannathan et al, 1996). However, to our knowledge, no theoretical results are available to validate these approaches. More recently, [Guo 1997] examines the global stability for a class of discrete-time nonlinear models which are linear in the parameters but nonlinear in the output dynamics. He proves the global stability of the closed-loop system when the growth rate of the nonlinear function does not exceed the one of a polynomial of degree < 4. The unknown parameter is estimated by the least squares estimator. In addition, in the scalar case, by exhibiting a counter-example, he shows that the closed-loop model is unstable if the degree is ≥ 4, even if the least squares estimator converges to the true parameter value. In a more recent work, [Bercu and Portier 2002] examine simular models and solve the problem of adaptive tracking. Several convergence results for the least squares estimator are also provided. Adaptive control laws using nonparametric estimators are not deeply studied. For a stable open-loop model of the form (1), [Duflo 1997] (see also Portier and Oulidi, 2000) proposes an asymptotically optimal adaptive tracking control law using persistent excitation.

72

BRUNO PORTIER

When the control law (4) is used, [Poggi and Portier 2000, 2001] give several other statistical results, as a pointwise central limit theorem for fbn and a global and a local test for linearity of function f . Finally, let us mention that an adaptive control law using a nonparametric estimator has been already experimented in a real world application: [Hilgert et al. 2000] use such an approach to regulate the output gas flowrate of anaerobic digestion process, by adapting the liquid flow-rate of an influent of industrial wine distillery wastewater. The paper is organized as follows. In section 2, we specify the model assumptions, the different estimators and the control law. Section 3 is devoted to the theoretical results (the proofs are postponed to appendices). Finally, section 4 contains an illustration by simulations. Our simulations carried out for one simple real-valued model indicate that our asymptotic results give a good approximation for moderate sample sizes. 2. Framework and assumptions. This section is devoted to the model assumptions, the definition of the different estimators and the adaptive control law. 2.1. Model assumptions. Let us denote D = {x ∈ Rd ; kxk ≤ D} where D is a positive constant, supposed to be known, and where k . k is the euclidian norm on Rd . Let us consider model (1) where initial conditions X0 and U0 are arbitrarily chosen and where function f is subjected to the following hypothesis. Assumption [A1]. The function f is continuous and ∀ x 6∈ D, f (x) = θt x where θ is an unknown d × d matrix. Remark 1. Under some convenient assumptions, extension to f (x) = θt ϕ(x) can be handled, but this framework requires some specific proofs and it is out of the scope of the paper. The noise ε will satisfy either Assumption [A2]. The noise ε = (εn )n≥1 is a bounded martingale difference £ ¤ sequence with E εn+1 εtn+1 / Fn = Γ where Γ is an invertible matrix and Fn is the σ-algebra generated by events occuring up to time n. or Assumption [A2bis] The noise ε = (εn )n≥1 is a sequence of d-dimensional, independent and identically distributed random vectors, with zero mean and invertible covariance matrix Γ. Its distribution is absolutely continuous with respect to the Lebesgue measure, with a probability density function (p.d.f. for short) p supposed to be C 1 -class with compact support, and p and its gradient are bounded. Assumption [A2] is not usual in the context of nonparametric estimation. Usually, we consider [A2bis] without assuming the boundedness of ε. The boundedness of ε, which is not so restrictive from a practical point of view, is required here to ensure the boundedness of the NPE.

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

73

2.2. Nonparametric estimator of f . As in [Portier and Oulidi 2000], unknown function f is estimated using a kernel method-based recursive estimator. For x ∈ Rd , f (x) is estimated by fbn (x) defined by: Pn−1 αd i K (iα (Xi − x)) (Xi+1 − Ui ) (7) fbn (x) = i=1 Pn−1 αd α i=1 i K (i (Xi − x)) and fbn (x) = 0 if the denominator in (7) is equal to 0. The real number α, called the bandwidth parameter, is in ]0 , 1/d[ and K, called the kernel, is a probability density function, subjected to the following assumption: Assumption [A3]. K : Rd → R+ is a Lipschitz positive function, with compact support, integrating to 1. Nonparametric estimation is extensively studied and widely used in the time series context. Comprehensive surveys about density and regression function estimation can be found in [Silverman 1986] and [H¨ardle 1990], respectively. 2.3. Parametric estimator of θ. It is well-known in linear adaptive control that the choice of the parameter estimation algorithm is crucial and essentially depends on the control objective: to identify the model or to solve the tracking problem. In this paper, we use the weighted least squared (WLS for short) estimator introduced by [Bercu and Duflo 1992]. This choice is governed by the properties of prediction errors which are simpler to manage and which allow us to easily study the global stability of the closed-loop model. Let us mention that the stochastic gradient estimator proposed by [Goodwin et al. 1981] is also well-suited for solving the tracking problem, but due to the lack of consistency it is not convenient for identifying function f outside D. Let us now present the construction of the WLS estimator which is slightly different from as usual since only the observations lying outside D have to be considered for the updating. The WLS estimator θbn is defined by: ³ ´t θbn+1 = θbn + an Sn−1 (a)Xn 1D (Xn ) Xn+1 − Un − θbnt Xn 1D (Xn ) (8) (9)

Sn (a) =

n X

ak Xk Xkt 1D (Xk ) + S−1 ,

k=0

where S−1 is a deterministic, symmetric and positive definite matrix. The initial value θb0 is arbitrarily chosen. The weighted sequence (an ) has been chosen following the −(1+²) work of [Bercu and Duflo 1992] and [Bercu 1995], ie. an = (log dn ) for some ² > 0, and where (10)

dn =

n X

2

kXk k 1D (Xk ) + d−1 with d−1 > 0,

k=0

2.4. Control law. In order to solve the tracking problem, we introduce an excited adaptive control law based on the certainty-equivalence principle (Astr¨om and

74

BRUNO PORTIER

Wittenmark, 1989). Addition of an excitation noise is necessary to obtain the uniform strong consistency of fbn when the noise ε is bounded (Duflo, 1997; Portier and Oulidi, 2000). Similar persistently excited control is used in the ARMAX framework to obtain the consistency of the extended least squares estimator (Caines, 1985). Let (Xn∗ )n≥1 be a given bounded deterministic tracking trajectory. Let (γn )n≥1 be a sequence of positive real numbers decreasing to 0 and let η = (ηn )n≥1 be a white noise. The excited adaptive tracking control is given by: (11)

∗ Un = Xn+1 − fbn (Xn )1D (Xn ) − θbnt Xn 1D (Xn ) + γn+1 ηn+1

where fbn (x) is the kernel-based estimator of f (x) and θbn is the WLS estimator of θ. The tracking trajectory (Xn∗ )n≥1 , the vanishing sequence (γn )n≥1 and the exciting noise η has to be chosen in such a way that the following assumptions are satisfied. Assumptions [A4]. – The tracking trajectory (Xn∗ )n≥1 is converging to a finite limit x∗ ∈ D; – The sequence (γn )n≥1 is such that γn−1 = O ((log n)a ) for some a > 0; – The noise η = (ηn )n≥1 is a sequence of d-dimensional, independent and identically distributed random vectors with mean zero and a finite moment of order 2, supposed to be also independent of X0 and ε. The distribution of η is absolutely continuous with respect to the Lebesgue measure, with a probability density function q > 0, supposed to be C 1 -class; q and its gradient are bounded. The choice of γn and ηn will govern the convergence rate of the NPE and the tracking. A short discussion about that choice is made in the following section. 3. Theoretical results. 3.1. Stability of the closed-loop model. Let us now present the theoretical results. The first one says that the control law (11), built with the NPE fbn and the WLS estimator, allows us to stabilize the closed-loop model. Theorem 3.1. Assume that [A1] to [A4] hold. Then, the closed-loop model is globally stable that is n X

(12)

2

kXk k = O(n) a.s.

k=1

Moreover, the parametric prediction errors satisfy (13)

n X

k(θbk − θ)t Xk k2 1D (Xk ) = o(n) a.s.

k=1

Proof. The proof is given in Appendix A. Of course results (12) and (13) hold if we assume [A2bis] instead of [A2]. The global stability (12) is the key point to prove convergence results for the NPE. Result (13) indicates that the parametric prediction errors have the good behaviour. This result will be useful to prove the asymptotic optimality of the tracking.

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

75

3.2. Optimality of the tracking. To establish the tracking optimality, we need a uniform convergence result for the NPE and the main difficulty of the proof is to establish that the denominator of fbn (x) is strictly positive on any compact set of Rd . Usually, this point is easily established when the distribution of ε is absolutely continuous with respect to the Lebesgue measure and its p.d.f. p is > 0. However, as ε is assumed to be bounded, we use the excitation noise η and its p.d.f. q > 0 to ensure that the denominator of fbn (x) remains strictly positive on any compact set of Rd . Nevertheless, due to the vanishing sequence (γn ), a condition linking (γn ) and the decrease of q, is now required. Assumption [A5]. The p.d.f. q of the noise (ηn ) and the vanishing sequence (γn ) are such that there exists a sequence of positive real numbers (δn )n≥1 ¡ ¢ decreasing to 0, with δn−1 = O (log n)b for some b > 0, satisfying, for any B < ∞ and any n ≥ 1, γn−d

(14)

inf q(γn−1 z) ≥ c δn

kzk≤B

where c is a positive constant. Remark 2. By choosing well-suited η and (γn ), it is always possible to find a sequence (δn ) matching condition (14). For example in the case d = 1, let us choose η such that its p.d.f. q satisfies q(x) ≥ cte/(1 + x4 ). Then, condition (14) holds with δn = γn3 . Theorem 3.2. Assume that [A1] to [A5] hold. Then, for α < 1/2d, we have the uniform almost sure convergence of fbn to f : for any A < ∞, µ −α −d ¶ µ β−1 ¶ ° ° n γn n ° ° (15) sup °fbn (x) − f (x)° = o + O a.s. δ δn n kxk≤A where β ∈ ]1/2 + αd , 1[. Moreover, the tracking is asymptotically optimal, ie. n 1 X 2 a.s. kXk − Xk∗ k −→ trace(Γ) n→∞ n

(16)

k=1

n

X bn = 1 and Γ (Xk − Xk∗ )(Xk − Xk∗ )t is a strongly consistent estimator of Γ. n k=1 Proof. The proof is given in Appendix B. Remark 3. If assumption [A2] is replaced by [A2bis], the term n−α γn−d reduces to n−α (see Remark B.1 in Appendix B). In that case, we have a loss in the convergence rate given by (15), compared to the result obtained in a nonadaptive context by [Duflo 1997] or [Senoussi 1991, Senoussi 2000]. The loss is due to the term δn which comes from Assumption [A5] (δn ≡ 1 in Duflo and Senoussi). Remark 4. Starting from result (B.20) of Appendix B, which means that n X ° ° ∗ °2 °f (Xk ) + Uk − Xk+1 = o(n) a.s. k=1

76

BRUNO PORTIER

where Uk is given by (11), we can obtain other interesting statistical results following the work of [Poggi and Portier 2000]. More precisely, under assumptions [A1], [A2bis] and [A3] to [A5], a multivariate pointwise central limit theorem for fbn (x) and a test for linearity of f can be derived. 3.3. A supplementary result about the WLS. As expected, the consistency of the parametric estimator is not required to establish the tracking optimality. Nevertheless, if we are interested in estimating the model outside D, it is possible to obtain some convergence results for the WLS estimator. However, the noise ε must satisfy [A2bis] instead of [A2]. Theorem 3.3. Assume that [A1], [A2bis] and [A3] to [A5] hold. Assume also that £ ¤ L = E (ε1 + x∗ )(ε1 + x∗ )t 1D (ε1 + x∗ ) is invertible. Then, (13) is improved by (17)

n X

¡ ¢ k(θbk − θ)t Xk k2 1D (Xk ) = o (log n)1+² a.s.

k=1

where ² is given by the weighting sequence (an )n≥1 . In addition, we have ¶ µ ° °2 (log n)1+² °b ° (18) θ − θ a.s. = O ° n ° n ´ ¡ ¢ √ ³ L (19) n θbn − θ −→ N 0 , L−1 ⊗ Γ . n→∞

Proof. The proof is given in Appendix C. The convergence results of the WLS estimator hold when the support of the noise ε is sufficiently large to guarantee that the process visits D sufficiently often even if the process is stabilized around x∗ . Let us also mention that as expected, the asymptotic variance of θbn is larger than the one obtained if model (1) was fully parametric. Indeed, in that case, matrix L is equal to Γ + x∗ (x∗ )t leading to a smaller asymptotic covariance matrix. 4. Simulation experiments. Since only asymptotic results are available, in this section we illustrate the behaviour of the adaptive control law (11) for moderate sample size realizations. We will focus on the quality of the tracking as well as the behaviour of the different estimates. Let us examine the following real-valued simulated nonlinear model defined by ¡ ¡ ¢¢ (20) Xn+1 = 1.4 + 0.5 sin(Xn /3) exp −(Xn − 118)2 /50 Xn + Un + εn+1 ¡ ¢ with εn ∼ N 0 , 22 , X0 = 5 and U0 = 0. This model, of course unrealistic, is very interesting because identification is difficult and the open-loop is very explosive.

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

77

Moreover, it satisfies the assumptions of the theoretical results previously established. Let us denote by f the real-valued function defined by ¡ ¢ f (x) = 1.4 x + 0.5 sin(x/3) exp −(x − 118)2 /50 x. The graph of f is given by the dotted line on Fig. 3. This function is linear for large x and highly nonlinear for small x. Here, the value of parameter θ is equal to 1.4. Domain D is defined by D = {x ∈ R, |x| ≤ 260} and contains the tracking trajectory which is defined as follows: Xn∗ = x∗ −(x∗ −X0∗ ) exp(−τ n) with τ = −(1/100)∗log(0.05), X0∗ = 20 and x∗ = 113. This kind of tracking trajectory is usual and is such that the deviation between Xn∗ and x∗ is of 5% when n = 100. For the nonparametric estimation of f , we take the bandwidth parameter α = 1/2 and we use the Gaussian kernel with the usual normalization equal to the estimated standard deviation of the process. However, when the process is stabilized at x∗ , this choice is not relevant, since during the transient phase of the tracking, due to the bad estimation of f (x) at the beginning, the process is often far from the tracking trajectory. Therefore, a slight modification of the normalization must be done: we compute the standard deviation of the process by taking only the most recent observations, and more precisely, the computation is based on the last observations Xn−51 , . . . , Xn , leading to build a slightly modified version of the NPE (7). This choice of normalization plays a crucial role since it allows to forget the transient phase, while the original normalization leads to a large empirical standard deviation and then to a kernel-based estimator with a too widely opened bandwidth of estimation. Let us describe the updating scheme of estimates fbn and θbn . • At time n = 0, the initial state X0 lies in D. Let n0 be the first n such that Xn 6∈ D. Until n0 , we update fbn and θbn . The updating of θbn allows us to obtain an approximative idea of the true parameter value. • After, for all n ≥ n0 , if Xn ∈ D, then the updating only concerns fbn , and θbn otherwise. The preliminary estimation of θ will certainly accelerate the convergence of θbn . Let us now comment on the obtained results. The study is based on 200 realizations of length n = 200. We can distinguish two kinds of realizations: those for which process X takes one or two values outside D (83%) and those for which process X does not leave D (17%). In the first case, as we can see for one realization (Fig. 1 to Fig. 4), the tracking is good until the nonlinear part of the model generates the observations (Fig. 1). The function f not yet being well estimated in the zone of the nonlinearity, the controller cannot stabilize the process at the reference trajectory. Later on, the process takes a value outside D. Since the WLS estimator is near to the true parameter value (Fig. 4), the controller can bring back the process within D (Fig. 1). This situation occurs one

78

BRUNO PORTIER Tracking

Adaptive Control Law

350

100

50 300 0 250

Control U(n)

Process X(n)

−50 200

150

−100

−150

−200 100 −250 50 −300

0

0

100

200

300

400

500

600

700

800

900

−350

1000

Fig. 1. The process Xn superimposed with the tracking trajectory.

0

100

200

300

400

500

600

700

800

900

1000

Fig. 2. The corresponding adaptive tracking control Un . 1.4

Estimation based on 1000 observations 300

1.3 250

SG Estimator

1.2

f and its NPE

200

150

1.1

1

0.9 100

0.8 50

0.7

0

0

20

40

60

80

100

120

140

160

180

Fig. 3. The true function (dotted line) superimposed with its NPE.

200

0.6

0

100

200

300

400

500

600

700

800

900

1000

Fig. 4. Parametric estimation.

or two times. After the process remains within D, estimation of f becomes better and better, the controller can stabilize the process at x∗ and finally, matches the control P1000 objective: the quantity (1/750) k=251 (Xk − Xk∗ )2 is equal to 4.15, to be compared to the noise variance equal to 4. As already observed by [Poggi and Portier 2000], we see in Fig. 2 that the control effort is moderate on the time interval [0 , 100] since the open-loop system is close to a linear system easy to be controlled. The control effort is very high after, since the open-loop system is locally highly unstable leading to the control burden (large slope). Nevertheless, this behaviour is as expected. In Fig. 3, one can appreciate the quality of the functional estimation of f in [0, 125] explaining the good quality of the tracking performance. Function f is not well estimated in [125, 200] because there are so few observations.

79

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS Tracking

Adaptive Control Law

160

20

140

0

120

Control U(n)

Process X(n)

−20 100

80

60

−40

−60

−80 40

−100

20

0

0

100

200

300

400

500

600

700

800

900

−120

1000

Fig. 5. The process Xn superimposed with the tracking trajectory.

0

100

200

300

400

500

600

700

800

900

1000

Fig. 6. The corresponding adaptive tracking control Un . 1.5

Estimation based on 1000 observations 300

1.4

SG Estimator

250

f and its NPE

200

150

1.2

1.1

100

1

50

0

1.3

0

20

40

60

80

100

120

140

160

180

Fig. 7. The true function (dotted line) superimposed with its NPE.

200

0.9

0

100

200

300

400

500

600

700

800

900

1000

Fig. 8. Parametric estimation.

In the second case (Fig. 5 to Fig. 8), let us only mention that the tracking is better (Fig. 5) and since the process does not leave D, the parameter θ is not well estimated (see Fig. 8). Some notation. Let us specify some notation that will be used in the rest of the paper. Let F = (Fn ) be the nondecreasing sequence of σ-algebras of events occuring up to time n. If (Mn ) is square-integrable vector martingale adapted to F, its increasing process will be the predictable and increasing sequence of semi-definite positive matrices defined by:

< M >n =

n X k=1

£ ¤ E (Mk − Mk−1 )(Mk − Mk−1 )t / Fk−1 where M0 = 0

80

BRUNO PORTIER

Let us now define the prediction errors which have to be considered. The nonparametric prediction error πn (f ) is defined by ³ ´ πn (f ) = − fbn (Xn ) − f (Xn ) 1D (Xn ) and the parametric one πn (θ) is defined by: ³ ´t πn (θ) = − θbn − θ Xn 1D (Xn ) = − θent Xn 1D (Xn ) where θen = θbn − θ. Appendix A: Proof of Theorem 3.1. By substituting (11) into (1), we obtain ∗ + γn+1 ηn+1 + εn+1 Xn+1 = πn + Xn+1

(A.1)

where πn is the global prediction error defined by πn = πn (f ) + πn (θ). Let us denote (A.2)

sn =

n X

2

kXk k

+ s−1 where s−1 > max(d−1 , trace(S−1 ))

k=0

By the strong law of large numbers, we easily prove that n = O(sn ) a.s., which implies a.s. that sn −→ ∞ (see Duflo, 1997, Corollary 1.3.25, p. 28). Now, let us show that we n→∞ have ∞ X

(A.3)

2

kπn (θ)k / dn < ∞ a.s.

n=1

Since Xn+1 − Un = f (Xn )1D (Xn ) + θt Xn 1D (Xn ) + εn+1 , equation (8) can be rewritten under the form: ³ ´t (A.4) θen+1 = θen + an Sn−1 (a)Xn 1D (Xn ) πn (θ) + εn+1 t Setting vn+1 = trace(θen+1 Sn (a) θen+1 ), we have 2

vn+1 = vn − an (1 − fn (a)) kπn (θ)k

2

+ an fn (a) kεn+1 k

− 2 an (1 − fn (a)) < πn (θ), εn+1 > P where fn (a) = an Xnt Sn−1 (a)Xn 1D (Xn ). Then, as n≥1 an fn (a) < ∞ , we derive, by proceeding as for the proof of Theorem 1 of [Bercu 1995], that X 2 (A.5) an (1 − fn (a)) kπn (θ)k < ∞ a.s. n≥1

and (A.6)

° °2 ° 1/2 ° °Sn (a) θen+1 ° = O(1) a.s.

−1 Finally, as an (1 − fn (a)) ≥ (a−1 ≥ (2dn )−1 for large n, we derive from (A.5) n + dn ) that the WLS estimator satisfy (A.3).

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

81

Therefore, as sn ≥ dn and sn increases to infinity a.s., we infer from (A.3) and Kronecker’s lemma, that: n X

(A.7)

2

kπk (θ)k = o(sn ) a.s.

k=1

Now to close the proof, let us show that sn = O(n) a.s. Firstly, Lemma B.1 of [Portier and Oulidi 2000] ensures that ∀ x ∈ Rd and ∀ n ≥ 1, ° ° °b ° °fn (x) − f (x)° ≤ cf + kf (x)k + sup kεk k a.s. k≤n

In addition, since (εn ) is bounded and f continuous, it follows easily that πn (f ) is almost surely bounded. Then, starting from (A.1), there exists a finite constant M1 such that ³ ´ 2 2 2 kXn+1 k ≤ 8 kπn (θ)k + kηn+1 k + M1 a.s. and therefore, (A.8)

sn+1 − s1 ≤ 8

n ³X

2

kπk (θ)k +

k=1

n X

2

kηk+1 k + n M1

´ a.s.

k=1

Furthermore, as η is independently and identically distributed (i.i.d. for short) and has a finite moment of order 2, then (A.9)

n X

2

kηk+1 k = O(n) a.s.

k=1

and using (A.7), we deduce from (A.8) that sn = o(sn ) + O(n) leading to sn = O(n) a.s., which establishes (12). In addition, we also deduce from (A.7) that (A.10)

n X

2

kπk (θ)k = o(n) a.s.

k=1

which gives (13). This last result will be useful to prove the optimality of the tracking (see Appendix B). Appendix B: Proof of Theorem 3.2. The study of the convergence results for the kernel-based estimator fbn is now well-known following the work of [Duflo 1997], [Senoussi 2000] and [Portier and Oulidi 2000]. In this proof, we shall follow the same scheme. Nevertheless, as (εn ) is not a sequence of i.i.d. random vectors with as usual a probability density function, some adaptations of the proof are required. Therefore, to make the paper self-contained, the main technical points are recalled and some of them are detailed if necessary. Starting from (7), let us rewrite fbn (x) − f (x) under the form Mnε (x) + Rn−1 (x) (B.1) fbn (x) − f (x) = 1{Hn−1 (x) 6= 0} − f (x)1{Hn−1 (x) = 0} Hn−1 (x)

82 where

BRUNO PORTIER

Mnε (x) =

n−1 X

Rn−1 (x) =

³ ´ iαd K iα (Xi − x) εi+1

i=1 n−1 X i=1 n−1 X

Hn−1 (x) =

³ ´³ ´ iαd K iα (Xi − x) f (Xi ) − f (x) ³ ´ iαd K iα (Xi − x)

i=1

Now, following the well-known argues, let us study the convergence of Mnε (x), Rn (x) and Hn (x). Study of Mnε (x). For x ∈ Rd and n ≥ 1, Mnε (x) is a square integrable martingale adapted to F = (Fn )n≥0 where Fn = σ (X0 , U0 , ε1 , . . . , εn ). As K is bounded and Lipschitz, we have for any x, y ∈ Rd and δ ∈ ]0 , 1[ , nαd |K (nα Xn )| ≤ cte nαd

(B.2) (B.3)

δ

nαd |K (nα (Xn − x)) − K (nα (Xn − y))| ≤ cte nαd+αδ kx − yk

In addition, as (εn )n≥1 has a finite conditional moment of order > 2, Mnε (x) matches assumptions of Proposition 3.1 of [Senoussi 2000] (or Corollary 3.VI.25 of [Duflo 1990], p.154). Hence, we have for any positive constant A < ∞ and β ∈ ]1/2 + αd , 1[, sup kMnε (x)k = o(nβ ), a.s.

(B.4)

kxk≤A

Before studying Rn (x) and Hn (x) let us establish the following lemma useful for the sequel. Consider the new filtration G = (Gn )n≥0 where Gn = σ(X0 , U0 , ε1 , . . . , εn+1 , η1 , . . . , ηn ). Lemma B.1. Assume that [A1], [A2], [A3] and [A4] hold. For x ∈ Rd , let us consider (B.5)

Mn (x) =

n X

n ³ ´ h ³ ´ io iλ K iα (Xi − x) − E K iα (Xi − x) / Gi−1

i=1

where λ ∈ ]0 , 1/2[ , α ∈ ]0 , 1/2d[. Then, for any A < ∞ and s ∈ ]1/2 + λ , 1[, we have sup |Mn (x)| = o (ns ) , a.s.

(B.6)

kxk≤ A

Proof. The proof is based on a result of uniform law of large numbers for martingales established in [Senoussi 2000] or [Duflo 1997]. We have < M (0) >n ≤

n X

£ ¤ E i2λ K 2 (iα Xi ) / Gi−1

i=1

≤

n X i=1

Z 2λ

i

³ ´ K 2 iα (πi−1 + Xi∗ + εi + γi v) q(v) dv

83

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

After an easy change of variable, we obtain < M (0) >n ≤ k q k∞ kKk∞

n X

i2λ−αd γi−d

i=1

Hence, E [< M (0) >n ] ≤ cte n1+2λ−αd γn−d ≤ cte n1+2λ . Let x, y ∈ Rd . Since K is bounded and Lipschitz, straightforward calculations give, for any τ ∈ ]0 , 1[ , < M (x) − M (y) >n ≤

n X

i2λ E

h³ ´τ i K (iα (Xi − x)) − K (iα (Xi − y)) /Gi−1

i=1

≤ cte

n X

τ

i2λ+ατ kx − yk

i=1

Now, taking the expectation, we obtain τ

E [< M (x) − M (y) >n ] ≤ cte kx − yk n1+2λ+ατ and assumptions of Theorem 1.1 of Senoussi (or Proposition 6.4.33 of Duflo, p.219) are a.s. fullfilled. Therefore, for any A < ∞ and s > 1/2+λ+ατ /2, sup n−s |Mn (x)| −→ 0. kxk≤A

n→∞

Finally, since τ > 0 is arbitrary, we obtain Lemma’s result. Study of Rn (x). As K is compactly supported, there exists a finite constant cK such that K(y) = 0 for kyk ≥ cK . From Assumption [A1], we deduce that f is Lipschitz-continuous that is, there exists a finite constant cf such that for all x, y ∈ Rd , kf (x) − f (y)k ≤ cf kx − yk. Then, we infer that kRn (x)k ≤ cf

n X i=1

³ ´ iαd−α K iα (Xi − x) iα kXi − xk 1niα kX − xk ≤ c o i K

and we deduce that kRn (x)k = O (Tn (x)) where Tn (x) =

n X

³ ´ iαd−α K iα (Xi − x) .

i=1

Now, let us decompose Tn (x) under the form MnT (x) + Tnc (x) where Tnc (x) =

n X

h ³ ´ i iαd−α E K iα (Xi − x) /Gi−1

i=1

=

n X

Z i−α γi−d

³ ´ K(t) q γi−1 (i−α t + x − πi−1 − Xi∗ − εi ) dt.

i=1

As q is bounded and K integrating to 1, we easily deduce that (B.7)

¡ ¢ sup |Tnc (x)| = O γn−d n1−α a.s.

x∈Rd

84

BRUNO PORTIER

For x ∈ Rd and n ≥ 1, MnT (x) is a square integrable martingale for which we can apply Lemma B.1 with λ = αd − α. Then, for A < ∞ and s0 > 21 + αd − α, ³ 0´ ¯ ¯ sup ¯MnT (x)¯ = o ns (B.8) a.s. kxk≤ A

Moreover, since α ∈ ]0 , 1/2d[, the real s0 can be chosen such that s0 < 1 − α. Hence, ¡ ¢ from (B.7) and (B.8), we obtain that for any A < ∞, sup |Tn (x)| = O γn−d n1−α kxk≤ A

a.s., and therefore ¡ ¢ sup kRn (x)k = O γn−d n1−α a.s.

(B.9)

kxk≤ A

Remark B.1. If assumption [A2] is replaced by [A2bis], then ¡ ¢ sup kRn (x)k = O n1−α a.s. kxk≤A

Indeed, in that case, result (B.6) of Lemma B.1 holds for the filtration (Fn ) instead of (Gn ). In addition, the term Tnc (x) is then equal to Tnc (x)

=

n X

ZZ −α

i

³ ´ K(t) p i−α t + x − πi−1 − Xi∗ − γi v q(v) dt dv

i=1

¡ ¢ and, as kpk∞ < ∞, we derive that sup |Tnc (x)| = O n1−α , which gives the desired x∈Rd

result.

Study of Hn (x). We study Hn (x) by proceeding as for Tn (x). For x ∈ Rd , let us set ³ ´ (B.10) Hn (x) = MnH (x) + Hnc (x) − Jn (x) + Jn (x) with MnH (x) = Hn (x) − Hnc (x) and Hnc (x) =

n X

Z γi−d

³ ´ K(t) q γi−1 (i−α t + x − πi−1 − Xi∗ − εi ) dt

i=1

Jn (x) =

n X

³ ´ γi−d q γi−1 (x − πi−1 − Xi∗ − εi )

i=1

For x ∈ Rd and n ≥ 1, MnH (x) is a square integrable martingale adapted to G. Then, by Lemma B.1 used with λ = αd, we derive that for A < ∞ and s00 > 12 + αd, ³ 00 ´ ¯ ¯ (B.11) sup ¯MnH (x)¯ = o ns a.s. kxk≤ A

Z As kDqk∞ < ∞ and (B.12)

ktk K(t) dt < ∞, we have

´ ³ sup |Hnc (x) − Jn (x)| = O γn−(d+1) n1−α a.s.

x∈Rd

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

85

From (A.1) together with (A.9) and (12), we deduce that n X ° ° °πj−1 + Xj∗ + εj °2 = O(n) a.s.

(B.13)

j=1

Then using Lemma A.1 of [Portier and Oulidi 2000], we obtain that there exists c1 > 0 such that for R large enough, n 1 X 1{kπk−1 +Xk∗ +εk k≤R} > c1 > 0 a.s. n

lim inf

(B.14)

n→∞

k=1

Let A < ∞. For x ∈ Rd such that kxk ≤ A, we have (B.15)

Jn (x) ≥

n X

γj−d

j=1

inf

kzk≤A+R

q(γj−1 z)1{kπj−1 +X ∗ +εj k≤R} j

and using Assumption [A5], we obtain that Jn (x) ≥ c2 δn

(B.16)

n X j=1

1{kπj−1 +Xj∗ +εj k≤R}

where c2 > 0. Then, from (B.14) together with (B.16), we deduce that for any A < ∞, lim inf

(B.17)

n→∞

1 n δn

inf Jn (x) > 0 a.s.

kxk≤A

Finally, from the following inequality inf Hn (x) ≥

kxk≤A

¯ ¯ inf Jn (x) − sup ¯MnH (x)¯ − sup |Hnc (x) − Jn (x)|

kxk≤A

kxk≤A

kxk≤A

together with (B.11), (B.12) and (B.17), we deduce that (B.18)

lim inf n→∞

1 n δn

inf Hn (x) > 0 a.s.

kxk≤A

To close the proof of Part 1, it suffices to combine (B.4), (B.9) and (B.18). Optimality of the tracking. Starting from (A.1), we have ° °2 ∗ °Xn+1 − Xn+1 ° = kπn k2 + 2 < πn , εn+1 + γn+1 ηn+1 > 2

+ kεn+1 k

2

2 + γn+1 kηn+1 k

+ 2 γn+1 < ηn+1 , εn+1 >

where < . , . > denotes the inner product on Rd . By Theorem 3.2, we have for any A < ∞, sup kfbn (x) − f (x)k = o(1) a.s. In kxk≤A

particular, we can take A = D and then derive that (B.19)

n X k=1

2

kπk (f )k = o(n) a.s.

86

BRUNO PORTIER 2

2

2

In addition as kπn k = kπn (f )k + kπn (θ)k , we infer from (A.10) and (B.19) that n X

(B.20)

2

kπk k = o(n) a.s.

k=1

Using once again (A.9), it follows that n X

(B.21)

2

2 kηk+1 k γk+1

= O

k=1

n ³X

2 γk+1

´ = o(n) a.s.

k=1

Furthermore, using the Cauchy-Schwarz inequality, we deduce that a.s. ¯ n ¯ n n ¯X ¯ ³X ´1/2 X ¯ ¯ 2 2 < πk , εk+1 + γk+1 ηk+1 >¯ ≤ kπk k × kεk+1 + γk+1 ηk+1 k = o(n) ¯ ¯ ¯ k=1

k=1

k=1

and ¯ ¯ n n n ¯X ¯ ³X ´1/2 ³X ´1/2 ¯ ¯ 2 2 2 γk+1 < εk+1 , ηk+1 >¯ ≤ γk+1 kηk+1 k kεk+1 k = o(n) ¯ ¯ ¯ k=1

k=1

k=1

Finally, combining these different results with a strong law of large numbers, we prove b n is obtained by proceeding as the tracking optimality. The strong consistency of Γ usual (see [Portier and Oulidi 2000] for example). Appendix C: Proof of Theorem 3.3. This appendix is concerned with the proof of some convergence results for the WLS estimator defined by (8) and (9). First, let us establish the following lemma. Lemma C.1. Assume that [A1], [A2bis] and [A3] to [A5] hold. Let g : Rd → R be a function of C 2 -class with bounded derivatives of order 2 and such that |g(x)| ≤ 2 cte(1 + kxk ). Then, n

1X a.s. g(Xk ) −→ E [g(ε1 + x∗ )] n→∞ n

(C.1)

k=1

and n

1X a.s. g(Xk )1D (Xk ) −→ E [g(ε1 + x∗ )1D (ε1 + x∗ )] n→∞ n

(C.2)

k=1

Proof. As g is of C 2 -class with bounded derivatives of order 2 and as ε is bounded, we easily show using a Taylor expansion that n

(C.3)

n

1X 1X g(Xk ) = g(εk + x∗ ) n n k=1 k=1 Ã n ! ´ 1 X³ 2 2 ∗ ∗ 2 2 +O kπk−1 k + kXk − x k + γk kηk k n k=1

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

87

Then, using results used to prove the tracking optimality in the previous appendix and a strong law of large numbers, we derive result (C.1). To establish (C.2), let us remark that n

n

n

k=1

k=1

1X 1X 1X g(Xk )1D (Xk ) = g(Xk ) − g(Xk )1D (Xk ) n n n k=1

and let us rewrite

n X

g(Xk )1D (Xk ) as Mng + Rn where

k=1

Rn = =

n X

E [g(Xk )1D (Xk ) / Fk−1 ]

k=1 n ZZ X

g(t)1D (t) p (t − Xk∗ − πk−1 − γk v) q(v) dt dv

k=1

For any n ≥ 1, Mng is a square integrable martingale adapted to F. Its increasing process satisfies < M g >n = O(n) a.s. Therefore, using a strong law of large numbers for martingales, we deduce that Mn = o(n) Z a.s. Z

Now, as E [g(ε1 + x∗ )1D (ε1 + x∗ )] = g(t)1D (t) p (t − x∗ ) dt and kDpk∞ < ∞, Z Z q(v)dv = 1, kvk q(v)dv < ∞ and |g(t)| 1D (t)dt < ∞, we infer after an easy

calculation that |Rn − n E [g(ε1 + x∗ )1D (ε1 + x∗ )]| = o(n) a.s.

(C.4) Then,

n

1X a.s. g(Xk )1D (Xk ) −→ E [g(ε1 + x∗ )1D (ε1 + x∗ )] n→∞ n

(C.5)

k=1

and combining this result with (C.1), we obtain (C.2). Now, we are able to prove Theorem 3.3. Firstly, using Part 2 of Lemma C.1, we deduce that (C.6)

n £ ¤ 1X a.s. Xk Xkt 1D (Xk ) −→ L = E (ε1 + x∗ )(ε1 + x∗ )t 1D (ε1 + x∗ ) n→∞ n k=1

Secondly, following [Bercu 1998], we derive that as soon as L is invertible, (C.7)

Sn (a) a.s. −→ L. In addition, −(1+²) n→∞ n(log n)

λmin (Sn (a)) a.s. −→ λmin (L) > 0 n(log n)−(1+²) n→∞

Finally, from results (A.5) and (A.6) together with (C.7), we deduce (17) and (18), respectively. The central limit theorem (19) is obtained using Lemma C.1 of [Bercu 1998].

88

BRUNO PORTIER

Acknowledgments. The author would like to thank Professors Jean-Michel Poggi and Bernard Bercu for helpful discussions. REFERENCES ¨ m and B. Wittenmark, Adaptive Control. Addison[Astr¨ om and Wittenmark 1989] K. J. Astro Wesley, 1989. es pond´ er´ es et poursuite. Ann. [Bercu and Duflo 1992] B. Bercu and M. Duflo, Moindres carr´ Inst. H. Poincar´ e, 29(1992), pp. 403–430. [Bercu 1995] B. Bercu, Weighted estimation and tracking for ARMAX models. SIAM J. of Cont. and Opt., 33(1995), pp. 89–106. [Bercu 1998] B. Bercu, Central limit theorem and law of iterated logarithm for least squares algorithms in adaptive tracking. SIAM J. of Cont. and Opt., 36(1998), pp. 910–928. [Bercu and Portier 2002] B. Bercu and B. Portier, Adaptive control of parametric nonlinear autoregressive models via a new martingale approach. To appear in IEEE Trans. on Aut. Cont., 2002, Preprint Universit´ e Orsay 2001-59. [Caines 1985] P. E. Caines, Linear Stochastic System. John Wiley, New York, Boston, 1985. [Chen and Khalil 1995] F-C. Chen and H. K. Khalil, Adaptive control of a class of nonlinear discrete-time systems using neural networks. IEEE Trans. on Aut. Cont., 40(1995), pp. 791–801. ethodes R´ ecursives Al´ eatoires, Masson,1990. [Duflo 1990] M. Duflo, M´ [Duflo 1997] M. Duflo, Random Iterative Models, Springer Verlag, 1997. [Goodwin et al. 1981] G. C. Goodwin, P. J. Ramadge, and P. E. Caines, Discrete time stochastic adaptive control. SIAM J. of Cont. and Opt., 19(1981), pp. 829–853. [Guo and Chen 1991] L. Guo and H. F. Chen, The Astr¨ om-Wittenmark self-tuning regulator revisited and ELS-based adaptive trackers. IEEE Trans. on Aut. Cont., 36(1991), pp. 802–812. [Guo 1994] L. Guo, Further results on least squares based adaptive minimum variance control. SIAM J. of Cont. and Opt., 32(1994), pp. 187–212. [Guo 1996] L. Guo, Self convergence of weighted least squares with applications to stochastic adaptive control. IEEE Trans. on Aut. Cont., 41(1996), pp. 79–89. [Guo 1997] L. Guo, On critical stability of discrete-time adaptive nonlinear control. IEEE Trans. on Aut. Cont., 42(1997), pp. 1488–1499. ¨ rdle, Applied Nonparametric Regression. Cambridge University Press, 1990. [H¨ ardle 1990] W. Ha [Hilgert et al. 2000] N. Hilgert, J. Harmand, J.-p. Steyer, and J.-p. Vila, Nonparametric identification and adaptive control of an anaerobic fluidized bed digester. Control Engineering Practice, 8(2000), pp. 367-376. [Jagannathan et al 1996] S. Jagannathan, F. L. Lewis, and O. Pastravanu, Discrete-time model reference adaptive control of nonlinear dynamicals systems using neural networks. Int. Journ. of Cont., 64(1996), pp. 217–239. [Narendra and Parthasarathy 1990] K. S. Narendra and K. Parthasarathy, Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Net., 1(1990), pp. 4–27. [Poggi and Portier 2000] J.-M. Poggi and B. Portier, Nonlinear adaptive tracking using kernel estimators : estimation and test for linearity. SIAM J. of Cont. and Opt., 39:3(2000), pp. 707–727. [Poggi and Portier 2001] J.-M. Poggi and B. Portier, Asymptotic Local Test for Linearity in Adaptive Control. Stat. and Prob. Let., 55:1(2001), pp. 9–17. [Portier and Oulidi 2000] B. Portier and A. Oulidi, Nonparametric estimation and adaptive control of functional autoregressive models. SIAM J. of Cont. and Opt., 39:2 (2000), pp. 411–432. er´ e et identification. Thesis. University Paris-Sud, [Senoussi 1991] R. Senoussi, Lois du logarithme it´

ADAPTIVE CONTROL OF DISCRETE-TIME NONLINEAR SYSTEMS

89

Orsay, 1991. [Senoussi 2000] R. Senoussi, Uniform iterated logarithm laws for martingales and their application to functional estimation in controlled Markov chains. Stoch. Proc. and their Appl., 89:2(2000), pp. 193–212. [Silverman 1986] B. W. Silverman, Density estimation for statistics and data analysis. Chapman and Hall, 1986. [Xie and Guo 2000] L-L. Xie and L. Guo, How much uncertainty can be dealt with by feedback ?. IEEE Trans. on Aut. Cont., 45(2000), pp. 2203–2217.

90

BRUNO PORTIER