Discrete Learning Control with Application to Hydraulic Actuators

In this paper the robustness of a class of learning control algorithms to state disturbances, output noise, and errors in initial conditions is studied. We present a simple learning algorithm and exhibit, via a concise proof, bounds on the asymptotic trajectory errors for the learned input and the corresponding state and output trajectories. Furthermore, these bounds are continuous functions of the bounds on the initial condition errors, state disturbance, and output noise, and the bounds are zero in the absence of these disturbances.


Introduction
Learning control is a name attributed to a class of selftuning processes whereby the system performance of a specified task improves based on the previous performances of identical tasks.This is an advantage when controlling systems that cannot be modelled accurately.The idea of a self-learning system is in itself aesthetically appealing in that it represents a significant step in the development of an intelligent, fully autonomous control system.A block diagram of a basic learning system is illustrated in Fig. 1. u k (t) denotes an input trajectory.The desired output trajectory from the plant is y d (t) and the actual output due to u k (t) is y d (t).L( * ) is the learning operator which compares y d (t) and y k (t) and adds an update term to u k (t) to produce u k+1 (t).In this paper the term "learning control" means the offline learning in which all the signals are defined over the finite time duration [0, T ] and the input modification is defined as follows: u k+1 (t) = L(u k (t), y d (t), y k (t)) (1) Figure 1: Basic learning system where L( * ) is a learning operator, u k+1 (t) is the input at the k+1'th trial stage, y k (t) and u k (t) are the output and input at the k'th trial stage, respectively, and y d (t) is the desired trajectory.The learning operator will in Section two also be a function of y k+1 (t), but for now Eq. ( 2) is considered.
The trajectories are taken to be functions of t ∈ [0, T ] and the updates occur sequentially in time.The trajectories are supported on finite intervals of the time axis and the iteration from k to k + 1 occurs from one interval to the next.In this way, learning control uses practice to improve movement by altering the stored data at the execution of the previous learning trial and generating an optimal feedforward input to attain the desired motion.Advantageous features of learning control are that it is easy to implement and allows simple models and control schemes to be used while compensating for unmodelled dynamics and complex phenomena such as stiction.
The basic strategy of the classical techniques is to use an iteration of the form , where the operator L(•, •) remains to be specified.For time-invariant mechanical systems Arimoto et al. (1984) and Craig (1984) present conditions on the learning operator which guarantee system convergence upon repeated application of the learning algorithm.One shortcoming of these analyses is that they are small signal analyses, which require the assumption that the initial trajectory (and thus all subsequent ones) lies in a neighbourhood of the desired trajectory.Togai and Yamano (1985) consider the problem of learning control for a discrete-time system by using gradient methods to optimize the learning operator.The approach of Mita and Kato (1985) and Kavli (1992) consider the learning control problem in the frequency domain.In model-based learning schemes Atkeson and McIntyre (1986), the inputs corresponding to the desired and actual trajectories are computed from estimated system parameters and the resulting input errors fed to the learning operator.In this scheme the performance of the algorithm depends on the quality of the parameter estimates, and the scheme is shown in Hauser (1987) to be a special case of this more general approach.All these techniques are for linear, timeinvariant systems.Other researchers have considered the learning control problem for classes of non-linear systems.Both Hauser (1987) and Bondi et al. (1988) remove the assumption that the initial trajectory lies in the neighbourhood of the desired one, by developing global analyses, proving convergence of the input sequence u k (t) with any initial trajectory.Another extension of Hauser (1987) allows time-varying systems.This is important because it is the wish to improve the performance of the plant as much as possible using conventional feedback control methods.The learned input, u k (t), is a feed-forward term which further improves the performance for a specific task.Thus, for most applications we have the situation shown in Fig.
Figure 2: Learning control application with a feedback controller attached 2., and the learning algorithm operates on the system between u k (t) and y k (t) which is time-varying.
Since learning control algorithms are iterative schemes, the robustness of such algorithms is critical in the presence of disturbances, measurement noise and perturbed errors of initialization.There have been a number of efforts toward the robustness of learning algorithms.
In Heinzinger et al. (1989), Heinzinger et al. (1992) the robustness problem for the non-linear system given in Hauser (1987) is studied for a class of learning algorithms, and it is proven without any linearization that the learned input and the corresponding output trajectories converge to neighbourhoods of their desired trajectories.In Arimoto (1990); Arimoto et al. (1991Arimoto et al. ( , 1990) ) robustness is proved based on the passitivity analysis of robot dynamics.In Saab et al. (1993) the same update law is used as in Arimoto (1990); Arimoto et al. (1991Arimoto et al. ( , 1990) but a broader class of systems are considered.
The learning control schemes presented in this paper are based on adaptively constructing a feedforward input history to the actuator, which will cancel the unknown repeatable portion of the dynamics.Since the construction of this feedforward input signal is not based on a model of any kind, the learned input may reflect any unknown complex function.
The paper is structured as follows: Section two presents a general robust discrete algorithm.Section three presents application to hydraulic actuators, and confirming the theoretical results some simulation results are given in Section four.Section five contains conclusions.
2 Robust Discrete Time Learning Controller (RDLC) In this section a robust discrete learning algorithm for a class of time-varying, non-linear systems is presented.By robust is meant that, when state disturbances are present or there are errors in the initial conditions, the learning algorithm generates a sequence of inputs such that the asymptotic trajectory errors for the input, state, and output are bounded.In addition, these bounds are continuous functions of the bounds on the initial condition errors and the disturbances, and we quantify the degradation due to each of these factors.
The description of the system and assumptions are similar to those in Hauser (1987).The proof technique is similar to many Hauser (1987); Heinzinger et al. (1989Heinzinger et al. ( , 1992) ) in that it proceeds in a straightforward manner showing that we have a "contraction" on the input sequence implying the convergence results.
The class of non-linear, time-varying systems considered is described by the following state-space equations: where, for all t ∈ [0, T ], Thus, for a given initial condition and control input on [0, T ], In addition, the following properties are assumed.
(A5) All functions are assumed to be measurable and integrable.Assumption (A1) implies that given an achievable, desired output trajectory (y d ) and initial state (x d (0)), there exists an unique input (u d ) and state (x d ) trajectories corresponding to this output trajectory.Assumption (A4) on g x (•, •) implies that g is uniformly globally Lipschitz in x on [0, T ].The function ω k (t) represents both deterministic and random disturbances of the system.It may be stiction, non-reproducible friction, modelling errors, etc.This is important to include since these are present in physical systems.Assumption (A2) restricts these disturbances to be bounded, but they may be discontinuous (e.g.stiction in mechanical systems).The discrete learning control strategy is inspired from the works in Heinzinger et al. (1992); Arimoto (1990).A motivation for the control strategy can be given by considering a simple first order system where u(t) is the input and x P (t) the output.The term ν(t) is introduced as a modelling error, completely unknown but upper bounded.Denoting x P (t) in the k' th work cycle by x k (t), and defining φ(t) = −Γ −1 (t)ν(t), the dynamic formulation in Eq. ( 3) can be written, at the k' th cycle as The function φ(t) represents the state disturbance, which is assumed to bounded.By making use of Taylor's expansion, the output x k (t) at the time instant t + ∆t can be approximated by and similarly at the k + 1'th work cycle as The input signal u k+1 (t), which forces x k+1 (t + ∆t) to approach x d (t + ∆t), may be solved by replacing x k+1 (t + ∆t) by x d (t + ∆t) in Eq. ( 6), provided that the function φ(t) is known, i.e.
Ignoring the variation of the unknown function φ(t) in two consecutive cycles, then φ k+1 (t), in Eq. ( 7), may be eliminated by substituting φ k (t) for φ k+1 (t).φ k (t) may be found from Eq. ( 5), thus Eq. ( 5) may be written as Rearranging, and ignoring the variation of Γ between the consecutive cycles, Eq. ( 8) may be turned into a recursive learning control law given by This learning law is similar in form to the one in Tso and Ma (1993), derived for a robot manipulator.In deriving the learning control law in Eq. ( 9) some assumptions were made.Therefore, returning to the nonlinear, time-varying system in Eq. ( 2) the following more general learning update law is proposed where Including γ allows the influence of a bias term, see Heinzinger et al. (1992); Arimoto (1990).This may prevent the input from wandering too much initially.In addition, γ may be allowed to vary with the iteration to further improve performance, but in this presentation γ is fixed.
For clarification of the remaining discussion, function parameters will be shown in subscript notation with the dependence on time implied unless otherwise stated.In particular and k gx , k gt , k f , k B , and k g are Lipschitz constants for and g(•, •) respectively.Now the main result of this section can be stated.
(2) satisfy assumptions (A1)-(A5) and use the update law Eq.(10).Given an attainable y d (•), if )), then as k → ∞ the error between u k and u d is bounded.In addition, the state and output asymptotic errors are bounded.These bounds depend continuously on the bound on the initial state error, bound on the state disturbance, and γ.As b x0 , b ω and γ tend to zero, these bounds also tend to zero.
Remark: If ∆t is chosen sufficiently small, the condition in Theorem 1 is equivalent to: Proof.From the system equation ( 2) and the update law in Eq. ( 10), the error for the iterate k + 1 can be written as Using that Eq. ( 11) may be written as follows by inserting Eq. ( 12) Recognizing that for the discrete version update scheme u k remains unchanged between the consecutive sampling instants, (i.e.u k (z) = u k (t) for any z ∈ [t, t + ∆t]), therefore, the following Eq.( 14) is equivalent to Eq. ( 13) Taking norms, and using the bounds yields Let b L and b gx be the norm bounds for L(•, •) and g x (•, •), respectively and define Now, using the Lipschitz conditions yields Now writing the integral expression for x(t), obtained from equation ( 2), with the quantities in the integral being functions of τ , and taking norms we obtain where b B is the norm bound on B(•, •), and b ud , k 3 are defined as Now, the problem is to gain an explicit bound on the right hand side of Eq. ( 19).For this purpose, using the Bellman-Gronwall Lemma, Eq. ( 19) may be rewritten as and from Eq. ( 21) Combining the Eq. ( 21), ( 22) and Eq. ( 18) yields Multiplying Eq. ( 24) by e −λt , defining The following norm ( 26) is used to simplify the expression of the result.
Definition 1.The λ-norm for a function h : Remark: From this definition it is seen that h λ ≤ h ∞ ≤ e λt h λ for λ > 0 (where h ∞ = sup t∈[0, T ] h(t) ), implying that these two norms are equivalent.
Using the λ-norm, and noticing that the integrals are strictly increasing, the inequality equation ( 25) can be rewritten as Eq. ( 27) reduces to Where ε combines the norms bounds of the initial state errors, state disturbances, and bias contribution.Since ρ < 1, it is possible to find a λ > k which makes ρ < 1. Proof.
Iterating Eq. ( 30) we obtain By using Lemma 1, it is seen that u k converges to the neighbourhood of u d of radius (1/(1 − ρ)) ε with respect to the λ-norm.Thus lim sup Using Eq. ( 21), and similar manipulations, the bound for the state error may be obtained as The result for y k is obtained by using the fact that g is Lipschitz in x.Thus, with x d − x k λ being bounded as above, Eq. ( 28) clearly illustrates the influence of the initial state error, state disturbance, and bias term in degrading the bound on the asymptotic errors.It is seen that this bound on the degradation is continuous in these factors.Furthermore, in the absence of these terms ε = 0, and the state converges to the desired trajectories.
The following lemma gives an extension of the learning update law.
Lemma 2. If the learning law in Eq. ( 10) is replaced by with K(•, •) bounded, then Theorem 1 still holds.
Proof.The proof proceeds as in the proof of Theorem 18) is modified by adding b k k g .

Application to Hydraulic Actuators
In this section the learning algorithm is applied to the dynamics of a hydraulic actuator.The plant considered is limited to the class of valve controlled hydraulic cylinder plants, as shown in Fig. 3.A servo valve controls the position of the hydraulic cylinder.The load is represented by a variable massspring-damper combination.The system shown in Fig. 3 is characterized by the highly non-linear nature of the servo valve pressure-flow curves and friction effects, a very low damping ratio, and dynamics that strongly depends on the operating point and the physical parameters describing the system.If the non-linear equations describing the system are linearized around an operating point (x V 0 , x P 0 , P L0 ) the transfer function relating the spool position to the piston position, may be written as where The coefficients in the transfer function ( 34) indicate the relation between the hydraulic natural frequency, damping, loop gain and the parameters defining the plant.The approximate dynamics of the overall system consisting of the servo valve, cylinder, and load system, is obtained as where In general, the operating frequencies of the electrohydraulic servo actuator are much lower than the natural frequency of the servo valve, so that the dynamics of the servo valve can be neglected in the further analysis.From Eq. ( 34) and Eq. ( 35) it is then seen that a pure integrator and two sets of complex conjugate poles dominate the dynamics.That means we consider spring type loads with high stiffness and the transfer function from the input voltage to the load position can be written as ; Γ P =k G γ P (36) The most important characteristics of the model analysis are summarized below.The system dynamics, may for each operating point, be sufficiently described by Eq. ( 36).The open loop gain is a non-linear function of the accelerated inertia load, friction and external force disturbances.The parameter variations in the valve-cylinder transfer function causes large variations in the damping and natural frequency, as the operating point is changed.From the above, being fundamental to the hydraulic servo design, it may be concluded that if the loop gain in the control design is chosen with care, the dynamic model used for controller design may be reduced to X P (s) The model used for control design is the one in Eq. (37), rewritten in the following form: where Γ(t) =1/Γ P (P S , P L ).The term ν(t) is introduced as a modelling error, completely unknown but upper bounded.
In Section one a class of non-linear, time-varying systems were considered.As mentioned before, this is significant because the result may be applied to a plant and feedback configuration as shown in Fig. 2. If the feedback controller is robust, then the system should have reasonable performance for every trial and will converge to the desired trajectory.If the control law is chosen as u where Γ represents the estimate of Γ and u L represents the learning term, then to formulate a system as Eq.
(2) we substitute the above control law into Eq.( 38), and define the new system as or written in the more general form ẋ The update law examined is For the system in Eq. ( 39) and Eq. ( 40) assumption (A1) is clearly satisfied.Assuming (A2), (A4), and (A5), also (A3) is satisfied while the functions involved then are bounded.Theorem 1 implies that given a desired trajectory the input will converge, even in the presence of disturbances, to a neighbourhood of the desired input trajectory providing that 1 − L 1 ∆t ´t+∆t t g x Bdτ ≤ ρ < 1.Assuming ∆t small, we see that the condition becomes 1 − L • Γ −1 ≤ ρ < 1 which gives a condition on the accuracy of the dynamical model of Γ that is necessary.

Simulation Results
In this section a simulation study is performed to investigate the performance of the learning control schemes.A hydraulically driven two-link robot is used as test facility.A sketch of this robot is shown in Fig. 4. The results of these simulations are seen in Figs.5-8.
Looking at the tracking error plots, Fig. ( 5) and ( 7), the error shown in the first trial, i.e. the first 3 seconds, is without the influence of the learning term.After the first trial the learned feedforward signal is added, cf.Fig. ( 6) and ( 8), and as may be seen the rate of convergence for the learning controller is very fast.At the end of the second trial the tracking error is significantly decreased.The rate of convergence Theoretically, according to the results of Theorem 1, the convergence rate could be increased by simply reducing the sampling interval ∆t.However, the computation delay associated with a particular hardware tends to violate the theoretical basis for deriving the learning control algorithm, and will hence militate against the use of too small a sampling interval in practice.Most learning controllers that decrease the magnitude of the error at the beginning of the learning process eventually results in error accumulation, so in practice it is desirable to stop the process in a finite time, with the error being as small as possible at this time.For the robust learning controller (RDLC), the bias term may be helpful, and varying the update operator as the iterations progress may further improve performance.The bias term, as discussed in Heinzinger et al. (1992), is initially useful to keep the input from wandering excessively, but with time it might be advantageous to decrease its influence be decreasing γ.Once the input has converged fairly well, decreasing the learning gain (the size of L) to cause the input to average out random disturbances, may improve the accuracy of the input.

Conclusions
The learning update law presented in this paper implies that; as the iteration number approaches infinity, the trajectory errors are less than certain bounds, provided certain conditions are met.One major advantage of the presented learning algorithm is the fast convergence, which means that the learning process can be stopped or decreased before error accumulation makes the system unstable.
Learning control itself cannot be used to stabilize a system or to change its performance for a general trajectory.Therefore, in applications it is desirable to use a robust feedback controller to improve the system performance (the motivation for considering time-varying systems).Learning control iteratively updates a feedforward term to provide a finer and finer "open loop" performance along a specific trajectory, thus it is not intended to make up for a poor feedback controller design.

Figure 3 :
Figure 3: Schematic diagram of the electro-hydraulic plant considered.

Figure 4 :
Figure 4: Sketch of the hydraulic driven two link robot, which is used as test facility.