Dynamic system multivariate calibration based on multirate sampling data

The statistical principal component regression (PCR) and chemometric partial least squares regression (PLSR) algorithms based on latent variables (LV) modeling are effective tools for handling ill-conditioned regression data. In many process related cases the data form time series, and it may then be possible to improve the prediction/estimation results by utilizing the autocorrelation in the observations. This can be done by use of estimators found from experimental data by use of a combination of statistical/chemometric and system identification methods. In important industrial cases, the response variables are product qualities which also in the experimental data are sampled at a low and possibly irregular rate, while the regressor variables are sampled at a higher rate. After a discussion of the options available, the paper shows how the autocorrelation of the regressor variables in such multirate sampling cases may be utilized by identification of latent variables based output error (LV + OE) estimators. An example using acoustic power spectrum regressor data is finally presented.


Introduction and problem statement
In many important practical cases there is a need to estimate primary system responses y i from known system inputs u and secondary system outputs y2 .In typical industrial cases the y i responses are product quality measurements that cannot be made available on-line, while u are manipulated process inputs and y 2 are various on-line process measurements.Dynamic estimators for this purpose may be identified from experimental u, y i and y2 data (Ergon, 1999b).
The experimental data are often characterized by few observations and a large number of nearly collinear y 2 variables.In the static case the statistical/chemometric methods PCR (Principal Component Regression) or PLSR (Partial Least Squares Regression) based on latent variables (LV) models provide powerful tools for handling such ill-conditioned problems.The experimental u and y 2 data are then collected in an X matrix, while the y i data are collected in a y vector (assuming a single response variable), and the X information is compressed into an estimated latent variables matrix T. In many process related cases the X and y data in ordinary PCR or PLSR form time series, and it may then be possible to improve the prediction/estimation results by utilizing the autocorrelation in the observations.After compression of the X data into T, this can basically be done by identification of FIR (Finite Impulse Response), ARMAX (AutoRegressive Moving Average with eXtra inputs), ARX (AutoRegressive with eXtra inputs) or OE (Output Error) type of models.The estimator may thus be found from experimental data by a combination of multivariate calibration methods used in statistics/chemometrics (e.g.Martens and Naes (1989) and Esbensen (2000)) and system identification methods (e.g.Ljung (1999), Söderstrøm and Stoica (1989) and Van Overshee and De Moor (1996)).
In some cases of practical interest it is possible to obtain experimental X data at a high sampling rate, while the experimental y data are obtained at a lower and possibly also irregular rate.In typical cases the y data are product quality measurements that require chemical or other forms of analyses, such that the results for practical and economical reasons cannot be made available at a high rate.The problem of the present paper is to find an identification method that handles such multivariate and multirate sampling time series data.
After an introductory discussion of the options available, a solution to the problem using an LV + OE model structure is presented.An example using acoustic power spectrum X data demonstrates the feasibility of this method.

Static multivariate calibration
As a background for a comparison of static and dynamic estimators, we start with a short presentation of static multivariate calibration methods based on latent variables models.We then focus on the common situation where the X variables are secondary outputs from a static system, and where no manipulated or otherwise known system inputs are available.Since u = 0, the regressor data is then X = [U Y2] = Y 2 , and the following is based on this assumption.

The static multivariate calibration problem
Assuming a static system with a scalar primary output or response variable yl and multivariate secondary outputs y 2 , the static calibration problem is to find an estimator b from experimental data that may be used to predict a new response y l,o from new observations y 2, o according to

Yi,o = Y2,o b
A classical example is the estimation of protein content in whole wheat kernels based on near infrared (NIR) spectroscopy (Norris, 1993).Here, the protein content is the primary output y i , while the NIR reflectance at a large number of frequencies gives rise to the y 2 variables.A process related example is the estimation of distillation product composition from a number of temperature measurements along the distillation tower (Mejdell and Skogestad, 1989).The fundamental problem in such cases is that the number of y2 variables may be much larger then the number of observations in the experimental data.A comprehensive discussion of the multivariate calibration problem is given in Martens and Nees (1989), while Esbensen (2000) focuses on the interpretational and practical application aspects of latent variables methods.With a large number p of y2 variables, this solution will be very noise and collinearity sensitive, and in practical applications the LS method will give satisfactory results only when p is well below N. A thorough discussion of this problem is given in Belsley (1991). (2)

Latent variables model
In many practical situations, fortunately, the multivariate y 2 variables are highly collinear, and the y2 information may then be compressed into small number a of estimated latent variables T = [i 1 i2 • • • ia ] T. The model underlying such data compression is the latent variables (LV) model where w1 , v 1,k and V 2,k are white noise sequences.After an appropriate similarity transformation the output equations may be written where W is orthonormal, e.g.W TW = I .
With N observations and

Regularized solutions
In PCR and PLSR (the Martens algorithm) the Y2 data matrix is compressed into a score matrix T by use of the factorization where E is a residual matrix.Here, the W weighting matrix is orthonormal, i.e.WTW = I, and the LS solution of ( 7) is thus From ( 5) we now find the ordinary LS estimator related to the 'r variables which results in fitted experimental responses according to (5) and ( 8) The regularized latent variables estimator thus becomes This LV estimator formulation is earlier given by Helland (1988), although not directly based on a LV model.We will return to this expression in connection with PCR and PLSR estimators using W = W PCR = P and W = WPLS.

Principal component regression
In PCR the weighting matrix is * = P, where P is the loading matrix related to the principal components of Y2 .We may find P from the singular value decomposition The latent variables estimates represented by the score matrix T are thus based on only Y2 information.

Partial least squares regression
Some of the latent variables estimates used in a PCR solution may be very weakly correlated with the response variable y i , and the PLSR solution to this is to make use of both Y2 and y i information in order to find an improved version of the W matrix.This is normally done step wise by finding one column W a of WPLS at a time (e.g.Martens and Naes, 1989), although a one-step procedure is recently developed (Di Ruscio, 2000).Note that T according to (8) is the non-orthogonal score matrix in the Martens PLSR algorithm, while the orthogonal score matrix in the Wold algorithm is somewhat differently defined (e.g.Martens and Naes, 1989).
The basic ad hoc idea in the step-wise PLSR algorithms is to find a given column vector Wa by minimization of the sample covariance between the corresponding score vector to = Y2wa according to (8) and the residual ea _ 1 of yi that is not explained by earlier columns w i , W 2 , *a _ i (in the original Wold and Martens algorithms also residuals of Y2 and not Y2 itself is used).A straightforward and simple step-wise PLSR algorithm is developed in Ergon and Esbensen (2001), where the relation between the LV estimator (11) and Kalman filtering theory is also discussed.The algorithm is as follows: 1. Set a= 1 and 80= y1.
This algorithm produces a sequence of predictors b i , 62 etc.The appropriate number of column vectors W i , W2, ... , WA to use must then be decided after validation against an independent data set.The score and loading matrices that are necessary for the data evaluation and interpretation that is an essential part of chemometrics may also be computed in the algorithm, or separately for specific values of a.

Alternatives for dynamic estimation
Dynamic system extensions of the PCR and PLSR algorithms have been discussed by a number of authors, see Ergon and Halstensen (2000) for further references.In relation to the present problem the alternatives available can be summarized as follows: • The FIR model Yk^' may be identified by use of LS regression, and this is possible also when most of the yk values are missing, in which case we use only the y; samples that are available, together with the corresponding past and present uk values.However, a large number of Markov parameters h, must normally be identified, and this requires a correspondingly large number of response observations.This is a serious drawback, especially since it often is a part of the problem that the number of response observations is very limited.In addition the FIR model is biased due to the lack of noise modeling.

• An ARMAX model of the type
where e k is a white noise sequence, may be identified by use of for example a prediction error method (e.g.Soderstrom and Stoica, 1989) or a subspace method (e.g.Van Overshee and De Moor (1996) and Di Ruscio ( 1997)).This is, however, not possible when y is sampled at a lower rate than a, and an ARMAX estimator is in any case non-optimal when also secondary system outputs are used as inputs (Ergon, 1999a).
• ARX models are ARMAX models without noise modeling, e.g. as given in (19) but with c l = c 2 = • • • = 0.In the present case they have the same drawbacks as ARMAX models, and bias due to the lack of noise modeling in addition.
• OE estimators as described below can be identified also when most of the ordinary y observations are missing and are theoretically optimal in the sense that they are unbiased and have minimized variances when secondary system outputs are used as estimator inputs (Ergon, 1999a).

Theoretical OE estimator
Assume the known discrete-time state-space model where Xk is the state vector.Here y 1,k for k = 1, 2, • • • , N are the primary response observations that in ordinary static PCR/PLSR with a single response variable would be collected in the y vector, while the known inputs u k and the secondary outputs y2,k would be collected in the X matrix.The process and measurement noise sequences Wk, Vl,k and 1/2,k are assumed to be independent and white.Note that the LV model ( 3) is a special case of this dynamic model.
(18) (20) Assuming the system matrices known and past and present y 1,k values not available, the optimal yl estimator is obtained by a Kalman filter driven by uk and y2,k .The result is where Here, KT is the Kalman gain found from the algebraic Riccati equation and ( 23) where T is the minimized prediction state estimate covariance (see e.g.Grewal and Andrews, 1993).The estimator equation ( 21) corresponds to the OE model where A(q -1 ) , B l (q -1 ) and B2 (q -1) are polynomials in the unit time delay operator q -1 , while 8, is a non-white stochastic process.
Note that for a first-order system with C 1 = 1 and uk = 0, equation ( 21) can be simplified to }1 1

Block diagram
The block diagram for the estimator ( 21) is shown as a part of Figure 1, where the notation O PCed indicates parameter values that are not necessarily correct (although some useful initial values are needed).The figure also illustrates the prediction error method (PEM) for system identification, adapted to minimize the error in the current estimate Ar id while considering both u and y 2 as input signals.
A parsimonious canonical form with C = [1 0 . .• 0] is assumed, and the output matrix is therefore not adjusted in the minimization procedure.It is further assumed that y, samples are available only at some of the uk and y 2,11 time instances k, resulting in a sequence of low and possibly irregular sampling rate observations yl,j.

Estimation error minimization
As indicated in Figure 1 PLANT is the estimation error, while N1 is the number of yi observations in the experimental data (see e.g.Soderstrom and Stoica (1989) for a general discussion of this prediction error method, and Ergon (1999a), for the case with also y 2 used as input).
Note that when not all y1,k values are available, the standard initial value procedure based on a least squares approach (Ljung, 1995) cannot be used.The initial value problem may therefore be the most difficult part of the problem, although several ad hoc approaches are available (Ergon, 1998(Ergon, , 1999b)).
Minimization of equation ( 27) can be performed only after a specification of the model order, and the normal procedure is then to select a chain of canonical structures with increasing model order, starting with a low-order model (Ljung, 1999).An alternative would be to find the model order by use of a subspace identification method (e.g.Di Ruscio, 1997), which would give an ARMAX type of model.However, this is not a feasible choice in cases with low primary output sampling rate.

Numerical algorithm
The practical identification can be performed by a modified version of the pem function in the System Identification Toolbox for use with Matlab, which uses an iterative Gauss-Newton algorithm (Ljung, 1995).The modifications are necessary in order to handle the multirate sampling case with only some of the y i,k observations available.

Latent variables solutions
With a large number of secondary y 2 variables, the number of parameters to identify will be correspondingly large.However, assuming collinear y 2 variables the Y2 data may be compressed into latent variables by use of principal component analysis (PCA) or PLSR.

Kp= PpCZP(PTC2PpC2P
with Pp determined by the algebraic Riccati equation The identification of this estimator is performed as indicated in Figure 1, only that the y 2,k data is replaced by ik as given by equation ( 29).For the special case of a first-order estimator, ( 26) is then modified into PLSR solution As an alternative to the PCA + OE estimator (30) we may use a PLSR + OE estimator with where WpLS is found by use of low sampling rate y2,k as well as y 1,k data.The first- order estimator (33) is then replaced by

Initial value determination
Identification of the OE estimator ( 16), the PCA + OE estimator (33) or the PLSR + OE estimator (35) requires useful initial parameter values.As already mentioned, the standard procedure based on a least squares solution (ARX model) is not feasible due to the lack of high sampling rate y 1 information, and one has to resort to ad hoc methods (Ergon,1999b).In the present case it is natural to look for a solution using a static PCR or PLSR estimator.(37) 1 -fq-l may be chosen such that the structure of the estimators ( 33) and ( 35) is obtained.
We may use this relation to find useful initial parameter values, and then let the values be free, or we may force the parameter values to comply to the relation during the entire minimization.

PCA + OE estimator
A comparison of ( 33) and ( 37) shows that f= A -AKP P TC 2 , m T = lip and h T = fmT resulting in the static estimator (with q _ 1 = 1) It is natural to compare this with the PCR estimator according to (11) and ( 12) where Y2,10W in the same way as y 1 contains only low sampling rate data.This comparison indicates the initial guesses and thus Here, the sign indeterminacy is due to the sign indeterminacy of the PCA involved.It now only remains to select fo and the sign, and this can be done after a systematic search for good validation results.

PLSR + OE estimator
In the same way as above, a comparison of ( 35) and ( 37) results in the initial values tions of the power spectrum of the sensor signal are collected in the X = Y 2 matrix, and calibrated against physical y = yl primary quantities like multi-component mixture concentrations, density etc., using for example a standard PLSR method (Esbensen et al., 1999).

Experiment
In an experiment on a test rig at Telemark University College, the flow rate of ordinary drinking water was measured by use of an ultrasonic flowmeter (Fuji Portaflow X) and used as the response variable y l , while the acoustic power spectral densities at 1024 frequencies originating from an accelerometer placed on a standard orifice plate were used as y 2 variables.
The flow rate was varied by means of a control valve with a control signal generated as filtered white noise.The resulting y 1,k and y2,k signals were recorded for 230 observations with a sampling interval T2 = 5.3 sec .No known inputs u were recorded.Each power spectrum representing an y 2 observation was formed as the mean value of 50 consecutive power spectra in the sampling interval.This standard procedure in static multivariate calibration based on acoustic data is necessary in order to obtain a reasonable noise level.Only every 5th yi sample was used in the estimator identification procedure, and the number of y1 observations in the modeling set were accordingly N1 = 46 .We thus consider a multirate system with y 2 mean sampling with intervals T2 = 5.3 sec , while yi is sampled with intervals T1 = 26.5 sec .We also consider a difficult case with only N1 = 46 samples of y 1 .The y1 signal in the modeling set is shown in Figure 2.
The estimator model was validated against a separate data set with 170 yl and y2 observations.The modeling and validation data sets were chosen in time intervals where no obvious outliers in the acoustic power spectrum were recorded.
Remark 8.1 The lise of every .5thob e,VLLtiOri ji v^ni high sampling rate data is of course not optimal.However, if the response variable had been for example a concentration of a certain component in a multi-component mixture, and the measurements had to be done through laboratory analyses, the response sampling rate would very likely for practical end economical reasons have been lower than the obtainable acoustic data sampling rate.Also the validation would in such a case have to be done against low sampling rate data.

Multirate identification
In order to test the method presented above, static PLSR and dynamic PCA + OE and PLSR + OE estimators were identified using the multirate modeling data described above.
Both an ordinary PLSR estimator bPLSR = WPLS(W PLS Y 2,low Y 2,low WPLS) ^ 1WPLS Y 2,low3'1 -WPLSbTLSR9 and first-order PCA + OE and PLSR + OE estimators according to (36) with i given by ( 29) and (34) were determined, using only the N1 = 46 response observations considered available.Here, Y2,10w contains only low sampling rate y 2,k data, while the PCA + OE and PLSR + OE dynamic estimators were identified by use of the high sampling rate y 2,k data.The best PCA + OE and PLSR results were obtained when (44) the model structures ( 33) and ( 35) were used only for the initial parameter values, i.e. when the parameters in (36) were left to move freely during the minimization.

Number of parameters
The number of independent parameters in the PLSR estimator is equal to the number of components a chosen (the number of latent variables), while the PCA + OE and PLSR + OE estimators has n + (n + 1)a parameters, where n is the model order (i.e. 5 parameters for a = 2 and n = 1).

Results
The best static PLSR estimator was obtained using the means of the last two Y2 observations, i.e. by computing the mean of 100 power spectra as Y2,k -(Y2,k + Y2,k -1)l2.The best PCA + OE and PLSR + OE estimators were obtained using the original means of 50 power spectra.The identified estimators were validated by use of the independent validation data set and computation of 1 "'2 _ E( i.e. all available validation data was used, including all validation data yl,k observations.The models for different numbers of components found in this way gave the validation RMSEVAL results shown in Figure 3.
As can be seen in Figure 3, the PCA + OE estimator gave an approximate 30% reduction of the validation RMSEVAL value at the optimal number of components a = 5, as compared with the PLSR result at a = 2.However, in a practical application we would most likely be content with 20% reduction, as given by the parsimonious PCA + OE and PLSR + OE estimators using only a = 2 components.Note that these estimators gave very similar results for a 5 4.
The PCA + OE validation response for a = 2 components is shown in Figure 4.Note that the estimated response follows the (in this case) measured response well also between the sampled values.This is also the case for the low sampling rate PLSR estimator, although the dynamic estimator gives a considerably lower validation RMSEVAL value.

Further results and discussion
The fact that we need two PCA or PLSR components to explain the main part of the variance in yi indicates that there is a second and independent source of the power spectrum variations in addition to the varying flow rate.This is most likely a temperature drift, or possibly air bubbles or particles in the water.Note that we use centered  data, with flow rate variations around a fixed mean value, i.e. power spectrum variations caused by such a second source may be present also without flow variations.Assuming first-order dynamics for the two major and independent acoustic sources, the model (20) will in such a case result in where a temperature drift can be modeled as integrated white noise by setting a22 = 1.After the data compression with W =P or W = Wp, S , this results in a theoretical primary response estimate on the form where k11 and k12 are Kalman gains.In the case that the second acoustic source does not result in a time series, i.e. when a 22 = 0, this is simplified to the form given by equation ( 26 The theoretical estimators ( 49) and ( 50) give a background for a discussion of the identified dynamic estimators, although this is somewhat complicated by the fact that the y 2 observations were obtained as mean values of various numbers of consecutive power spectra.The dynamic PLSR + OE estimator obtained with a= 2 components and based on 50 consecutive power spectra was [0.0652 0.1315] + [0.2345 0.2228] q -1 'i l,k Yl,k^k= 1-0.0837q-1 i2k] 10_3, which resulted in 20% reduction of the RMSEVAL value as compared with the static PLSR estimator based on the means of 100 consecutive power spectra (see Figure 3).For this estimator the initial values were chosen according to equations ( 40) and ( 41) with fo = 0.5 (the fp value was in no way critical), but the parameters were then allowed to move freely during the minimization.A comparison with the model in equation ( 49) shows that a q -2 term is missing in the denominator, i.e. we should have identified a second-order estimator.The reason for not doing so is the initial value problem discussed in Section 7.
When the parameter values in the PLSR + OE estimator were forced to comply with the theoretical estimators ( 35) and ( 50 However, this estimator gave no improvement in the RMSEVAL value as compared to the static PLSR estimator, which indicates that the second acoustic source gave time series data, i.e. that a22 0. The dynamic estimators may also be compared to the static estimators.With q -1 = 1 the PLSR + OE estimator (51) gives the corresponding static estimator y l = (0.32711 1 + 0.3867/i2)• 10 -3 , where X = [x 1 x 2] is orthogonal.A PLSR factorization using the Martens algorithm would result in Y2 = 'Nils, where the T matrix is non-orthogonal, and it is therefore more appropriate to use the orthogonal score matrix fw in the Wold algorithm.variations caused by flow variations, while spectrum 2 reflects the influence of the second underlying source of variation.Since T = Y2 *", is nearly orthogonal, these spectra are almost identical to the loading weight spectra based on W PLS = [w 1 w2].

Conclusion
High sampling rate X and low sampling rate y time series data may be used to identify an output error (OE) response estimator based on an underlying Kalman filter.The X variables may be both known system inputs uk and secondary system outputs y 2,k, while the y variables are the primary responses yid.
A high number of collinear X = [U Y 2] variables may be compressed by principal component analysis (PCA) or partial least squares regression (PLSR), resulting in PCA + OE and PLSR + OE methods.
A multirate sampling example with acoustic power spectrum Y2 data and flow meter yi data demonstrates the feasibility of these methods, although the interpretation of the results is complicated by the fact that the Y2 data are obtained as means of several consecutive power spectra.

Figure 1 .
Figure 1.Identification of OE estimator by estimation error minimization.

Figure 2 .
Figure 2. Measured flow rate (dotted) in the modeling set, with assumed low sampling rate measurements (o-marked).

Figure 3 .
Figure 3. Validation RMSEVAL results for static PLSR and dynamic PCA+OE and PLSR+OE estimators based on low sampling rate y i data.

Figure 4 .
Figure 4. Validation response for dynamic PCA + OE estimator based on a= 2 PCA compo- nents and low sampling rate y 1 data.The low sampling rate validation data is o-marked, while the (in this case) known intermediate validation values are shown by dotted line.The estimated values are shown by solid line.

Figure 5 .
Figure 5. Spectra for PLSR estimator with two components.