Noise Handling Capabilities of Multivariate Calibration Methods

The noise handling capabilities of principal component regression (PCR) and partial least squares regression (PLSR) are somewhat disputed issues, especially regarding regressor noise. In an attempt to indicate an answer to the question, this article presents results from Monte Carlo simulations assuming a multivariate mixing problem with spectroscopic data. Comparisons with the best linear unbiased estimator (BLUE) based on Kalman filtering theory are included. The simulations indicate that both PCR and PLSR perform comparatively well even at a considerable regressor noise level. The results are also discussed in relation to estimation of pure spectra for the mixing constituents, i.e. to identification of the data generating system. In this respect solutions to well-posed least squares problems serve as references.


Introduction
The noise handling capabilities of principal component regression (PCR) and partial least squares regression (PLSR) are somewhat disputed issues, especially regarding regressor noise (X-noise).In an attempt to indicate an answer to the question, this article presents results from Monte Carlo simulations assuming a typical multivariate calibration problem, where several constituents with unknown spectroscopic properties are mixed.
The performances of PCR and PLSR are certainly noise dependent, but to which degree?A more specific question is how well these methods handle noise of different levels, as compared with theoretically best possible prediction results, which in the simulations are found by use of the best linear unbiased estimator (BLUE) based on Kalman filtering theory.The results are also discussed in relation to estimation of pure constituent spectra, i.e. to identification of the data generating system.In this respect solutions to well-posed least squares (LS) problems are used for comparisons.
The theoretical background based on latent variables (LV) modeling is summarized in Section 2, with references to more detailed treatments of PCR and PLSR.The simulated mixing problem and the simulation results are presented in Section 3, and conclusions are given in Section 4. Some details concerning PLSR modeling and constituent profile estimation are collected in Appendix A and B.

Model assumptions and problem statement
Assume centered data generated according to the LV model where zk is a random vector of latent variables, i.e. the expectation Eziz,T = 0 for all j k, and where yk 1 is a vector of response variables, while x k E ^P 1 is a vector of regressor variables.C 1 E 11m " A and C 2 e fåP " A are time-invariant matrices, while fk and ek are independent and random noise vectors of appropriate dimensions.
Also assume m A and independent components of z and y, i.e. diagonal expectations Ezk z,T and Eyk y,T.Without loss of generality we may then assume an LV representation such that i.e. we assume that each response variable is a latent variable plus some random noise.Collection of data from N observations in matrices Y and X E 11 N " P thus gives where it is a part of the assumptions that A < < N <p.The OSC notation is borrowed from recent articles on orthogonal signal correction (Wold et al. (1998), Fearn (2000), Trygg and Wold (2001), Westerhuis et al. (2001), Trygg (2001)).The matrix ZoscCosc thus contains the structured but Y-orthogonal information in X.With the assumptions given the columns of C 2 may typically be scaled versions of pure constituent spectral profiles.
The assumption A < <N < p makes it natural to use PCR or PLSR for calibration purposes, and we will in the following focus on the PCR and PLSR noise sensitivity in relation to two problems: • The multivariate calibration problem of finding an estimator B for prediction of new responses from new regressor observations according to • The problem of estimating the pure constituent profiles in C 2 , i.e. the problem of identifying the data generating system.

B from ordinary least squares regression
The ordinary LS solution for B obtained from the data is (e.g.Johnson and Wichern, 1998) BLS= (X TX) 1X TY. (7) Under the present assumptions with a large number p of x variables relative to the number N of observations, the underlying LS problem will be ill-posed.In this case there is a need for regularization, which can be based on LV modeling and PCR or PLSR as summarized below. (4)

C2 from ordinary least squares regression
The columns Cy of C 2 that are directly related to the responses Y can be found from (5) using LS regression according to Under the assumptions given the underlying LS problem is well-posed.We will later discuss this result in relation to the PCR and PLSR methods.

Multivariate calibration model
Multivariate calibration using PCR or PLSR assumes a model Y=TQT+F X=TLT+E, resulting from (4,5) through an unknown similarity transformation.Here, T is the matrix of scores, while L is the matrix of loadings.For the two problems under study we may note the following: • PCR and PLSR use different factorizations of ZCi and ZCZ, as summarized below.• The pure constituent profiles in C 2 = [Cy Cosc] may be confounded and scaled in L.

The Helland predictor
The PCR and PLSR regularizations are based on the latent variables model (9, 10) above.The LS solution of (10) is and from ( 9) and ( 11) we thus find the LS predictor related to the latent variables (12) which after some simplifications results in fitted experimental responses according to (9) The regularized LV predictor B to be used in (6) thus becomes This predictor was first presented by Helland (1988), although there not explicitly based on an LV model.
The problem now is to find L, or more realistically good estimates L. A simple choice is L = IP , which brings us back to the LS solution (7).Other choices give the PCR and PLSR solutions.

The PCR predictor
In PCR the loading matrix is L = P, where P is found from a principal component analysis (PCA) of X (e.g.Johnson and Wichern, 1998).We may also find T = U1S1 and P = V 1 from the singular value decomposition (SVD) The LV represented by the score matrix T are thus based on X information only.

The PLSR predictor
Regarding PLSR we will discuss two algorithms: • The original method (Wold et al., 1982) with an orthogonal score matrix I.
• The alternative method (e.g.Martens and Naes, 1989) with a non-orthogonal score matrix TM .
Some of the latent variables represented in the PCR score matrix T may often be very weakly correlated with the response variable in y.The PLSR solution to this is to use both X and y information in order to find improved versions of T and L. In the Wold and Martens algorithms this is done by step-wise computations (e.g.Martens and Naes, 1989), but a one-step procedure is also available (Di Ruscio, 2000).The original orthogonal PLSR algorithm of Wold is based on the factorization where PW is a special non-orthogonal loading matrix.Both T om, and the loading weight matrix W are orthogonal, and W TW = IA (see Appendix A for a detailed discussion).The non-orthogonal PLSR algorithm of Martens is based on the factorization where TM is non-orthogonal, while * is the same as in the Wold algorithm.Since KW is a low dimensional and invertible matrix, application of ( 14) with L = WW T Pw and L = W give the same result, eLSR See e.g.Martens and Naes (1989) for detailed descriptions of the algorithms, and Ergon and Esbensen (2001) for a new didactic version.

The optimal predictor
In order to obtain a basis for comparisons we need an optimal predictor formulation.The optimal predictor may be found by use of general Kalman filtering theory (e.g.Grewal and Andrews, 1993).We will, however, derive the optimal solution directly by introduction of the optimal state estimate related to the LV model (1,2), ( 17) where K is chosen such that the expectation and Eek e,T = Re we find (e.g.Gelb, 1974) i.e. the optimal predictor is Optimality here means that (24) gives the best linear unbiased estimate (BLUE), and the best possible estimate whatsoever assuming Gaussian noise distribution (e.g.Grewal and Andrews, 1993).This predictor will be used as a source of reference in the simulations in Section 3.

Pure spectra estimation from PLSR and PCR results
Pure Spectra estimates may be found from the well-conditioned L,S solution (8), and there is thus no need for use of the PCR and PLSR results for this purpose.It is, however, a central part of the PLSR algorithms that the first loading weight vector with a single response variable yj is found as (e.g.Martens and Naes, 1989) T X yj Wjl / T TTXXyj From (8) thus follows that the column of C 2 corresponding to yi is estimated as With the representation used in (4), i.e.Y = Zy + F, the first loading weight vector thus gives a scaled LS estimate of C 2j .In relation to the noise handling capabilities it is reassuring to know that a single response PLSR (PLS1) under the given assumptions results in a pure spectrum estimate that is identical with the result from a well-posed LS problem (see also simulation results in Section 3).For PCR the situation is more involved (see Appendix B). ( This intermediate result, derived from general Kalman filtering theory, was first presented by Berntsen (1988).
The resulting optimal response estimate is Using CY from (8) we may compute X -YCY, and PCA/SVD of the result gives

S
Choosing 43SC = Uosc we will thus find the confounded and scaled profiles of the )(orthogonal interferants in Note that the scaled and sign indeterminate profile of a single unknown interferant will be found directly from (29).For correct scaling we would need additional information.

Consequences of data centering and standardization
Centering of the data, i.e. using X4-X -X and Y -Y -V where X and V are column mean values, has no effect on the C2 estimate according to ( 8) and ( 29).However, standardization of the columns of X and Y to unit variance does affect C2, and must thus be properly accounted for.

Monte Carlo simulation
The practical case behind the following simulation example could be a spectroscopic measurement of a solution with three different chemical constituents.A typical simulation result is shown in Fig. 1.Note the overlapping peaks and considerable Xnoise.The simulations are based on assumed discrete frequency spectra in the range 0 <f s 500 frequency units (f.u.), , with resonance frequencies f1 = 200 f.u., f2 = 250 f.u., f3 = 300 f.u. and relative dampings Ci = = S3 = 0.05, and with C 2(f) E R ix 3 .It is also assumed that the variations in the concentration of Constituent 1, Constituent 2 and Constituent 3, denoted Zi,k, Z2,k and z3,k , are independent and randomly generated zero mean numbers with normal distributions and variances rzz = Ez ,k = Ezik = Ez3 ,k = 1.The noise terms ek(f) are independent and randomly generated zero mean numbers with normal distribution and equal variances ree = Ed (f).Several ree values were used in the simulations.

Signal-noise-ratio for the X-data
The total signal-noise-ratio (total SNR) for the X-data used in the simulations can be defined as the ratio between the total variances in the centered matrices ZCz and E in (5).This gives the expectation (e.g.Johnson and Wichern, 1998) trace(E T E) pr" (31) The expected total SNR for the different values of r ee used in the simulations are given in Table 1.Note the very low total SNR for ree = 100.Also note, however, that the signal-to-noise ratio in the central part of the spectrum is better than that.The highest expected column SNR value is found at the frequency f= 250 as (in Matlab notation) trace (C 2(250, :) CZ (250, :)) ree and is also included in Table 1.

Case with a single response variable
It was initially assumed a single response variable In a practical case this would mean that the primary response of interest would be the concentration of one of the three chemical constituents, while the others would be treated as interferants.
The total model with centered data is then where k = 1,2, ... , N indicates sequences of y and x observations corresponding to different concentrations of the three constituents, and where C 2 e R5" x 3 Ezk zk =I 3i rff =Efk = 0.0001 and Re =Eek ek =reel," Prediction ability.Based on a modeling set with N= 100 and a validation set with N va1 = 1000 centered observations, M = 100 Monte Carlo runs gave the mean root mean square errors of prediction (RMSEP) as shown in Fig. 2. Mean RMSEP values based on the theoretical Kalman predictor (24) are also plotted.
PLSR results at different noise levels and the corresponding results for PCR and the theoretical Kalman predictor ( 24) are shown in Table 2.
(34)   The corresponding PCR and PLSR results at different noise levels ree and with different numbers N of modeling observations are shown in Fig. 3.Not surprisingly, the predictors deteriorate for small values of N, especially at high noise levels.Note that the difference between PCR and PLSR is more pronounced at high noise levels, and that for large values of N the predictions seems to approach the theoretical Kalman predictions.
From Table 2 and Fig. 3 it may be concluded that both PCR and PLSR in this case handles X-noise well, as compared with the theoretical Kalman predictor, especially at noise levels up to ree = 10 (total SNR = 2.26), where for both PLSR and PCR the relative RMSEP increase due to noise is 12% for N = 100.For N= 400 the relative RMSEP increase due to noise is 9 to 10% at r"= 100 (total SNR = 0.23).See also Fig. 1 for an illustration of the noise levels.Spectra estimation.An LS estimation according to (8) resulted in a typical case with A = 3, N = 200 and r"=-10 in the estimated spectral profile for Constituent 2 shown in Fig. 4a, while the profile estimates according to (29) for the unknown constituents 1 and 3 are shown in Fig. 4b and 4c (sign indeterminate and assuming the same scaling factor (y Ty) -1 as for Constituent 2).As can be seen, the known constituent profile is estimated fairly well, while the profiles of the two unknown constituents are confounded.

Case with two response variables
The output model (33) was in this case replaced by Cyl'kJ_rl components, gave in a typical case with N = 200 and r"= 10 the individually estimated profiles shown in Fig. 5a and b, while the profile for the unknown Constituent 3 is shown in Fig. 5c (assuming the same scaling factor (y Ty) -1 as for Constituent 1).In this case all constituent profiles are estimated fairly well, including the unknown interferant profile.

Conclusions
The noise handling capabilities of PCR and PLSR have been tested by simulations of a typical multivariate mixing problem, using spectra with p = 500 discrete frequen-dies and different X-noise levels.Comparisons with optimal Kalman predictors show that both PCR and PLSR perform well even at a considerable noise level (ca.12% relative increase in RMSEP for N= 100 observations at a total signal-to-noise ratio SNR = 2.26%, and ca.10% relative increase in RMSEP for N = 400 observations at a total SNR = 0.23).Prediction errors due to X-noise as functions of total SNR and N are presented in Fig. 3. Corresponding tests on constituent profile LS + PCA/SVD estimation show similar good noise handling capabilities.

Figure 1 .
Figure 1.Mean spectrum and standard deviations (Fig. a -dashed lines) plus a typical realization of a noise free original spectrum (Fig. a -solid line), and corresponding centered and noise corrupted spectra (Fig. b) of a mixture of three chemical constituents.The X-noise covariances are here rue = 10 (Fig. b -solid line) and ree = 100 (Fig. b -dotted line) (see relation to signal-noise-ration below).The centred noise free spectrum is shown by dashed line in Fig. b.

Figure 2 .
Figure 2. Mean validation RMSEP values for different numbers of PLSR components, based on M= 100 Monte Carlo runs using N= 100 observations in the modeling set.Two different X-noise levels, r"= 10 and r"-= 100 are used.The mean validation RMSEP values based on the theoretical Kalman predictor (24) for A = 3 components are included.

Figure 3 .
Figure 3. Mean validation PCR and PLSR results from M = 100 Monte Carlo runs using r"= 10, 32 and 100 (expected total SNR = 2.26, 0.71 and 0.23), A = 3 and different numbers N of modeling observations.The Kalman predictor results are shown by solid lines.

Figure 4 .Figure 5 .
Figure 4.Estimated spectral profiles for a single known constituent (Fig. a) and for two unknown constituents (Fig. b and c), using A= 3 components, noise variances r"= 10 and N= 200 observations in the modeling set.The known reference profiles are shown by dashed lines.

Table 1 .
Expected total and maximum column SNR for different values of X-noise variance ree.

Table 2 .
Mean validation PCR, PLSR and Kalman predictor results from M= 100 Monte Carlo runs using N= 100 observations in the modeling set, A = 3 components and different values of reC.