PLS score-loading plots for process understanding and monitoring

Principal component regression (PCR) based on principal component analysis (PCA) and partial least squares regression (PLSR) are well known projection methods for analysis of multivariate data. They result in scores and loadings that may be visualized in a scoreloading plot (biplot) and used for process monitoring. The difficulty with this is that often more than two principal or PLS components have to be used, resulting in a need to monitor more than one such plot. However, it has recently been shown that for a scalar response variable all PLSR/PCR models can be compressed into equivalent PLSR models with two components only. After a summary of the underlying theory, the present paper shows how such two-component PLS (2PLS) models can be utilized in informative score-loading biplots for process understanding and monitoring. The possible utilization of known projection model monitoring statistics and variable contribution plots is also discussed, and a new method for visualization of contributions directly in the biplot is presented. An industrial data example is included.


Introduction
Partial least squares regression (PLSR) and principal component regression (PCR) are well known methods for prediction of e.g. a scalar response variable y from multivariate regressor variables z T according to ŷnew = z T new b [1].In these methods the estimator b is found from modeling data collected in a regressor matrix , where the basic idea in PLSR is to maximize the covariance between X and y, while PCR is based on principal component analysis (PCA) of X [1].The reason for use of PLSR, PCR or some other regularization method is that an ordinary least squares (LS) problem becomes ill-posed due to a large p (many variables) relative to N (few samples in the modeling set) or/and strongly collinear variables in X.An important result of PLSR/PCR is also a compression of X into a few components, as in the PLS factorization where A << p, and the interpretation of these components is a central part of PLSR/PCR modeling.The underlying assumption is here that the X and y data are generated by a few latent variables (plus noise), which has been shown to be true in a vast number of practical applications.In a process monitoring application the response y is typically a product quality that cannot be measured on-line, while z T is made up by easily accessible process measurements [2], and possibly also by manipulated inputs.An important part of the PLSR/PCR modeling is visualization by use of score and loading plots, and the interpretation of these plots for e.g.product quality estimation, and fault detection and diagnosis [3,4].Most multivariate monitoring techniques also use squared prediction error (SPE) plots (also known as Q plots) and Hotelling's T 2 plots that summarize the information, give a statistical basis for the interpretation and preserve the time sequence of the data [3,4].As part of trend and fault diagnoses, specific process samples may also be analyzed by use of variable contribution plots [3,4].The overall monitoring problem is, however, increasingly difficult with more that two PLSR/PCR components, as is often necessary also in the scalar response case.For example will the use of four components make it necessary to monitor either a T 2 plot (and lose some details) or three score plots (e.g.t2 , t3 and t4 vs. t1 ).And as shown in an industrial data example in Section 6, some important process understanding that can be gained from a two-component model, is very much lost for models with more than two components.
In a recent work [5] it has been shown that for a scalar response variable, all PLSR/PCR models can be reduced to an equivalent model with two components only (2PLS), while at the same time the estimator b is unchanged.The 2PLS factorization is i.e. all y-relevant variation in X is captured in one score plot ( t2 vs. t1 ) and one loading plot ( e w 2 vs. ŵ1 ).An attractive feature with this factorization is also that t2 is orthogonal to both y and ŷ = X b.Note that the first component in ( 2) is the same as in (1).Also note that the residual Ẽ in (2) generally is different from Ê in (1), because more y-orthogonal variation in X may be captured in Ẽ.
The main focus of the present paper is on the utilization of 2PLS realizations in process monitoring applications based on correspondence between scores and loadings, and use of combined score-loading plots (biplots) [6,7].Such plots will make it possible for a process operator monitoring a single plot to detect a deviation from normal plant operation, to evaluate the importance of this with regard to the response y, and to get some indications on which process variables that may cause the deviation.The standard techniques for use of SPE, T 2 and contribution plots are applicable also for a 2PLS model, although they may to some extent be less necessary.With two components only, a statistical T 2 upper limit may be shown as a confidence ellipse in the score plot, while the contributions from the different variables may be shown directly in the score-loading plot (see Section 5 for theory and Section 6 for an industrial data example).Some background in PCR and PLSR modeling is given in Section 2, followed by a short presentation of the 2PLS algorithm in Section 3. The score-loading correspondence is discussed in Section 4, followed by a summary of multivariate monitoring statistics and variable contribution analyses in Section 5, the industrial data example in Section 6, and conclusions in Section 7.

PCR and PLSR modeling
Multivariate calibration using PCR/PLSR directly or implicitly assumes a latent variables (LV) model which assuming L T L = I and using the LS solutions QT = ¡ T T T ¢ −1 T T y and T = XL results in ŷ = T ³ TT T´− 1 TT y and thus the estimator The PCR and PLSR algorithms use different factorizations of X as summarized below, and thus also different factorizations of y.PCR is based on the PCA factorization or singular value decomposition , and where both TPCA and PPCA are orthogonal, with PPCA also orthonormal.The estimator is determined by (5), with L replaced by PPCA .The number of components to include is based on either cross-validation or independent test set validation [1].In the established terminology the columns of TPCA are called score vectors, where the elements are scores, while the columns of PPCA are called loading vectors, where the elements are loadings.The basic idea in PLSR is that the covariance between X and y should be maximized.The original so-called orthogonalized PLSR algorithm of Wold is based on a factorization with orthogonal score vectors and non-orthogonal loading vectors [1,5].In the present context it is, however, more relevant to refer to the Martens factorization [1] where the loading weight matrix Ŵ = £ ŵ1 ŵ2 • • • ŵA ¤ is orthonormal, while the score matrix T = X Ŵ is non-orthogonal.The common estimator in the Wold and Martens PLSR algorithms is [8] b 3 Compression into two PLS components

Basic insight
The basic insight behind the 2PLS algorithm is illustrated in Fig. 1.The estimator b is found in the space spanned by the loading weight vectors in it is a linear combination of these vectors.It is, however, also found in the plane defined by ŵ1 and a vector e w 2 orthogonal to ŵ1 , which is a linear combination of the vectors ŵ2 , ŵ3 , . . ., ŵA .The matrix is thus the loading weight matrix in a two-component PLS solution (2PLS) giving exactly the same estimator b as the original solution using any number A components [5].What matters in the original PLS model is not the matrix Ŵ as such, but the space spanned by ŵ1 , ŵ2 , • • • ŵA [8], and in the 2PLS model it is the plane spanned by ŵ1 and e w 2 that is essential.
The reason for keeping ŵ1 is discussed in [5].Note that all samples in X (row vectors) in the original PLS model are projected onto the space spanned by ŵ2 , ŵ3 , . . ., ŵA .They may thus be further projected onto the plane spanned by ŵ1 and e w 2 , and form a single score plot containing all y-relevant information.If for some reason e.g.ŵ2 is more informative than ŵ1 , a plane through ŵ2 and b may be a better alternative.It will in any case result in a 2PLS model that gives the estimator b, as will in fact all planes through b that are at the same time subspaces of the column space of Ŵ.

Algorithm
As illustrated in Fig. 1, the central problem is to find a second vector that together with ŵ1 spans the shadowed plane that includes b.One way of doing this follows from the estimator formulation (8), in that We summarize the 2PLS compression and its properties in Theorem 1 below (see [5] for proof and Matlab code).The second vector spanning the shadowed plane in Fig. 1 is not necessarily e w 2 , but it is convenient to make use of the orthogonal vectors ŵ1 and e w 2 in the theorem and its proof.
Theorem 1 The original PLSR estimator (8) can be written as where is the new orthonormal loading (weight) matrix.The corresponding factorization of X is and e w 2 is Here while t1 = X ŵ1 is the same as in the factorization (7), and t2 = X e w 2 .Furthermore, t2 is orthogonal to both y and ŷ = X b, i.e. y T t2 = 0 and ŷT t2 = 0.
Note that this theorem is valid for any number A of original PLS components.Also note that the residual Ẽ may be different from the original residual Ê, i.e. some extra y-orthogonal structured variation in X may be captured in Ẽ (see theoretical example in Subsection 3.3 below).
Remark 1 Since Ŵ in the estimator (8) may be replaced by ŴM, where M is any invertible transformation matrix, it follows from Theorem 1 that any plane containing b that is also a subspace of the column space of Ŵ may be used instead of the shadowed plane in Fig. 1.The theorem may thus be given a more general formulation.

Theoretical example
Assume an ordinary non-orthogonalized PLS factorization according to (7) with three components, i.e.
and the corresponding score matrix From this follows that it is possible to have scores tnew,2 6 = 0 and tnew,3 6 = 0, and at the same time have ŷ = 0.
After the compression into two components according to Theorem 1, the factorization is and the corresponding score matrix , while the predicted response with z T new ŵ1 = 0 as above is If now ŷ = 0 we must also have tnew,2 = 0, i.e. we will not see any deviation from the origin in the t2 vs. t1 score plot.This also illustrates how some y-orthogonal variation in X may be captured in Ẽ. Note, however, that it is generally possible to have deviations from the origin in the t2 vs. t1 score plot also when ŷ = 0.The general expression for ŷ is which may give ŷ = 0 also when tnew,1 6 = 0 and tnew,2 6 = 0.In such cases, however, it will be possible to see that ŷ = 0 directly in the score plot, as demonstrated in the industrial data example in Section 6.

Score-loading correspondence
Score-loading correspondence for process monitoring is discussed in [7], and a short summary only is given here.A convenient starting point is then the PCA/SVD (6), where the number of components to include is both data and application dependent.The essential fact in the present context is that both TPCA and PPCA are orthogonal, with PPCA also orthonormal, resulting in the LS estimate Assuming the same covariance matrix as for the row vectors in the modeling data matrix X, a new row vector z T new thus results in This means that the scores ¤ T of the variable j in X that causes the score to deviate from zero, with x new,j as a scaling coefficient.When several variables in combination causes τ PCA new , the scores correspond to weighted sums of loadings.
For the 2PLS factorization (11) it follows that T = X f W, where f W just as P in PCA is orthonormal, and the score-loading correspondence requirements are thus fulfilled.When one variable only in z T new is different from zero, the new score in the single 2PLS score plot will thus be found in exactly the same direction as the corresponding loading in the single 2PLS loading plot (see industrial data example in Section 6).Note that b = f W £ â1 e a 2 ¤ T just as can be plotted in the loading plot.From this follows that b and ŷ (based on the modeling data) in a score-loading biplot will have identical positions.Finally note that the PLS factorization according to Wold, with orthogonal score vectors, does not fulfill the score-loading correspondence requirements (although the deviation may in some cases be small) [7].
5 Monitoring statistics and contribution plots for projection methods

Squared prediction error
Process and product monitoring based on projection methods like PLS and 2PLS, rely on an incontrol model developed by use of a reference or modeling data set, which defines an A-dimensional projection space.It is therefore first of all necessary to check if the perpendicular distance of a new sample z T new from this projection space is within acceptable limits.As a measure of this distance it is common to use the squared prediction error The SPE will detect the occurrence of a totally new type of process development.This is also applicable when using the 2PLS data compression, the only special thing being that the projection space is the plane illustrated in Fig. 1.If statistical limits for such a detection of special events are of interest, xnew,j in (21) may be computed by use of a PCA model.Upper control limits can then be computed using approximate results for the distribution of quadratic forms, often referred to as Q-statistic [2,3,4].

Hotelling's T 2 statistic
T 2 plots for many PLS components Assuming that the SPE for a new sample z T new is acceptable, a meaningful comparison with the reference data set is possible.Further assuming normal distributions, this may be done using the Hotelling's T 2 statistic [9] based on the estimated score covariance matrix, which with centered data and use of the PLS factorization ( 7) is With centered data the T 2 statistic for a new observation z T new is where The upper control limit for T 2 based on N past multivariate data and A PLS components is where F α (A, N − A) is the upper 100α % critical point of the F distribution with (A, N − A) degrees of freedom [3,4].For a sequence of new samples, the T 2 value may thus be plotted and compared with T 2 UCL , and a fault alarm signal given according to some more or less conservative rule.
Confidence ellipse for two PLS components Since the 2PLS algorithm results in two components only, the T 2 UCL limit may be replaced by a confidence ellipse in a score plot, based on the eigenvalues (direction of axes) and eigenvectors (length of axes) of S and using the same F distribution limit as in (24) [9].Such a confidence ellipse may also be included in the score-loading biplot discussed in Section 4, as shown in the industrial data example in Section 6.In addition to the information of a violated upper limit, such a biplot also gives information on which variables or group of variables that are involved, as discussed below.

Contribution plots
SPE contribution plots When the SPE or T 2 plots indicate that the process is operating outside the normal region, it is of interest to see which variables that contribute the most to this.For a specific sample of interest this can be done by plotting variable contribution plots.The SPE contribution plot is simply a plot (often a bar plot) showing how the different variables contribute to the sum (21) for a specific observation.

Contribution plots on the scores
The contribution plots on a score tnew,a could in the same way show the variable contributions to tnew,a = p X j=1 x new,j ŵj,a . (25) Since the different scores have different influence on T 2 according to (23), and in order to minimize ambiguity, it is common practice [3,4] to use the following procedure: where s 2 a is the score vector variance based on the modeling data.
2. Set cont new j,a = 0 if it is negative, i.e. if the sign of the contribution x new,j ŵj,a is opposite to the sign of the score tnew,a .
Alternative 2PLS contribution plot on the scores With one score plot only, an alternative approach is to plot the weighted loadings in directly in the score-loading biplot.The original loadings £ ŵj,1 e w j,2 ¤ will after multiplication by x new,j be moved radially, indicating the strength by which they attract (positive x new,j ) or repel (negative x new,j ) the scores.These weighted loadings will thus indicate how the contributions from different variables vary with time, as illustrated in the industrial data example in Section 6.

Industrial data example
The following example uses multivariate regression data from a mineral processing plant [10] (the 'cleaner' data, originally published in [11]).The problem considered here is to predict a given quality y new,4 from twelve known process variables . For the purpose of finding an initial PLS factorization, samples 1 to 120 in the data sets xce and yce [10] were used for modeling, while samples 151 to 240 in the same data sets were used for validation.The data were centered and scaled to unit variance (autoscaled), and the result was a PLSR model with A = 6 components, resulting in a validation root mean square error RM SEP = 0.15 (as compared with RM SEP = 1 for A = 0).The PLS factorization of the autoscaled data matrix X modeling was finally compressed into a 2PLS factorization (11), resulting in the loading matrix f W.

Score-loading plot for process understanding
To illustrate the use of 2PLS for process understanding, new X data were introduced as X test = I 12 , i.e.
The 2PLS loadings in f W, £ ŵj,1 e w j,2 ¤ j=1,2,...,12 , and scores £ ti,1 ti,2 ¤ test i=1,2,...,12 = (z test i ) T f W were then plotted together with the estimator b according to (10) (see Fig. 2).Fig. 2 also shows lines for constant ŷ based on (using (10) with and an axis for ŷ and b perpendicular to those lines.Note that ŷ (based on the modeling data) in the loading plot and b in the score plot have the same position in the biplot.For an operator support application, the entire plot can easily be rotated such that the ŷ axis becomes horizontal pointing to the right.
- The plot in Fig. 2 can be used to gain process understanding.As can be seen the estimator b and thus the predicted response ŷ is strongly correlated with variable 3 and to some extent also with variables 1 and 2, while the other variables have little to do with ŷ (as indicated by the o-marked loadings close to the line for ŷ = 0).This can also be seen by inspection of but Fig. 2 also gives information on y-orthogonal properties (variable 4 gives larger y-orthogonal score movements than variable 5 etc.).Given sufficient process knowledge also the y-orthogonal scores may be informative.For a comparison score plots for the ordinary PLS factorization are shown in Fig. 3, where projections of b are also included.Note that e.g.score no. 5 shows considerable deviations from the origin, although the deviation for this score in Fig. 2 is very small.This illustrates that some y-orthogonal variations in X that in the ordinary PLS factorization are part of X, in the 2PLS factorization are captured in Ẽ.Also note that it cannot be seen in Fig. 3 that the scores no. 4, 5 and 6 are nearly y-orthogonal, i.e. that the estimator b is perpendicular to a plane through these scores.Finally note that the ti,1 scores are the same in both Fig. 2 and Fig. 3, as follows from the 2PLS factorization (11) as compared with the PLS factorization (7). .Score plots for test matrix X test and projected estimator b using ordinary PLS factorization.Note that although the scores 4, 5 and 6 are nearly y-orthogonal, i.e. located in a plane perpendicular to b, this cannot be seen from these projections.

Plots for process monitoring
As part of a process monitoring system, it is necessary to evaluate the SPE value according to (21), based on X = Tf W T according to (11).As a basis for comparison, we may first analyze the modeling data by plotting SPE (Fig. 4, top) and both y and ŷ (Fig. 4, middle).The predictions are quite good also for samples with an SPE peak, i.e. the peaks do not indicate very special events.We then plot SPE for the new validation data (Fig. 4, bottom), and conclude that these values are in no way extreme, i.e. not located further away from the 2PLS projection plane than for the modeling data.We can thus draw meaningful conclusions regarding the new samples from the score-loading plot.After appropriate scaling and definition of the normal operating region based on historical data, the plot in Fig. 2 may also be used for process monitoring purposes.The last few of the new scores at each time instant may then be shown, indicating how the process conditions are developing, while older scores must fade away in order to give room for new information.A confidence ellipse based on the modeling data and corresponding to the T 2 UCL limit (24) may also be included, as shown in Fig. 5.
In order to give as clear a picture as possible the o-marked loadings based on the modeling data are here plotted using a common scaling constant 7, as compared to Fig. 2. The loadings will in any case indicate the direction and relative strength of the variable attractions on the scores.A large deviation from the origin thus not only signals a special plant operation situation, but the direction also indicates which regressor variable or variables that cause the deviation.It is for example rather obvious that observation 209 outside of the normal operating region is very much influenced by an especially high positive value of variable 3. The deviations of samples 191, 201 and 210 may more clearly be caused by several variables.All of this will be clarified by use of a weighted loading plot, as shown in Fig. 6 below.Note that sample 191 has the maximal SPE value in Fig. 4, but that the 2PLS projection is well within the T 2 UCL ellipse.In order to clarify the information contained in the score-loading plot in Fig. 5, the alternative 2PLS contribution plot according to (28) utilizing weighted loadings may be used, as shown in Fig. 6 for four samples.This shows that • the deviation of sample 201 perpendicular to the ŷ axis (i.e.ŷ ≈ 0) is mainly caused by a high positive value of variable 6, and not a high positive value of variable 7 or a high negative value of variable 4, as may be concluded from Fig. 5 • for sample 202 all variables have small values, keeping the score close to the origin • the deviation of sample 209 in the direction of the ŷ axis (i.e. a high positive value of ŷ) is mainly caused by a high positive value of variable 3, although variable 4 also has a high negative value, and some other variables are also involved • the deviation of sample 210 is clearly caused by a high positive value of variable 3 and a high negative value of variable 4. Figure 6 Score-loading plots with scores (×-marked) and weighted loadings (solid vector lines) according to (28).Note that the score at each sample is the exact sum of the weighted loadings (the most influential ones are numbered).
From a single plot with process scores and weighted loadings it will thus be possible • to see a deviation from normal process operation, and the eventual violation of the upper control limit T 2

UCL
• to see whether and to which degree the deviation gives a change in predicted response ŷ • to get some information on which variables are causing the deviation, as well as sign and magnitude.
The corresponding traditional contribution plots according to (27) are shown in Fig. 7.These plots give no new information, at least not when the number of variables is as low as in this example, and it is in fact obscured that variable 4 influences the samples 209 and 210 by having a high negative value.This is caused by both the scaling factor tn e w ,a s 2 a in (26), and that negative contributions are set equal to zero.

Conclusions
Background theory on PLS modeling, compression into two-component PLS (2PLS) realizations, score-loading correspondence, monitoring statistics and contribution plots is presented.The 2PLS data compression makes it possible to construct a single dynamic and informative score-loading biplot, utilizing score-loading correspondence, as illustrated in an example using mineral processing plant data (Fig. 6).
One potential use of this is to gain understanding of how and to which extent different process variables affect a specific response variable.When used in process monitoring the biplot will make it possible to see a deviation from normal process operation and when the T 2 UCL upper control limit is violated, to see whether and to which degree such a deviation gives a change in the predicted response, and to judge which process variables are contributing to the deviation.In the same biplot it is also possible to see the sign and magnitude of the variables that contribute to a given deviation.Another result of the 2PLS data compression is that more of the response-orthogonal variations in the process data are captured in the residuals, and thus a more response-relevant model is obtained.
Further research will investigate fault diagnosis based on fault signatures in the score-loading biplot.It might be that some specific process faults will give scores that result in a specific trace or pattern, such that the underlying problem can be revealed at a glance.

Figure 1 .
Figure 1.Illustration of basic insight behind the 2PLS factorization, assuming A = 3 original components.The PLSR estimator b is found in the space spanned by ŵ1 , ŵ2 and ŵ3 , but also in the shadowed plane spanned by ŵ1 and e w 2 .

Figure 2 .
Figure 2. Score-loading correspondence plot with loadings from loading matrix f W (o-marked) and scores from test matrix X test according to (29) (×-marked), together with the estimator b (marked with arrow).Parallel dashed lines indicate constant predictions ŷ.Note the total score-loading correspondence, and the nearly y-orthogonal deviations for scores no. 4, 5 and 6 (see comparison with ordinary PLS score plots below).

Figure 3
Figure 3. Score plots for test matrix X test and projected estimator b using ordinary PLS factorization.Note that although the scores 4, 5 and 6 are nearly y-orthogonal, i.e. located in a plane perpendicular to b, this cannot be seen from these projections.

Figure 4
Figure 4 Squared prediction error (SPE) for modeling data (top), the corresponding values for ŷ and y (middle), and SPE for the new validation data (bottom).

Figure 5
Figure 5  Score-loading plot with new scores (×-marked) and fixed loadings from the modeling data with scaling constant 7 (o-marked).The ellipse shows T 2 UCL according to (24) for α = 0.01.
1 e t new,2 ¤ T can be plotted in the score plot, while the loadings