Closed and Open Loop Subspace System Identiﬁcation of the Kalman Filter

Some methods for consistent closed loop subspace system identiﬁcation presented in the literature are analyzed and compared to a recently published subspace algorithm for both open as well as for closed loop data, the DSR e algorithm. Some new variants of this algorithm are presented and discussed. Simulation experiments are included in order to illustrate if the algorithms are variance eﬃcient or not.


Introduction
A landmark for the development of so called subspace system identification algorithms can be said to be the algorithm for obtaining a minimal state space model realization from a series of known Markov parameters or impulse responses, i.e., as presented by Ho and Kalman (1966).Subspace system identification have been an important research field during the last 2.5 decades.Larimore (1983Larimore ( , 1990) ) presented the Canonical Variate Analysis (CVA) algorithm.The N4SID algorithm presented by Overschee and de Moor (1994) and furthermore, subspace identification of deterministic systems, stochastic systems as well as combined deterministic linear dynamic systems are developed and discussed in detail in Overschee and de Moor (1996) and where also the Robust N4SID was presented.In the standard N4SID an oblique projection is used and this method was biased for colored inputs and an extra orthogonal projection, in order to remove the effect of future inputs on future outputs, was included in the Robust N4SID.See also Katayama (2005) and the references therein for further contributions to the field.These subspace system identification algorithms may in some circumstances lead to biased estimates in case of closed loop input and output data.
A subspace system identification method, based on observed input and output data, which estimates the system order as well as the entire matrices in the Kalman filter including the Kalman filter gain matrix, K, and the square root of the innovations covariance matrix, F , was presented in Di Ruscio (1994,1996).This algorithm is implemented in the DSR function in the D-SR Toolbox for MATLAB.The DSR estimate of the innovation square root covariance matrix F is consistent both for open loop as well as for closed loop data.The DSR method was compared with other algorithms and found to give the best model in comparison to the other methods, based on validation, and on a real world waste and water example in Sotomayor et al. (2003a,b).
The DSR e method presented in Di Ruscio (2008) and used in the thesis by Nilsen (2005) is a simple subspace system identification method for closed and open loop systems.DSR e is a two stage method where the innovations process is identified consistently in the first step.The second step is simply a deterministic system identification problem.
In Qin and Ljung (2003), Weilu et al. (2004) a subspace system identification algorithm denoted PAR-SIM E for closed loop data is presented.Repetitive and successive computations of the innovations as well as the Markov parameters are incorporated in the algorithm in order to overcome the correlation problem in traditional subspace algorithms in case of feedback in the data.An extended version is presented in Qin et al. (2005).
The main contributions of this paper can be itemized as follows: • An extended presentation, further discussions and details of the DSR e algorithm in Di Ruscio (2008) are given in this paper.In particular the second stage of the algorithm, the deterministic identification problem, is discussed in detail and some solution variants are discussed and presented in Sections 6.1, 6.2 and 6.3 of this paper.
• The PARSIM algorithm by Qin et al. (2005) is considered, and in particular the PARSIM-E algorithm in Qin and Ljung (2003) is discussed and presented in detail in Section 5.1.In the PARSIM-E algorithm some Markov parameters are reestimated.A modified version of the PARSIM-E algorithm which does not re-estimate the Markov parameters is presented in Section 5.2.The modified version follows the lines in the PARSIM-S algorithm (Algorithm 2) in Qin et al. (2005).The PARSIM-S method is biased for direct closed loop data, as pointed out in Qin et al. (2005), while the modified algorithm in Section 5.2 is consistent for closed loop subspace identification.
• Due to the lack of software implementation of the PARSIM E algorithm from the authors this algorithm is discussed and implemented along the lines in Section 5.1 and the "past-future" data matrix Eq. presented in Di Ruscio (1997) and implemented as a MATLAB m-file function.The Matlab m-file code of the PARSIM-E algorithm is enclosed in the Appendix.
This implementation of the PARSIM E algorithm is compared with the DSR e function in the D-SR Toolbox for MATLAB, and a new variant of the dsr e method is discussed.This new variant consists of an Ordinary Least Squares (OLS) step for solving a deterministic identification problem for computing the entire Kalman filter model matrices.
• The DSR e algorithm is a subspace system identification algorithm which may be used to estimate the entire Kalman filter model matrices, including the covariance matrix of the innovations process and the system order, directly from known input and output data from linear MIMO systems.Simulation experiments on a linear MIMO closed loop system is given in this paper, and the algorithm is shown to give promising estimation results which are comparable to the corresponding estimates from the Prediction Error Method (PEM).
• Monte Carlo simulation experiments, on two examples (a SISO and a MIMO system), are presented which show that the DSR e algorithm outperforms the PARSIM-E method for closed loop subspace system identification.Interestingly, the DSR e algorithm is on these two examples shown to give as efficient parameter estimates as the Prediction Error Method (PEM).
The contributions in this paper are believed to be of interest for the search for an optimal subspace identification algorithm for open and closed loop systems.The PARSIM-E and DSR methods may be viewed as subspace methods which are based on matrix Eqs.where the states are eliminated from the problem.
Noticing that if the states of a linear dynamic system are known then the problem of finding the model matrices is a linear regression problem, see e.g., Overschee and de Moor (1996).Some methods for closed loop subspace identification with state estimation along the lines of the higher order ARX model approach presented in Ljung and McKelvey (1995) are presented in Jansson (2003Jansson ( , 2005) ) and Chiuso (2007).In this class of algorithms the states are estimated in a preliminary step and then the model matrices are estimated as a linear regression problem.In Ljung and McKelvey (1995) a method denoted, SSNEW, with MATLAB mfile code is presented.In Jansson (2003) a method denoted SSARX was presented, and in Jansson (2005) a three step method denoted, NEW, is presented.Chiuso (2007) presents the methods denoted, PBSID and PB-SID opt, and shows some asymptotic similarities with the SSARX and the PBSID algorithms.It is also shown that the PBSID opt algorithm is a row weighted version of the SSNEW algorithm in Ljung and McKelvey (1995).Furthermore, Chiuso (2007) stated that the PBSID opt algorithm is found to be asymptotically variance efficient on many examples but not in general.
It would be of interest to compare these methods with the DSR e algorithm, in particular for systems with a low signal to noise ratio.The SSNEW m-file in the original paper by Ljung and McKelvey (1995) would be a natural starting point in this work, but this is outside the scope of this paper and is a topic for further research, where the results are believed to be of interests.
The rest of the paper is organized as follows.In Section 2 the problem definition and some preliminary notations used in the paper are given.In Section 3 some preliminaries and matrix notations are defined and used throughout the paper.In Section 4 the subspace identification problem with closed loop data is discussed in connection with the "past-future" data matrix presented in Di Ruscio (1997).In Section 5 the PARSIM-E method, Weilu et al. (2004), Qin et al. (2005), for closed loop system identification is discussed as well as a variant of this method.In Section 6 we discuss the dsr e method with a new variant of implementation in case of a known system order.In Section 7 some examples which illustrate the behavior of the algorithms are presented.Finally some concluding remarks follow in Section 8.

Problem formulation
We will restrict ourselves to linearized or linear state space dynamic models of the form with x 0 as the initial predicted state and where a series of N input and output data vectors u k and y k ∀ k = 0, 1, . . ., N − 1 are known, and where there is possible feedback in the input data.In case of output feedback the feed through matrix is zero, i.e.E = 0. Also for open loop systems the feed through matrix may be zero.We will include a structure parameter g = 0 in case of feedback data or for open loop systems in which E = 0, and g = 1 for open loop systems when E is to be estimated.Furthermore, for the innovations model ( 1) and (2) e k is white with unit covariance matrix, i.e.E(e k e T k ) = I .Note that corresponding to the model ( 1) and ( 2) on innovations form we may, if the system is not pure deterministic, define the more common Kalman filter on innovations form by defining the innovations as ε k = F e k and then K = CF −1 is the Kalman filter gain.Hence, the Kalman filter on innovations form is defined as where the innovations process A, B, C, D, E, F ) and the Kalman filter gain K are of appropriate dimensions.The problem addressed in this paper is to determine these matrices from the known data.Both closed and open loop systems are addressed.

Notations and definitions
Hankel matrices are frequently used in realization theory and subspace system identification.The special structure of a Hankel matrix as well as some matching notations, which are frequently used throughout the paper, are defined in the following.Definition 3.1 (Hankel matrix) Given a vector sequence where nr is the number of rows in s k .Define integer numbers j, i and nc and define the matrix S j|i ∈ R inr×nc as follows , which is defined as a Hankel matrix because of the special structure.A Hankel matrix is symmetric and the elements are constant across the anti-diagonals.The integer numbers j, i and nc have the following interpretations: • j start index or initial time in the sequence used to form S j|i , i.e., s j , is the upper left vector element in the Hankel matrix.
• i is the number of nr-block rows in S j|i .
• nc is the number of columns in the Hankel matrix S j|i .
One should note that this definition of a Hankel matrix S j|i is slightly different to the notation used in Overschee and de Moor (1996) (pp. 34-35) where the subscripts denote the first and last element of the first column in the block Hankel matrix, i.e. using the notation in Overschee and de Moor (1996) a Hankel matrix U 0|i is defined by putting u 0 in the upper left corner and u i in the lower left corner and hence U 0|i would have i + 1 rows.
Examples of such vector processes, s k , to be used in the above Hankel-matrix definition, are the measured process outputs, y k ∈ R m , and possibly known inputs, u k ∈ R r .

Definition 3.2 (Basic matrix definitions)
The extended observability matrix, O i , for the pair (D, A) is defined as where the subscript i denotes the number of block rows.
The reversed extended controllability matrix, C d i , for the pair (A, B) is defined as where the subscript i denotes the number of block columns.A reversed extended controllability matrix, C s i , for the pair (A, C) is defined similar to (7), i.e., i.e., with B substituted with C in (7).The lower block triangular Toeplitz matrix, H d i ∈ R im×(i+g−1)r , for the quadruple matrices (D, A, B, E).
where the subscript i denotes the number of block rows and i + g − 1 is the number of block columns, and where 0 m×r denotes the m × r matrix with zeroes.A lower block triangular Toeplitz matrix H s i ∈ R im×im for the quadruple (D, A, C, F ) is defined as Given two matrices A ∈ R i×k and B ∈ R j×k , the orthogonal projection of the row space of A onto the row space of B is defined as The following property is used Proof of Eq. ( 10) can be found in e.g., Di Ruscio (1997).
We have where Cs J = C s J (A − KD, K) is the reversed extended controllability matrix of the pair (A − KD, K), and is the reversed extended controllability matrix of the pair (A − KD, B − KE), and where C d J and C s J are defined in Eqs. ( 7) and ( 8), respectively.
One should here note that the term O L (A − KD) J X 0|1 → 0 exponentially as J approaches infinity.Also note Eq. ( 43) in Di Ruscio ( 2003) with proof, i.e., which relates the "past" data matrices, U 0|J and Y 0|J to the "future" states Note that Eq. ( 11) is obtained by putting (12) into the following common matrix equation in the theory of subspace system identification, i.e.
Eq. ( 11) is of fundamental importance in connection with subspace system identification algorithms.One problem in case of closed loop subspace identification is that the future inputs, U J|L+g−1 in the regressor matrix are correlated with the future innovations, E J|L .
Proof 4.1 (Proof of Eq. ( 11)) From the innovations model, Eqs. ( 1) and (2), we find Eq.( 13), where definitions for the Hankel matrices Y J|L , U J|L+g−1 and E J|L as well as the matrices O L , H d L and H s L are given in Section 3. From the Kalman filter on innovations form, Eqs.
(3) and (4), we obtain the optimal Kalman filter prediction, ȳk , of the output, y k , as ȳk = Dx k + Eu k , and the Kalman filter state Eq., From this we find Eq.( 12) for the states.
The matrices H d L and H s L with Markov parameters are block lower triangular Toeplitz matrices.This lower triangular structure may be utilized in order to overcome the correlation problem, and recursively computing the innovations sequences, ε J|1 , . .., ε J|L in the innovations matrix E J|L .This was the method for closed loop subspace identification proposed by Qin et al. (2005) and Weilu et al. (2004).This method denoted PARSIM is discussed in the next Section 5 and compared to the DSR e method, Di Ruscio (2008), in an example in Section 7.1.
Another closely related subspace system identification method, i.e. the DSR e method, Di Ruscio (2008), which works for both open as well as for closed loop identification is built as a two stage method where the innovations sequences, ε J|1 , are identified from Eq. ( 11) in the first step.Noticing that putting L = 1 and g = 0 (no direct feed through term in the output equation) then we see that Eq. ( 11) also holds for closed loop systems, i.e., we have Hence, the DSR F matrix estimates in Di Ruscio (1996) is consistent also for closed loop data.The second step is an optimal OLS step and hence solved as a deterministic problem.The DSR e method is further discussed in Section 6.
5 Recursive computation of innovations

The PARSIM-E Algorithm
We will in this section work through the PARSIM-E method presented by Qin and Ljung (2003), and further discussed in Weilu et al. (2004) and in the same instant indicate a drawback with this method (i.e.Markov parameters are recomputed and large matrices pseudo inverted when J large) and show how we may overcome this.This was also noticed in Qin and Ljung (2003).In order to give a simple view and understanding of the PARSIM method we present a few of the first iterations in the algorithm.
Step 1: i = 0 Let L = 1, J → ∞ and g = 0 in (11) and we have From Eq. ( 15) we find estimate of the innovations ε J|1 = F E J|1 as well as the indicated matrix in the regression problem, i.e., D Cd J D Cs J .Hence, an ordinary Least Squares (OLS) problem is solved.This step is identical to the first step in the DSR e method.
Step 2: i = 1 We may now express from Eq. ( 11) From Eq. ( 16) we find estimate of the innovations Step 3: i = 2 We may now express from Eq. ( 11) where and From Eq. ( 17) we find estimate of the innovations ε J+2|1 = F E J+2|1 as well as an estimate of the matrix DAB DB DA 2 Cd J DA 2 Cs J DAC DC .Hence, at this stage the Markov parameters DB and DK are recomputed.Note that we have used the future outputs U J|2 in the regressor matrix to be inverted and that the regressor matrix have increased in size.Note that at iteration i = 2 the Markov parameters DB and DK are known from the previous iteration i = 1.An algorithm which does not recompute DB and DK is presented in the next section.
Step 4: i := 3 We may now express from Eq. ( 11) where and From Eq. ( 20) we find estimate of the innovations ε J+3|1 = F E J+3|1 as well as the indicated matrix with model parameters.
We may now express from Eq. ( 11) where and From Eq. ( 23) we find estimate of the innovations ε J+i|1 = F E J+i|1 .The process given by Eq. ( 23) is performed for i = 0, 1, 2, . . ., L in order to find the extended observability matrix O L+1 and to use the shift invariance technique in order to find A.
This means that at each new iteration (repetition) i one new last row ε J+i−1|1 is included in the regression matrix on the right hand side of Eq. ( 23) and one new innovations sequence ε J+i|1 = F E J+i|1 is identified.This procedure is performed for i = 0, 1, . . .L in order to identify O L+1 and using the shift invariance method for calculating O L and O L A. Here L is a user specified horizon into the future for predicting the number of states.Note also that the regression matrix to be inverted at each step increases in size for each new iteration.
We can see that at each iteration, i.e. in particular for i ≥ 2 unnecessary Markov parameters are estimated.However, it is possible to exploit this by subtracting known terms from the left hand side of Eq. ( 23).Note that at iteration i = 1 the Markov parameters DB and DK are computed and these may be used in the next step i = 2.This modified strategy is discussed in the next Section 5.2.
Software implementation of the PARSIM-E algorithm has not been available so an own Matlab m-file function has been implemented.This function is available upon request.

A modified recursive algorithm
We will in this section present a modified version of the PARSIM-E method presented by Qin and Ljung (2003), and utilize the block lower triangular structure of the Toeplitz matrices H d L and H s L .In the PARSIM-E algorithm some of the Markov parameters are recomputed in the iterations for i = 2, 3, . . ., L even if they are known from the previous iterations.The algorithm will in this section be modified.
A modified version of PARSIM was also commented upon in Qin et al. (2005), i.e., in Algorithm 2 of the paper a sequential PARSIM (PARSIM-S) algorithm is presented.This algorithm utilizes the fact that some Markov parameters in the term H d i|1 U J|i are known at each iteration i ≥ 2. However, the term H s i|1 ε J|i is omitted and the innovations, ε J|1 , . .., ε J+L|1 are not computed in this algorithm, and the PARSIM-S method is biased for direct closed loop data, also commented upon in Remark 2 of that paper.To make the PARSIM-S applicable to closed loop data, they refer to the innovation estimation approach as proposed in Qin and Ljung (2003).
The iterations for i = 0 and i = 1 are the same as in the previous PARSIM-E method in Eqs. ( 15) and ( 16).Hence, at this stage the Markov parameters DB, DK and the innovations sequences ε J|1 = F E J|1 and ε J+1|1 = F E J+1|1 are known.We will in this section modify the PARSIM-E method to exploit this and modify the left hand side regressed variable term at each new iteration i = 2, 3, . . ., L.
Step 3: i = 2 We may now express from Eq. ( 11) From Eq. ( 26) we find estimate of the innovations ε J+2|1 = F E J+2|1 and new Markov parameters in the indicated matrix on the right hand side are computed.
Step 4: i := 3 We may now express from Eq. ( 11) From Eq. ( 27) we find estimate of the innovations ε J+3|1 = F E J+3|1 as well as the indicated matrix with model parameters.As we see the matrix of regression variables on the right hand side is unchanged during the iterations and one matrix inverse (projection) may be computed once.
We may now express from Eq. ( 11) where for i ≥ 2 and From this general case we find estimate of the innovations ε J+i|1 = F E J+i|1 .The process given are performed for i = 2, . . ., L in order to find the extended observability matrix O L+1 and to use the shift invariance technique in order to find A. Furthermore the Markov parameters in matrices H d L|1 and H s L|1 are identified.See next section for further details.

Computing the model parameters
In both the above algorithms the model matrices (A, B, K, D) may be determined as follows.The following matrices are computed, i.e., and the matrices with Markov parameters From this we find the system order, n, as the number of non-zero singular values of Z J|L+1 , i.e. from a Singular Value decomposition (SVD) of the Z J|L+1 matrix.From this SVD we also find the extended observability matrices O L+1 and O L .Furthermore A may be found from the shift invariance principle, i.

Discussion
The two methods investigated in this section do not seem to give very promising results compared to the dsr e method.It seems that the variance of the parameter estimates in general is much larger than the dsr e estimates.
The reason for the high variance in the PARSIM-E method is probably that the algorithm is sensitive to data with low signal to noise ratios, and the reason for this is that "large" matrices are pseudo inverted when J is "large".Recall that in Eq. ( 11) the therm O L (A − KD) J X 0|J → 0 only when J → ∞, because the states X 0|J are correlated with the data matrix with regression variables.In practice the past horizon parameter J is finite and the term is negligible.However, a matrix of larger size than it could have been is inverted and this gives rise to an increased variance, compared to the DSR e method in the next Section 6.It is also important to recognize that only the first innovations sequence, ε J|1 , really is needed, as pointed out in Di Ruscio (2008) and that it seems unnecessary to compute the innovation sequences, ε J+i|1 , ∀ i = 1, . . .L, and then forming the innovations matrix E J|L+1 , i.e., Only the first row, ε J|1 , is actually needed in order to find the model matrices as we will show in the next Section 6.Since unnecessary future innovations rows in E J|L+1 matrix are computed, this probably also gives rise to large variances in estimates from the PARSIM algorithm.
In the PARSIM strategy, Section 5, a matrix with increasing size, ir+Jr+Jm+mi = (J +i)r+(J +i)m = (J + i)(m + r), is pseudo inverted at each new iteration i = 0, 1, 2, . . ., L. The matrix to be inverted increases in size during the iterative process when 0 ≤ i ≤ L. Both the pseudo inverse problem and the iterations give rise to this large variance.Updating techniques for this matrix inverse may be exploited, however.
The modified PARSIM method in Section 5.2 was also implemented with a QR factorization and the recursive algorithm implemented in terms of the R factors only.This strategy seems not to improve the variance considerably and the variances are approximately the same.The reason is probably that J should be "large" for the method to work.It is interesting to note that in this modified algorithm only a matrix of row size r + rJ + mJ + m = (J + 1)(r + m) is pseudo inverted and that this regressor matrix is held unchanged during the iterations and is inverted once.
We believe that the dsr e method, with a consistent estimate of the innovations process in a first step, followed with an optimal estimation of a noise free deterministic subspace identification problem, or if the order is known solved by an optimal OLS (ARX) step leads us to the lower bound on the variance of the parameter estimates.Note that the first step in the dsr e method is a consistent filtering of the output signal, y J|1 , into a signal part, y d J|1 , and a noise innovations part, ε J|1 .We will in the next Section 6 discuss the dsr e method. 6 The DSR e algorithm In Di Ruscio (2008) a very simple, efficient subspace system identification algorithm which works for both open as well as for closed loop data was presented.This algorithm was developed earlier and presented in an internal report in ( 2004) and used in Nilsen (2005).
In this section an improved and extended presentation of the algorithm is presented.
In the DSR e algorithm the signal content, y d J|1 , of the future data, y J|1 = y J y J+1 . . .y N −1 ∈ R m×(N −J) , is estimated by projecting the "past" onto the "future", and in case of closed loop data when the direct feed-through term is zero (E = 0), i.e. estimated by the following projection where the projection operator "/" is defined in Eq. ( 9), and where we have used that for large J or as J → ∞ we have that which may be proved from Eq. ( 12) by using Eq. ( 10).The past data matrices, U 0|J and Y 0|J , are uncorrelated with the future innovations sequence, E J|1 .In the same stage the innovations sequence ε J|1 = F E J|1 in Eq. ( 15) is then consistently estimated as Note that both the signal part, y d J|1 , and the innovation part, ε J|1 , are used in the dsr e algorithm.
For open loop systems we may have a direct feed through term in the output equation, i.e., E = 0 and with an output Eq. y k = Dx k + Eu k + F e k we obtain and the signal parts may in this case be computed as, where we have used Eq. ( 10).Furthermore, the innovations are determined by The algorithm is a two step algorithm where in the first step the output data are split into a signal part, y d J|1 , and a noise (innovations) part, ε J|1 = F E J|1 , i.e., as This step of splitting the future outputs, y J|1 , into a signal part, DX J|1 , and an innovations part, ε J|1 , is of particular importance in the algorithm.We propose the following choices for solving this first step in the algorithm: • Using a QR (LQ) decomposition.Interestingly, the square root of the innovations covariance matrix, F , is also obtained in this first QR step, as in Di Ruscio (1996).Using the definitions or the QR decomposition gives approximately the same results in our simulation examples.One should here note that when the QR decomposition is used also the Q factors as well as the R factors are used.We have the from the QR decomposition where g = 0 for closed loop systems and systems with no direct feed through term in the output Eq., i.e., when E = 0.For open loop systems and when we want to estimate the direct feed through matrix E we put g = 1.From, the decomposition (41) we have and we notice that also the Q sub matrices are used.Notice also that we may take F = R 22 or compute a new QR decomposition of ε J|1 in order to estimate F .
• Another interesting solution to this step is to use a truncated Conjugate Gradient (CG) algorithm, Hestenes and Stiefel (1952), in order to compute the projection.The CG algorithm is shown to be equivalent with the Partial Least Squares algorithm in Di Ruscio (2000) for univariate (single output) systems.This will include a small bias but the variance may be small.This choice may be considered for noisy systems in which PEM and the DSR e method have unsatisfactory variances or for problems where one has to chose a "large" past horizon parameter J.However, one have to consider a multivariate version, e.g. as the one proposed in Di Ruscio (2000).
This step of splitting the future outputs, y J|1 , into a signal part, DX J|1 , and an innovations part, ε J|1 , is consistent and believed to be close to optimal when J is "large".
The second step in the dsr e algorithm is a deterministic subspace system identification step if the system order has to be found, or an optimal deterministic Ordinary Least Squares (OLS) step if the system order is known, for finding the Kalman filter model.Using PEM for solving the second deterministic identification problem may also be an option.
Hence, the future innovations ε J|1 = F E J|1 (noise part) as well as the given input and output data are given and we simply have to solve a deterministic identification problem.Note also that if the system order, n, is known this also is equivalent to a deterministic OLS or ARX problem for finding the model.This method is effectively implemented through the use of QR (LQ) factorization, see the D-SR Toolbox for Matlab function dsr e.m (unknown order) and dsr e ols or PEM if the order is known.
At this stage the innovations sequence, ε J|1 , and the noise free part, y d J|1 , of the output y J|1 are known from the first step in the DSR e algorithm.Hence we have to solve the following deterministic identification problem where not only the input and output data, u k , and, y d k , respectively, are known, but also the innovations, ε k , are known.Hence, the data are known, and the model matrices (A, B, K, D) may be computed in an optimal OLS problem.The second step in the DSR e method is best discussed as a deterministic identification problem where we will define new input and output data satisfying a deterministic system defined as follows where N := N − J is the number of samples in the time series to be used in the deterministic identification problem.Hence we have here defined new outputs, y k , from all corresponding samples in the noise free part y d J|1 ∈ R m×K where the number of columns is of the original input data, and from the the computed innovations in In the examples of Section 7 we illustrate the statistical properties of the method on closed loop data, and we show that DSR e is as optimal as PEM, and far more efficient than the PARSIM method.
We believe that the first step in the DSR e algorithm is close to optimal.Since we have some possibilities for implementing the second step in the algorithm, i.e., the deterministic identification problem, at least in the multiple output case.We discuss some possibilities for the second step separately in the following subsections.This second deterministic identification step is believed to be of interest in itself.

Deterministic subspace identification problem
Step 2 in the dsr e.m implementation in the D-SR Toolbox for MATLAB is basically as presented in this subsection.
Consider an integer parameter L such that the system order, n, is bounded by 1 ≤ n ≤ mL.As an example for a system with m = 2 outputs and n = 3 states it is sufficient with L = 2.
From the known deterministic input and output data u k y k ∀ k = 0, 1, . . ., N define the data matrix equation where the matrices are given by The same data as used in Eq. ( 48), i.e.
and U 0|L+g are used to form the matrix Eq.
There are some possibilities to proceed but we suggest estimating the extended observability matrix from Eq. ( 51) and the B, E matrices as an optimal OLS problem from Eq. ( 48), using the corresponding R sub matrices from the following LQ (transpose of QR) factorisation, i.e., Due to the orthogonal properties of the QR factorisation we have and the system order, n, the extended observability matrix O L+1 , and thereafter A and D, may be estimated from an SVD of R 22 , and using the shift invariance technique.An alternative to this is to form the matrix Eq.
and estimate the system order as the n largest nonzero singular values of the mL singular values of the matrix R 22 .Here R22 is obtained as R 22 with the first m block row deleted and R 22 as R 22 with the last m block row deleted.Using only the n largest singular values we have from the SVD that R 22 = U 1 S 1 V T 1 and we may chose O L = U 1 and find A from Eq. ( 54), i.e., as Note that there are common block rows in R 22 and R22 .This may be utilized and we may use which is used in the DSR algorithms.This means that we, for the sake of effectiveness, only use the truncated SVD of R 22 and the last block row in R 22 in order to compute an estimate of the A matrix.The D matrix is taken as the first m block row in O L = U 1 .This way of computing the A and D matrices is a result from Di Ruscio (1994).
Finally, the parameters in the B and E matrices are estimated as an optimal OLS step, using the structure of matrix BL and the equation where R21 is obtained as R 21 with the first m block row deleted and R 21 as R 21 with the last m block row deleted.Since ÃL now is known the problem BL R 11 = R21 − ÃL R 21 may be solved for the parameters in B (and E for open loop systems) as an optimal OLS problem in the unknown (n+m)r parameters in B and E, Di Ruscio (2003).We also mention in this connection that it is possible to include constraints in this OLS step, in order to e.g.solve structural problems, e.g. with K = 0 as in output error models.

Deterministic OLS/ARX step 2
The content in this section is meant as an option for solving the second (deterministic identification) step in the DSR e algorithm.However, the theory may also be used to identify a higher order ARX model followed by model reduction to a model with the adequate system order.
Specify a prediction horizon parameter, L, as specified above.We have the input and output ARX model where A i ∈ R m×m and B i ∈ R m×r ∀ i = 1, . . ., L are matrices with system parameters to be estimated.We only consider the case with a delay of one sample, i.e. without direct feed through term (E = 0) in Eq. ( 2).
From the observed data form the OLS/ARX regression problem where the parameter matrix, θ, ∈ R m×L(m+r) , is given by This regression problem is effectively solved through a LQ (transpose of QR problem) factorization.
Note that for single output systems then we simply may chose L = n and the state space model matrices A, B, K and D are determined simply from the correspondence with θ and the observer canonical form.
For multiple output systems we may chose L smaller than the system order according to the inequality 1 ≤ n ≤ mL.Hence, a non-minimal state space model of order mL on block observer form is constructed directly from the OLS estimate, θ, i.e., Hence, the A matrix in the non-minimal state space model has the A i ∈ R m×m matrices in the left block column.The B matrix in the non-minimal state space model of order n ≤ mL is constructed from the B i ∈ R m×r matrices.Noticing that the model has the same impulse response matrices as the underlying system, a state space realization with order n is constructed by Hankel matrix realization theory through a Singular Value Decomposition (SVD).
Using the realization algorithm in Ho and Kalman (1966), Hankel matrices H 1|L = O L C J and H 2|L = O L AC J are constructed from impulse response matrices h i = CA i−1 B ∀ i = 1, . . ., L + J, where L + J is the number of impulse response matrices.Using the SVD realization algorithm in Zeiger and McEwen (1974) gives the model matrices A, B and D and the system order, n.
The above method with outline of implementation is included in the D-SR Toolbox for MATLAB function dsr e ols.m.

Discussion
We have tried with different implementations of the deterministic identification problem in the second step of the DSR e algorithms.In Section 6.1 we presented a deterministic subspace identification problem for identifying both the system order, n, as well as the model matrices A, B, K, D, E. Interestingly, this choice was found to be the best one for all simulation experiments.To our knowledge this subspace system identification method is both consistent and close to efficient for closed as well as for open loop data.The efficiency of the algorithm is illustrated by simulation experiments in the section of examples.
In Section 6.2 we implemented the deterministic identification problem as an OLS/ARX step.However, due to the identification of the block observer form non minimal state space model of order mL and the need for model reduction, this did not match the low variance estimates from the dsr e algorithm with the deterministic subspace solution as presented in Section 6.1.
An optimal PEM step for direct identification of the parameters of an n-th order canonical form state space model of the deterministic identification second step in the DSR e algorithm is also considered.This option does not in general improve the estimates and in some examples gave larger variance than the dsr e implementation.However, this option should be investigated further.
Finally one should note that there are two horizon integer parameters involved in the DSR e algorithm.The first parameter J defines "past" input and output data which is used to remove noise from the "future" outputs.The parameter J should be chosen "large" such that the error term D(A − KF ) J X 0|J in the first projection is negligible.Note that the size of this term may be analyzed after one run of the algorithm.However, on our simulation experiments J in the range 6 ≤ J ≤ 10 gave parameter estimates with similar variances as the PEM estimates, see the examples in Section 7. The horizon J is only used in the first step so that the second step of the algorithm is independent of this parameter.
In the second step a prediction horizon parameter L for the system order is specified such that the system order is bounded by, 1 ≤ n ≤ mL.In the single output case low variance estimates are obtained with L = n and in the multiple output case L should be chosen as small as possible, as a rule of thumb.Note however that we have observed that it often is better to chose L = 2 than L = 1 even for a first order system in which n = 1.The horizon parameters may also with advantage be chosen based on model validation.
7 Numerical examples 7.1 SISO one state closed loop system We have found that the PARSIM-E method works well on the system in Weilu et al. (2004), Case 2 below.However, we will here study a slightly modified system in which the PARSIM-E method fails to give comparable results as the PEM and DSR e methods, Case 1 below.Hence, this example is a counter example which shows that a method based on recursive innovations computations may give poor estimates.
Consider the following system with proper order of appearance We consider the following two cases: Case 1 A = 0.9, B = 0.5, K = 0.6 and an initial value x 1 = 0. e k is white noise with standard deviation 0.1.For the controller we use K p = 0.6 and r k as a white noise signal with E(r 2 k ) = 1 generated in Matlab as R = randn(N, 1) and r k = R(k).Simulation horizon N = 1000.
Case 2 Example in Weilu et al. (2004).The same parameters as in Case 1 above but with B = 1, K = 0.5, e k is white noise with standard deviation 1 and r k as a white noise signal with E(r 2 k ) = 2. Simulation horizon N = 4000.
The system was simulated with discrete time instants 1 ≤ k ≤ N .This was done M = 100 times with different noise realizations on the innovations e k but with the same reference signal, r k , i.e. a Monte Carlo simulation is performed for both the above cases.For the DSR e, DSR e ols and the PARSIM-E methods we used the same algorithm horizon parameters, J = 6 and L = 3.The system identification Toolbox Ident function pem was used with the calls dat = iddata(Y, U, 1) and m = pem(dat, n) with system order n = 1.
This example shows clearly that the variance of the PARSIM-E method is larger than the corresponding estimates from the DSR e algorithm which are comparable and as efficient as the optimal PEM estimates, in   particular for the A and B estimates as illustrated in Figures 1 and 2. The variance of the K estimates from PARSIM are comparable with the DSR e and PEM estimates, see Figure 3.To our belief we have coded the PARSIM-E method in an efficient manner.An m-file implementation of the basic PARSIM-E algorithm is enclosed this paper.We also see that the DSR e estimates are as optimal as the corresponding PEM estimates.We have here used the standard dsr e method as presented in Sections 6 and 6.1, as well as a version of DSR e with an OLS/ARX second step as presented in Section 6.2, denoted DSR e ols.
For this example the PEM, DSR e and DSR e ols algorithms give approximately the same parameter estimates.Results from DSR e and PEM are illustrated in the lower part of Figures 1, 2 and 3.
In order to better compare the results we measure the size of the covariance matrix of the error between the estimates and the true parameters, i.e., as where subscript "alg" means the different algorithms, i.e.PEM, DSR e, DSR e ols and PARSIM-E.Here the true parameter vector is θ 0 = 0.9 0.5 0.6 T and the corresponding estimates θi = Â B K T i for each estimate i = 1, . . ., M in the Monte carlo simulation.From the simulation results of Case 1 we obtain the results V DSR e = 0.8712, V PEM = 0.9926, V DSR e ols = 1.0565, (69) It is interesting to note that the DSR e algorithm for this example gives a lower value than PEM for the V alg measure of the parameter error covariance matrix in Eq. ( 65).The reason for this is believed to be due to some default settings of the accuracy in PEM and this is not considered further.Finally we illustrate the estimated A parameters for the same system as in Weilu et al. (2004), i.e., with parameters in Case 2 above, see Figure 4.For this example the PARSIM-E method gives parameter estimates with approximately the same variance as DSR e and PEM.However, this is generally not the case as we have demonstrated in Figures 1, 2 and 3.  method.The correct eigenvalues are λ 1 = 0.85 and a complex conjugate pair, λ 2,3 = 0.75 ± j0.3708.Complex eigenvalues are in complex conjugate pairs.Hence, the negative imaginary part is not presented.
7.2 Closed loop MIMO 2 × 2 system with n = 3 states Consider the MIMO output feedback system The feedback is obtained with and r k with a binary sequence as a reference for each of the two outputs in y k .The process noise v k and measurements noise, w k are white with covariance matrices E(v k v T k ) = 0.05 2 I 3 and E(w k w T k ) = 0.1 2 I 2 , respectively.For comparison purpose we present the deterministic gain from u k to y k , i.e., eigenvalues are λ 1 = 0.85 and a complex conjugate pair, λ 2,3 = 0.75 ± j0.3708.Complex eigenvalues are in complex conjugate pairs.Hence, the negative imaginary part is not presented.
A Monte Carlo simulation with M = 100 different experiments, 1 ≤ i ≤ M , is performed.The simulation results for the deterministic gain matrix is illustrated in Figure 5 and for the stochastic gain matrix, h s , is illustrated in Figure 6.Furthermore we have presented eigenvalue estimates for both DSR e and the PEM method in Figures 7 and 8, respectively.For the DSR e method we used the algorithm horizon parameters, J = 6 and L = 3.The PARSIM-E estimates are out of range and are not presented.

Conclusion
We have in this paper investigated the subspace based method for closed loop identification proposed by Weilu et al. (2004).Alternative versions of this method are discussed and implemented in parallel and compared with the subspace identification method DSR e which works for both open and closed loop system.We have found by simulation experiments that the variance of the PARSIM-E method is much larger and in general not comparable with the DSR e and PEM algorithms.
The first step in the PARSIM-E and the DSR e algorithms is identical.However, the methods differ in the second step.In the PARSIM-E algorithm iterations are included in order to estimate the future innovations over the "future" horizon 0 ≤ i ≤ L. In the DSR e second step a simple deterministic subspace system identification problem is solved.
The difference in the second step/stage of the PARSIM-E and the DSR e methods is believed to be the reason for the high variance in the PARSIM-E estimates.In the PARSIM-E algorithm a matrix, r+m) , is computed in order to compute the extended observability matrix O L+1 .The matrix P will be of much larger size than necessary since J is "large" in order for the term O L+1 (A − KD) J X 0|J to be negligible.In the DSR e algorithm the horizon J is skipped from the second step of the algorithm and the extended observability matrix O L+1 is identified from a matrix of size (L + 1)m × (L + 1)(m + r) and where L may be chosen according to 1 ≤ n ≤ mL.
We have tested the DSR e algorithm against some implementation variants using Monte Carlo simulations.The DSR e algorithm implemented in the D-SR Toolbox for MATLAB is found to be close to optimal by simulation experiments.
e.A = O † L O L+1 (m+1 : (L+1)m, :) and D = O L (1 : m, :).The B and K matrices may simply be computed by forming the matrices O L B and O L K from the estimated matrices H d L+1|1 and H s L+1|1 and then premultiply O L B and O L K with the pseudo inverse O

Figure 1 :Figure 2 :
Figure 1: Pole estimates of the closed loop system in Example 7.1 with parameters as in Case 1.

Figure 3 :
Figure 3: K parameter estimates of the closed loop system in Example 7.1 with parameters as in Case 1.Here the PARSIM-E algorithm gives acceptable estimates compared to the corresponding DSR e and PEM estimates.

Figure 4 :
Figure 4: Pole estimates of the closed loop system in Example 7.1 with parameters as in Case 2, and using PARSIM e, PEM and DSR e with, B = 1.0,K = 0.5 and E(e 2 k ) = 1 and r k white noise with variance E(r 2 k ) = 2. See Figure 1 for a counter example.

Figure 5 :Figure 6 :Figure 7 :
Figure 5: Elements in the deterministic gain matrix, h d = D(I 3 − A)B, estimates of the closed loop system in Example 7.2.Both PEM and DSR e ols estimates are illustrated and the correct gains indicated with solid lines.
The regression problem increases in size and the matrix DB DA Cd DK is estimated as an OLS problem.Hence, at this stage the non-zero part of the first block rows in the Toeplitz matrices H d L and H s L , i.e. the Markov parameters DB and DK, respectively, are computed.Proceeding in this way each non-zero block row, H d i|1 and H s i|1 , in the Toeplitz matrices is computed as follows.
JDA Cs J