Parameter and State Estimation of Large-Scale Complex Systems Using Python Tools

This paper discusses the topics related to automating parameter, disturbance and state estimation analysis of large-scale complex nonlinear dynamic systems using free programming tools. For large-scale complex systems, before implementing any state estimator, the system should be analyzed for structural observability and the structural observability analysis can be automated using Modelica and Python. As a result of structural observability analysis, the system may be decomposed into subsystems where some of them may be observable — with respect to parameter, disturbances, and states — while some may not. The state estimation process is carried out for those observable subsystems and the optimum number of additional measurements are prescribed for unobservable subsystems to make them observable. In this paper, an industrial case study is considered: the copper production process at Glencore Nikkelverk, Kristiansand, Norway. The copper production process is a large-scale complex system. It is shown how to implement various state estimators, in Python, to estimate parameters and disturbances, in addition to states, based on available measurements.


Introduction
Consider a class of nonlinear deterministic systems given by eq.( 1), where x ∈ R nx , u ∈ R nu , w ∈ R nw , p ∈ R np , and y ∈ R ny are respectively state, input, process noise, parameter, and output vectors and C ∈ R ny×nx is a constant matrix and f (.) and g(.) are known vector functions. 1ẋ = f (x, u, w, p) u and y are known as well as their time derivatives.x, w, and p are unknowns and the objective is to esti-1 x ≡ x(t), u ≡ u(t), etc. mate them based on u-y information.The state space is augmented with p via ṗ = 0.However, it is not obvious how to augment w, since w is completely unknown.One possibility is to assume w is a random variable with a given probability characteristic.An example Gelb (2001) is given in eq.( 2), where w i ∈ [w 1 , w 2 , . . ., w nw ] T is a stationary random process with the autocorrelation function γ i (τ ) = σ 2 i δ i /2 • e −δi|τ | , i is a given white Gaussian process and δ i , σ i > 0.
There are other alternatives for disturbance augmentation Bona and Smay (1966) and the choice may depend on whether the augmented system is observable or not -there is no point of augmenting parameters and disturbances if the augmented system becomes unobservable.The complete augmented system is given in eq.( 3) and corresponding state space representation is in eq.( 4), where g(.) is chosen appropriately by augmenting disturbances.3ẋ = f (x, u, w, p) ẇ = g (w) + ṗ = 0 The noise model related to eq.( 1) is given by the stochastic nonlinear system described by eq.( 5), where and v (measurement noise) are vectors of white Gaussian processes such that R nw×ny (S is a zero matrix if w and v are independent), Γ ∈ R nx×nw is a constant matrix, and x 0 is independent from w and v. Associated discrete time version of eq.( 5) is presented in eq.( 6).
β is a Brownian motion process such that •dt = dβ.4Now, the system in eq.( 1) is simulated for given p, {w k } (w k with some added noise), {u k }, and x 0 .{x k } and {y k } (a fictitious noise is added to y k ) are stored for k = 1, 2, 3, . . ., n.5 Then, based on the simulated data {u k } and {y k }, {x k } is estimated, for k = 1, 2, . . ., n, to verify the stability properties of state estimators.
The structure of the paper is as follows.Section 2: a brief discussion on nonlinear observability, in particular giving more attention on structural observability.Section 3: the process model.Section 4: Observability analysis.Section 5: a brief introduction to filtering theory.Section 6: structure of the Python code and results.Section 7: conclusions and future work.

Nonlinear observability
In order to achieve the state observability of the stochastic system given in eq.( 5), it is a must that the corresponding noise-free system is observable Margaria et al. (2004).Hence, consider the noise-free version of eq.( 5): For nonlinear systems, local observability should be concerned which is often tedious to handle for largescale complex systems.Let an initial state xi 0 , then for a given bounded input trajectory u(t), assume there exists a solution trajectory ŷi (t) = ĥ xi (t) for t ∈ [0, T ] and T < ∞, satisfying eq.( 7).If the points lie within the neighborhood of xi (0), satisfying ĥ xi (t) = ĥ xj (t) such that xi 0 = xj 0 , then xi 0 and xj 0 are said to be (locally) distinguishable.Moreover, the local distinguishability property has a one-to-one correspondence with the local observability: local observability ⇔ local distinguishable.Loosely speaking, different state trajectories starting from distinct initial states xi 0 and xj 0 , will always generate distinct output trajectories.If not local distinguishable, then it is not possible to uniquely (locally) deduce x i (0) from y i (t)-u i (t) information.A criterion for local observability/distinguishability or algebraic observability is given using Lie derivatives and brackets Isidori (1995), however algebraic observability analysis is not easy to use for large-scale complex systems because the test related to searching for the rank of a matrix with higher dimensions and algebraic variables.The best solution is to use structural observability instead, preferably its graph-theoretic associate, where structural dependencies among output and state variables are used to define a necessary condition for observability.Observability is a structural property and by exploiting the system structure, it is possible to extract much more information than the rank test for the algebraic observability check.The structural dependencies of the system given in eq.( 7) are mapped into a directed graph, so called the system digraph.The system digraph G is created as follows Reinschke (1988): 1. define nodes (or vertices) x 1 , x 2 , . . .,x n x , y 1 , y 2 , . . ., y n ŷ , 2. there is a directed edge from and   3. there is a directed edge from The system digraph may be decomposed into strongly connected components (SCCs): a SCC is a largest subgraph where there exists a directed path from each node to every other node in it.A root SCC is a subgraph such that there are no incoming edges to its nodes.In order to achieve structural observability, at least one node of each root SCC should be measured.Hence, number of sensor nodes must not be less than number root SCCs Liu et al. (2013).On the other hand, suppose that sensor nodes are given.Then, the system is structurally observable if the system digraph is spanned by cacti covering all nodes Lin (1974).Figure 1 shows a cactus.A cactus consists of a stem and possibly one or more buds attach to the stem.Stem is a directed path where starting node is the root and end is the top.A root node is always a measurement.A bud is an elementary closed path, which connects to the stem via the distinguished edge.A bud should not be connected either to the root or to the top node and no two distinguished edges share the same node.

Process model
See Figure 2 for the process flow sheet.The process consists of four sections: (i) the slurrification section where powdered raw material containing mostly copper oxide (CuO) is slurrified using recycled anolyte flow, which containing sulfuric acid (H 2 SO 4 ), taken from the electrowinning section, (ii) the leaching section where sulfuric acid is added to the slurry in order to leach more copper (Cu) into the solution, (iii) the purification section where the slurry is first filtered to extract the solution, which contains copper sulphate (CuSO 4 ), followed by the cementation and fine filtering processes, and (iv) the electrowinning section where the solution containing (Cu 2+ ) is electrolyzed to release solid copper at the cathode.In the initial model Lie and Hauge (2008),7 tanklevel dynamics are neglected (i.e.static mass balances).However, level dynamics of several tanks are included in the modified model.It is assumed that the tanks in the slurrification and leaching sections as well as the electrowinning tank have no level variations.Level dynamics in the following tanks are included: buffer tank 1, buffer tank 2, buffer tank 3, dilution tank and mixing tank.System digraph is given in Figure 3 and it is seen that it is not possible to isolate a spanning cacti covering all nodes, then the system is not structurally observable and thereby, not observable Perera et al. (2015).Since, it is already proved in that the system is not structurally observable, only a (structurally) observable subsystem -which is the electrowinning section -is considered in the following discussion.
Where β is a positive constant which should be specified.The last 4 equations are due to parameterdisturbance augmentation.8There are 4 measurements: • ρ ew,CuSO 4 , and

Observability analysis
Including augmented parameter η and disturbances Ved2w , Vew2m , and ρ (3) pb,H 2 SO 4 , altogether there are 12 (n x = 12) states.This makes it hard to analyze for algebraic observability.See Figure 4 for the system digraph.According to the digraph, it is always possible to estimate Ved2w and Vew2m using y 1 and y 2 .The reason is that y 1 → V ed → Ved2w and y 2 → V em → Vew2m are two cacti (just two stems without buds).There exists a spanning cacti covering all nodes, hence the system is structurally observable.The spanning cacti is as follows: , and Consider eq.( 19).If β = 0, then the self-loop ρ

Filtering theory for discrete systems
The objective of filtering/estimation is to estimate system state from available noisy data Jazwinski (2007).
Consider the discrete stochastic dynamical system given in eq.( 6).Assume that system is observable.9{w k , k = 1, 2, 3, ...} is a random sequence with given probability distributions.Probability density function of x0 is also given.The famous assumptions are {w k } is white Gaussian sequence, w k ∼ N (0, Q k ), and independent of x0 .Also, {v k , k = 1, 2, 3, ...} is a random sequence with given probability distributions such as v k ∼ N (0, R k ).A solution to the eq.( 6) is the probability density function of x k .{wk} and {vk} may be dependent.We may write: Where δ kl = 1 when k = l, else δ kl = 0. S k = 0 if {w k } and {vk} are independent.The two sequences Y N = {y 1 , y 2 , . . ., y N } and U N = {u 0 , u 1 , . . ., u N −1 } contain input-output data and {x 1 , x2 , . . ., xN−1 } to be estimated for given Y N and U N .There are several ways to handle nonlinear filtering problems: extended Kalman filtering (EKF) Strang and Borre (1997), unscented Kalman filtering (UKF) Simon (2006), particle filtering (PF) Doucet et al. (2000), etc. Figure 5 gives a comparison about different state estimation techniques with respect to accuracy and computational effort.
Often, EKF could be the starting point for a nonlinear estimation problem, where linear Kalman filtering theory is adapted based on first-order linearization.The extended Kalman filter10 is as follows for k = 1, . . ., n: • Song and Grizzle (1992); k , and

Where x+
k is the best estimate for x k .The main reason for the divergence of EKF is the model fidelity.This can be demonstrated easily using even a simple scalar system Fitzgerald (1971) Simon (2006).In eq.( 5), Γ ∈ R nx×nw and n w ≤ n x .If n w < n x , then there is at least one differential equation where a process noise term does not appear.However, by including a fictitious process noise to such equations, it may be possible to compensate model inaccuracies to some extent.For a constant parameter, ideally, we have p k+1 = p k and however, a small fictitious noise is added to get p k+1 = p k + p .Now, we have a stochastic system given in eq.( 24), where Γ x , Γ w , Γ p = 0 and [Γ x , Γ w , Γ p ] T ∈ R nx+nw+np×nx+nw+np and w k ∈ R nx+nw+np .Also, in order to increase the stabilityconvergence characteristics, fading-memory filters may be used Simon (2006).Actually, it is not necessary to include fictitious noises to all differential equations, but it is enough to include for some of them.Fictitious noises are inserted such that the stochastic system becomes state stabilizable with respect to process noise vector Potter (1965).
EKF is susceptible to linearization errors and if the model is highly nonlinear then the EKF may not give acceptable results.The unscented Kalman filter Julier et al. (1995) Doucet et al. (2000) is a possible candidate to try with, if the EKF fails.Importantly, UKF does not require to calculate Jacobian matrices.The following steps are followed in UKF algorithm: 6 Structure of the Python code and results et al. (2015) discuss a procedure for automating structural observability analysis in Python using JModelica.org-Casadiinterface.Figure 3 and 4 are generated based on above mentioned article.There are several Python packages for state estimation: pyda, filterpy, pykalman, KF, etc.11 First simulated noisy data is created.See the Python script given below.model(x,t,u,w,p) represents from eq.( 8) to eq.( 16) and f(x,dt,u,w,p) is the discrete system.Simulated data is obtained with known variations -see the for        often which is not the case in reality.Therefore, in order to test the capabilities of various estimators, those estimators should be applied with real process data where process noises may not be Gaussian and therefore, parameter-disturbance-state estimation with real process data is set as a future work.In particular, it is expected to implement particle filter algorithms with real process data, since particle filters can handle non-Gaussian process noise as well as it works better compared to EKF/UKF algorithms when the system model is highly nonlinear.

Perera
Filtering parameters, such as Q, R, P , etc., are tuning parameters, but filter tuning is not considered in this paper.Those parameters directly link with filterresponse-time and filter sensitivity (propagation of the error covariance matrix P ), therefore sensitivity analysis and filter tuning should be done as a future work.As the full state of the electrowinning system can be reconstructed from measurement data, it opens up many possibilities, among others, implementing an optimal control strategy.Also, estimating disturbances alongside the states will help to improve control performance -disturbance compensation.Finally, comparing the performances among PID and Optimal control strategies would be an interesting future work.

Figure 2 :
Figure 2: Process flow sheet for the Copper electrowinning process (electrowinning section is highlighted in red).

Figure 6 :
Figure 6: Estimation of V ed and V em .