Structural Observability Analysis of Large Scale Systems Using Modelica and Python

State observability of dynamic systems is a notion which determines how well the states can be inferred from input-output data. For small-scale systems, observability analysis can be done manually, while for large-scale systems an automated systematic approach is advantageous. Here we present an approach based on the concept of structural observability analysis, using graph theory. This approach can be automated and applied to large-scale, complex dynamic systems modeled using Modelica. Modelica models are imported into Python via the JModelica.org-CasADi interface, and the Python packages NetworkX (for graph-theoretic analysis) and PyGraphviz (for graph layout and visualization) are used to analyze the structural observability of the systems. The method is demonstrated with a Modelica model created for the Copper production plant at Glencore Nikkelverk, Kristiansand, Norway. The Copper plant model has 39 states, 11 disturbances and 5 uncertain parameters. The possibility of estimating disturbances and parameters in addition to estimating the states are also discussed from the graph-theory point of view. All the software tools used on the analysis are freely available.


Introduction
Knowing the internal state of a dynamic system is important in many applications such as state feedback.However, measuring all state variables is usually impossible or impractical.What is more realistic is to estimate the state variables based on a finite set of measurements.The notion of observability characterizes whether a given set of measurements is adequate to estimate the state of the system.For linear time invariant systems, if the rank of the observability matrix is equal to the dimension of the state space, then the system is observable Simon (2006).For nonlinear dynamic systems diverse local observability definitions can be considered, for example using Lie derivatives Liu et al. (2012).In addition to analyzing observability for a given set of measurements, it would also be useful (especially for large-scale complex systems) to systematically find the minimum set of measurements which makes the system observable.By exploiting the model structure (algebraic dependencies among state and output variables), we can infer the minimum number of measurements and the possible choices to select from.Structural (or algebraic) observability is a fundamental property that provides a necessary condition for observability, and often it may also be sufficient for many systems Reinschke (1988), Liu et al. (2012).Structural observability analysis can be done using graph-theoretic techniques.Under some assumptions, unknown disturbances/parameters can be estimated (for example using Kalman filtering techniques Simon (2006)) by augmenting the system with them as state variables, making it necessary to check the observability of the augmented system.Struc-tural observability analysis via graph theory offers a visual means to pinpoint measurements needed to estimate states/disturbances/parameters, or to detect which cannot be estimated at all in the augmented system.JModelica.org is a Modelica-based simulation tool that makes possible to make Modelica models available as symbolic model objects in Python with the help of the JModelica.org-CasADiinterface.The symbolic models can then be used in structural analysis using Networkx, Pygraphviz, and Python packages. 1 This paper demonstrates the usefulness of using Python and relevant Python packages in analyzing structural observability for large-scale systems via graph theory.As a case study, the Copper production plant at Glencore Nikkelverk AS, Kristiansand, Norway is considered.The Copper plant model contains 39 states, 11 disturbances, and 5 uncertain parameters.The possibility of estimating disturbances and parameters additionally to the states will be discussed in a graph-theocratic point of view.Section 2 gives a basic description about graph-theoretic concepts which are needed in the subsequent sections.Section 3 discusses structural observability in graph-theoretic point of view.A way of automating structural observability analysis in Python is given in Section 4. A demonstration of our development is done based on a real process and it is given in Section 5.

Graph-theoretic concepts
A graph G is denoted by G = (V, E) where V is a set of nodes (or vertices) and E is a set of edges. 2 An edge connects two nodes v i and v j (where v i , v j ∈ V ) and denoted by (v i , v j ).The node v i and v j are incident to (v i , v j ) and v i and v j are said to be adjacent nodes.A graph may be directed or undirected.In undirected graphs, edges are marked with directed lines while in undirected graphs it is not.For undirected graphs, (v i , v j ) and (v j , v i ) are identical.A directed/undirected graph may allow multiple edges among nodes.In short, digraph stands for directed graph.Let an edge e i = (v i , v j ) ∈ E, then v i is the initial-vertex and v j is the final-vertex.As a shorthand notation for a directed edge, let A path is a sequence of edges connected one after another.A path has an initial node and a final node.
1 Alternatively, it is possible to create symbolic mathematical models using the Python package SymPy which is a CAS -CAS stands for Computer Algebra System -tool and then use Networkx and PyGrapViz.However, this method is more limited since it does not support the modeling power available in Modelica. 2Refer Bondy and Murty (2008) for graph theory.
Instead of using the term "simple path", we use "ele-mentary path" for digraphs.Similarly, instead of "simple cycle", "elementary cycle" is used.A stem is an elementary path.The initial and final nodes are respectively called the "root" and the "top" of the stem.
A root is also called a driver node.A bud is an elementary cycle with an additional directed edge where its final node is one of the nodes in the cycle.This additional edge is called the distinguished edge of the bud.A directed cactus is made out of a stem and buds connected in a special way.The initial node of the distinguished edge of a bud is connected to any node in the stem except the top or it may be connected to a node of another bud.See figure 2. A cactus has a driver node which is the root of the stem in the cactus.
If there are vertex disjoint cacti covering all nodes in a given digraph, then they are called spanning cacti and cacti have more than one driver nodes.appropriately, a maximum matching for digraphs can be efficiently found.In a bipartite, there are two disjoint sets of nodes V A and V B such that edges only exist between V A and V B .See figure 5.

Structural observability
Consider the linear time invariant (LTI) state space model and y = y 1 , y 2 , . . ., y ny T .Once the output vector y ∈ R ny and the input vector u ∈ R nu are known, then the state vector x ∈ R nx can be estimated if the system is observable.Analyzing observability based on the system structure is called structural (algebraic) observability analysis.Note that structural observability gives a necessary condition for observability, which means that if a system is not structural observable then it is not observable.A detailed discussion on structural observability analysis of linear systems is given in Reinschke (1988).The system structure can be represented graphically (using diagraph) and hence, graph-theoretic techniques can be used to analyze structural observability.
The system diagraph G is created in the following way.
There are n x + n y number of nodes representing state and output variables.If < i, j >-th element of A is not zero then there exists a directed edge from x i to x j .Similarly, the edges (x i , y j ) are created by considering non-zero elements in C. The definition 1 gives the condition for output connectivity and the definition 2 defines the conditions to be satisfied for structural observability.
Definition 1 A class of systems is said to be outputconnectable6 (or reachable) if in the digraph G there is a path from at least one of the output-vertices to every state-vertex.Reinschke (1988) Definition 2 A class of systems is s-observable if and only if the digraph G meets the following condition: Violation in the definition 2 will make the system not sobservable, hence not observable.To find the minimum number of driver nodes to achieve s-observability, the minimum input theorem is used Liu et al. (2011).The first step is to create the corresponding bipartite graph of the digraph.Let G(V, E) be a digraph where V = {v 1 , v 2 , . . ., v nv } and E = {e 1 , e 2 , . . ., e ne }.Define two disjoint sets of nodes such that is an edge of the bigraph.A maximum (or perfect) matching in the bigraph also gives a maximum (or perfect) matching in the digraph.The minimum number of driver nodes needed to achieve the s-observability is equal to the number of unmatched v + i 's in the bigraph and the driver nodes are corresponding v i 's.Note that if there is a perfect matching then there is a single driver node.Matching algorithms for bipartite graphs are already implemented in NetworkX so that the minimum input theorem can be easily implemented in Python.Also refer the supplementary section to Liu et al. (2011) for more details.Practical systems are often nonlinear, but interestingly it is possibly to apply the graph-theoretic approach discussed above for LTI systems directly to nonlinear systems Reinschke (1988), Daoutidis and Kravaris (1992), Liu et al. (2012), Boukhobza and Hemlin (2007) and Liu et al. (2011).Consider the nonlinear state space model T , and g = g 1 , g 2 , . . ., g ny T .
The digraph (G) containing n x + n y number of nodes.If ∂fi ∂xj = 0 then x i → x j exists and similarly, if ∂gi ∂xj = 0 then y i → x j exists.Note that partial derivatives should be found symbolically Perera (2014), Perera et al. (2014), not numerically.The reason is that, for example even though x j occurs in f i , still it is possible to have numerically ∂fi ∂xj = 0. State estimation techniques -for example extended Kalman filtering -may be used to estimate unknown disturbances and unknown/uncertain parameters Simon (2006), Jazwinski (2007), Åström (2006).Let the nonlinear state space model ẋ = f (t, x, u, w, p) , (3) where w = [w 1 , w 2 , . . ., w nw ] T is the disturbance vector and p = p 1 , p 2 , . . ., p np T is the parameter vector.Assume w and p are unknown disturbances and uncertain parameters to estimated.One possibility is to write ẇ = 0, (4) ṗ = 0, and then to augment w and p to the current state x.I.e.x = [x, w, p] T .Now the augmented state space model is ẋ = f (t, x, u) , (5) y = g (t, x, u) .
f and g are then used in s-observability analysis as already explained using graph-theoretic techniques.The compiled model object is a ".fmux" file with the name "mymodel" and the model object is imported as a CasadiModel object.The code is given below: Additionally, it is useful to do some formatting on nodes/edges.For example, states, input and output nodes are in different colors and shapes.NetworkX graph object can be converted to PyGraphViz AGraph objects using Gp = nx.toagraph(G).See the code below:

Python implementation
In order to have a better structured code, several new functions may be defined within the CasadiModel class: symbolicLinearization(), symbolicDAEs(), in-dexReduction(), createNodes(), createEdges(), gener-ateGraph(), decomposeGraph(), Y Topped(), Max -Matching(), etc.Now these functions can be called as for instance casadiModelObject.createNodes().symbolicDAEs() creates symbolic functions for ODEs and algebraic equations.indexReduction() is used for index reduction.Symbolic Jacobian matrices are found by symbolicLinearization(). Based on Jacobian matrices the nodes and the edges of the digraph are generated using createNodes() and createEdges().generate-Graph() creates a NetworkX and a PyGraphViz graph objects as well as it creates a 'dot'13 file.decomposeGraph() decomposes the digraph into strongly connected components.To check the conditions given in the definition 2 Y Topped() and Max Matching() are used.Let us define some terms (based on Anh (2012), Liu et al. (2012), andLiu et al. (2011)).State nodes which are directly connected to output nodes are called controlled nodes.A controlled node with just a single connection to an output is called a driver node.As a summary to this section, the following points are made: (1) a Modelica model is created, (2) import the dynamic model as a CasadiModel object model and use casadi to find symbolic Jacobian matrices of symbolic DAEs (after reducing the index if needed), (3) generate a digraph using networkx and pygrapviz, (4) use graph theories to analyze the digraph.

Industrial Application Case
The Copper electro-winning process at Glencore Nikkelverk, Kristiansand, Norway is considered.The process consists of four sections: (i) the slurrification where the calcine containing mostly copper oxide is slurrified using recycled anolyte flow, which containing sulfuric acid, taken from the electrowinning section, (ii) the leaching section where sulfuric acid is added to the slurry in order to leach more copper into the solution, (iii) the purification section where the slurry is first filtered to extract the solution, which contains copper sulphate (CuSO 4 ), followed by the cementation and fine filtering processes, and (iv) the electrowinning section where the solution containing Cu 2+ is electrolyzed to release solid copper at the cathode.For a detailed discussion and a mechanistic model for the Copper electro-winning process given in Lie and Hauge (2008).Figure 7 in appendix C gives the flow sheet 14 .The system model is in the form of equation 3 while the augmented model -by taking ṗ = 0 and w as slowly varying (i.e.ẇ ≈ 0) -is in the form of equation 5.The nodes corresponding to the parameters and disturbances have directed edges always directing towards them starting from either output/state nodes.I.e.possible edges ending at parameter/disturbance nodes are x i → p j , y i → p j , x i → w j and y i → w j .In the following discussion, it is shown that how to implement the procedure given in figure 6 in relation to the dynamic model given in Lie and Hauge (2008).The tank-volume (or level) dynamics are neglected in the original model.But, the tank-volume dynamics of the electro-winning section is included and the new model is considered in this paper.See equations 6, 7 and 8.It is assumed that the liquid level of the electro-winning tank is a constant.Except Ved2w , Vew2m and Vvap , the rest of the volumetric flow rates are known.Ved = Vem2d + Vp2e − Ved2w − Ved2m (6) 14 Taken from Lie and Hauge (2008).
Equations 9, 10, 11, and 12 are resulted by applying (static) mass balances to the slurrification, leaching and purification sections.
• parameters (all are unknown): k, η, (3) ps = 0. Note that, it may not be possible to augment all the unmeasured disturbances, but a subset of them.The reason is possible algebraic relationships among unmeasured disturbances, states, inputs and measured disturbances.15Vw2l , Vs2l , V Consider figure 9.The two elementary paths y 1 → V ed → Ved2w and y 2 → V em → Vew2m are two stems.Since neither V ed nor V em has edges going out from them, both are cacti without any buds.The SCC with plain-green colored nodes has no incoming edges from the other SCC which is in red, hence there must be at least one measurement node which is connected to a node in the plain-green colored SCC and y 4 and y 5 satisfy this condition. 16Since y 4 has only one edge, y 4 should be used to create the cacti, y 4 → ρ ed,CuSO 4 → ρ are not yet used.Since there are many nodes with only incoming edges -yellow color edges η, 1 , • • • , x cCu -, it is impossible to find cacti starting from y 3 and y 5 .Therefore, no spanning cacti is found for G. Thereby, the augmented system model is not structurally observable and hence not observable.In other words, it is impossible to completely estimate the augmented system state using given measurements.

Conclusion
We have demonstrated how to implement structural observability analysis, in the view of graph-theoretic approach, for large scale complex dynamic system in Python by using NetworkX, PyGraphViz Python packages as well as CasADi's Python front-end.The main result is how to find the spanning cacti for a given digraph in order to find the minimum number of driver nodes.Modelica is used for modeling and it is a standard tool for modeling large scale complex dynamic system.CasADi supports to import Modelica models into Python as flattened symbolic DAEs making it possible to use Modelica models in general use.Importantly, all the software tools which are used in our development are free.

Figure 1 :
Figure 1: (a) Undirected and without self-cycles/loops and multiple edges.(b) Undirected and with self-cylcles/loops and multiple edges.(c) Directed and without self-cycles/loops and multiple edges.(d) A directed and with self-cycles/loops and multiple edges.

3
See supplementary information toLiu et al. (2011)) for further details.Consider a subset of edges M in an undirected graph where no two edges share common nodes.Nodes in M are said to be matched.M is a maximum matching if there exists no edge setM such that |M | > |M |. 4 M is perfect if each node in the graph is in M .See figure3and note that thick color lines are matched edges.A path with edges alternating between E \ M 5 and M is an M -alternating path.M -alternating path is Malternating path with odd number of edges where the starting and the final edges are not in M .According to figure 3, {(v 4 , v 3 ), (v 3 , v 8 ), (v 8 , v 1 ), (v 1 , v 7 ), (v 7 , v 6 )} is an M -augmenting path.For digraphs, a matching is a subset of edges where no two edges share common nodes and a node is said to be matched if that node is the ending node of a matched edgeLiu et al. (2011).See figure4where M = {(v 6 , v 7 ), (v 1 , v 8 ), (v 3 , v 4 )} and v 7 , v 8 , and v 4 are matched nodes.

Figure 4 :
Figure 4: A matching for a digraph.

Figure 5 :
Figure 5: A bigraph.There are two sets of disjoint vertices (white and black colored).No edges among white nodes as well as black nodes. 8

4. 2 .
Structure of the Python scriptThe skeleton of the Python script is depicted in figure 6.First, the system model is encoded as a Modelica model.The Modelica model is then compiled and imported back to Python as a CasadiModel model object.The imported model is a symbolic flat representation of the Modelica model.Now using the CasADi Python package, necessary symbolic Jacobian matrices -which appeared in section 3 -are found.Once the Jacobian matrices are available, corresponding digraphs can be easily generated by means of NetworkX and PyGraphViz Python packages. 9NetworkX supports to create four types of graph objects: Graph, DiGraph, MultiGraph, and MultiDiGraph. 10In Net-workX, Graph and DiGraph graph objects are used only for graphs without multiple edges.To create graphs with multiple edges MultiGraph/MultiDiGraph graph objects should be used.It is clear that for s-observability analysis MultiDiGraph graph objects must be used.NetworkX provides many network algorithms related to: matching, bipartite graph related, strongly connectivity, cycles, tree, etc.The PyGraphviz Python package can be used as the layout tool. 11The Matplotpib Python package may also be used for network drawing.The NetworkX and PyGraphviz network objects are convertible to each other.

Figure 6 :
Figure 6: Structure of the Python script.
model is created for the (updated) dynamic model explained above and also augment unknown independent parameters as states: d dt k

Figure 7 :
Figure 7: The process flow sheet for the Copper electro-winning process.

Figure 8 :
Figure 8: The digraph (G) for the Copper plant model.
Perera (2014) (2014)ica.organdCasADioptionsModelica is becoming a standard tool for modeling large-scale complex physical systems.CasADi is a symbolic framework -a CAS tool -for numerical optimization and it is available to use it within Python.Modelica models -which result in differential algebraic equations, DAEs -can be imported to Python via CasADi and make symbolic DAEs available for general use in Python.SeePerera et al. (2014)andPerera (2014).CasADi comes with JModelica.organd it may be the easiest way of accessing CasADi in Python.
For example ocp.x, ocp.z, ocp.pi,  ocp.pd, ocp.pf, ocp.t, ocp.u, ocp.ode, and ocp.alg give respectively dynamic vector, algebraic state, independent parameter vector, dependent parameter vector, free parameter vector, time, input vector, vector of ODEs and vector of algebraic equations.Also casadiModelObject.dxgivesdynamic state derivative vector.The main ingredient for generating a MultiDi-Graph object is to have functions f and g which are given in equation 3 or f and g in equation 5. f is defined as an SXFunction class instance, say ffun.See below for the Python code:In order to add edges we can use the following code: 1) l,o and Vl2p can be expressed in terms of inputs and measured disturbances while x c,Cu and ρ a,H 2 SO 4 cannot, therefore x c,Cu and ρ a,H 2 SO 4 should be augmented.Vew2m , Ved2w and Vvap are dependent each other, therefore 2 out of 3 should be augmented.Let us choose to augment Vew2m and Ved2w , thereby Vvap becomes a function of the augmented states.See below: The structure of the Modelica code is given in appendix A. The structure of the Python script which is used to generate the digraph for the structural observability analysis is given in appendix A. The Python script creates the digraph G, which is given in figure8in appendix C. G can be decomposed into SCCs using the function decomposeGraph().See figure 9 in appendix C.There are two SCCs with more than one node.Each SCC is colored with different colors.It is possible to check whether G is output connected using Y Topped() (see definition 1).Here we use networkx.allsimplepaths() and this function gives all possible paths starting from a given node and ending at a given node if any.See the script given below for the definition of Y Topped():