Nilssen, Preconditioning of fully implicit Runge-Kutta schemes for parabolic PDEs

Recently, the authors introduced a preconditioner for the linear systems that arise from fully implicit Runge-Kutta time stepping schemes applied to parabolic PDEs see Mardal (2005). The preconditioner was a block Jacobi preconditioner, where each of the blocks were based on standard preconditioners for low-order time discretizations like implicit Euler or Crank-Nicolson. It was proven that the preconditioner is optimal with respect to the timestep and the discretization parameter in space. In this paper we will improve the convergence by considering other preconditioners like the upper and the lower block Gauss-Seidel preconditioners, both in a left and right preconditioning setting. Finally, we improve the condition number by using a generalized Gauss-Seidel preconditioner.


Introduction
Since its introduction in 1895, the Runge-Kutta schemes have proven to be efficient for a variety of problems.For the solution of parabolic PDEs however, the fully implicit schemes are not that widespread.The majority of these problems are computed using either implicit Euler, Crank-Nicolson or some higher-order Backward Differential Formulas (BDF) scheme in time.They all result in a Helmholtz type of problem to be solved for each timestep.Because of the quadrature of the fully implicit Runge-Kutta schemes, the system to be solved increases in dimension for increasing number of quadrature nodes.In general, for a problem where the space is discretized using m degrees of freedom, an s stage scheme will result in a system to be solved of dimension sm × sm.In addition, although the ODE system matrix is symmetric and positive definite, the matrix when the Runge-Kutta scheme is employed is in general nonsymmetric and indefinite, and requires different linear solvers.Diagonal implicit Runge-Kutta schemes (DIRK) have been introduced to solve this problem.They lead to s systems of dimension m×m to be solved, where the symmetric positive definite property is preserved.Unfortunately, this type of schemes suffer from a severe kind of order reduction for stiff problems [4,Chapter IV.15], leading to first order convergence.Fully implicit Runge-Kutta schemes have several desirable properties such as high-order of accuracy, strong stability properties, both for linear and for some schemes also for nonlinear differential equations.They also lead to a simple implementation of adaptivity, both with respect to the timestep and the order [5].It is therefore desirable to reduce the computational costs of solving the linear systems arising when using Runge-Kutta methods on PDEs, in order to make them competitive to multistep methods like BDF.We will do this by reusing efficient preconditioners for the Helmholtz problem in a block preconditioner for the fully implicit Runge-Kutta scheme.We will discuss both block Jacobi, and block Gauss-Seidel preconditioners, and we will also discuss both left and right preconditioning.To the authors' knowledge the only former work on preconditioners for the fully implicit Runge-Kutta schemes applied to parabolic equations is done in Van lent and Vandewalle [8].In [8] the time stepping system (see (7)) is preconditioned with a multigrid approximation of the fully coupled system, using a block smoother.The benefit of the preconditioner presented in this paper is the reuse of standard preconditioner, both when it comes to code and theory.Only an efficient preconditioner for the Helmholtz problem is needed to implement our block preconditioner.
We do however need a linear solver for a general non-symmetric, indefinite matrix, such as e.g.

GMRES.
The remaining of this paper is organized like this: Section 2 explains the discretization of the problem, with emphasis on the resulting block structure from the Runge-Kutta discretization in time.The preconditioner is presented in Section 3, and some important properties are discussed.Section 4 presents a variety of numerical experiments, demonstrating the effectiveness of the preconditioner.In Appendix A, some optimal coefficients for the generalized lower block Gauss-Seidel preconditioner are presented.

Discretization of the problem
Let Ω be a bounded polygonal domain in R d , with d=1,2 or 3, and boundary ∂Ω.We will consider a parabolic PDE on the form The equations ( 1)-( 3) are discretized in space by a finite element method, which gives a R m×m system of ODEs to be solved, where I h is the mass matrix and ∆ h is the stiffness matrix.
When the equations ( 4)-( 5) are discretized by single stage discretization schemes such as implicit Euler, Crank-Nicolson or higher-order BDF schemes, we arrive at the following sequence of linear systems to be solved at each time step where δt is the time stepping parameter and α 0 is a coefficient specific for the chosen scheme.All higher-order single stage schemes result in more terms on the right hand side, leaving the left hand side unchanged (except for the α 0 parameter).This means that a Helmholtz solver has to be used to compute a single timestep.
A Runge-Kutta scheme, applied to our model problem ( 1)-( 2), can be written on the form where A ij are the Runge-Kutta coefficients, b i are the quadrature weights and c i are the quadrature nodes of the Runge-Kutta scheme, organized in the Butcher tableau c A b T .
To better understand the block structure arising from the Runge-Kutta system, we write the scheme on matrix form We then have to solve Ag = b, where b is the right hand side given by (7).We notice the block system, with Helmholtz problems on the diagonal, and Poisson problems on the off-diagonals.This motivates us to reuse the preconditioner for the Helmholtz problem.
We will here give a brief introduction to the different fully implicit Runge-Kutta schemes.For a more thorough description, the reader is referred to [4].The main difference between the families of fully implicit Runge-Kutta schemes is the choice of quadrature nodes.It can either be the Gauss, the Radau or the Lobatto quadrature.The methods based on the Gauss quadrature are stable for both linear and nonlinear problems, and the order is 2s.In addition the schemes are symplectic (see [3,Chapter II.16]).These methods will be denoted Gs, where s is the number of nodes.Note that G1 is the famous implicit midpoint scheme.When choosing the Radau quadrature, we have to decide if we want the start or the endpoint to be one of the quadrature nodes.We choose the endpoint, leading to very attractive schemes for parabolic PDEs.The schemes are stable for both linear and nonlinear problems, and the order is 2s − 1.In addition the schemes are stiffly accurate [4, Chapter IV.15], which is a very attractive property when solving different PDEs.These methods will be denoted RIIs, where II notes that the endpoint is one of the quadrature nodes.Note that RII1 is the famous implicit Euler scheme.By choosing the Lobatto quadrature nodes, we derive three subfamilies of schemes.Two of the subfamilies have an explicit step, either the first or the last quadrature node.We will not discuss these schemes here.Instead we will focus on the subfamily with only implicit quadrature points.These schemes are also stable for linear and nonlinear problems, and the order is 2s − 2. As for the RII schemes, these schemes are also stiffly accurate.We will denote them LCs, where C refers to the subfamily C. Note that both the Gauss methods, and the RadauII methods are collocation methods.As a consequence, methods of any given order are easily constructed.

The Preconditioner
A general description of a preconditioned problem is , where B L and B R is the left and the right preconditioner, respectively.It will be made clear from the context whether B is a left, or a right preconditioner, so we will from now on drop the subscript for left and right.For s = 1, we end up with preconditioning and solving a Helmholtz problem.Order optimal solution algorithms for this system are well known for most spatial discretization methods.The goal of this paper is to reuse such preconditioners for fully implicit Runge-Kutta schemes.We will investigate both a block Jacobi and a block Gauss Seidel preconditioner on the form where Ã = A for now.Ã will be referred to as the preconditioner coefficients.The upper triangular Gauss-Seidel (GSU) preconditioner B GSU have the structure of the transposed of the lower triangular (GSL) preconditioner B GSL .When we write B GS , it means that it can be either lower, or upper block Gauss-Seidel preconditioner.For lower-order discretization methods in space, multigrid and domain decomposition methods are often used as preconditioners.Such methods have been extensively studied both in theory and in practice, and it has been shown that they are order optimal with respect to the discretization parameters h and δt.When we write B, it means exact preconditioning, meaning that each block is inverted exactly.B means that we compute a cheap approximation of B, e.g.multigrid.
Property 1 By using an order optimal preconditioner for the Helmholtz problem for each diagonal block, the block Jacobi preconditioner BJ will also be order optimal.This is proven in [9].It can been proven in a similar way for the block Gauss-Seidel preconditioner, but we will investigate this numerically.
Property 2 Assume that B is the approximation of the exact preconditioner B, e.g.multigrid, and that B is either a block Jacobi or a block Gauss-Seidel preconditioner.Then the condition number can be bounded by Proof: By using a Cauchy-Schwarz like inequality valid for condition numbers, we find that It is clear from Property 2 that the condition number using an inexact preconditioner can be bounded by the condition number using exact preconditioning multiplied by an amplification factor.This amplification factor is the condition number of the inexact preconditioner applied to the inverse of the exact preconditioner.
Assuming that we use a block Jacobi preconditioner, and that ∆ h is symmetric positive definite.Then BB −1 is also symmetric positive definite, leading to It can be proven that this is close to the condition number of the preconditioned Helmholtz problem.Assume that we use a lower block Gauss-Seidel preconditioner.The inverse of a triangular matrix is still a triangular matrix.We now write Then we find that Hence, the analysis is more complicated.The same argument holds for upper block Gauss-Seidel.Obviously the block Gauss-Seidel preconditioner is less robust to a poor approximation of the preconditioner, then the block Jacobi preconditioner in the case with a large number of quadrature points.We are therefore interested in investigating these amplification factors numerically when using multigrid approximation.
Property 3 Assume that the complexity of the matrix-vector product ∆ h x scales as O(m), for ∆ h ∈ R m×m and x ∈ R m .Then one iteration of our preconditioned algorithm, both the block Jacobi and the block Gauss-Seidel, scales as O(s 2 m), where s is the number of quadrature nodes in the chosen Runge-Kutta scheme.
This means that the fully implicit scheme scales as the DIRK methods, but worse then the single stage schemes which scales as O(pm) where p is the number of steps.However, s is usually small leading to a relatively small difference.The question is if the increase in computational cost for one timestep is larger then the decrease in the required number of timesteps.This will be investigated numerically.
One benefit of the presented preconditioner is that spatial discretization technique can easily be changed.In practice people would probably be interested in using higher-order methods in space as well as in time.As long as there exists a preconditioner for the implicit Euler method, this can be reused with our methodology.Note however that the proof of Property 1 is based on a conforming finite element or spectral element discretization.

Results
We will use multigrid to approximate the preconditioner.All computations will be done on a domain Ω = (0, 1) d , where d is the number of spatial dimensions.A sequence of meshes is constructed by uniform refinement of a 2, 2 × 2 or 2 × 2 × 2 partition of the domain Ω.The preconditioner B is computed using a standard V-cycle with a symmetric Gauss-Seidel smoother.Gaussian elimination is used as the coarse grid solver.Note that we do not reach the asymptotic region for 1D multigrid preconditioning in our experiments, and the condition number may be higher in this case.We want to find the condition number κ( BA) for left preconditioning, and κ(A B) for right preconditioning.For large problems, this can not be found exactly.It is therefore approximated by solving the linear system using Conjugate Gradient for the Normal equation (CGN).More precisely we solve ( BA) T BAx = ( BA) T Bb and approximate κ( BA) = κ(( BA) T BA).A description on how to approximate the condition number from a Conjugated Gradient method can be found in [10].

Verification of the optimality of the preconditioner using multigrid
In this experiment we verify numerically the order optimality of the block preconditioner with respect to the spatial discretization parameter h and the timestep δt, by using multigrid to approximate the blocks.This is done for the 2D problem (1)-( 3) using bilinear finite elements in space and the three nodes RadauII scheme in time.First BJ is a block Jacobi preconditioner approximated by one multigrid V-cycle.The results can be found in Table 1.
The order optimal behavior is confirmed with an asymptotic value of roughly 17.
In the second experiment, BGSL is a lower block Gauss-Seidel preconditioner, again approximated by one multigrid V-cycle.The results can be found in Table 2. Gauss-Seidel is apparently much better then Jacobi, and the asymptotic value of the condition number is roughly 3. Again the order optimal behavior is confirmed.1)-(3) using bilinear finite elements in space, and the three nodes RadauII scheme in time.BJ is the block Jacobi preconditioner, and is approximated using one multigrid V-cycle.1)-(3) using bilinear finite elements in space, and the three nodes RadauII scheme in time.BGSL is the lower block Gauss-Seidel preconditioner, and is approximated using one multigrid V-cycle.

Numerical investigation of the condition number when using multigrid
Property 2 states that the condition number using inexact preconditioning will be bounded by the condition number of the exact preconditioner multiplied by the condition number of inexact preconditioner applied to the inverse of the exact preconditioner.We will investigate this numerically.This is done by computing the condition number κ(BA) and κ( BA), where B is computed using one multigrid V-cycle.We do this for d = 1, 2, 3, with h = 2 −9 , 2 −9 , 2 −6 respectively.The exact preconditioner is only computed in the 1D case.The results using block Jacobi preconditioning can be found in Table 3, while the lower block Gauss-Seidel preconditioner can be found in Table 4.The one node Gauss is included for reference.We notice that the increase in condition number due to the inexact preconditioning is approxi-   3) with δt = 0.1 and h = 2 −9 , 2 −9 , 2 −6 respectively.B is computed using one multigrid V-cycle.
mately 2 in 1D, and 1.1 in 2D.For the 3D case, we are not in the asymptotic region, and the condition number is therefore some places slightly smaller then the one using exact preconditioning.We notice that the block Gauss Seidel preconditioner is in general much better then the block Jacobi preconditioner.

Comparison of left and right preconditioner for Jacobi, lower and upper Gauss-Seidel
In our fifth experiment, the difference between left and right preconditioning, for both the block Jacobi, lower block and upper block Gauss-Seidel is investigated.This is done for a 1D problem on the form (1)-( 3).In space we use linear finite elements with h = 2 −8 .The preconditioner is computed exact.This is done for the Radau schemes with two to six nodes.The results are shown in Table 5.From the results, we conclude that right preconditioning is generally better then left preconditioning.The difference may be more then a factor of two.We also conclude that lower block Gauss-Seidel gives the lowest condition number.Upper block Gauss-Seidel gives by far the largest condition number, which is not intuitive.The explanation to this is postponed to the next section, due to the need for some simplifying assumptions.Table 5: The condition number for the left preconditioned system κ(BA), and the right preconditioned system κ(AB) for the 1D problem (1)-(3) using linear finite elements in space with h = 2 −8 .B is the block Jacobi, lower block and upper block Gauss-Seidel, and is computed exactly.

Finding optimal coefficients for the preconditioner
In the previous experiments we used Ã = R(A), where R represents the restriction to the diagonal elements, the lower or the upper triangular part.A relevant question is if it is possible to reduce the condition number of the preconditioned system by changing Ã.From the previous example, we noticed that upper block Gauss-Seidel gives a larger condition number then the block Jacobi preconditioner.By choosing all the off-diagonal coefficients infinite small, we will be close to a block Jacobi preconditioner.This clearly indicates that it should be possible to find a more optimal Ã.In order to find these optimal coefficients, we need to understand what governs the condition number from the preconditioned system.If we instead of solving a PDE, discretize a scalar ODE u = λu, we get B is identical, only changing A with Ã and restricting it to diagonal or lower triangular.If δtλ 1, it is obvious that For a PDE, λ will be a n × n matrix, containing n eigenvalues.If we assume that all the blocks in A is well preconditioned by B, all the eigenvalues will be clustered and ( 12) is still a good approximation.This is tested for the 1D problem ( 1)-( 3), using linear finite elements with h = 2 −8 , and the results can be seen in Table 6.6: Comparison of the condition number of the preconditioned system BA and the condition number of the preconditioner coefficient matrix and Runge-Kutta coefficient matrix ÃA.The numbers are in good agreement, motivating us to use (12) as a cheep cost-function for the optimization process.
We will now indicate why upper block Gauss-Seidel works so bad compared to block Jacobi and lower block Gauss-Seidel.Obviously we have where ( Ã−1 ) ij = Âij .Most fully implicit Runge-Kutta schemes have large values in the lower triangular part of the coefficient matrix A, and small values in the upper triangular part.
For lower block Gauss-Seidel, Ã−1 GSL A, this leads to a small number divided by a larger number in the upper right part of the matrix, while the lower part is well preconditioned.In general this leads to a small condition number.For the upper block Gauss-Seidel, Ã−1 GSU A, this do however lead to a relative large number divided by a smaller number in the lower left part of the matrix, while the upper part is well preconditioned.In general this leads to a large condition number.Because of its bad preconditioning properties, we will discuss upper block Gauss-Seidel no more.Note that this is not a proof, but only a plausible explanation.The same type of arguments can be used to explain why right preconditioning is generally better then left.We will now see if it is possible to improve the conditioning number by optimizing the preconditioner coefficient matrix Ã. Obviously (12) is a good approximation, at least as long as the preconditioner is computed exact.We will therefore optimize the coefficients in Ã, given the structure from the choice of a block Jacobi scheme, or a lower block Gauss-Seidel scheme.
Note that we now use a generalized block Jacobi, or block Gauss-Seidel, since B is no longer the block diagonal or block triangular part of A. We use a Nelder-Mead algorithm [6] and initialize with the values from A. Note that we might not find the global optimal value using this optimization process.
In Table 7 we see the condition numbers based on the optimized preconditioner coefficient matrix Ã.The difference between the optimization cost function (15) and κ(BA), is minimal.A is constructed for the 1D heat equation ( 1)-( 3) using linear elements with h = 2 −8 .Since the difference between left and right lower block Gauss-Seidel is relative small, and left preconditioning is the most commonly used preconditioning technique, we will from now on only discuss left preconditioning.
It is now important to determine how much the condition number will grow when the exact preconditioner B is changed with the inexact preconditioner B based on multigrid.The results  7: Comparison of the condition number of the preconditioned system BA and the condition number of the preconditioner coefficient matrix and Runge-Kutta coefficient matrix ÃA where the preconditioner coefficient matrix is a result from the optimization process (15).can be seen in Table 8 for the block Jacobi and the lower block Gauss-Seidel preconditioner.For block Jacobi preconditioning, we notice that the reduction in the condition number is much smaller then expected from the exact preconditioned problem.For lower block Gauss-Seidel the condition number is in some cases larger then the non optimized case.For LC3, we do not even have convergence after 3000 CGN iterations for the lower block Gauss-Seidel.To understand this we study the blocks in the preconditioner.

Bi (a∆
The condition number of (16) will not change when c changes, though the impact on the condition number of the full block matrix is more complicated.For (17) however, we can not say that the condition number will not change when c changes, considering the approximation of the preconditioner is done by multigrid.
To avoid this, we try another approach by adding the constraint to the minimization problem (15).This results in an optimization only valid for block Gauss-Seidel preconditioners.
The results can be seen in Table 9.As expected, the condition number using exact preconditioning is larger for the optimization using the constraint (18), then without.But the condition number using inexact preconditioning based on multigrid is in much better agreement with the optimization results.By choosing a 6 nodes scheme, the lower block Gauss-Seidel preconditioner using one multigrid V-cycle results in a condition number of less then 2.5 for two and three dimensional problems.

Iteration count and timing results
Finally, we compare the wall clock time (wct) for a given test problem.We solve (1)-( 3), with a source term f such that the exact solution is u(x, y, t) = sin (ω x x) sin (ω y y) sin (ω t t) The high number of oscillation in time is used to generate a certain degree of complexity in time.
In space we discretize using linear finite elements.Both the element size h and the time-step δt is chosen such that the error is of order 10 −5 , measured in the L 2 norm in both space and time.The preconditioner is a lower block Gauss-Seidel approximated using one multigrid V-cycle, and the linear system is solved using GMRES with restart and 5 search vectors (for RadauII 1 node we used conjugated gradients) with a stopping criterion of absolute residual equal 10 Notice that for GMRES the residual is only evaluated before the restart.This means that the system is possibly over-iterated, but the computational time is in general smaller due to the high cost of evaluating the residual in every iteration.We also solve the linear system using CGN.
The results are computed on a Linux machine with an Intel P4 2.8GHz processor and 1GB RAM.The result is displayed in Table 10.
The number of iterations for GMRES and CGN is comparable, but the difference in the wall clock time is approximately a factor of 2. In our experiment, the five nodes RadauII scheme is by far the fastest.This is due to the decrease in number of required steps outweighs the increase in number of iteration for the linear solver.Implicit Euler (RII1) is very slow due to the large number of required timesteps.Note however that no general conclusions can be drawn from this small experiment.Which scheme is the fastest depends on several properties like the regularity of the solution, the required accuracy, the implementation etc.

Final remarks
In this paper we have shown that the systems arising from fully implicit Runge-Kutta schemes applied to parabolic equations can be preconditioned  with block diagonal and block triangular preconditioners, where the diagonal blocks are standard preconditioners developed for the backward Euler scheme.Such preconditioner are well known to be order optimal when constructed by, e.g., multigrid or domain decomposition methods.
In several numerical experiments we have demonstrated that the condition number for the preconditioned systems is bounded.We have also seen that higher-order methods are beneficial, when using efficient preconditioners, even for problems with relatively fast dynamics and modest accuracy requirements.For the six nodes RadauII scheme, the new preconditioning approach with lower block Gauss-Seidel with optimal coefficients results in a 30 times reduction in the condition number compared to the block Jacobi preconditioner presented in the previous paper [9].

A Optimal coefficients
we present the optimal coefficients for the preconditioner matrix Ã.The coefficients are found by solving the optimization problem (15) with the constraint (18).Due to a space limitation, only 6 decimals are presented.Only the values for left preconditioned lower block Gauss-Seidel are presented, since this has been the most effective preconditioner in our experiments.

Table 1 :
The condition number κ( BJ A) for the 2D problem (

Table 2 :
The condition number κ( BGSL A) for the 2D problem (

Table 3 :
Block Jacobi preconditioner applied to the one, two and three dimensional problem (1)-(

Table 9 :
−7. Condition number of the preconditioned system where Ã is the optimal coefficients computed from the optimization problem (15), with the constraint (18).

Table 15 :
Optimal coefficients Ã for the six nodes Gauss scheme

Table 16 :
Optimal coefficients Ã for the two nodes RadauII scheme

Table 18 :
Optimal coefficients Ã for the four nodes RadauII scheme

Table 19 :
Optimal coefficients Ã for the five nodes

Table 20 :
Optimal coefficients Ã for the six nodes RadauII scheme

Table 21 :
Optimal coefficients Ã for the two nodes

Table 22 :
Optimal coefficients Ã for the three nodes LobattoC scheme

Table 23 :
Optimal coefficients Ã for the four nodes LobattoC scheme