A BLUP derivation of the multivariate breeder’s equation, with an elucidation of errors in BLUP variance estimates, and a prediction method for inbred populations

The multivariate breeder’s equation Lande (1979) was derived from the Price equation Price (1970, 1972). Here, I present a derivation based on the BLUP (best linear unbiased predictions) equations in matrix form, ﬁrst given in summation form by Henderson (1950). The derivation makes use of a comparison with the known form of the multivariate breeder’s equation, and it is therefore not an independent derivation. The alternative derivation does, however, clarify why and to which extent the variances of BLUP predictions of random eﬀects are underestimated. The BLUP random eﬀects can in fact be used for prediction of phenotypic responses by use of the Robertson-Price identity, Robertson (1966), Price (1970, 1972). The BLUP derivation also leads to a prediction method for populations with inbreeding, i.e., where the additive genetic relationship matrix departs from a unity matrix, and where the multivariate breeder’s equation therefore will fail.


Introduction
The multivariate breeder's equation for correlated phenotypic traits, was originally developed by Lande (1979) and Lande and Arnold (1983) using the Price equation, Price (1970Price ( , 1972, as a starting point. Here, ∆ȳ t is the incremental change in the vector of mean trait values from one generation to the next, while y i,t is a vector of individual traits at generation t. The G matrix is the covariance matrix of the additive genetic effects, while P is the phenotypic covariance matrix, and G and P are here assumed to be constant. W i,t andW t are the individual and population mean fitness, respectively. Lande's equation has since it first appeared been an important tool in quantitative genetics. Evolution of mean traits according to Eq. (1) seeks to maximize the expected geometric mean fitness, Ergon (2022a), and in a theoretical study this requires an individual fitness function. Such a function will include a vector of phenotypic values θ t that maximize the individual fitness, as exemplified in Ergon (2019). Changes of the elements in θ t as functions of environmental factors are thus the driving force of the evolution. Environmental factors can in addition have a direct influence on the evolution Ergon (2019).
In this article, I present a novel derivation of the multivariate breeder's equation, based on the linear mixed model BLUP (best linear unbiased predictions) equa-tion in matrix form, Ch. 26, Walsh and Lynch (2018). I will assume non-overlapping generations, and in cases with sexual reproduction I will assume a hypothetical single parent (mid-parent) occupying an intermediate phenotypic position between the two parents. Note that the BLUP equations are considerably older than the Price equation, although then in summation form, Henderson (1950). My derivation makes use of a comparison with Eq. (1), and it is therefore not an independent derivation. It does, however, clarify why and to which extent the variances of BLUP predictions of random effects are underestimated. The BLUP random effects can in fact be used for prediction of ∆ȳ t by use of the Robertson-Price identity, Robertson (1966), Price (1970Price ( , 1972. A main advantage with the BLUP derivation is also that it easily can be modified to include a genetic relationship matrix, Ch. 26, Walsh and Lynch (2018), resulting in a prediction method for inbred populations. For this type of inbred populations, the genetic relationship information is simply ignored when Eq. (1) is used. An additional possibility is that the BLUP solution may incorporate additional fixed effects, which in many cases is advantageous, Ch. 19, Walsh and Lynch (2018).
Accepting the BLUP equations in matrix form as given, the BLUP derivation of Eq. (1) is in some ways simpler than the original derivation, in that only basic matrix algebra (including Kronecker products) but no multivariate statistical theory is needed. This is so because no assumptions of normally distributed additive genetic and non-additive effects are needed in order to find the mixed model equations, Robinson (1991). This does not, however, imply that Eq. (1) and its BLUP counterpart give correct results for non-normal data. The reason for this is that Eq. (1) makes use of only mean values and variances of additive genetic and environmental effects, which means that influences from third and higher order statistical moments are ignored.
The BLUP derivation is given in Section 2, including the necessary assumptions. Simulations that verify the theoretical results are presented in Section 3, and a short summary and discussion is finally given in Section 4. MATLAB simulation code is given in the Appendix.

Assumptions
In Lande (1979) the multivariate breeder's equation (1) was developed with the Price equation for selection in a population, Price (1970Price ( , 1972, as starting point. According to the Price equation in its multivariate form, the evolution of a mean phenotypic trait values in a p × 1 vectorȳ t is described bȳ (2) where ∆ȳ t =ȳ t+1 −ȳ t are the incremental changes from generation to generation, while y i,t is a p × 1 vector of individual trait values. Here, W i,t andW t are the individual and mean fitness, respectively, whilē y of f spring i,t is the p × 1 vector of mean trait value of the offspring (and the parent if it survives) of individual i in generation t.
In Ergon (2019) it is shown how Eq.
(2) leads to the multivariate breeder's equation (1) by means of five assumptions: 1. The vector y i,t of individual traits is the sum of independent additive genetic effects x i,t and environmental effects e i,t (including non-additive genetic effects), i.e., y i,t = x i,t + e i,t .
2. The non-additive effects e i,t are zero mean, independent and identically distributed (iid) random variables.
3. There are no expected fitness weighted changes in the mean additive genetic effectsx t from one generation to the next besides selection, i.e., and W i,t , are conditionally independent given the phenotypic trait vector y i,t . This implies that x i,t ,ē i,t andē of f spring i,t influence fitness only through the values of y i,t .

A BLUP derivation of the multivariate breeder's equation
Initially assume the simple case where an individual phenotypic trait y ij,t in a vector y j,t at a time t (a generation) has an additive genetic effect x ij,t =ȳ j,t + a ij,t , and an independent non-additive (environmental) effect e ij,t (Assumption 1). Also assume that the mean values of a ij,t and e ij,t are zero (Assumption 2). We thus have the individual traits whereȳ j,t is the mean phenotypic value in the population. For a population with N individuals this leads to y j,t = 1 Nȳj,t + a j,t + e j,t where y j,t , 1 N , a j,t and e j,t are N × 1 column vectors.
In the multivariate case with p phenotypic traits, Eq. (4) gives or in a standard linear mixed model form, Ch. 26, Lynch M (1998), We must in general assume that the random effects a j,t may be correlated, with a covariance matrix (here assumed to be constant) Also assume that the residuals e j,t are independent and identically distributed (iid), withē j,t = 0 and constant variances r j , resulting in a covariance matrix For use in a general multivariate BLUP equation we also need to introduce the N × N genetic relationship matrix A t , Ch. 26, Lynch M (1998), and the Kronecker covariance matricesG = G ⊗ A t andR = R ⊗ I N , where I N is the N × N identity matrix. Here, ⊗ is the Kronecker product operator, which means that all elements in G and R should be multiplied by A t and I N , respectively. From the mixed model, Eq. (6), follows the multivariate BLUP equation in matrix form, Henderson (1950), Ch. 26, Lynch M (1998, In order to find the multivariate breeder's equation, we must here assume A t = I N , such thatG = G ⊗ I N . For the further development we need the following lemma: Lemma 1 Assuming a genetic relationship matrix A t = I N , the mean values ofâ j,t andŷ j,t in Eq. (8) are zero andȳ j,t , respectively, i.e.,ā j,t =â j,t = 0 and y j,t =ȳ j,t .
Proof: Because Z = I pN , Eq. (8) after elimination of X TR−1 givesâ t = y t − Xŷ t , from which follows that for all values of j from 1 to p From Eq. (8) with Z = I pN and elimination ofR −1 , also follows that Since the block matrices Q jk in general are non-zero, we will by taking mean values of all terms in Eqs. (9)-(11) find two equations forȳ j,t − 1 Nŷj,t that are equal only whenā j,t =â j,t = 0 for all j from 1 to p. Since 1 Nŷj,t =ŷ j,t , Eq. (11) thus givesŷ j,t =ȳ j,t . Next, we must find how fitness affects the mean trait predictions for the offspring of any given parent generation. In order to do that we must assume that there are no expected changes in the mean phenotypic valuesȳ t from one generation to the next, beside those caused by selection (Assumption 3). We must also assume that the individual fitness W i,t is determined by the individual phenotypic traits y ij,t for all values of j (Assumption 5). As a measure of individual fitness, we may here use the number of offspring.
We may now introduce the N × N diagonal relative fitness matrix F t = diag(W i,t )/W t , with W i,t /W t for i from 1 to N along the main diagonal, and the block diagonal pN × pN matrixF = diag(F t ). After multiplication from the left byF t , Eq. (10) can be written asF Using G + R = P, and the mixed product property Taking mean values on both sides of Eq. (14), and utilizing thatŷ j,t =ȳ j,t (Lemma 1), we find the key equation We now find that the elements in the column vector on the righthand side of Eq. (15) are where cov(W i,t , y ij,t ) is applied on the parent generation. Eqs. (15) and (16) thus give Here, we recognize the righthand side of Eq. (1), which necessarily means that the elements on the lefthand side are F tâj,t = ∆ȳ j,t . The BLUP derivation of Eq. (1), leading to Eq. (17), also leads to the following theorem: Theorem 1 Assuming a genetic relationship matrix A t = I N , the incremental change ∆ȳ j,t in a mean phenotypic trait from one generation to the next is given by the mean value of the product F tâj,t , whereâ j,t is found from the Eq. (8), i.e., An alternative expression for the incremental phenotypic changes is given by the following theorem: Theorem 2 Assuming a genetic relationship matrix A t = I N , the incremental change ∆ȳ j,t in a phenotypic trait from one generation to the next can be found from the Robertson-Price identity applied on the estimated random effect, Proof: From Lemma 1 follows thatā j,t =â j,t = 0, and Eq. (18) can thus be developed as In summary we thus have three methods which for a genetic relationship matrix A t = I N give identical results for the incremental changes in phenotypic traits, i) the multivariate breeder's equation (1), ii) the BLUP updating equation (18), and iii) the . Note, however, that the two BLUP methods can be used also for inbred populations with A t = I N , where the multivariate breeder's equation will fail.
Finally note that the derivation of Equation (17) made use of four of the five assumptions used in the derivation based on the Price equation, as presented in Ergon (2019). The assumption of jointly multivariate normal random effects (Assumption 4) was used only indirectly, in that we relied on a recognition of the righthand side of Eq. (1).

Errors in BLUP variance estimation
It is well known that BLUP consistently underestimates the variances of the random effects, and this has been seen as a serious problem for applications on wild populations, Hadfield (2010). However, it follows from Theorem 2 that the underestimation of var(â ij,t ) for all j is just what is needed in order to obtain the same results from the Robertson-Price identity (Theorem 2) and the multivariate breeder's equation. This is most easily seen in the univariate case, with G = g and R = r, where Eq. (1) and the Robertson-Price identity (19) give Here, we haveâ i,t = g g+r (a i,t + e i,t ), and thus var(â i,t ) = g g+r 2 (g + r) = g 2 g+r , while var(a i,t ) = g.
This can be generalized into where a ij,t and e ij,t are p×1 vectors of additive genetic and environmental effects for a given individual. We thus find the variance of a given vectorâ j,t as the variance of the mean centered row vector j in the matrix a 11,t + e 11,t a 21,t + e 21,t · · · a N 1,t + e N 1,t a 12,t + e 12,t a 22,t + e 22,t · · · a N 2,t + e N 2,t . . .
Note that we for limited population sizes must use mean centered row vectors in order to avoid conflicts with Lemma 1.

An alternative proof of Theorem 2
According to Lemma 1 we have thatā j,t =â j,t = 0 andŷ j,t =ȳ j,t . From Eq. (8), with Z = I p N , thus follows whereâ j,t = [â 1j,tâ2j,t · · ·â N j,t ] T and y j,t = [y 1j,t y 2j,t · · · y N j,t ] T . Using the mixed product prop where G + R = P. We thus find (26) which proves that the multivariate Roberson-Price identity applied onâ i,t gives the same result as the multivariate breeder's equation applied on y i,t .

Simulation results
In order to verify Theorems 1 and 2, a population as described in Section 2 with three phenotypic traits was used, with parameter values as given in Fig. 1, including G 12 = 0. The individual fitness function was rounded values of i.e., number of offspring as integer numbers from 0 to 10. The value of ω 2 is given in Fig. 1. Eq. (28) assumes that the individual fitness is maximized for y i1,t = y i2,t = y i2,t = 0. The evolution ofȳ 1,t ,ȳ 2,t andȳ 3,t from initial values different from zero towards zero, was simulated in MATLAB by use of three methods. New values in the random vectors a 1,t , a 2,t , a 3,t , e 1,t , e 2,t and e 3,t were at each new generation drawn from normal distributions defined by the values of G 11 , G 22 , G 33 , G 12 , G 13 , G 23 , r 1 , r 2 and r 3 , and these random inputs were used in all of the three simulations. First, the responses were found by use of the multivariate breeder's equation (1)  Wt cov(W i,t ,â i,t ) (squares). Panels a), b) and c) show results with a relationship matrix A t = I N , while panels d), e) and f ) show results for A t according to Eq. (29). Parameter values were G 11 = G 22 = G 33 = 0.5, G 12 = G 21 = 0.25, G 13 = G 31 = 0, G 23 = G 32 = 0, and r 1 = r 2 = r 3 = 0.25. The population size was N = 100, and the width of the fitness function (28) was ω 2 = 10.
Second, the simulations were repeated by use of the BLUP equation (8), with a genetic relationship matrix A t = I N , and the prediction equation (18) (Theorem 1), i.e.,ȳ 1,t+1 =ȳ 1,t + F tâ1,t ,ȳ 2,t+1 =ȳ 2,t + F tâ2,t andȳ 3,t+1 =ȳ 3,t + F tâ3,t , with fitness values as found in the first simulation. Third, the responses were also simulated by use of the Roberson-Price identity ∆ȳ t = 1 Wt cov(W i,t ,â i,t ) (Theorem 2), again with A t = I N and fitness values as found in the first simulation. The covariances in the first and third methods were computed as (N − 1)cov(W i,t , z i,t )/N , where cov(W i,t , z i,t ) is the appropriate MATLAB function. As shown in Ergon (2019), this follows from the defi-nition of the covariance function in the Price equation (2). As shown in Fig. 1, the results found by the three methods are identical with a population size N = 100, and this is the case for population sizes down to N = 2.
As a test, the simulations were repeated with a relationship matrix for a population with a high degree of inbreeding, as shown for N = 6 in Eq. (29), As shown in Fig. 1, this gave different results for the three methods. In this case new random effects at each new generation were found as a 1,t = M √ A t a 10,t , a 2,t = M √ A t a 20,t and a 3,t = M √ A t a 30,t , with a 10,t , a 20,t and a 30,t drawn from normal distributions with the given values of G 11 , G 22 , G 33 , G 12 , G 13 and G 23 .
Here, M √ A t is the matrix square root, i.e., A t = A t , and the new random effects were mean centered. This typically resulted in var(a 1,t ) = var(a 2,t ) = var(a 3,t ) = 0.35, and cov(a 1,t , a 2,t ) = 0.22, i.e., reduced as compared with the nominal values, which were 0.5 and 0.25, respectively. The main point is that the results from the three methods are not identical when A t = I N . As shown in Fig. 1, there are minor differences also between the two BLUP methods, but these differences are smaller when the population size is larger (see Section 4 for discussion).
The results in Fig. 1 were obtained with random inputs generated by the MATLAB function randn, which generates normally distributed data. If instead the function rand was used to generate random inputs that were uniformly distributed in the interval [−0.5, 0.5], the responses found by the three methods described were still identical when A t = I N . However, because the variances of the random inputs with use of the rand function were reduced by approximately a factor ten, the responses were approximately ten times slower. As a final test, uniformly distributed samples in the interval [0, 1] with variance 1 were added to a 1,t , resulting in clearly skewed data. With A t = I N , the responses found by the three methods described were still identical, although they were all wrong, because the effects of third and higher order statistical moments are ignored for non-normal data.
Sampling variances of the random effects were computed in two different ways, with results as shown in Table 1. Note that the results for the two methods are identical, and this is true for population sizes down to N = 2.

Summary and discussion
The multivariate breeder's equation of Lande (1979) and Lande and Arnold (1983), Eq. (1), was originally derived from the Price equation, Price (1970), Price (1972), assuming normally distributed additive genetic and environmental effects. Here, I show that the same equation can be derived from the BLUP equations in matrix form, originally developed by Henderson (1950) (although then in summation form), and use of a genetic relationship matrix A t = I N , where N is the var(â i1,t ) var(â i2,t ) var(â i3,t ) Nominal 0.5 0.5 0.  (2019), except for the normality assumption. Note, however, that the derivation in Section 2 relies on a development where the righthand side of Eq. (17) is recognized as the righthand side of Eq.
(1), and that the normality assumption therefore is used indirectly. It is interesting to note that Henderson's equations are considerably older than the Price equation, and that Eq. (17) thus conceivably could have been developed already in the 1950s. It would then have been natural to compare the univariate version of Eq. (17) with the univariate breeder's equation, Lush (1937), ∆ȳ t = h 2 S, where h 2 is the heritability, while S is the selection differential, and such a comparison would have shown that h 2 = g/(g + r), and S = cov(W i,t , y i,t )/W t . With knowledge of Eq. (17) and Theorem 2, it might then have been conjectured that Eq. (1) is the multivariate breeder's equation.
It should be noted that since applications of Eq. (1) on non-normal data may produce incorrect microevolutionary results, this must be the case also for the BLUP prediction methods in Eqs. (18) and (19). This problem was acknowledged by Lande and Arnold (1983), who in their Appendix proposed a possible correction for errors caused by skewness. The fundamental reason for such errors is that the multivariate breeder's equation involves only mean and variance values of the additive genetic and environmental effects, while influences from third and higher order statistical moments are ignored. See for example discussions in Bonamour et al. (2017) and Pick et al. (2022).
Other sources of errors in applications of the multivariate breeder's equation, are unknown or at least unmeasured traits that are correlated with the traits used in the analysis, Morrissey et al. (2010). Such errors will be the same with use of the two BLUP methods.
Note that the new derivation of the multivariate breeder's equation makes use of the mean values of the products F tâj,t , where F t is a diagonal matrix of normalized individual fitness values, whileâ j,t are predicted random effects according to Eq. (8). Although the variances ofâ j,t as such are not used, they have been computed in the simulations, and assuming a genetic relationship matrix A t = I N , they are exactly what is needed for the Robertson-Price identity to produce the incremental changes ∆ȳ j,t . The BLUP derivation of the multivariate breeder's equation thus clarifies why and to which extent BLUP underestimates the variances of the random effects. The theory is confirmed by the simulation results in Fig. 1, and the estimated variance results in Table 1.
The theory in Section 2 also shows how BLUP can be used to compute the incremental values ∆ȳ j,t in cases where A t = I N , and were thus the multivariate breeder's equation will fail. As shown in Fig. 1, the identical results with use of A t = I N , are no longer identical when A t = I N . Note that there are minor differences also between the two BLUP responses in panels d), e) and f). This is so because with A t = I N only the expectations E ā j,t = 0, such that we may haveā j,t =â j,t = 0. This may lead to extra terms in the ∆ȳ j,t = F tâj,t results, while it has nothing to say for ∆ȳ t = cov(W i,t ,ā i,t )/W t . These differences tend to be smaller for larger population sizes.
The theory in Section 2 assumes a population with several correlated phenotypic traits. The multivariate breeder's equation may, however, also be used in cases where parameters in reaction norm models are treated as traits in their own rights, Lande (2009), Ergon and Ergon (2017). Identical results can also in such cases be found from a BLUP model with a relationship matrix A t = I N , but the derivation is then somewhat more involved. This will be reported separately, as a BLUP extension to the microevolutionary parameter estimation algorithms in Ergon (2022a) and Ergon (2022b).