in-Integration of Interbull’s multiple across-country evaluation approach breeding values into the multiple-trait single-step random regression test-day genetic evaluation for yield traits of Australian red breeds

Interbull’s multiple across-country evaluaftion provides national breeding organizations with breeding values for internationally used bulls, which integrate performance data obtained in different breeding populations, environments, and production systems. However, breeding value-based selection decisions on domestic individuals born to foreign sires can only benefit from Interbull breeding values if they are integrated such that their information can contribute to the breeding values of all related domestic animals. For that purpose, several methods have been proposed which either model Interbull breeding values as prior information in a Bayesian approach, as additional pseudo data points, or as correlated traits, where these methods also differ in their software and parameterization requirements. Further, the complexity of integration also depends on the traits and genetic evaluation models. Especially random regression models require attention because of the dimensionality discrepancy between the number of Interbull breeding values and the number of modeled genetic effects. This paper presents the results from integrating 16,063 Interbull breeding values into the domestic single-step random regression test-day model for milk, fat, and protein yield for Australian Red dairy cattle breeds. Interbull breeding values were modeled as pseudo data points with data point-specific residual variances derived within animal across traits, ignoring relationships between integrated animals. Re-sults suggest that the integration was successful with regard to alignment of Interbull breeding values with their domestic equivalent as well as with regard to the individual and population-wide increase in reliabilities. Depending on the relationship structure between integration candidates, further work is required to account for those relationships in a computationally feasible manner. Other traits with separate parity effects nationally could use a similar approach, even if not modeled with a test-day model.


INTRODUCTION
The worldwide dairy cattle population is dominated by only a few breeds, and the regular global exchange of genetic material between local populations has been common for decades. Consequentially, a bilateral and multi-lateral evaluation system has been developed which combine raw records across countries (Edel et al., 2011;Lidauer et al., 2015). However, combined across-country genetic evaluation is still an exception. Consequentially, Interbull (Upsalla, Sweden) is conducting international genetic evaluation for 7 dairy cattle breeds using de-regressed domestic breeding values as pseudo phenotypes in a multiple across-country evaluation approach (MACE;Schaeffer 2001). This allows the integration of all available information about an animal across countries, environments, and production systems, and to compare breeding values at the international level. However, genomic information is not used in MACE, but it is used in genomic MACE for young Holstein bulls and intergenomics for Brown Swiss bulls.
Breeding value-based selection decision on domestic animals born to foreign sires can only benefit from Interbull's evaluation if the breeding values reported by Interbull are integrated into the national evaluation system. This is even more relevant for single-step genetic evaluation as the number of breeding values affected by the integration is expected to increase compared with genetic evaluation models based on pedigree information due to more nonzero relationships between animals. Several methods have been suggested for in-Integration of Interbull's multiple across-country evaluation approach breeding values into the multiple-trait single-step random regression test-day genetic evaluation for yield traits of Australian red breeds tegrating external breeding values into national genetic evaluation systems, which differ in their approach and consequentially in their requirement with regard to the used evaluation software and parameterization. Integrating external breeding values as prior information in a Bayesian approach Zhang 2001, 2006;Legarra et al., 2007;Vandenplas and Gengler 2012;Vandenplas et al., 2014;Guarini et al., 2019) requires manipulation of the right-hand side and a special covariance matrix between animals of the mixed model equation system. Integration as additional pseudo data (Bonaiti and Boichard 1995;Pedersen et al., 1999;VanRaden and Tooker 2012;VanRaden et al., 2014;Pitkänen et al., 2020) requires derivation of 1 or several additional data points and manipulation of the residual co-variance matrix. Treating external breeding values as extra traits (Vandenplas et al., 2015) requires additional co-variances and increases the dimension of the evaluation equations system. Further, the integration of production traits like milk, fat, and protein yield is complicated due to dimensionality discrepancies between the Interbull model and the domestic multiple-trait random regression test-day models. To our knowledge, no entire description of a method for integrating Interbull breeding values into a single-step random regression test-day model of a domestic evaluation is available in the scientific literature. Therefore, in this paper, we present results from the integration of Interbull breeding values for milk, fat, and protein yield into the domestic single-step random regression test-day model genetic evaluation for Australian Red Dairy breeds.

METHODS
No live human or animal subjects were used, so this analysis did not require approval by an Institutional Animal Care and Use Committee or Institutional Review Board.

Data
The data set for genetic evaluation of production traits of Australian Red dairy breeds consisted of testday observation on milk yield in liters, fat yield in kilograms, and protein yield in kilograms in the first 3 lactations with a total of 10,000,614 observations across all 9 traits (i.e., each lactation is a separate trait). The pedigree consisted of 73 phantom parents and 975,532 individuals, of which 8,191 were genotyped on various platforms. Genotypes underwent internal quality control and imputation procedures and, after being imputed to a common set of about 73,000 markers, were used to construct a genomic relationship matrix G using the observed allele frequencies for centering and the average observed allele frequency variance for scaling (VanRaden 2008).
The total number of bulls to be integrated was 16,063, where 466 bulls had Australian information provided to Interbull (A-bulls) and 15,597 bulls had no Australian information provided toInterbull (Bbulls), either because no information was available or the available information was not eligible due to Interbull reporting guidelines. Only 212 A-bulls and 116 B-bulls had genomic information in the Australian population.
Breeding values of A-bulls for integration were identified by a positive difference between the Interbull reliability and the local reliability of at least 0.01. If the difference was only achieved for a subset of A-bulls's breeding values (e.g., for milk yield, but not for fat and protein yield), for that bull only, the subset was integrated.
All breeding values of B-bulls were integrated regardless of the Interbull reliabilities.

Evaluation Model
The multiple-trait random regression test-day evaluation model, used by DataGene, into which the Interbull breeding values were integrated, was  ,., , , ,., ( ( ) . ,., where subscript t = [1,2,3] indexes traits milk (1), fat (2), and protein (3) within lactation, indexed by subscript l = [1,2,3], f i is a matrix valued function applying a normalized Legendre polynomial (Kirkpatrick et al. 1990) order i to the argument, y l,t is a vector of phenotypic observations of length n l ; X l,t , l l , Z l , and W l are within-lactation trait (l,t) or within-lactation acrosstrait (l) design matrices linking respective effects to observations, where l l is a vector of ones, K l is a diagonal matrix within lactation l, I k k , , , ∈ ( ) 3 9 15 is an identity matrix of subscripted dimension, b l,t is a vector of fixed classification effects, c l t f i , , is a scalar fixed lactation curve effect of normalized Legendre polynomial order i on trait t in lactation l; u f l t i , , and q f l t i , , are vectors of random direct genetic and random individual permanent environmental effects of normalized Legendre polynomial order i on trait t in lactation l, and e l,t is the respective residual. For the sake of a simplified description, the above notation assumes that all traits within lactation are equally recorded.
Columns in X l were constructed from herd test day, year-season, and age class at calving, and diagonal elements in K l were DIM.
The model also had vector q where Σ q is co-variance matrix of the permanent environmental effects of dimension 27 × 27 and Σ a is the genetic covariance matrix of dimension 27 × 27, and of which the inverse is where Θ is a diagonal matrix with elements equal to 0.62, implying that genetic groups are fitted as random with a variance of 0.62Σ a , Q is the pedigree-derived genetic group regression matrix, and H is the singlestep relationship matrix constructed using a weighted genomic relationship matrix (G w = 0.8 × G + 0.2 × A 22 ; Christensen and Lund 2010).

Integration Methodology
Interbull breeding values were modeled as additional data points with the data point specific residual variance as the single tuning parameter.
The integration approach must account for the fact that that the model provides 27 factor level effects (random regression coefficients) at the polynomial multi-trait level, whereas the information sent to and received from Interbull provides cumulative breeding values within trait and subsequently averaged across lactations. That is, although the domestic evaluation model provides 27 factor level effects (random regression coefficients) and 9 reliabilities (1 reliability per trait) per animal, only 3 breeding values and reliabilities per animal (lactation averages within milk, fat and protein) are sent to and received from Interbull. Note that the number of reliabilities results from the used software and approximation algorithm.
For aligning within-animal dimensions, Interbull breeding values u MACE and reliabilities r MACE and sent breeding values u SENT and reliabilities r SENT were mapped from R 3 to R 9 using a block-diagonal matrix K of dimension 9 × 3 containing 3 × 1 column blocks of ones. Further, Σ a was mapped from R 27 to R 9 using a 9 × 27 block-diagonal matrix L t I = ⊗ 9 with , where I 9 is an identity matrix of dimensions 9 × 9, t is a vector of length 3, d is a vector containing a consecutive sequence from 6 to 305, f i is a vector-valued function applying the Legendre polynomial order i to the argument, and l is a vector of ones. Therefore, u :m = Ku : and r :,m = Kr : , where ":" represents either MACE or SENT and Σ a,m = LΣ a L′, where Σ a,m is of dimension 9 × 9. In case only a subset of breeding values within animal met the integration requirement, the above dimensions were adjusted accordingly.
Within animal, a vector of prediction error variances was calculated by p diag a,m : : , 1 r m where diag() denotes a vector of diagonal elements of a squared matrix. The system for deriving a diagonal matrix D : of inverse residual variances across traits within animal was diag D p a,m : where D : was obtained directly only if the dimension of the system was 1. Otherwise, D : was established iteratively (Vandenplas and Gengler 2012;Barwick et al. 2013). Within-animal vectors of pseudo-phenotypes ("deregressed proofs") y : * were calculated for A-bulls as [5] (Pitkänen et al., 2020), and for B-bulls as A maximum of 9 pseudo-phenotypes y * of animal i were added to the evaluation using the following model: , , with e where d is a vector containing a sequence from 6 to 305, D i −1 for animal i is either D r −1 for A-bulls or D MACE −1 for B-bulls. In other words, D i −1 is the residual co-variance matrix of the respective pseudo-phenotypes derived as a function of the prediction error variance of the respective domestic and Interbull breeding values. Note that [b l,t ,.,b 3,3 ] are general means modeled within trait t in lactation l and bull group b.

Evaluating Results
For evaluating results, genetic effects were mapped from R 27 to R 9 and subsequently averaged across lactations within trait. The reliability approximation software provided reliabilities only in R 9 , which were averaged across lactations within trait. Single-step reliabilities were approximated with a combination of the method of Jamrozik et al. (2000) and Misztal et al. [2013; I. Strandén, Natural Resources Institute (LUKE), Jokioinen, Finland; personal communication].
For A-bulls, results were evaluated by regressing breeding values and reliabilities received from Interbull on breeding values and reliabilities from pre-and post-integration evaluation, and by Pearson correlations between sets. Because only breeding values with a positive reliability difference were integrated, it was expected that a successful integration produces breeding values and reliabilities which are highly correlated with those supplied by Interbull.
For B-bulls, the same evaluation parameters were calculated, but post-integration breeding values were expected to be a weighted average of domestic and Interbull information and post-integration reliabilities to be greater than the domestic or Interbull reliabilities. Therefore, Pearson correlations between post-integra-tion breeding values and reliabilities and their Interbull equivalents were expected to be lower than that of Abulls.
For the entire evaluation population, it was expected that post-integration reliabilities are either equal to their pre-integration values or higher, where the magnitude is a function of the relation of the respective animals to the integrated bulls.

Software
Data preparation and derivation of D : was done using the R computing environment (version 4; R Development Core Team, 2011). The genetic evaluation model was solved and reliabilities were deterministically calculated using MiX99 and Apax, respectively (Lidauer et al. 2017). Figures  1, 2, 3, and 4. Result trends for fat and milk yield were very similar and are, therefore, not separately shown. For A-bulls, correlations between Interbull breeding values and domestic evaluation breeding values increased from 0.94, 0.93, and 0.94 pre-integration to 0.98, 0.98, and 0.99 post-integration for milk, fat, and protein yield, respectively. The regression intercept changed from −328, −9.16, and −8.94 to −334, −9.72, and −9.08, and the regression slope from 1.03, 1.08, and 1.1 to 0.974, 1.01, and 0.998 for milk, fat, and protein yield, respectively. For B-bulls, correlations between Interbull breeding values and domestic evaluation breeding values increased from ≤0.01 pre-integration to ≥0.98 for all the 3 traits.

Results for protein yield are summarized in
Results were similar for the reliabilities, but correlations and regression parameters, due to the nature of reliabilities, did not provide the same goodness-offit information as for the breeding values. However, post-integration reliabilities of all bulls were at least as high as their Interbull reliabilities. For A-bulls, the post-integration reliabilities aligned very well with the Interbull reliabilities. Further, for B-bulls with breeding values with pre-integration reliabilities of zero, the procedure returned post-integration reliabilities similar to the Interbull reliabilities.
For both A-bulls and B-bulls, the results were not affected by the genotyping status of the integration animals.
For the entire evaluation population, excluding the integrated bulls, about 40% of the reliabilities of all 3 traits were not affected by the integration, and only 5% of the reliabilities increased by more than 0.1 (see Fig-ure 3). Contrarily, every single breeding value changed as a result of the integration with 27, 29, and 29% of the breeding values for milk, fat, and protein yield, respectively, having changed by more than 10% when compared with the pre-integration breeding values.
However, for all 3 traits, only about 15% of the breeding values changed by more than 25% when compared with their pre-integration value. The relatively small number of animals with substantial changes in breeding values is also reflected by the re-ranking of animals (see Figure 4). Although the pre-and post-integration rank was equal for only about 100 animals, only for 4% of all animals the difference between the pre-and post-integration rank was greater than 100,000, resulting in a Spearman's rank correlation between pre-and post-integration ranks of 0.99.

DISCUSSION
Results shown in this study align very well with the formulated expectations and, therefore, the integration of Interbull breeding values into the domestic singlestep random regression test-day evaluation is regarded as successful. The nonzero intercept when regressing Interbull breeding values on domestic breeding values, pre-as well as post-integration, is a result of the average superiority of the integrated bulls within the Australian population as all base population differences were accounted for by fitting bull group specific fixed effects. The finding that all breeding values were affected by the integration, but more than 50% of all reliabilities remained unaffected, is a feature of the reliability approximation (Jamrozik et al., 2000). Although the added data points changed the information content for all modeled factor levels, the accuracy approximation can only account for information changes within groups of closely related animals (e.g., full-sib and halfsib families.) Integration of Interbull breeding values as pseudo data points was considered the only options as, to our knowledge, the used software did not allow for equation system manipulation required for a Bayesian approach. Further, integration of Interbull breeding values as additional traits was not considered given the anticipated effects on solver runtime and equation system dimensions, and requirements for pipeline manipulation and variance component estimation. Boerner et al.: INTERBULL BREEDING VALUES IN RANDOM REGRESSION TEST MODEL Figure 3. Histograms of differences between pre-and post-integration reliabilities (left) and breeding values (right) for protein yield in kilograms of the entire evaluation population without the integrated bulls. Min = minimum change in reliability or breeding value; max = maximum change in reliability or breeding value; no change = proportion of animals without any change in reliabilities; >1% change, >5% change, >10% change, and >25% change = proportion of animals with more than 1, 5, 10, and 25% change in reliability or breeding value, respectively. Proportional changes were calculated using the pre-integration results as a reference. When deriving matrix D : , the fixed effect structure associated with the pseudo-data points was not accounted for. This is not problematic as long as the number of bulls within a fixed effect level is sufficiently large to ensure that the detrimental effect of fixed effects on reliabilities is limited. The current integration approach ignores the relationships between integrated bulls when deriving individual-specific residual variances. Therefore, the methodology used is the same regardless of the target equation system using genomic information or not. Empirically, ignoring the relationships among integration animals does not pose a problem if the reliabilities are high and integration candidates are not closely related, where the latter condition may be unrealistic given the level of global genetic exchange between local dairy populations.
Accounting for relationships among integration animals poses a computational problem. For the approach presented, the procedure for deriving D : of system diag D p a : : ,: : fore, any results regarding convergence are data set and implementation specific. For our data set and approach, where Σ a = Σ a,m was a derivative of the genetic co-variance matrix of a random regression testday model and Γ :,: −1 was 1, we found that updating of D : , as proposed by Vandenplas and Gengler (2012) and Calus et al. (2016), was too coarse and the algorithm never converged. Instead, we updated D : with empirically determined small values (Barwick et al. 2013), which required up to tens of thousands of inversions to achieve convergence. Therefore, extending the approach to a system across traits and animals will certainly affect inversion time and, for a given data set, the required number of iterations as the spectral radius of Σ Γ a − − ⊗ 1 1 :,: will increase if Γ :,: −1 deviates from I.
As an example, for the data set presented deriving D : jointly for the entire integration population would have required inverting a system of rank 144,000 probably several hundred thousand times. An intermediate approach is to approximate some matrix J C C C C a a a a a a a a = − ( ) where C is the mixed model coefficient matrix and a i is a vector indexing the block in C related to animal i, which is similar to approximating prediction error variances when calculating reliabilities. Subsequently, J is used instead of Σ Depending on the actual implementation, the approach either ignores that all other bulls have fractional observations when deriving J for bull i, or it must be run iteratively until convergence. Further, it might be computationally prohibitive to apply such procedure to a single-step system. In any case, computation time may be crucial as imputing data points is done within the genetic evaluation pipeline before the actual solving step, and the frequency of evaluation and available hardware resources may set a tight time frame. The presented approach assumes that genetic correlations between Interbull and domestic breeding values is 1.0, where for B-bulls a deviation from 1.0 can be accounted for by using conditional expectations and associated reliabilities instead of Interbull breeding values and reliabilities. However, for A-bulls, the approach may not be straight forward as an imperfect correlation is a function of perfect and imperfect correlations between the domestic and international parts of the Interbull breeding value and its domestic equivalent, respectively.

CONCLUSIONS
The outlined methodology and presented results demonstrate a possible approach for integrating Interbull breeding values into the domestic single-step random regression test-day model where the pseudo data approach provides good alignment between Interbull and post-integration breeding values and is applicable to software which does not allow for manipulating the right-hand side or a special co-variance structure between individuals of the mixed model equation system. Further work may be required to allow for co-variance between integration candidates.