If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The objectives of this study were to evaluate the feasibility of use of the test-day (TD) single-step genomic BLUP (ssGBLUP) using phenotypic records of Nordic Red Dairy cows. The critical point in ssGBLUP is how genomically derived relationships (G) are integrated with population-based pedigree relationships (A) into a combined relationship matrix (H). Therefore, we also tested how different weights for genomic and pedigree relationships affect ssGBLUP, validation reliability, and validation regression coefficients. Deregressed proofs for 305-d milk, protein, and fat yields were used for a posteriori validation. The results showed that the use of phenotypic TD records in ssGBLUP is feasible. Moreover, the TD ssGBLUP model gave considerably higher validation reliabilities and validation regression coefficients than the TD model without genomic information. No significant differences were found in validation reliability between the different TD ssGBLUP models according to bootstrap confidence intervals. However, the degree of inflation in genomic enhanced breeding values is affected by the method used in construction of the H matrix. The results showed that ssGBLUP provides a good alternative to the currently used multi-step approach but there is a great need to find the best option to combine pedigree and genomic information in the genomic matrix.
Most genomic evaluations are based on multi-step approach that requires (1) calculation of traditional EBV without genomic information; (2) extraction of pseudo-observations, typically either daughter yield deviations (DYD) or deregressed EBV (deregressed proofs; DRP); and (3) genomic model for prediction of direct genomic values (DGV;
) to yield genomic enhanced breeding values (GEBV).
The multi-step approach to calculate GEBV has an inherent problem. First, the parent averages (PA) of progeny of genomically selected animals do not automatically include genomic information. Second, when animals are selected by their GEBV, the future estimation of unbiased EBV becomes difficult because genomic information is not taken into account in the traditionally calculated EBV. Moreover, genomic selection using the multi-step approach is complex and includes several approximations, all of which reduce accuracy and can inflate the resultant GEBV. None of these issues applies to the single-step approach.
Single-step evaluation (single-step genomic BLUP; ssGBLUP) is a unified approach to calculate GEBV. The ssGBLUP combines phenotypic records, pedigree information, and genomic information optimally in calculation of GEBV (
). The approach integrates the pedigree relationship matrix A and genomic relationship matrix G into a single H matrix, which replaces the traditional relationship matrix A in the mixed-model equations (
). However, ssGBLUP has been successfully applied, for example, for final scores of over 6 million Holsteins with greater accuracy than that of a multi-step procedure (
used ssGBLUP to analyze litter size in pigs. Thus, despite its high computational requirements, the single-step method is suitable for large multi-trait analyses. The critical issue with ssGBLUP is compatibility between the marker-based relationship matrix and the pedigree-based relationship matrix for genotyped animals. Applications of the first unified approaches for merging information from animals with or without genotypes by combining the A matrix with the G matrix resulted in biased GEBV (e.g.,
). Since then, it has been demonstrated that accuracy of prediction can be improved and bias reduced by adjusting the G matrix toward their expected values in the A matrix to decrease the scaling problem (e.g.,
) in Nordic Red Dairy Cattle (RDC). As more selection decisions are made using genomic information, it is becoming essential that all genomic information is included in national evaluations. The objectives of this study were to evaluate the feasibility of the large random regression TD ssGBLUP, and to estimate the accuracy of GEBV when using this model. We also tested how different combinations of the A and G matrices affect the bias and accuracy of GEBV in the TD ssGBLUP.
Materials and Methods
All analyses used the data used in the official Nordic RDC milk production evaluations. The multiple-trait milk production evaluation includes TD records for milk, fat, and protein production. Production records from the first 3 lactations are in the same multiple-trait model. Each trait has random regression function for random genetic and permanent environmental effects. For more information, see
The routine full evaluation data from May 2014 for the RDC were obtained from the Nordic Cattle Genetic Evaluation (NAV; Aarhus, Denmark). For production traits, the TD data included 3.8 million cows with a total of 85 million records and 5.1 million animals in the Nordic RDC pedigree. To be able to validate the model, a reduced data set was extracted from the full data set, as follows: the last 4 yr of observations were removed and the reduced data included 2.7 million cows with 72 million records. The reduced data set was used to solve GEBV and EBV for all animals in the pedigree, and the full data set was used to solve current EBV for testing purpose. The initial EBV from the reduced data set were denoted EBVr. For the females without observations and bulls without daughters in reduced data, EBVr are hereafter referred to as parent average (PA). Comparing initial predictions from the reduced data set with those from the full data set allowed estimation of validation accuracy (
). The total number of equations in the reduced run was 217,370,251, and in the full run 238,041,030.
The unified relationship matrix H in single-step evaluations defines the relationships among genotyped and nongenotyped animals. Although H can be expensive to compute, its inverse has a simple structure (
where A22 is the sub-matrix of the pedigree-based numerator relationship matrix A for the genotyped animals, and G is the relationship matrix constructed using genomic information. The G matrix had 15,148 genotyped RDC animals, of which 5,534 were bulls and 9,529 cows. The G matrix also included genotypes of animals without offspring or records. Genotypes were obtained from the Illumina Bovine SNP50 Bead Chip (Illumina, San Diego, CA). After application of exclusion criteria, 46,914 SNP markers on the 29 bovine autosomes were available for further analysis. The genotype file was the same as was used in official genomic evaluation of Nordic Cattle Genetic Evaluation in June 2014. Genotypes were used to form the raw G matrix with method 1 in
). Before the matrices G and A22 were combined, the raw G matrix was scaled by scalar , where tr is the trace of matrix. Thus, G has, on average, the same diagonals as the A22 matrix.
When the mixed-model equation for single-step is considered, the difference from the normal animal model is the matrix block H22 = A22 + G−1–A22−1 between genotyped animals. To improve the properties of the ssGBLUP, different weights in building the H22 matrix were tested.
noted that if not all genetic variance is accounted for by the SNP effects, the residual polygenic effect can be included in the model by changing the genomic matrix G and using where Gw = (1 − w)G + w A22, and the constant w represents the proportion of polygenic variance not described by markers. So, the smaller w, the more genetic variance that is attributed to genomic markers. We used 3 different proportions w (w = 0.10, w = 0.15, or w = 0.20) in Gw. In
suggested that optimal weights τ for and ω for decrease the possible inflation of GEBV estimated by ssGBLUP. The parameters τ and ω scale the size of the genomic and pedigree relationships, respectively. The larger τ is, the less weight is given to G, whereas larger values of ω decrease the importance of pedigree relationships and increase the importance of genomic relationships. We tested 4 combinations of these parameters. The first was the combination found best in
: τ = 1.5 and ω = 0.6. In the other set ups, the weights were selected to be τ = 1.6 and ω = 0.5, τ = 1.6 and ω = 1.0, or τ = 1.0 and ω = 0.5. In the tests with different τ and ω, the proportion of polygenic variance was fixed to w = 0.10 in Gw. In the following, these different methods are referred to as follows: w20, w15, w10, τ1.6ω1.0, τ1.0ω0.5, τ1.6ω0.5, and τ1.5ω0.6. Note that the method using w = 0.10 corresponds to the situation where τ = 1.0 and ω = 1.0; however, we refer to this method as w10, instead of τ1.0ω1.0.
The analyses used the NAV routine EBV milk production evaluation model, in which a multiple-trait TD model is used to estimate EBV of milk, fat, and protein simultaneously. The GEBV were obtained from the TD ssGBLUP. Unknown parents by genetic groups and inbreeding coefficients were taken into account in the model. The mixed model equations were solved by MiX99 software using iteration on data and preconditioned conjugate gradients (PCG) iteration (
). Computations for EBV and GEBV were very similar. Equations to solve EBV used inverse of A−1, which was replaced by alternative H−1 matrices in the single-step method for GEBV. In the PCG algorithm, the iteration involves multiplication of search direction vector v by the MME coefficient matrix. The implementation of the single-step method in MiX99 splits the required matrix multiplications into several steps. The first step is the multiplication of the least squares part of the coefficient matrix and the second is the product H−1v. This is further divided into 2 steps. First, the product A−1v is computed directly by reading the pedigree file as is done in the traditional EBV calculation. Second, the product H22v is calculated by reading [H22 − A22] from a separate file during each PCG iteration cycle. Thus, the only additional work for the single step in solving MME is the matrix times vector product H22v in each PCG iteration where However, due to different convergence in the iteration with different models, extra work may be required for the single-step method.
All the analyses of the results used the official 305-d lactation total yields of milk, protein, and fat (Lidauer et al., 2014). For validation of GEBV, the EBV for the lactation totals were obtained from the full data analysis. These were then deregressed for bull GEBV validation and cow GEBV validation using the Secant method in option DeRegress (
) for all animals in the pedigree when full data was used. Variance parameters in ERC approximation were from the average daily TD model, and the same values (h2milk = 0.48, h2protein = 0.48, and h2fat = 0.49) were used throughout the study. Deregressions used the full pedigree in NAV evaluation and EBV for the bulls and cows from the full evaluation model. The 3 traits were deregressed simultaneously but assuming genetic and residual correlations to be zero. We chose to use DRP also for the bulls in the validation calculations instead of DYD to have directly comparable results with sire model GBLUP studies, including those in
Bulls born between years 2006 and 2009 and having their EBV based on ERC ≥3.0 in the full data but having PA with ERC = 0.0 in the reduced data were defined as candidate bulls. For a bull, the phenotypic information obtained with ERC = 3.0 corresponds roughly to 20 daughters. For the cow validation, genotyped cows with no TD records in the reduced data and a minimum of 5 TD records in the full data were considered candidate cows. Finally, in the validation test, we had 707 candidate bulls and 7,113 candidate cows. Validation reliability of predictions was assessed using the Interbull validation protocol (
where y has the DRP of the candidate bulls or cows in the full data, b0 and b1 are unknown regression coefficients, â has the genomic prediction for bulls or cows based on the reduced data analysis (GEBV), and e is the residual error. The validation reliability of the model was obtained from the coefficient of determination (R2) of the model (R2model), after correcting it by the average reliability of DRP of the candidate bulls or cows; that is, The reliabilities of DRP were calculated as where λ = (1 − h2)/h2. To estimate the further gain from the genomic information over the traditional PA (
), the same validation tests were also applied to PA. Confidence intervals (CI) were estimated for the regression coefficients (b1), the validation reliabilities, and the differences between the validation reliabilities among the model alternatives (e.g., R2validation,w20 − R2validation,w15) using nonparametric bootstrap. The boot and boot.ci functions of the R package (
R Core Development Team. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
) were used to calculate the bootstrap CI separately for candidate bulls and cows. Number of bootstrap samples was 10,000. Bootstrap confidence intervals calculated using 3 methods: basic, norm, and perc. Confidence intervals by the basic method are given, because all methods gave approximately the same values.
Results and Discussion
The number of iterations for EBV estimation was 3,818, that for EBVr was 3,705, and that for the ssGBLUP varied from 3,545 to 4,990 depending on the method used. The models were run into the same level of convergence: where Ca = relative difference between left- and right-hand side of the part of the MME that includes the equations of the additive genetic animal effects. All models took about 35 to 54 h to run with 4 Intel Xeon 3.6 GHz processors. There was an approximately 15% difference in iteration time: 33s and, on average, 38s per iteration round for EBV and ssGBLUP, respectively. Although the increase in computing time was mainly due to extra iterations, some differences were apparent between methods. According to the time used per iteration round, w20 and w15 had the best convergence among ssGBLUP methods. Computationally, the ssGBLUP added very little extra computing time to solving the mixed model equations by the PCG method. The only significant extra computations in the single-step method were due to the construction of the H22 matrix, which was done once before applying the PCG method. Presumably, the inclusion of genomic data affects convergence because the variance structure of genotyped animals in the G matrix is less diagonally dominant than with the pedigree-based relationship matrix A only. It has been shown that slight changes in scaling of G and A22 can affect convergence without a negative effect on validation accuracy (
). Generally, ssGBLUP with w15 or τ1.0ω0.5 needed the fewest iteration rounds to achieve convergence. Also, ω = 0.5 seemed to give better convergence than ω = 0.6 or ω = 1.0. This is in agreement with
, who noted that larger ω causes H to be less positive definite, leading to slower convergence or even divergence when mixed model equations are solved by the iterative method, whereas smaller ω tend to give better convergence.
We also observed poor convergence with some Gw matrices.
presented a method to scale and adjust the genomic relationship matrix. The proposed method calculates adjusted genomic relationship matrix G* = aGw + b11′, where 1 is a vector of ones and the constants a and b are such that in G*, the average of diagonals and average of off-diagonals equal to those in the pedigree-based relationship matrix. We tested G* in ssGBLUP using our data, but did not get solutions for further examination because the PCG algorithm showed poor convergence and failed to converge within 5,000 iterations. The adjustment constants for our data set were a = 0.968 and b = 0.0323; that is, the Gw matrix was inflated by 3% after which all values were increased by a constant b. When the diagonal values of Gw are <1, the G* matrix is diagonally less dominant than Gw, which means increased correlations between animals and possibly poorer convergence of the PCG algorithm. There are other ways to correct genetic differences among genotyped and nongenotyped individuals (
Table 1 presents standard deviations of EBV and GEBV for reference and candidate bulls and cows in the full and reduced data sets. Standard deviations were derived within birth year and pooled across years to account for selection. For the reference animals, standard deviations of the GEBV closely followed standard deviations of the standardized EBV, although overall, some variation existed in the standard deviations of bulls and cows among different methods in building the H22 matrix. By using different w, standard deviations of GEBV were close to that of EBV, whereas when using different τ and ω, standard deviations of GEBV were somewhat lower than that of EBV. The reference bulls have a lot of progeny information and, there, changes in w did not affect reference animal GEBV much. In contrast, changes in τ and ω tended to have larger effects on reference bull standard deviations. In candidate bulls, standard deviations of GEBV increased with decreasing w; that is, increased genomic information increased standard deviations of GEBV. Overall, in the candidate animals, the standard deviations of GEBV were higher than that of PA due to added information from genotypes. Standard deviations of the GEBV varied also with different estimation methods. In contrast to reference animals, in candidate bulls and cows, the use of different w gave higher standard deviations than the use of τ and ω in building the H22 matrix. In conclusion, the effect of changing τ and ω was an important one that affected standard deviations of both candidate and reference animals when changes in w affected to a larger degree candidate animals.
Table 1Standard deviations of EBV in the full data set and the reduced data (EBVr; for candidates, this is parent average, PA), and genomic enhanced breeding values with different single-step genomic BLUP (ssGBLUP) methods for the reference and candidate bulls and cows
EBV are expressed as standardized EBV with SD of 10 units for all bulls and cows born between the years 1990 and 2012. Standard deviations were calculated within birth year and pooled across years.
w=proportion of polygenic variance; τ=weight for G−1; ωxx = weight for A22−1 matrix.
Milk
Protein
Fat
Bulls
Cows
Bulls
Cows
Bulls
Cows
No. of animals
4,442
270
EBV
11.11
8.83
10.51
7.97
11.52
10.27
EBVr
10.93
9.42
10.39
9.21
11.40
11.71
ssGBLUP
w20
11.08
8.89
10.55
8.71
11.63
10.90
w15
11.08
8.81
10.54
8.62
11.63
10.80
w10
11.08
8.74
10.54
8.52
11.62
10.70
τ1.0ω0.5
10.57
7.79
9.99
7.62
11.08
9.61
τ1.6ω0.5
10.34
7.40
9.76
7.16
10.85
8.95
τ1.5ω0.6
10.44
7.56
9.87
7.31
10.96
8.17
τ1.6ω1.0
10.79
8.12
10.24
7.77
11.34
9.72
Candidate
No. of animals
707
7,113
EBV
10.85
9.19
10.12
9.02
10.07
9.87
PA
7.01
6.51
6.90
6.49
7.76
7.83
ssGBLUP
w20
8.93
8.92
8.90
8.86
9.50
10.52
w15
9.06
9.13
9.03
9.05
9.61
10.69
w10
9.20
9.37
9.17
9.26
9.73
10.87
τ1.0ω0.5
7.25
7.19
7.11
6.83
7.79
8.38
τ1.6ω0.5
7.26
7.26
7.07
6.81
7.66
8.24
τ1.5ω0.6
7.44
7.42
7.27
7.01
7.86
8.48
τ1.6ω1.0
8.73
8.87
8.62
8.60
9.10
10.09
1 EBV are expressed as standardized EBV with SD of 10 units for all bulls and cows born between the years 1990 and 2012. Standard deviations were calculated within birth year and pooled across years.
2 w = proportion of polygenic variance; τ = weight for G−1; ωxx = weight for A22−1 matrix.
The correlations between GEBV from different methods for candidate animals are presented in Table 2. The correlations for reference bulls were all near to 1 (varying from 0.998 to 0.999). Thus, for proven bulls, different methods seem not to affect the GEBV greatly. For candidate bulls and cows, the correlations among different methods differed considerably, varying from 0.952 to 0.999 for bulls and from 0.933 to 0.999 for cows. There, the differences appeared between methods using w and methods using τ and ω. The correlations between the different combinations of τ and ω increased as the amount of the additive genetic variance attributed to the genomic markers increased.
Table 2Correlations (above diagonal = candidate bulls; below diagonal = candidate cows) among genomic enhanced breeding values from different methods in building genomic relationship matrices
The model validation results for bulls are in Table 3 and for cows in Table 4. The tables present regression coefficients (b1) and validation reliabilities (R2) with 95% bootstrap CI with 10,000 bootstrap resampling. For the bulls, validation reliabilities from the ssGBLUP with different H22 matrix block varied between 0.48 and 0.49 for milk, between 0.39 and 0.40 for protein, and between 0.41 and 0.44 for fat. The PA based on the same data but without genomic information gave, on average, 14-percentage-units lower reliability for milk, protein, and fat. For bulls, the best choice of parameters for milk was τ1.5ω0.6, and for protein and fat was τ1.6ω0.5. The lower ω for protein and fat means that more weight should be placed on genomic relationships for protein and fat relative to milk. For cows, validation reliabilities with different genomic matrix varied between 0.41 and 0.42 for milk, 0.31 and 0.33 for protein, and 0.32 and 0.34 for fat. Cow PA gave on average 12.6-percentage-unit lower reliabilities than GEBV. For cows, the best choice of parameters for milk and protein was τ1.5ω0.6, and for fat, the best choice was τ1.6ω0.5. Thus, protein has different results in cows compared with bulls. However, for cows, validation reliabilities can also differ because they have fewer close relatives in the reference population
Table 3Bull validation results from different single-step genomic BLUP (ssGBLUP) methods, showing regression coefficients (b1), validation reliabilities (R2), and their 95% bootstrap confidence intervals (CI) from the parent average (PA) and genomic enhanced breeding values with different methods in ssGBLUP
Table 4Cow validation results from different single-step genomic BLUP (ssGBLUP) methods, showing regression coefficients (b1), validation reliabilities (R2), and their 95% bootstrap confidence intervals (CI) from the parent average (PA), and genomic enhanced breeding values with different methods in ssGBLUP
Table 5 presents the root mean square errors (MSE) for the regressions. The MSE combine all criteria (inflation, trend, and accuracy), and the model with the lowest MSE should be the best one. Only small differences existed among MSE and they were mainly in line with the validation reliability results. However, according to MSE for bulls, the best parameter option for protein and fat would be τ1.6ω1.0, which contradicts the results from Table 3.
Table 5Root mean square errors (MSE) from different validation regression models
Model
Milk
Protein
Fat
Bull
PA
644.81
20.94
25.77
w20
589.46
19.26
23.09
w15
589.46
19.26
23.00
w10
589.45
19.26
22.99
τ1.0ω0.5
584.50
19.26
23.26
τ1.6ω0.5
584.56
19.19
23.03
τ1.5ω0.6
583.02
19.15
22.97
τ1.6ω1.0
585.83
19.13
22.74
Cow
PA
866.90
30.85
35.23
w20
840.04
30.19
34.17
w15
839.94
30.10
34.16
w10
840.24
30.20
34.17
τ1.0ω0.5
836.96
31.00
34.09
τ1.6ω0.5
835.20
30.05
34.02
τ1.5ω0.6
834.56
30.04
34.01
τ1.6ω1.0
836.28
30.08
34.17
1PA = parent average; w = proportion of polygenic variance; τ = weight for G−1; ωxx = weight for A22−1 matrix.
In general, our results indicate that differences in validation reliabilities among different methods are small, especially for bulls. This is demonstrated in more detail with the pairwise bootstrap comparisons, where only significant differences appear between validation reliabilities from PA and different GEBV, and between some ssGBLUP methods in fat (Table 6). For cows, significant differences seem to arise in validation reliabilities between methods using different values of w, τ, and ω. Narrower CI could be achieved by increasing the number of validation animals. However, this would reduce the reference population size and could lead to lower validation reliability. Thus, significant differences in genomic prediction methods may be difficult to find when the number of genotyped animals is small. In addition, a small validation population has a large CI in validation reliability and this (random noise) may be one reason for different methods giving highest validation reliability, particularly in other studies where no confidence intervals have been given.
Table 6Pairwise comparisons of validation reliabilities (R
Letters indicate significant differences between compared methods. Differences in methods for bulls are shown above the diagonal and those for cows below the diagonal; M, P, F indicate milk, protein, and fat.
Method
PA
w20
w15
w10
τ1.0ω0.5
τ1.6ω0.5
τ1.5ω0.6
τ1.6ω1.0
PA
M P F
M P F
M P F
M P F
M P F
M P F
M P F
w20
M P F
F
w15
M P F
F
w10
M P F
F
τ1.0ω0.5
M P F
F
F
F
τ1.6ω0.5
M P F
M P F
M P F
M P F
M P F
F
τ1.5ω0.6
M P F
M P F
M P F
M P F
M P F
M
τ1.6ω1.0
M P F
M P F
M P F
M P F
M
1 Letters indicate significant differences between compared methods. Differences in methods for bulls are shown above the diagonal and those for cows below the diagonal; M, P, F indicate milk, protein, and fat.
2 w = proportion of polygenic variance; τ = weight for G−1; ωxx = weight for A22−1 matrix.
The degree of inflation is indicated by the coefficient of regression (b1) of true genetic values on (G)EBV. Optimal prediction of genetic merit of young individuals should have a regression coefficient of 1. With b1 <1, the predictions are inflated, and the differences in estimated genetic merit of young individuals are exaggerated compared with their future performance. For bulls, the b1 values were almost always lower than the expected value, indicating that GEBV over-evaluate differences between bulls (Table 3). Changes in τ seemed to have only small effect on the b1 value, unlike changes in ω. Decreasing ω increases weight of pedigree relationship matrix such that GEBV will be more influenced by pedigree information. Decrease in ω increased b1 value. Only for milk and with τ1.6ω0.5, τ1.5ω0.5, or τ1.0ω0.5, the b1 values were >1. However, depending on the method used to build the H22 matrix for ssGBLUP, the over-dispersion was very similar to or even lower than that with PA. This suggests that GEBV by the single-step method are less biased than PA, but it is still essential to determine the best method to build the H22 matrix. The results were similar with cows, although in general, b1 values were higher for cows than for bulls, indicating a smaller bias (Table 4). Several studies have found better accuracies and lower biases of GEBV by fine-tuning w, τ, and ω when constructing the H matrix (
The genetic trends in milk (G)EBV for genotyped bulls are shown in Figure 1. For reference bulls, the genetic trends with different GEBV followed close to the EBV and EBVr trends, although there was a tendency for methods using w to give genetic trends that were closer to EBV compared with methods using a wider range of τ and ω. For the candidate bulls, the ssGBLUP with different w as well as PA seem to overestimate the trend, whereas methods using τ and ω seemed to underestimate the trend. Figure 1 indicates that putting more weight on genomic information is desirable. However, this must be balanced with the proper parameter choice for τ and ω to have the correct combination of genomic and pedigree relationships.
Figure 1Genetic trends for milk genomic enhanced breeding values (GEBV) and EBV of reference and candidate bulls from reduced data (i.e., EBVr) with different methods used in single-step genomic BLUP (ssGBLUP). For the candidate bulls, the EBVr is parent average (PA); EBV are from the full test-day model. Solid lines indicate reference bulls and dashed lines indicate candidate bulls. w = proportion of polygenic variance; τ = weight for G−1; ωxx = weight for A22−1 matrix. Color version available online.
Comparison of different methods to build the H22 matrix for the ssGBLUP showed that alternative parameters could be chosen depending on whether the goal is to achieve higher validation reliabilities or smaller bias. In general, differences in validation reliabilities and their CI (Table 3 and Table 4) with the studied methods were smaller than effects to regression coefficient b1, which indicate bias. Therefore, it is essential to consider the whole picture before choosing which method to use. Moreover, different methods can be optimal for different traits. Differences in optimal weighting factors may be due to differences in genetic architecture. Use of a genomic relationship matrix that weights markers according to analyzed trait (e.g.,
) may better account for differences in genetic architecture.
Validation reliabilities for bulls from the current study were generally higher than the validation reliabilities obtained for RDC with genomic evaluation by multi-step approach and by sire model single-step method (
). This is partly due to the larger reference population. It also appears that we achieved better validation reliabilities by using phenotypic data in the ssGBLUP than by using deregressed breeding values (DRP) from either sire or animal model deregression (
Koivula, M., I. Strandén, G. P. Aamand, and E. A. Mäntysaari. 2014. Effect of cow reference group on validation accuracy of genomic evaluation. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 083.
). Moreover, based on the regression coefficients b1, the GEBV from the phenotypic records seemed to be less inflated than the DGV from sire model or GEBV from animal model deregressions. Still, the models in the current study might fail the Interbull GEBV validation test. From bootstrapping with 10,000 samples, CI for the regression coefficients of the GEBV reached 1.0 with only some methods in milk and protein; for fat, the upper limits were always <1.0. The bootstrap CI are consistent with results in
, in which we suggested that to reach a standard error of 0.049 for the estimate of b1 requires 443 bulls. However, the Interbull requirement for unbiasedness will accept b1 estimates between 0.90 and 1.20, even if the CI of b1 does not include 1. Thus, a b1 estimate of 0.9 is not considered biologically significantly biased. Therefore, a standard error of 0.05is an important limit for the power of the test, because with it the statistical and biological significance agree. Reversing the power consideration, we could suggest that the number of validation bulls should always be at least 500.
The ssGBLUP has been used for several large-scale analyses, including in dairy cattle (
). Experiences from these studies have indicated that ssGBLUP gives as high or higher validation reliability than multi-step methods, and the inflation of GEBV is generally smaller. Another reason for using ssGBLUP is the ability to account for selection bias when selection is based on genotypes only (
The current study shows that ssGBLUP is easy to implement in existing national evaluation models. In this way, phenotypic records are combined directly with genomic information and, thus, resulting GEBV directly combine both sources of information. Moreover, in TD ssGBLUP, genomic information can be accounted for in estimation of environmental effects. Additional computational costs in the single-step approach may be lower than in multiple-step genomic evaluations. In our study, the number of genotyped bulls was relatively low, and the G matrix was easy to invert. For a large number of genotyped animals, algorithms have been proposed that overcome the need of G inverse, for example, by
Liu, Z., M. Goddard, F. Reinhardt, and R. Reents. 2013. Computing strategies for a single step SNP model with an across country reference population. No. 19:452 in Book of Abstracts: 64th Annu. Mtg. EAAP, Nantes, France. EAAP, Rome, Italy.
Strandén, I., and E. A. Mäntysaari. 2014. Comparison of some equivalent equations to solve single-step GBLUP. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 069.
Our results show that the use of phenotypic test-day records in single step analysis is feasible. The ssGBLUP provides a good alternative to the current multi-step approach used for Nordic RDC. The ssGBLUP is easy to implement and it gives results comparable to those of the original models. Moreover, the use of phenotypic records gave higher validation reliabilities compared with earlier validations that used sire model or animal model deregressions. However, it is essential to find the optimal way to combine G and A matrixes to minimize bias and maximize reliability of GEBV.
Acknowledgments
This work was a part of the Genomic Selection project originally established by Aarhus University and the Nordic cattle breeding organizations Viking Genetics (Randers, Denmark), Nordic Cattle Genetic Evaluation (Aarhus, Denmark), and Faba (Hollola, Finland). They are acknowledged for providing the genotype and test-day data.
References
Aguilar I.
Misztal I.
Johnson D.L.
Legarra A.
Tsuruta S.
Lawlor T.J.
Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
Koivula, M., I. Strandén, G. P. Aamand, and E. A. Mäntysaari. 2014. Effect of cow reference group on validation accuracy of genomic evaluation. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 083.
Liu, Z., M. Goddard, F. Reinhardt, and R. Reents. 2013. Computing strategies for a single step SNP model with an across country reference population. No. 19:452 in Book of Abstracts: 64th Annu. Mtg. EAAP, Nantes, France. EAAP, Rome, Italy.
R Core Development Team. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Strandén, I., and E. A. Mäntysaari. 2014. Comparison of some equivalent equations to solve single-step GBLUP. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 069.