## Abstract

**G**) are integrated with population-based pedigree relationships (

**A**) into a combined relationship matrix (

**H**). Therefore, we also tested how different weights for genomic and pedigree relationships affect ssGBLUP, validation reliability, and validation regression coefficients. Deregressed proofs for 305-d milk, protein, and fat yields were used for a posteriori validation. The results showed that the use of phenotypic TD records in ssGBLUP is feasible. Moreover, the TD ssGBLUP model gave considerably higher validation reliabilities and validation regression coefficients than the TD model without genomic information. No significant differences were found in validation reliability between the different TD ssGBLUP models according to bootstrap confidence intervals. However, the degree of inflation in genomic enhanced breeding values is affected by the method used in construction of the

**H**matrix. The results showed that ssGBLUP provides a good alternative to the currently used multi-step approach but there is a great need to find the best option to combine pedigree and genomic information in the genomic matrix.

## Key words

## Introduction

**DYD**) or deregressed EBV (deregressed proofs;

**DRP**); and (3) genomic model for prediction of direct genomic values (

**DGV**;

**GEBV**).

**PA**) of progeny of genomically selected animals do not automatically include genomic information. Second, when animals are selected by their GEBV, the future estimation of unbiased EBV becomes difficult because genomic information is not taken into account in the traditionally calculated EBV. Moreover, genomic selection using the multi-step approach is complex and includes several approximations, all of which reduce accuracy and can inflate the resultant GEBV. None of these issues applies to the single-step approach.

**ssGBLUP**) is a unified approach to calculate GEBV. The ssGBLUP combines phenotypic records, pedigree information, and genomic information optimally in calculation of GEBV (

**A**and genomic relationship matrix

**G**into a single

**H**matrix, which replaces the traditional relationship matrix

**A**in the mixed-model equations (

**A**matrix with the

**G**matrix resulted in biased GEBV (e.g.,

**G**matrix toward their expected values in the

**A**matrix to decrease the scaling problem (e.g.,

**TD**) model is currently used for the official Nordic genetic evaluation of production (

**RDC**). As more selection decisions are made using genomic information, it is becoming essential that all genomic information is included in national evaluations. The objectives of this study were to evaluate the feasibility of the large random regression TD ssGBLUP, and to estimate the accuracy of GEBV when using this model. We also tested how different combinations of the

**A**and

**G**matrices affect the bias and accuracy of GEBV in the TD ssGBLUP.

## Materials and Methods

**NAV**; Aarhus, Denmark). For production traits, the TD data included 3.8 million cows with a total of 85 million records and 5.1 million animals in the Nordic RDC pedigree. To be able to validate the model, a reduced data set was extracted from the full data set, as follows: the last 4 yr of observations were removed and the reduced data included 2.7 million cows with 72 million records. The reduced data set was used to solve GEBV and EBV for all animals in the pedigree, and the full data set was used to solve current EBV for testing purpose. The initial EBV from the reduced data set were denoted

**EBV**. For the females without observations and bulls without daughters in reduced data, EBV

_{r}_{r}are hereafter referred to as parent average (

**PA**). Comparing initial predictions from the reduced data set with those from the full data set allowed estimation of validation accuracy (

**H**in single-step evaluations defines the relationships among genotyped and nongenotyped animals. Although

**H**can be expensive to compute, its inverse has a simple structure (

where

**A**

_{22}is the sub-matrix of the pedigree-based numerator relationship matrix

**A**for the genotyped animals, and

**G**is the relationship matrix constructed using genomic information. The

**G**matrix had 15,148 genotyped RDC animals, of which 5,534 were bulls and 9,529 cows. The

**G**matrix also included genotypes of animals without offspring or records. Genotypes were obtained from the Illumina Bovine SNP50 Bead Chip (Illumina, San Diego, CA). After application of exclusion criteria, 46,914 SNP markers on the 29 bovine autosomes were available for further analysis. The genotype file was the same as was used in official genomic evaluation of Nordic Cattle Genetic Evaluation in June 2014. Genotypes were used to form the raw

**G**matrix with method 1 in

**G**and

**A**were combined, the raw

_{22}**G**matrix was scaled by scalar $t=\frac{tr\left({\text{A}}_{22}\right)}{tr\left(\text{G}\right)}$, where

*tr*is the trace of matrix. Thus,

**G**has, on average, the same diagonals as the

**A**

_{22}matrix.

**H**

^{22}=

**A**

^{22}+

**G**

^{−1}–

**A**

_{22}

^{−1}between genotyped animals. To improve the properties of the ssGBLUP, different weights in building the

**H**

^{22}matrix were tested.

**G**and using ${\text{H}}^{22}={\text{A}}^{22}+{\text{G}}_{\text{w}}^{-1}\u2013{\text{A}}_{22}^{-1},$ where

**G**

_{w}= (1 − w)

**G**+ w

**A**

_{22}, and the constant w represents the proportion of polygenic variance not described by markers. So, the smaller w, the more genetic variance that is attributed to genomic markers. We used 3 different proportions w (w = 0.10, w = 0.15, or w = 0.20) in

**G**

_{w}. In

**H**

^{22}matrix was further scaled to be ${\text{H}}^{22}={\text{A}}^{22}+\tau {\text{G}}_{\text{w}}^{-1}\u2013\omega {\text{A}}_{22}^{-1}.$

**G**, whereas larger values of ω decrease the importance of pedigree relationships and increase the importance of genomic relationships. We tested 4 combinations of these parameters. The first was the combination found best in

**G**

_{w}. In the following, these different methods are referred to as follows: w

_{20}, w

_{15}, w

_{10}, τ

_{1.6}ω

_{1.0,}τ

_{1.0}ω

_{0.5}, τ

_{1.6}ω

_{0.5}, and τ

_{1.5}ω

_{0.6}. Note that the method using w = 0.10 corresponds to the situation where τ = 1.0 and ω = 1.0; however, we refer to this method as w

_{10}, instead of τ

_{1.0}ω

_{1.0}.

**PCG**) iteration (

**A**

^{−1}, which was replaced by alternative

**H**

^{−1}matrices in the single-step method for GEBV. In the PCG algorithm, the iteration involves multiplication of search direction vector

**v**by the MME coefficient matrix. The implementation of the single-step method in MiX99 splits the required matrix multiplications into several steps. The first step is the multiplication of the least squares part of the coefficient matrix and the second is the product

**H**

^{−1}

**v**. This is further divided into 2 steps. First, the product

**A**

^{−1}

**v**is computed directly by reading the pedigree file as is done in the traditional EBV calculation. Second, the product

**H**

^{22}

**v**is calculated by reading [

**H**

^{22}−

**A**

^{22}] from a separate file during each PCG iteration cycle. Thus, the only additional work for the single step in solving MME is the matrix times vector product

**H**

^{22}

**v**in each PCG iteration where $\left[{\text{H}}^{22}-{\text{A}}^{22}\right]={\text{G}}_{\text{w}}^{-1}\u2013{\text{A}}_{22}^{-1}.$ However, due to different convergence in the iteration with different models, extra work may be required for the single-step method.

**ERC**) was used as a weighting factor in the deregressions. The ERC (

^{2}

_{milk}= 0.48, h

^{2}

_{protein}= 0.48, and h

^{2}

_{fat}= 0.49) were used throughout the study. Deregressions used the full pedigree in NAV evaluation and EBV for the bulls and cows from the full evaluation model. The 3 traits were deregressed simultaneously but assuming genetic and residual correlations to be zero. We chose to use DRP also for the bulls in the validation calculations instead of DYD to have directly comparable results with sire model GBLUP studies, including those in

where

**y**has the DRP of the candidate bulls or cows in the full data, b

_{0}and b

_{1}are unknown regression coefficients,

**â**has the genomic prediction for bulls or cows based on the reduced data analysis (GEBV), and

**e**is the residual error. The validation reliability of the model was obtained from the coefficient of determination (R

^{2}) of the model (R

^{2}

_{model}), after correcting it by the average reliability of DRP $\left({\overline{r}}_{DRP}^{2}\right)$ of the candidate bulls or cows; that is, ${{\text{R}}^{\text{2}}}_{\text{validation}}={{\text{R}}^{\text{2}}}_{\text{model}}/{\overline{r}}_{DRP}^{2}.$ The reliabilities of DRP were calculated as ${r}_{DRP}^{2}=ER{C}_{i}/\left(ER{C}_{i}+\lambda \right),$ where λ = (1 − h

^{2})/h

^{2}. To estimate the further gain from the genomic information over the traditional PA (

**CI**) were estimated for the regression coefficients (

**b**), the validation reliabilities, and the differences between the validation reliabilities among the model alternatives (e.g., R

_{1}^{2}

_{validation},

_{w20}− R

^{2}

_{validation},

_{w15}) using nonparametric bootstrap. The boot and boot.ci functions of the R package (

R Core Development Team. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

## Results and Discussion

_{r}was 3,705, and that for the ssGBLUP varied from 3,545 to 4,990 depending on the method used. The models were run into the same level of convergence: ${\text{C}}_{\text{a}}=\left(\sqrt{10.5}\right),$ where C

_{a}= relative difference between left- and right-hand side of the part of the MME that includes the equations of the additive genetic animal effects. All models took about 35 to 54 h to run with 4 Intel Xeon 3.6 GHz processors. There was an approximately 15% difference in iteration time: 33s and, on average, 38s per iteration round for EBV and ssGBLUP, respectively. Although the increase in computing time was mainly due to extra iterations, some differences were apparent between methods. According to the time used per iteration round, w

_{20}and w

_{15}had the best convergence among ssGBLUP methods. Computationally, the ssGBLUP added very little extra computing time to solving the mixed model equations by the PCG method. The only significant extra computations in the single-step method were due to the construction of the

**H**

^{22}matrix, which was done once before applying the PCG method. Presumably, the inclusion of genomic data affects convergence because the variance structure of genotyped animals in the

**G**matrix is less diagonally dominant than with the pedigree-based relationship matrix

**A**only. It has been shown that slight changes in scaling of

**G**and

**A**

_{22}can affect convergence without a negative effect on validation accuracy (

_{15}or τ

_{1.0}ω

_{0.5}needed the fewest iteration rounds to achieve convergence. Also, ω = 0.5 seemed to give better convergence than ω = 0.6 or ω = 1.0. This is in agreement with

**H**to be less positive definite, leading to slower convergence or even divergence when mixed model equations are solved by the iterative method, whereas smaller ω tend to give better convergence.

**G**

_{w}matrices.

**G*** =

*a*

**G**

_{w}+

*b*

**11**′, where

**1**is a vector of ones and the constants

*a*and

*b*are such that in

**G***, the average of diagonals and average of off-diagonals equal to those in the pedigree-based relationship matrix. We tested

**G*** in ssGBLUP using our data, but did not get solutions for further examination because the PCG algorithm showed poor convergence and failed to converge within 5,000 iterations. The adjustment constants for our data set were

*a*= 0.968 and

*b*= 0.0323; that is, the

**G**

_{w}matrix was inflated by 3% after which all values were increased by a constant

*b*. When the diagonal values of

**G**

_{w}are <1, the

**G***matrix is diagonally less dominant than

**G**

_{w}, which means increased correlations between animals and possibly poorer convergence of the PCG algorithm. There are other ways to correct genetic differences among genotyped and nongenotyped individuals (

**H**

^{22}matrix. By using different w, standard deviations of GEBV were close to that of EBV, whereas when using different τ and ω, standard deviations of GEBV were somewhat lower than that of EBV. The reference bulls have a lot of progeny information and, there, changes in w did not affect reference animal GEBV much. In contrast, changes in τ and ω tended to have larger effects on reference bull standard deviations. In candidate bulls, standard deviations of GEBV increased with decreasing w; that is, increased genomic information increased standard deviations of GEBV. Overall, in the candidate animals, the standard deviations of GEBV were higher than that of PA due to added information from genotypes. Standard deviations of the GEBV varied also with different estimation methods. In contrast to reference animals, in candidate bulls and cows, the use of different w gave higher standard deviations than the use of τ and ω in building the

**H**

^{22}matrix. In conclusion, the effect of changing τ and ω was an important one that affected standard deviations of both candidate and reference animals when changes in w affected to a larger degree candidate animals.

_{r}; for candidates, this is parent average, PA), and genomic enhanced breeding values with different single-step genomic BLUP (ssGBLUP) methods for the reference and candidate bulls and cows

Reference | Milk | Protein | Fat | |||
---|---|---|---|---|---|---|

Bulls | Cows | Bulls | Cows | Bulls | Cows | |

No. of animals | 4,442 | 270 | ||||

EBV | 11.11 | 8.83 | 10.51 | 7.97 | 11.52 | 10.27 |

EBV_{r} | 10.93 | 9.42 | 10.39 | 9.21 | 11.40 | 11.71 |

ssGBLUP | ||||||

w_{20} | 11.08 | 8.89 | 10.55 | 8.71 | 11.63 | 10.90 |

w_{15} | 11.08 | 8.81 | 10.54 | 8.62 | 11.63 | 10.80 |

w_{10} | 11.08 | 8.74 | 10.54 | 8.52 | 11.62 | 10.70 |

τ_{1.0}ω_{0.5} | 10.57 | 7.79 | 9.99 | 7.62 | 11.08 | 9.61 |

τ_{1.6}ω_{0.5} | 10.34 | 7.40 | 9.76 | 7.16 | 10.85 | 8.95 |

τ_{1.5}ω_{0.6} | 10.44 | 7.56 | 9.87 | 7.31 | 10.96 | 8.17 |

τ_{1.6}ω_{1.0} | 10.79 | 8.12 | 10.24 | 7.77 | 11.34 | 9.72 |

Candidate | ||||||

No. of animals | 707 | 7,113 | ||||

EBV | 10.85 | 9.19 | 10.12 | 9.02 | 10.07 | 9.87 |

PA | 7.01 | 6.51 | 6.90 | 6.49 | 7.76 | 7.83 |

ssGBLUP | ||||||

w_{20} | 8.93 | 8.92 | 8.90 | 8.86 | 9.50 | 10.52 |

w_{15} | 9.06 | 9.13 | 9.03 | 9.05 | 9.61 | 10.69 |

w_{10} | 9.20 | 9.37 | 9.17 | 9.26 | 9.73 | 10.87 |

τ_{1.0}ω_{0.5} | 7.25 | 7.19 | 7.11 | 6.83 | 7.79 | 8.38 |

τ_{1.6}ω_{0.5} | 7.26 | 7.26 | 7.07 | 6.81 | 7.66 | 8.24 |

τ_{1.5}ω_{0.6} | 7.44 | 7.42 | 7.27 | 7.01 | 7.86 | 8.48 |

τ_{1.6}ω_{1.0} | 8.73 | 8.87 | 8.62 | 8.60 | 9.10 | 10.09 |

**G**

_{−1}; ω

_{xx}= weight for

**A**

_{22−1}matrix.

Method | Single-step genomic BLUP | ||||||
---|---|---|---|---|---|---|---|

w_{20} | w_{15} | w_{10} | τ_{1.0}ω_{0.5} | τ_{1.6}ω_{0.5} | τ_{1.5}ω_{0.6} | τ_{1.6}ω_{1.0} | |

w_{20} | 0.999 | 0.996 | 0.957 | 0.952 | 0.961 | 0.985 | |

w_{15} | 0.998 | 0.999 | 0.957 | 0.954 | 0.963 | 0.989 | |

w_{10} | 0.992 | 0.997 | 0.957 | 0.956 | 0.965 | 0.992 | |

τ_{1.0}ω_{0.5} | 0.933 | 0.940 | 0.947 | 0.996 | 0.999 | 0.966 | |

τ_{1.6}ω_{0.5} | 0.933 | 0.942 | 0.950 | 0.996 | 0.999 | 0.974 | |

τ_{1.5}ω_{0.6} | 0.944 | 0.952 | 0.960 | 0.996 | 0.999 | 0.980 | |

τ_{1.6}ω_{1.0} | 0.982 | 0.989 | 0.993 | 0.957 | 0.967 | 0.975 |

**G**

_{−1}; ω

_{xx}= weight for

**A**

_{22−1}matrix.

_{1}) and validation reliabilities (R

^{2}) with 95% bootstrap CI with 10,000 bootstrap resampling. For the bulls, validation reliabilities from the ssGBLUP with different

**H**

^{22}matrix block varied between 0.48 and 0.49 for milk, between 0.39 and 0.40 for protein, and between 0.41 and 0.44 for fat. The PA based on the same data but without genomic information gave, on average, 14-percentage-units lower reliability for milk, protein, and fat. For bulls, the best choice of parameters for milk was τ

_{1.5}ω

_{0.6}, and for protein and fat was τ

_{1.6}ω

_{0.5}. The lower ω for protein and fat means that more weight should be placed on genomic relationships for protein and fat relative to milk. For cows, validation reliabilities with different genomic matrix varied between 0.41 and 0.42 for milk, 0.31 and 0.33 for protein, and 0.32 and 0.34 for fat. Cow PA gave on average 12.6-percentage-unit lower reliabilities than GEBV. For cows, the best choice of parameters for milk and protein was τ

_{1.5}ω

_{0.6}, and for fat, the best choice was τ

_{1.6}ω

_{0.5}. Thus, protein has different results in cows compared with bulls. However, for cows, validation reliabilities can also differ because they have fewer close relatives in the reference population

_{1}), validation reliabilities (R

^{2}), and their 95% bootstrap confidence intervals (CI) from the parent average (PA) and genomic enhanced breeding values with different methods in ssGBLUP

Method | Milk | Protein | Fat | |||
---|---|---|---|---|---|---|

b_{1} (CI) | R^{2} (CI) | b_{1} (CI) | R^{2} (CI) | b_{1} (CI) | R^{2} (CI) | |

PA | 0.97 (0.87−1.07) | 0.36 (0.30−0.42) | 0.79 (0.69−0.89) | 0.27 (0.20−0.34) | 0.68 (0.60−0.77) | 0.26 (0.20−0.33) |

ssGBLUP | ||||||

w_{20} | 0.87 (0.80−0.94) | 0.48 (0.42−0.54) | 0.73 (0.66−0.81) | 0.39 (0.33−0.46) | 0.72 (0.65−0.79) | 0.43 (0.36−0.50) |

w_{15} | 0.86 (0.79−0.93) | 0.48 (0.42−0.54) | 0.72 (0.65−0.80) | 0.39 (0.33−0.46) | 0.72 (0.65−0.79) | 0.43(0.36−0.50) |

w_{10} | 0.84 (0.78−0.91) | 0.48 (0.42−0.54) | 0.71 (0.64−0.79) | 0.39 (0.33−0.46) | 0.71 (0.64−0.78) | 0.43 (0.36−0.50) |

τ_{1.0}ω_{0.5} | 1.09 (1.00−1.18) | 0.49 (0.43−0.55) | 0.92 (0.83−1.01) | 0.39 (0.33−0.46) | 0.87 (0.79−0.95) | 0.41 (0.35−0.48) |

τ_{1.6}ω_{0.5} | 1.09 (1.00−1.18) | 0.49 (0.43−0.55) | 0.93 (0.84−1.02) | 0.40 (0.33−0.47) | 0.90 (0.82−0.98) | 0.43 (0.36−0.49) |

τ_{1.5}ω_{0.6} | 1.06 (0.98−1.15) | 0.49 (0.43−0.55) | 0.91 (0.82−1.00) | 0.40 (0.34−0.47) | 0.88 (0.80−0.96) | 0.43 (0.37−0.50) |

τ_{1.6}ω_{1.0} | 0.90 (0.83−0.97) | 0.49 (0.42−0.55) | 0.77 (0.69−0.84) | 0.40 (0.34−0.47) | 0.77 (0.70−0.84) | 0.44 (0.38−0.51) |

**G**

_{−1}; ω

_{xx}= weight for

**A**

_{22−1}matrix.

_{1}), validation reliabilities (R

^{2}), and their 95% bootstrap confidence intervals (CI) from the parent average (PA), and genomic enhanced breeding values with different methods in ssGBLUP

Method | Milk | Protein | Fat | |||
---|---|---|---|---|---|---|

b_{1} (CI) | R^{2} (CI) | b_{1} (CI) | R^{2} (CI) | b_{1} (CI) | R^{2} (CI) | |

PA | 1.10 (1.03−1.17) | 0.28 (0.24−0.31) | 0.93 (0.86−1.01) | 0.20 (0.17−0.24) | 0.82 (0.75−0.88) | 0.19 (0.16−0.21) |

ssGBLUP | ||||||

w_{20} | 0.89 (0.85−0.94) | 0.40 (0.36−0.44) | 0.78 (0.73−0.83) | 0.30 (0.26−0.33) | 0.75 (0.70−0.79) | 0.32 (0.28−0.35) |

w_{15} | 0.87 (0.82−0.91) | 0.40 (0.36−0.44) | 0.77 (0.72−0.81) | 0.30 (0.26−0.33) | 0.73 (0.69−0.77) | 0.32 (0.28−0.35) |

w_{10} | 0.84 (0.80−0.88) | 0.40 (0.36−0.44) | 0.74 (0.70−0.79) | 0.30 (0.26−0.33) | 0.71 (0.67−0.75) | 0.32 (0.27−0.35) |

τ_{1.0}ω_{0.5} | 1.13 (1.07−1.18) | 0.42 (0.37−0.45) | 1.02 (0.96−1.08) | 0.31 (0.27−0.35) | 0.97 (0.91−1.03) | 0.32 (0.28−0.36) |

τ_{1.6}ω_{0.5} | 1.14 (1.09−1.20) | 0.43 (0.38−0.46) | 1.05 (0.99−1.12) | 0.32 (0.28−0.35) | 0.99 (0.93−1.05) | 0.34 (0.29−0.37) |

τ_{1.5}ω_{0.6} | 1.12 (1.06−1.17) | 0.43 (0.38−0.46) | 1.02 (0.96−1.08) | 0.32 (0.28−0.36) | 0.96 (0.91−1.02 | 0.34 (0.29−0.37) |

τ_{1.6}ω_{1.0} | 0.93 (0.88−0.97) | 0.42 (0.38−0.45) | 0.83 (0.78−0.88) | 0.32 (0.27−0.35) | 0.79 (0.74−0.83) | 0.33 (0.29−0.36) |

**G**

_{−1}; ω

_{xx}= weight for

**A**

_{22−1}matrix.

**MSE**) for the regressions. The MSE combine all criteria (inflation, trend, and accuracy), and the model with the lowest MSE should be the best one. Only small differences existed among MSE and they were mainly in line with the validation reliability results. However, according to MSE for bulls, the best parameter option for protein and fat would be τ

_{1.6}ω

_{1.0}, which contradicts the results from Table 3.

Model | Milk | Protein | Fat |
---|---|---|---|

Bull | |||

PA | 644.81 | 20.94 | 25.77 |

w_{20} | 589.46 | 19.26 | 23.09 |

w_{15} | 589.46 | 19.26 | 23.00 |

w_{10} | 589.45 | 19.26 | 22.99 |

τ_{1.0}ω_{0.5} | 584.50 | 19.26 | 23.26 |

τ_{1.6}ω_{0.5} | 584.56 | 19.19 | 23.03 |

τ_{1.5}ω_{0.6} | 583.02 | 19.15 | 22.97 |

τ_{1.6}ω_{1.0} | 585.83 | 19.13 | 22.74 |

Cow | |||

PA | 866.90 | 30.85 | 35.23 |

w_{20} | 840.04 | 30.19 | 34.17 |

w_{15} | 839.94 | 30.10 | 34.16 |

w_{10} | 840.24 | 30.20 | 34.17 |

τ_{1.0}ω_{0.5} | 836.96 | 31.00 | 34.09 |

τ_{1.6}ω_{0.5} | 835.20 | 30.05 | 34.02 |

τ_{1.5}ω_{0.6} | 834.56 | 30.04 | 34.01 |

τ_{1.6}ω_{1.0} | 836.28 | 30.08 | 34.17 |

^{1}PA = parent average; w = proportion of polygenic variance; τ = weight for

**G**

_{−1}; ω

_{xx}= weight for

**A**

_{22−1}matrix.

Method | PA | w_{20} | w_{15} | w_{10} | τ_{1.0}ω_{0.5} | τ_{1.6}ω_{0.5} | τ_{1.5}ω_{0.6} | τ_{1.6}ω_{1.0} |
---|---|---|---|---|---|---|---|---|

PA | M P F | M P F | M P F | M P F | M P F | M P F | M P F | |

w_{20} | M P F | F | ||||||

w_{15} | M P F | F | ||||||

w_{10} | M P F | F | ||||||

τ_{1.0}ω_{0.5} | M P F | F | F | F | ||||

τ_{1.6}ω_{0.5} | M P F | M P F | M P F | M P F | M P F | F | ||

τ_{1.5}ω_{0.6} | M P F | M P F | M P F | M P F | M P F | M | ||

τ_{1.6}ω_{1.0} | M P F | M P F | M P F | M P F | M |

**G**

_{−1}; ω

_{xx}= weight for

**A**

_{22−1}matrix.

_{1}) of true genetic values on (G)EBV. Optimal prediction of genetic merit of young individuals should have a regression coefficient of 1. With b

_{1}<1, the predictions are inflated, and the differences in estimated genetic merit of young individuals are exaggerated compared with their future performance. For bulls, the b

_{1}values were almost always lower than the expected value, indicating that GEBV over-evaluate differences between bulls (Table 3). Changes in τ seemed to have only small effect on the b

_{1}value, unlike changes in ω. Decreasing ω increases weight of pedigree relationship matrix such that GEBV will be more influenced by pedigree information. Decrease in ω increased b

_{1}value. Only for milk and with τ

_{1.6}ω

_{0.5}, τ

_{1.5}ω

_{0.5}, or τ

_{1.0}ω

_{0.5}, the b

_{1}values were >1. However, depending on the method used to build the

**H**

^{22}matrix for ssGBLUP, the over-dispersion was very similar to or even lower than that with PA. This suggests that GEBV by the single-step method are less biased than PA, but it is still essential to determine the best method to build the

**H**

^{22}matrix. The results were similar with cows, although in general, b

_{1}values were higher for cows than for bulls, indicating a smaller bias (Table 4). Several studies have found better accuracies and lower biases of GEBV by fine-tuning w, τ, and ω when constructing the

**H**matrix (

_{r}trends, although there was a tendency for methods using w to give genetic trends that were closer to EBV compared with methods using a wider range of τ and ω. For the candidate bulls, the ssGBLUP with different w as well as PA seem to overestimate the trend, whereas methods using τ and ω seemed to underestimate the trend. Figure 1 indicates that putting more weight on genomic information is desirable. However, this must be balanced with the proper parameter choice for τ and ω to have the correct combination of genomic and pedigree relationships.

**H**

^{22}matrix for the ssGBLUP showed that alternative parameters could be chosen depending on whether the goal is to achieve higher validation reliabilities or smaller bias. In general, differences in validation reliabilities and their CI (Table 3 and Table 4) with the studied methods were smaller than effects to regression coefficient b

_{1}, which indicate bias. Therefore, it is essential to consider the whole picture before choosing which method to use. Moreover, different methods can be optimal for different traits. Differences in optimal weighting factors may be due to differences in genetic architecture. Use of a genomic relationship matrix that weights markers according to analyzed trait (e.g.,

_{1}, the GEBV from the phenotypic records seemed to be less inflated than the DGV from sire model or GEBV from animal model deregressions. Still, the models in the current study might fail the Interbull GEBV validation test. From bootstrapping with 10,000 samples, CI for the regression coefficients of the GEBV reached 1.0 with only some methods in milk and protein; for fat, the upper limits were always <1.0. The bootstrap CI are consistent with results in

_{1}requires 443 bulls. However, the Interbull requirement for unbiasedness will accept b

_{1}estimates between 0.90 and 1.20, even if the CI of b

_{1}does not include 1. Thus, a b

_{1}estimate of 0.9 is not considered biologically significantly biased. Therefore, a standard error of 0.05is an important limit for the power of the test, because with it the statistical and biological significance agree. Reversing the power consideration, we could suggest that the number of validation bulls should always be at least 500.

**G**matrix was easy to invert. For a large number of genotyped animals, algorithms have been proposed that overcome the need of

**G**inverse, for example, by

## Conclusions

**G**and

**A**matrixes to minimize bias and maximize reliability of GEBV.

## Acknowledgments

## References

- Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
*J. Dairy Sci.*2010; 93: 743-752 - Multiple trait genomic evaluation of conception rate in Holsteins.
*J. Dairy Sci.*2011; 94: 2621-2624 - Genome-wide marker-assisted selection combining all pedigree phenotypic information and genotypic data in one step: An example using broiler chickens.
*J. Anim. Sci.*2011; 89: 23-28 - Genomic prediction when some animals are not genotyped.
*Genet. Sel. Evol.*2010; 42: 2 - Single-step methods for genomic evaluation in pigs.
*Animal.*2012; 6: 1565-1571 - An iterative implementation of the single step approach for genomic evaluation which preserves existing genetic evaluation models and software.
*Interbull Bull.*2011; 44: 138-142 - Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
*Genet. Sel. Evol.*2011; 43: 1 - Invited review Genomic selection in dairy cattle: Progress and challenges.
*J. Dairy Sci.*2009; 92: 433-443 Koivula, M., I. Strandén, G. P. Aamand, and E. A. Mäntysaari. 2014. Effect of cow reference group on validation accuracy of genomic evaluation. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 083.

- Single step genomic evaluations for the Nordic Red Dairy cattle test day data.
*Interbull Bull.*2012; 46 (b): 115-120 - Different methods to calculate genomic predictions—Comparisons of SNP-BLUP, G-BLUP and H-BLUP.
*J. Dairy Sci.*2012; 95 (a): 4065-4073 - A relationship matrix including full pedigree and genomic information.
*J. Dairy Sci.*2009; 92: 4656-4663 - Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction.
*J. Dairy Sci.*2012; 95: 4629-4645 - Across-country test-day model evaluations for Nordic Holstein, Red Cattle and Jersey.
*J. Dairy Sci.*2015; 98 (10.3168/jds.2014-8307): 1296-1309 Liu, Z., M. Goddard, F. Reinhardt, and R. Reents. 2013. Computing strategies for a single step SNP model with an across country reference population. No. 19:452 in Book of Abstracts: 64th Annu. Mtg. EAAP, Nantes, France. EAAP, Rome, Italy.

- Are evaluations on young genotyped animals benefiting from the past generations?.
*J. Dairy Sci.*2014; 97: 3930-3942 - Single step evaluations using haplotype segments.
*Interbull Bull.*2013; 47: 217-221 - Interbull validation test for genomic evaluations.
*Interbull Bull.*2010; 41: 17-24 - GEBV validation test revisited.
*Interbull Bull.*2012; 45: 11-16 - Estimation of GEBVs using deregressed individual cow breeding values.
*Interbull Bull.*2011; 44: 19-24 - The unified approach to the use of genomic and pedigree information in genomic evaluations revisited.
*J. Anim. Breed. Genet.*2011; 128: 429-439 - Choice of parameters for single-step genomic evaluation for type.
*J. Dairy Sci.*2010; 93 (Abstr.): 533 - Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.
*J. Dairy Sci.*2009; 92: 4648-4655 - Methods to approximate reliabilities in single-step genomic evaluation.
*J. Dairy Sci.*2013; 96: 647-654 R Core Development Team. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

- Solving large mixed models using preconditioned conjugate gradient iteration.
*J. Dairy Sci.*1999; 82: 2779-2787 - Calculation of Interbull weighting factors for the Finnish test day model.
*Interbull Bull.*2001; 26: 78-81 - A recipe for multiple trait deregression.
*Interbull Bull.*2010; 42: 21-24 Strandén, I., and E. A. Mäntysaari. 2014. Comparison of some equivalent equations to solve single-step GBLUP. Proc. 10th World Congr. Genet. Appl. Livest. Prod., Vancouver, Canada, Aug. 17–22, Comm. 069.

- Genomic prediction for the Nordic Red Cattle using one-step and selection index blending approaches.
*J. Dairy Sci.*2012; 95: 909-917 - Comparison of model reliabilities from single-step and bivariate blending methods.
*Interbull Bull.*2013; 47: 246-251 - Multiple-trait genomic evaluation of linear type traits using genomic and phenotypic data in US Holsteins.
*J. Dairy Sci.*2011; 94: 4198-4204 - Efficient methods to compute genomic predictions.
*J. Dairy Sci.*2008; 91: 4414-4423 - Avoiding bias from genomic pre-selection in converting daughter information across countries.
*Interbull Bull.*2012; 45: 1-5 - Invited review: Reliability of genomic predictions for North American Holstein bulls.
*J. Dairy Sci.*2009; 92: 16-24 - Bias in genomic predictions for populations under selection.
*Genet. Res. (Camb.).*2011; 93: 357-366

## Article info

### Publication history

### Identification

### Copyright

### User license

Elsevier user license |## Permitted

### For non-commercial purposes:

- Read, print & download
- Text & data mine
- Translate the article

## Not Permitted

- Reuse portions or extracts from the article in other works
- Redistribute or republish the final article
- Sell or re-use for commercial purposes

Elsevier's open access license policy