## Abstract

A bias in the trend of genomic estimated breeding values (GEBV) was observed in the Danish Jersey population where the trend of GEBV was smaller than the deregressed proofs for individuals in the validation population. This study attempted to improve the prediction reliability and reduce the bias of predicted genetic trend in Danish Jersey. The data consisted of 1,238 Danish Jersey bulls and 611,695 cows. All bulls were genotyped with the 54K chip, and 1,744 cows were genotyped with either 7K chips (1,157 individuals) or 54K chips (587 individuals). The trait used in the analysis was protein yield. All cows with EBV were used in a single-step approach. Deregressed proofs were used as the response variable. Four alternative approaches were compared with genomic best linear unbiased prediction (GBLUP) model with bulls in the reference data (GBLUP

_{Bull}): (1) GBLUP with both bulls and genotyped cows in the reference data; (2) GBLUP including a year of birth effect; (3) GEBV from a GBLUP model that accounted for the difference of EBV between dams and maternal grandsires; and (4) using a single-step approach. The results indicated all 4 alternatives could reduce the bias of predicted genetic trend and that the single-step approach performed best. However, not all these approaches improved reliability or reduced inflation of GEBV. The reliability was 0.30 and regression coefficients of deregressed proofs on GEBV were 0.69 in the scenario GBLUP_{Bull}. When genotyped cows were included in the reference population, the regression coefficients decreased to 0.59 but the reliability increased to 0.35. If a year effect was included in the model, the prediction reliability decreased to 0.29 and the regression coefficient improved to 0.75. The method in which GEBV were adjusted for the difference between dam EBV and maternal grandsire EBV led to much lower regression coefficients though the reliability increased to 0.4. The single-step approach improved both the reliability, to 0.38 and regression coefficient to 0.78. Therefore, the bias in genetic trend was reduced. The results suggest that implementing the single-step approach is an effective way to improve genomic prediction in Danish Jersey cattle.## Key words

## Introduction

Genomic prediction has been widely used in dairy cattle since genome-wide dense marker chips became available. To obtain accurate prediction, a large reference population is needed (

Goddard and Hayes, 2009

; Hayes et al., 2009a

). In dairy cattle, usually progeny-tested bulls are used to form the reference population. In some large populations, such as Holsteins, accurate prediction using genomic information has been obtained (VanRaden et al., 2009

; Lund et al., 2011

). For Danish Jerseys it is quite challenging to obtain a large reference population because a limited number of progeny-tested bulls are available (- Lund M.S.
- De Roos A.P.W.
- De Vries A.G.
- Druet T.
- Ducrocq V.
- Fritz S.
- Guillaume F.
- Guldbrandtsen B.
- Liu Z.
- Reents R.
- Schrooten C.
- Seefried F.
- Su G.

A common reference population from four European Holstein populations increases reliability of genomic predictions.

*Genet. Sel. Evol.*2011; 43 (http://dx.doi.org/10.1186/1297-9686-43-43): 43

Thomasen et al., 2012

). One way to overcome this limitation is to add genotyped cows to the reference population. However, previous studies have reported an inflation of the genomic estimated breeding values (**GEBV**) when cows were included into the training set (Wiggans et al., 2011

; Calus et al., 2013

), because the genotyped cows are usually elite and possible get preferential treatment. Another strategy is to make use of the phenotypic information from nongenotyped animals. A popular approach is to apply a single-step model which estimates genomic breeding values using the information of genotyped and nongenotyped individuals simultaneously by integrating marker- and pedigree-based relationship matrix into a joint relationship matrix (Misztal et al., 2009

; Christensen and Lund, 2010

; Aguilar et al., 2010

).Nordic routine genomic genetic evaluation has observed a bias of predicted genetic trends in Danish Jerseys. Bias of predicted genetic trends was defined as the annual deviation of GEBV from the deregressed proofs (

**DRP**) of the animals in the test population. Bias of predicted genetic trends may lead to an unfair comparison of animals across birth years. The bias could be caused by a discrepancy between assumptions of the genomic prediction models and the selection histories of the practical populations (Vitezica et al., 2011

). The genomic prediction models assume there is no selection in the population, which is used for implementing genomic prediction (Hayes et al., 2009b

). However, in practice, the genotyped populations usually consist of selected animals such as progeny-tested bulls and elite cows. The single-step approach accounts for the selection by including all records in the model. Therefore, this approach is expected to minimize the bias. Another possible solution to reduce the bias is to add a year of birth effect in the model, which may lead to a robust estimation of genetic trend (Ducrocq, 2010

). Therefore, the genetic progress on the maternal side could be taken into account by the year trend. Similarly, adjusting GEBV for the difference between EBV of dam and maternal grandsire (**MGS**) may reduce bias of predicted genetic trend.The objectives of our study were to investigate the prediction reliability and bias of predicted genetic trend in Danish Jersey. A second objective was to increase prediction reliability and reduce bias of predicted genetic trend using various strategies such as adding genotyped cows to the reference population, including year effect into the prediction model, accounting for the difference of EBV between dam and MGS, and applying a single-step approach.

## Materials and Methods

### Data

Danish Jersey data were used in our study. There were 2,982 genotyped individuals comprising 1,238 bulls born between 1981 and 2009 and 1,744 cows born between 2000 and 2011, with most of them (1,733) born after 2004. Most cows (1,157) were randomly selected from a few herds, whereas the others (587) were selected as potential bull dams by individual farms according to their own breeding schemes. The DRP of protein used in different scenarios were calculated from EBV of genetic evaluation in November 2013. When using the single-step approach, all cows with EBV for protein were used in the analysis. After tracing the pedigree to as many generations as possible for the cows with EBV and bulls with genotypes, the pedigree used for single-step prediction included 819,988 individuals. The DRP for all cows were calculated using Mix99 (

Lidauer and Strandén, 1999

; Strandén and Mäntysaari, 2010

); it required that the cows had an effective record contribution (**ERC**) larger than 0.1. This reduced the number of cows with DRP to be 611,695. Cows which are daughters of the test bulls (described later) were excluded. After filtering, the number of cows with DRP used in the single-step approach was 577,405.The bulls were genotyped with Illumina BovineSNP50 BeadChip (

**54K**; Illumina, San Diego, CA), which includes 54,001 SNP. Bull dams (587) were genotyped with 54K chips. Randomly selected cows (1,157) were genotyped with Illumina BovineLD BeadChip (**LD**) which includes 6,909 SNP. The LD data were imputed to 54K with Beagle (Browning and Browning, 2009

) using the 54K genotyped animals as imputation reference population. The markers used for prediction were from 29 autosomes. The genotypes for genomic prediction were edited by deleting the markers with minor allele frequency less than 0.01 and the markers in complete linkage disequilibrium (r^{2}= 1) with the previous marker. After editing, 38,967 markers were used for genomic prediction.### Methods

To validate the prediction accuracy and unbiasedness, the Jersey bulls were divided into reference and test sets using a cut-off date of birth of January 1, 2005. The bulls born after this date were used as validation animals (208 bulls). Thus, in the scenario using only bull reference data, 1,030 bulls were used as reference population.

Besides the genomic BLUP model (

**GBLUP**) with bulls in the reference data (**GBLUP**), 5 alternative approaches were used in our study. The first was including pedigree relationships to weight the genomic relationship (_{Bull}**GBLUPW**). Approach 2 was the GBLUP model with both bulls and genotyped cows in reference set (_{Bull}**GBLUP**), in which, 25 cows were dams of test bulls. Approach 3 included a year of birth effect in the GBLUP model (_{Cow}**GBLUP**) to account for the part of genetic trend that is not accounted for by SNP markers. Approach 4 was to adjust GEBV using the difference of EBV between dams and maternal grandsires (_{Year}**GBLUP**). Approach 5 was a single-step method to integrate the information of genotyped and nongenotyped animals for genomic prediction. Two scenarios of this approach were investigated, which were the predictions either using cow genotypes (_{Dam_mgs}**SS**) or without using cow genotypes (_{PG}**SS**).The numbers of individuals used in the reference population and test population in different scenarios are shown in Table 1._{P}Table 1The number of individuals in each scenario

^{1}

GBLUPBull=genomic BLUP model with bulls as reference population; GBLUPWBull=same as GBLUPBull but with a genomic relationship matrix Gω=0.8G + 0.2A, where G is a genomic relationship matrix and A is pedigree relationship matrix; GBLUPCow=GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear=year effects were included in the model as genetic trend; GBLUPDam_mgs=genomic EBV from GBLUP model using bull reference data were adjusted for the difference between dam EBV and maternal grandsire (mgs) EBV. SSP=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls. SSPG=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.

Item | GBLUP_{Bull}/GBLUPW_{Bull}/GBLUP_{Year}/GBLUP_{Dam_mgs} | GBLUP_{Cow} | SS_{P} | SS_{PG} | ||
---|---|---|---|---|---|---|

No. ofgenotypedanimals | No. ofphenotypedanimals | No. ofgenotypedanimals | No. ofphenotypedanimals | |||

Reference set | 1,030 | 2,774 | 1,030 | 577,405 | 2,774 | 577,405 |

Test set | 208 | 208 | 208 | 208 | 208 | 208 |

1 GBLUP

_{Bull}= genomic BLUP model with bulls as reference population; GBLUPW_{Bull}= same as GBLUP_{Bull}but with a genomic relationship matrix**G**ω = 0.8G + 0.2**A**, where**G**is a genomic relationship matrix and**A**is pedigree relationship matrix; GBLUP_{Cow}= GBLUP model with both genotyped bulls and cows as reference population; GBLUP_{Year}= year effects were included in the model as genetic trend; GBLUP_{Dam_mgs}= genomic EBV from GBLUP model using bull reference data were adjusted for the difference between dam EBV and maternal grandsire (mgs) EBV. SS_{P}= the single-step approach using phenotypes of all cows and genotypes of genotyped bulls. SS_{PG}= the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.### Statistical Models

The statistical models used in different scenarios are described below.

#### GBLUP

The GBLUP model was

where

where

$\text{y}=1\mu +\text{Z}g\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{e},$

where

**y**is a vector of DRP of animals in reference population; μ is the overall mean;*g*is the direct genomic value;**Z**is the design matrix for linking*g*to**y**; and**e**is a vector of the random residuals. Random effects were assumed to be distributed as $g~N\left(0,{\text{G}}_{\omega}{\sigma}_{g}^{2}\right)\text{and}\text{e}~N\left(0,\text{D}{\sigma}_{e}^{2}\right),$ where ${\sigma}_{g}^{2}$ is the additive genetic variance,**G**_{ω}is the genomic relationship matrix, ${\sigma}_{e}^{2}$ is the residual variance, and**D**is a diagonal matrix with elements ${\text{d}}_{\text{ii}}=\left(1-{\text{r}}_{\text{DRP}}^{2}\right)/{\text{r}}_{\text{DRP}}^{2},$ in which ${\text{r}}_{\text{DRP}}^{2}$ is the reliability of DRP. The GEBV was calculated as $\text{GEBV}=\stackrel{\u02c6}{\mu}+\stackrel{\u2322}{g}.$ The genomic relationship matrix,**G**_{ω}, is defined as${\text{G}}_{\omega}=\omega \text{A}+\left(1-\omega \right)\text{G},$

where

**G**is genomic relationship matrix described inVanRaden (2008)

, and **A**is pedigree relationship matrix. In scenarios GBLUP_{Bull}and GBLUP_{Cow}, ω = 0. In scenario GBLUPW_{Bull}, ω = 0.2.#### GBLUP with Year Effect

When the year effect is included in a GBLUP model, the model was

where

$\text{y}=1\mu +b\text{X}\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{Z}g\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{e},$

where

*b*is a regression coefficient of**y**on birth years, and**X**is a vector of birth years, treated as continuous covariates in this model. The GEBV from GBLUP_{Year}was calculated as $\text{GEBV}=\stackrel{\u02c6}{\mu}+\stackrel{\u2322}{b}\times \text{year}+\stackrel{\u2322}{g}.$#### Adjusting GEBV for the Difference of EBV Between Dams and MGS

In traditional genetic evaluations for an individual without own or offspring records, when both sire EBV and dam EBV are available, the EBV for the individual is

When only the bulls’ EBV (sire EBV and maternal grandsire EBV) are available, the EBV for the individual is

where the dam EBV is supposed as the average of EBV from all the daughters of the maternal grandsire, which is not the case because bull dam has high EBV due to selection. The difference between EBV

$\text{EB}{\text{V}}_{o}=\frac{1}{2}\text{EB}{\text{V}}_{\text{sire}}+\frac{1}{2}\text{EB}{\text{V}}_{\text{dam}}.$

When only the bulls’ EBV (sire EBV and maternal grandsire EBV) are available, the EBV for the individual is

$\text{EB}{\text{V}}_{o}=\frac{1}{2}\text{EB}{\text{V}}_{\text{sire}}+\frac{1}{4}\text{EB}{\text{V}}_{\text{mgs}},$

where the dam EBV is supposed as the average of EBV from all the daughters of the maternal grandsire, which is not the case because bull dam has high EBV due to selection. The difference between EBV

_{mgs}and EBV_{dam}may cause an underestimation of GEBV of candidates when dams are absent in reference population. To reduce the influence by this difference, the GEBV for the validation animals were corrected by adding a value of $\frac{1}{2}\left(\text{EB}{\text{V}}_{\text{dam}}-\frac{1}{2}\text{EB}{\text{V}}_{\text{mgs}}\right).$#### Single-Step Model

The single-step model was as follows

where

$\text{y}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}1\mu \text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{Za}\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{e},$

where

**y**is the vector of DRP of all the cows with EBV in the whole population,**a**is a vector of additive genetic effects, and**Z**is the design matrix for additive genetics effects. Random effects were assumed to be normally distributed $\text{a}~N\left(0,\text{H}{\sigma}_{a}^{2}\right)\text{and}\text{e}~N\left(\text{D}{\sigma}_{e}^{2}\right),$ where ${\sigma}_{a}^{2}$ is the additive genetic variance and**H**is the relationship matrix of all the individuals as defined below. Here the reliability of DRP $\left({\text{r}}_{\text{DRP}}^{\text{2}}\right)$ was ERC/(ERC + λ), where λ = (1 − h^{2})/h^{2}.Following

where

Legarra et al. (2009)

, Aguilar et al., 2010

, and Christensen and Lund, 2010

, $\text{H}=\left[\begin{array}{cc}\hfill {\text{A}}_{12}{\text{A}}_{22}^{-1}{\text{G}}_{\omega}{\text{A}}_{22}^{-1}{\text{A}}_{21}+{\text{A}}_{11}-{\text{A}}_{12}{\text{A}}_{22}^{-1}{\text{A}}_{21}\hfill & \hfill {\text{A}}_{12}{\text{A}}_{22}^{-1}\text{G}\omega \hfill \\ \hfill {\text{G}}_{\omega}{\text{A}}_{22}^{-1}{\text{A}}_{21}\hfill & \hfill {\text{G}}_{\omega}\hfill \end{array}\right],$

where

**A**is pedigree relationship matrix and can be partitioned as $\text{A}=\left[\begin{array}{cc}\hfill {\text{A}}_{11}\hfill & \hfill {\text{A}}_{12}\hfill \\ \hfill {\text{A}}_{21}\hfill & \hfill {\text{A}}_{22}\hfill \end{array}\right]$ with subscript 1 for nongenotyped individuals and 2 for genotyped individuals, and**G**_{ω}= (1 − ω)**G**+ ω**A**_{22}. In our study the**G**matrix was adjusted for the differences in location and scale of pedigree-based relationship matrix (**A**) using the method proposed by_{22}Christensen et al. (2012)

. Furthermore, ω was set as 0.2 according to the study by Gao et al. (2012)

.The inverse of

GEBV was calculated as $\text{GEBV}=\stackrel{\u02c6}{\mu}+\stackrel{\u02c6}{a}.$ In all the models, the DMU package (

**H**(Aguilar et al., 2010

; Christensen and Lund, 2010

) was ${\text{H}}^{-1}={\text{A}}^{-1}+\left[\begin{array}{cc}\hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {\text{G}}_{\omega}^{-1}-{\text{A}}_{22}^{-1}\hfill \end{array}\right].$

GEBV was calculated as $\text{GEBV}=\stackrel{\u02c6}{\mu}+\stackrel{\u02c6}{a}.$ In all the models, the DMU package (

Madsen et al., 2010

) was used to estimate variance components and predict breeding values.### Validation of Predictions

The reliability of predictions was calculated as the squared correlation between GEBV and DRP divided by the average reliability of the DRP in the test set. The bias was investigated by the regression coefficient and intercept of DRP corrected with model mean on estimated genetic effects (the year trend was added to the direct genomic values in scenario GBLUP

_{Year}) and predicted genetic trend. Bias of predicted genetic trends was assessed by comparing year mean of GEBV with year mean of DRP for test individuals.## Results

Descriptive statistics of DRP in different data sets are shown in Table 2. The mean DRP differ because individuals in different data sets were born in different periods. The average DRP of genotyped bulls was lower than the average DRP of the genotyped cows, whereas it was higher than the average of all the cows used in the single-step approach. This is caused by genetic progress over years due to selection and that genotyped cows were born in recent years.

Table 2Mean and SD of deregressed proofs (DRP) and reliability (R

^{2}_{DRP}) of DRP for protein in different data setsTrait | Genotyped bulls | Genotyped cows | Cows used in single-step approach | |||
---|---|---|---|---|---|---|

Mean | SD | Mean | SD | Mean | SD | |

Protein | 89.29 | 13.95 | 106.62 | 18.70 | 78.89 | 29.09 |

R^{2}_{DRP} | 0.92 | 0.04 | 0.44 | 0.07 | 0.36 | 0.06 |

The number of test individuals in each year varied from 43 to 55, except for year 2009, in which there were only 16 test individuals (Table 3). The mean of the DRP in each year varied from 103.06 to 109.08, and the standard deviation varied from 6.56 to 8.88.

Table 3The number of individuals and mean and SD of deregressed proofs in each year in test set

Year | 2005 | 2006 | 2007 | 2008 | 2009 |
---|---|---|---|---|---|

No. | 46 | 48 | 55 | 43 | 16 |

Mean | 103.06 | 103.06 | 104.15 | 105.73 | 109.08 |

SD | 7.09 | 7.28 | 8.42 | 8.88 | 6.56 |

The reliabilities of GEBV, as well as regression coefficients and intercept of DRP on GEBV for different scenarios are shown in Table 4. The reliabilities ranged from 0.29 to 0.40 in different scenarios. The reliability of GEBV from basic model (GBLUP

_{Bull}) was 0.30. In scenario GBLUP_{Cow}, the reliability of GEBV increased. The scenarios of the single-step approach gained a large increase of reliability regardless of including cow genotypes or not. The reliability was 0.38 for scenario SS_{PG}, whereas it was 0.36 for Scenario SS_{P}. The highest reliability (0.40) was achieved in the scenario GBLUP_{Dam_mgs}. The reliability increased 1 percentage point in the scenario GBLUP_{Year}. However, in scenario GBLUPW_{Bull}, the reliability decreased 1 percentage point. The regression coefficients varied from 0.58 to 0.78 in different scenarios. The regression coefficient from basic model (GBLUP_{Bull}) was 0.69. The regression coefficients increased to 0.72, 0.75, 0.74, and 0.78 in scenarios GBLUPW_{Bull}, GBLUP_{Year}, SS_{PG}, and SS_{P}, respectively. The regression coefficient in scenario GBLUP_{Cow}was 0.09 lower than in the scenario GBLUP_{Bull}. The regression coefficient was the lowest in the scenario GBLUP_{Dam_mgs}. The intercept for different scenario were much larger than 0, which indicated that the mean of GEBV was lower than the mean of DRP.Table 4Reliabilities (R

^{2}_{GEBV}; Rel.) of genomic EBV (GEBV), regression coefficient (Reg. coef.), and intercept (Int.) of deregressed proofs on GEBV of test individuals in different scenarios^{1}

GBLUPBull=genomic BLUP model with bulls as reference population; GBLUPWBull=same as GBLUPBull but with a genomic relationship matrix Gω=0.8G + 0.2A, where G is the genomic relationship matrix and A is the pedigree relationship matrix; GBLUPCow=GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear=year effects were included in the model as genetic trend; GBLUPDam_mgs=GEBV from GBLUP model using bull referencre data were adjusted for the difference between dam EBV and maternal grandsire EBV; SSP=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls; SSPG=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.

Item | GBLUP_{Bull} | GBLUPW_{Bull} | GBLUP_{Cow} | GBLUP_{Year} | GBLUP_{Dam_mgs} | SS_{P} | SS_{PG} |
---|---|---|---|---|---|---|---|

Rel. | 0.30 | 0.29 | 0.35 | 0.31 | 0.40 | 0.36 | 0.38 |

Reg. coef. | 0.69 | 0.72 | 0.60 | 0.75 | 0.58 | 0.78 | 0.74 |

Int. | 6.00 | 6.55 | 7.97 | 5.90 | 3.79 | 5.88 | 6.92 |

1 GBLUP

_{Bull}= genomic BLUP model with bulls as reference population; GBLUPW_{Bull}= same as GBLUP_{Bull}but with a genomic relationship matrix**G**ω = 0.8**G**+ 0.2**A**, where**G**is the genomic relationship matrix and**A**is the pedigree relationship matrix; GBLUP_{Cow}= GBLUP model with both genotyped bulls and cows as reference population; GBLUP_{Year}= year effects were included in the model as genetic trend; GBLUP_{Dam_mgs}= GEBV from GBLUP model using bull referencre data were adjusted for the difference between dam EBV and maternal grandsire EBV; SS_{P}= the single-step approach using phenotypes of all cows and genotypes of genotyped bulls; SS_{PG}= the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.The trends of GEBV and DRP for genotyped bulls are shown in Figure 1. Bias of predicted genetic trend was observed in the scenario GBLUP

_{Bull}. The difference between DRP and GEBV from GBLUP_{Bull}was around 5, which was statistically significant. Compared with scenario of GBLUP_{Bull}, all the alternative approaches reduced bias of predicted genetic trend to some extent except GBLUPW_{Bull}. Bias of predicted genetic trend was partly corrected in the scenario GBLUP_{Cow}. The scenario SS_{P}and SS_{PG}greatly reduced bias of predicted genetic trend. Scenario GBLUP_{Dam_mgs}also reduced bias of predicted genetic trend. Bias of predicted genetic trend was reduced slightly in scenario GBLUP_{Year}. Figure 2 shows the boxplots results for GEBV-DRP for each scenario in each birth year.## Discussion

Our study investigated strategies to improve the prediction reliability and reduce bias of predicted genetic trend observed in the Danish Jersey population. Several strategies were tested; that is, including cows in the reference, including a year of birth effect in the prediction model, adjusting GEBV with the difference between dam EBV and MGS EBV, and using a single-step approach. The results showed that these strategies could reduce bias of predicted genetic trend to some extent. However, the prediction reliability and regression coefficients did not consistently improve in parallel with the reduction in the bias of predicted genetic trend.

The regression coefficients in different scenarios were smaller than 1 in our study. One possible reason could be that markers were not in complete linkage disequilibrium with causal genes, and thus could not fully account for the total genetic variance. Another reason could be that the data used in the analysis were not a random sample, but selected data.

The reliability of prediction for protein was improved when the genotyped cows were included in the reference, as it clearly enlarged the size of reference population, which is the most important factor affecting prediction reliability (

Goddard and Hayes, 2009

). However, we observed that GEBV were more inflated when both genotyped cows and genotyped bulls were used as reference population. Inflation may be caused by preferential treatment of cows included in the reference population (Wiggans et al., 2011

; Kuhn et al., 1994

). The results from our study were consistent with the results reported by Wiggans et al., 2011

; in their study, the regression coefficients of DRP on GEBV of protein decreased from 0.86 to 0.83 when the cows were added into the reference population. On the other hand, bias of predicted genetic trend was reduced when genotyped cows were included into the reference population. The reason could be that the cows which were bull dams and sibs of test bulls may account for the contribution of the test bulls’ dam to the bulls.Previous studies reported that including a polygenic effect in a SNP-BLUP model or Bayesian model led to less inflation of GEBV (

Solberg et al., 2009

; Liu et al., 2011

; Su et al., 2014

). The regression coefficient was improved from 0.69 to 0.72 in the current study when the polygenic effect was included in the model (GBLUPW_{Bull}). However, the bias of predicted genetic trend was not reduced compared with the model without polygenic effect.The single-step model used cows’ deregressed EBV as a response variable rather than the raw phenotypic data. However, the effect of genomic preselection is minor in the current Jersey data and the deregressed cow EBV should not be biased at all. Therefore, the results could be considered single-step prediction using raw data. However, if genomic preselection is used in breeding schemes, the EBV estimated using pedigree will be biased. In this case, it is better to use raw data as a response variable in single-step approach. The single-step prediction, which used all the females’ DRP and pedigree as well as genotypes from genotyped bulls and cows, increased the reliability and reduced inflation of GEBV and bias of predicted genetic trend. These results were consistent with previous reports (

Vitezica et al., 2011

; Koivula et al., 2012

; Su et al., 2012b

). As DRP of nongenotyped animals also contributes to the prediction through a combined matrix, the prediction reliability was improved. Moreover, single-step models could reduce bias of predicted genetic trend by including all the records to trace selection (Vitezica et al., 2011

). Similar to a GBLUP model including genotyped cows in the reference data, the regression coefficient decreased when the cow genotypes were included into the single-step approach. The selection index blending (VanRaden et al., 2009

; Su et al., 2012b

) with the same information used in the single-step approach without cow genotype data was compared with single-step approach in our study (data not shown). The prediction reliability was 0.32, which was higher than reliability of GEBV directly from GBLUP_{Bull}but lower than scenario SS_{P}even though the information used in these 2 methods was the same. The bias of predicted genetic trend was corrected for individuals born in 2005 and 2006, but not for the individuals born after 2006 when the blending index was used. Genomic relationship matrix was modified with pedigree relationship matrix in single-step approach. As the pedigree relationship has influence on the regression coefficient, to be consistent, the scenario GBLUPW_{Bull}was investigated. The results from GBLUPW_{Bull}showed GBLUP model with 20% of the pedigree relationship matrix did not increase the prediction reliability and reduce bias of predicted genetic trend. However, the regression coefficients were improved by the weighted G matrix (from 0.69 of GBLUP_{Bull}to 0.72 of GBLUPW_{Bull}) and by the single-step approach (from 0.72 of GBLUPW_{Bull}to 0.78 of SS_{P}). These results suggest that using a single-step method is an effective approach to increase the prediction reliability and reduce the bias of predicted genetic trend.Including the year of birth effect reduced the bias of predicted genetic trend and improved the regression coefficients. The reason could be that the year effect partly accounted for the trend of selection among the dams. The GEBV together with the year effect captured the genetic progress across years, which led to a robust estimation of genetic trend (

Ducrocq, 2010

).The mean of GEBV adjusted for the difference between dam EBV and MGS EBV were much closer to the mean of DRP in the test population compared with the GEBV without adjustment. The reliability was improved greatly, which may have been caused by a possible autocorrelation between dam EBV and the progeny DRP. However, the regression coefficients deviated more from unity, which may have been caused by the preferential treatment of selected cows. Bias of prediction trend was corrected in a form of large inflation of the GEBV. Therefore, it is not a good approach to correct for bias of predicted genetic trend.

The results from the current study indicate that the regression coefficient, which has mainly been used in previous studies (

Verbyla et al., 2009

; Su et al., 2012a

), should not be the only criterion to measure the unbiasedness of predictions. The regression coefficient is not always consistent with the bias of predicted genetic trends. As the prediction trend is important when the individuals across generations are compared, it should also be included in the evaluation criteria. The year mean of DRP could be expressed as the year mean of GEBV times the regression coefficients plus the intercept. Therefore, the bias of predicted genetic trend could be predicted using the regression coefficient and intercept. Therefore the intercept together with the regression coefficients should be given attention in genomic prediction.- Su G.
- Brøndum R.F.
- Ma P.
- Guldbrandtsen B.
- Aamand G.P.
- Lund M.S.

Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations.

*J. Dairy Sci.*2012; 95 (a http://dx.doi.org/10.3168/jds.2012-5379): 4657-4665

## Conclusions

The main reason for the bias of predicted genetic trend could be that the reference animals did not have all the information required to trace selection, especially the information of dams. Consequently, methods using more information related to selection can reduce the bias. The most efficient way is to implement a single-step approach for genomic prediction, as the single-step approach increased the prediction reliability, improved the regression coefficients, and led to an unbiased prediction trend. As bias of predicted genetic trends can be measured by the intercept and regression coefficient of observations on GEBV, both intercept and regression coefficients should be taken into consideration in validation of genomic predictions.

## Acknowledgments

This work was performed within the project “Genomics in herds,” funded by VikingGenetics (Randers, Denmark) and Nordic Cattle Genetic Evaluation (Aarhus, Denmark).

## References

- Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
*J. Dairy Sci.*2010; 93 (http://dx.doi.org/10.3168/jds.2009-2730): 743-752 - A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.
*Am. J. Hum. Genet.*2009; 84 (http://dx.doi.org/10.1016/j.ajhg.2009.01.005): 210-223 - Combining cow and bull reference populations to increase accuracy of genomic prediction and genome-wide association studies.
*J. Dairy Sci.*2013; 96 (http://dx.doi.org/10.3168/jds.2012-6013): 6703-6715 - Genomic prediction when some animals are not genotyped.
*Genet. Sel. Evol.*2010; 42 (http://dx.doi.org/10.1186/1297-9686-42-2): 2 - Single-step methods for genomic evaluation in pigs.
*Animal.*2012; 6 (http://dx.doi.org/10.1017/S1751731112000742): 1565-1571 Ducrocq, V. 2010. Sustainable dairy cattle breeding: Illusion or reality. In Proc. 9th World Congr. Genet. Appl. Livest. Prod.

- Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population.
*Genet. Sel. Evol.*2012; 44 (http://dx.doi.org/10.1186/1297-9686-44-8): 8 - Mapping genes for complex traits in domestic animals and their use in breeding programmes.
*Nat. Rev. Genet.*2009; 10 (http://dx.doi.org/10.1038/nrg2575): 381-391 - Invited review: Genomic selection in dairy cattle: progress and challenges.
*J. Dairy Sci.*2009; 92 (a http://dx.doi.org/10.3168/jds.2008-1646): 433-443 - Increased accuracy of artificial selection by using the realized relationship matrix.
*Genet. Res. (Camb.).*2009; 91 (b http://dx.doi.org/10.1017/S0016672308009981): 47-60 - Single step genomic evaluations for the Nordic Red Dairy cattle test day data.
*Interbull Bull.*2012; 46: 115-120 - Potential biases in predicted transmitting abilities of females from preferential treatment.
*J. Dairy Sci.*1994; 77 (http://dx.doi.org/10.3168/jds.S0022-0302(94)77185-X): 2428-2437 - A relationship matrix including full pedigree and genomic information.
*J. Dairy Sci.*2009; 92 (http://dx.doi.org/10.3168/jds.2009-2061): 4656-4663 - Fast and flexible program for genetic evaluation in dairy cattle.
*Interbull Bull.*1999; 20: 19-24 - Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction.
*Genet. Sel. Evol.*2011; 43 (http://dx.doi.org/10.1186/1297-9686-43-19): 19 - A common reference population from four European Holstein populations increases reliability of genomic predictions.
*Genet. Sel. Evol.*2011; 43 (http://dx.doi.org/10.1186/1297-9686-43-43): 43 - DMU-A package for analyzing multivariate mixed models.in: Proc. 9th World Congr. Genet. Appl. Livest. Prod., Leipzig, Germany Gesellschaft für Tierzuchtwissenschaft e.V., Bonn, Gemany2010: 732
- Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.
*J. Dairy Sci.*2009; 92 (http://dx.doi.org/10.3168/jds.2009-2064): 4648-4655 - Persistence of accuracy of genome-wide breeding values over generations when including a polygenic effect.
*Genet. Sel. Evol.*2009; 41 (http://dx.doi.org/10.1186/1297-9686-41-53): 53 - A recipe for multiple trait deregression.
*Interbull Bull.*2010; 42: 21-24 - Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations.
*J. Dairy Sci.*2012; 95 (a http://dx.doi.org/10.3168/jds.2012-5379): 4657-4665 - Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances.
*J. Dairy Sci.*2014; 97 (http://dx.doi.org/10.3168/jds.2014-8210): 6547-6559 - Genomic prediction for Nordic Red Cattle using one-step and selection index blending.
*J. Dairy Sci.*2012; 95 (b http://dx.doi.org/10.3168/jds.2011-4804): 909-917 - Reliabilities of genomic estimated breeding values in Danish Jersey.
*Animal.*2012; 6 (http://dx.doi.org/10.1017/S1751731111002035): 789-796 - Efficient methods to compute genomic predictions.
*J. Dairy Sci.*2008; 91 (http://dx.doi.org/10.3168/jds.2007-0980): 4414-4423 - Invited review: Reliability of genomic predictions for North American Holstein bulls.
*J. Dairy Sci.*2009; 92 (http://dx.doi.org/10.3168/jds.2008-1514): 16-24 - Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle.
*Genet. Res. (Camb.).*2009; 91 (http://dx.doi.org/10.1017/S0016672309990243): 307-311 - Bias in genomic predictions for populations under selection.
*Genet. Res. (Camb.).*2011; 93 (http://dx.doi.org/10.1017/S001667231100022X): 357-366 - Technical note: Adjustment of traditional cow evaluations to improve accuracy of genomic predictions.
*J. Dairy Sci.*2011; 94 (http://dx.doi.org/10.3168/jds.2011-4481): 6188-6193

## Article info

### Publication history

Published online: September 30, 2015

Accepted:
August 14,
2015

Received:
April 13,
2015

### Identification

### Copyright

© 2015 American Dairy Science Association®.

### User license

Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) | How you can reuse

Elsevier's open access license policy

Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0)

## Permitted

### For non-commercial purposes:

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works

## Not Permitted

- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article

Elsevier's open access license policy