If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
A bias in the trend of genomic estimated breeding values (GEBV) was observed in the Danish Jersey population where the trend of GEBV was smaller than the deregressed proofs for individuals in the validation population. This study attempted to improve the prediction reliability and reduce the bias of predicted genetic trend in Danish Jersey. The data consisted of 1,238 Danish Jersey bulls and 611,695 cows. All bulls were genotyped with the 54K chip, and 1,744 cows were genotyped with either 7K chips (1,157 individuals) or 54K chips (587 individuals). The trait used in the analysis was protein yield. All cows with EBV were used in a single-step approach. Deregressed proofs were used as the response variable. Four alternative approaches were compared with genomic best linear unbiased prediction (GBLUP) model with bulls in the reference data (GBLUPBull): (1) GBLUP with both bulls and genotyped cows in the reference data; (2) GBLUP including a year of birth effect; (3) GEBV from a GBLUP model that accounted for the difference of EBV between dams and maternal grandsires; and (4) using a single-step approach. The results indicated all 4 alternatives could reduce the bias of predicted genetic trend and that the single-step approach performed best. However, not all these approaches improved reliability or reduced inflation of GEBV. The reliability was 0.30 and regression coefficients of deregressed proofs on GEBV were 0.69 in the scenario GBLUPBull. When genotyped cows were included in the reference population, the regression coefficients decreased to 0.59 but the reliability increased to 0.35. If a year effect was included in the model, the prediction reliability decreased to 0.29 and the regression coefficient improved to 0.75. The method in which GEBV were adjusted for the difference between dam EBV and maternal grandsire EBV led to much lower regression coefficients though the reliability increased to 0.4. The single-step approach improved both the reliability, to 0.38 and regression coefficient to 0.78. Therefore, the bias in genetic trend was reduced. The results suggest that implementing the single-step approach is an effective way to improve genomic prediction in Danish Jersey cattle.
Genomic prediction has been widely used in dairy cattle since genome-wide dense marker chips became available. To obtain accurate prediction, a large reference population is needed (
). In dairy cattle, usually progeny-tested bulls are used to form the reference population. In some large populations, such as Holsteins, accurate prediction using genomic information has been obtained (
). One way to overcome this limitation is to add genotyped cows to the reference population. However, previous studies have reported an inflation of the genomic estimated breeding values (GEBV) when cows were included into the training set (
), because the genotyped cows are usually elite and possible get preferential treatment. Another strategy is to make use of the phenotypic information from nongenotyped animals. A popular approach is to apply a single-step model which estimates genomic breeding values using the information of genotyped and nongenotyped individuals simultaneously by integrating marker- and pedigree-based relationship matrix into a joint relationship matrix (
Nordic routine genomic genetic evaluation has observed a bias of predicted genetic trends in Danish Jerseys. Bias of predicted genetic trends was defined as the annual deviation of GEBV from the deregressed proofs (DRP) of the animals in the test population. Bias of predicted genetic trends may lead to an unfair comparison of animals across birth years. The bias could be caused by a discrepancy between assumptions of the genomic prediction models and the selection histories of the practical populations (
). However, in practice, the genotyped populations usually consist of selected animals such as progeny-tested bulls and elite cows. The single-step approach accounts for the selection by including all records in the model. Therefore, this approach is expected to minimize the bias. Another possible solution to reduce the bias is to add a year of birth effect in the model, which may lead to a robust estimation of genetic trend (
). Therefore, the genetic progress on the maternal side could be taken into account by the year trend. Similarly, adjusting GEBV for the difference between EBV of dam and maternal grandsire (MGS) may reduce bias of predicted genetic trend.
The objectives of our study were to investigate the prediction reliability and bias of predicted genetic trend in Danish Jersey. A second objective was to increase prediction reliability and reduce bias of predicted genetic trend using various strategies such as adding genotyped cows to the reference population, including year effect into the prediction model, accounting for the difference of EBV between dam and MGS, and applying a single-step approach.
Materials and Methods
Data
Danish Jersey data were used in our study. There were 2,982 genotyped individuals comprising 1,238 bulls born between 1981 and 2009 and 1,744 cows born between 2000 and 2011, with most of them (1,733) born after 2004. Most cows (1,157) were randomly selected from a few herds, whereas the others (587) were selected as potential bull dams by individual farms according to their own breeding schemes. The DRP of protein used in different scenarios were calculated from EBV of genetic evaluation in November 2013. When using the single-step approach, all cows with EBV for protein were used in the analysis. After tracing the pedigree to as many generations as possible for the cows with EBV and bulls with genotypes, the pedigree used for single-step prediction included 819,988 individuals. The DRP for all cows were calculated using Mix99 (
); it required that the cows had an effective record contribution (ERC) larger than 0.1. This reduced the number of cows with DRP to be 611,695. Cows which are daughters of the test bulls (described later) were excluded. After filtering, the number of cows with DRP used in the single-step approach was 577,405.
The bulls were genotyped with Illumina BovineSNP50 BeadChip (54K; Illumina, San Diego, CA), which includes 54,001 SNP. Bull dams (587) were genotyped with 54K chips. Randomly selected cows (1,157) were genotyped with Illumina BovineLD BeadChip (LD) which includes 6,909 SNP. The LD data were imputed to 54K with Beagle (
) using the 54K genotyped animals as imputation reference population. The markers used for prediction were from 29 autosomes. The genotypes for genomic prediction were edited by deleting the markers with minor allele frequency less than 0.01 and the markers in complete linkage disequilibrium (r2 = 1) with the previous marker. After editing, 38,967 markers were used for genomic prediction.
Methods
To validate the prediction accuracy and unbiasedness, the Jersey bulls were divided into reference and test sets using a cut-off date of birth of January 1, 2005. The bulls born after this date were used as validation animals (208 bulls). Thus, in the scenario using only bull reference data, 1,030 bulls were used as reference population.
Besides the genomic BLUP model (GBLUP) with bulls in the reference data (GBLUPBull), 5 alternative approaches were used in our study. The first was including pedigree relationships to weight the genomic relationship (GBLUPWBull). Approach 2 was the GBLUP model with both bulls and genotyped cows in reference set (GBLUPCow), in which, 25 cows were dams of test bulls. Approach 3 included a year of birth effect in the GBLUP model (GBLUPYear) to account for the part of genetic trend that is not accounted for by SNP markers. Approach 4 was to adjust GEBV using the difference of EBV between dams and maternal grandsires (GBLUPDam_mgs). Approach 5 was a single-step method to integrate the information of genotyped and nongenotyped animals for genomic prediction. Two scenarios of this approach were investigated, which were the predictions either using cow genotypes (SSPG) or without using cow genotypes (SSP).The numbers of individuals used in the reference population and test population in different scenarios are shown in Table 1.
GBLUPBull=genomic BLUP model with bulls as reference population; GBLUPWBull=same as GBLUPBull but with a genomic relationship matrix Gω=0.8G + 0.2A, where G is a genomic relationship matrix and A is pedigree relationship matrix; GBLUPCow=GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear=year effects were included in the model as genetic trend; GBLUPDam_mgs=genomic EBV from GBLUP model using bull reference data were adjusted for the difference between dam EBV and maternal grandsire (mgs) EBV. SSP=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls. SSPG=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.
Item
GBLUPBull/GBLUPWBull/GBLUPYear/GBLUPDam_mgs
GBLUPCow
SSP
SSPG
No. ofgenotypedanimals
No. ofphenotypedanimals
No. ofgenotypedanimals
No. ofphenotypedanimals
Reference set
1,030
2,774
1,030
577,405
2,774
577,405
Test set
208
208
208
208
208
208
1 GBLUPBull = genomic BLUP model with bulls as reference population; GBLUPWBull = same as GBLUPBull but with a genomic relationship matrix Gω = 0.8G + 0.2A, where G is a genomic relationship matrix and A is pedigree relationship matrix; GBLUPCow = GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear = year effects were included in the model as genetic trend; GBLUPDam_mgs = genomic EBV from GBLUP model using bull reference data were adjusted for the difference between dam EBV and maternal grandsire (mgs) EBV. SSP = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls. SSPG = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.
The statistical models used in different scenarios are described below.
GBLUP
The GBLUP model was
where y is a vector of DRP of animals in reference population; μ is the overall mean; g is the direct genomic value; Z is the design matrix for linking g to y; and e is a vector of the random residuals. Random effects were assumed to be distributed as where is the additive genetic variance, Gω is the genomic relationship matrix, is the residual variance, and D is a diagonal matrix with elements in which is the reliability of DRP. The GEBV was calculated as The genomic relationship matrix, Gω, is defined as
where G is genomic relationship matrix described in
, and A is pedigree relationship matrix. In scenarios GBLUPBull and GBLUPCow, ω = 0. In scenario GBLUPWBull, ω = 0.2.
GBLUP with Year Effect
When the year effect is included in a GBLUP model, the model was
where b is a regression coefficient of y on birth years, and X is a vector of birth years, treated as continuous covariates in this model. The GEBV from GBLUPYear was calculated as
Adjusting GEBV for the Difference of EBV Between Dams and MGS
In traditional genetic evaluations for an individual without own or offspring records, when both sire EBV and dam EBV are available, the EBV for the individual is
When only the bulls’ EBV (sire EBV and maternal grandsire EBV) are available, the EBV for the individual is
where the dam EBV is supposed as the average of EBV from all the daughters of the maternal grandsire, which is not the case because bull dam has high EBV due to selection. The difference between EBVmgs and EBVdam may cause an underestimation of GEBV of candidates when dams are absent in reference population. To reduce the influence by this difference, the GEBV for the validation animals were corrected by adding a value of
Single-Step Model
The single-step model was as follows
where y is the vector of DRP of all the cows with EBV in the whole population, a is a vector of additive genetic effects, and Z is the design matrix for additive genetics effects. Random effects were assumed to be normally distributed where is the additive genetic variance and H is the relationship matrix of all the individuals as defined below. Here the reliability of DRP was ERC/(ERC + λ), where λ = (1 − h2)/h2.
where A is pedigree relationship matrix and can be partitioned as with subscript 1 for nongenotyped individuals and 2 for genotyped individuals, and Gω = (1 − ω)G + ωA22. In our study the G matrix was adjusted for the differences in location and scale of pedigree-based relationship matrix (A22) using the method proposed by
) was used to estimate variance components and predict breeding values.
Validation of Predictions
The reliability of predictions was calculated as the squared correlation between GEBV and DRP divided by the average reliability of the DRP in the test set. The bias was investigated by the regression coefficient and intercept of DRP corrected with model mean on estimated genetic effects (the year trend was added to the direct genomic values in scenario GBLUPYear) and predicted genetic trend. Bias of predicted genetic trends was assessed by comparing year mean of GEBV with year mean of DRP for test individuals.
Results
Descriptive statistics of DRP in different data sets are shown in Table 2. The mean DRP differ because individuals in different data sets were born in different periods. The average DRP of genotyped bulls was lower than the average DRP of the genotyped cows, whereas it was higher than the average of all the cows used in the single-step approach. This is caused by genetic progress over years due to selection and that genotyped cows were born in recent years.
Table 2Mean and SD of deregressed proofs (DRP) and reliability (R2DRP) of DRP for protein in different data sets
The number of test individuals in each year varied from 43 to 55, except for year 2009, in which there were only 16 test individuals (Table 3). The mean of the DRP in each year varied from 103.06 to 109.08, and the standard deviation varied from 6.56 to 8.88.
Table 3The number of individuals and mean and SD of deregressed proofs in each year in test set
The reliabilities of GEBV, as well as regression coefficients and intercept of DRP on GEBV for different scenarios are shown in Table 4. The reliabilities ranged from 0.29 to 0.40 in different scenarios. The reliability of GEBV from basic model (GBLUPBull) was 0.30. In scenario GBLUPCow, the reliability of GEBV increased. The scenarios of the single-step approach gained a large increase of reliability regardless of including cow genotypes or not. The reliability was 0.38 for scenario SSPG, whereas it was 0.36 for Scenario SSP. The highest reliability (0.40) was achieved in the scenario GBLUPDam_mgs. The reliability increased 1 percentage point in the scenario GBLUPYear. However, in scenario GBLUPWBull, the reliability decreased 1 percentage point. The regression coefficients varied from 0.58 to 0.78 in different scenarios. The regression coefficient from basic model (GBLUPBull) was 0.69. The regression coefficients increased to 0.72, 0.75, 0.74, and 0.78 in scenarios GBLUPWBull, GBLUPYear, SSPG, and SSP, respectively. The regression coefficient in scenario GBLUPCow was 0.09 lower than in the scenario GBLUPBull. The regression coefficient was the lowest in the scenario GBLUPDam_mgs. The intercept for different scenario were much larger than 0, which indicated that the mean of GEBV was lower than the mean of DRP.
Table 4Reliabilities (R2GEBV; Rel.) of genomic EBV (GEBV), regression coefficient (Reg. coef.), and intercept (Int.) of deregressed proofs on GEBV of test individuals in different scenarios
GBLUPBull=genomic BLUP model with bulls as reference population; GBLUPWBull=same as GBLUPBull but with a genomic relationship matrix Gω=0.8G + 0.2A, where G is the genomic relationship matrix and A is the pedigree relationship matrix; GBLUPCow=GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear=year effects were included in the model as genetic trend; GBLUPDam_mgs=GEBV from GBLUP model using bull referencre data were adjusted for the difference between dam EBV and maternal grandsire EBV; SSP=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls; SSPG=the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.
Item
GBLUPBull
GBLUPWBull
GBLUPCow
GBLUPYear
GBLUPDam_mgs
SSP
SSPG
Rel.
0.30
0.29
0.35
0.31
0.40
0.36
0.38
Reg. coef.
0.69
0.72
0.60
0.75
0.58
0.78
0.74
Int.
6.00
6.55
7.97
5.90
3.79
5.88
6.92
1 GBLUPBull = genomic BLUP model with bulls as reference population; GBLUPWBull = same as GBLUPBull but with a genomic relationship matrix Gω = 0.8G + 0.2A, where G is the genomic relationship matrix and A is the pedigree relationship matrix; GBLUPCow = GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear = year effects were included in the model as genetic trend; GBLUPDam_mgs = GEBV from GBLUP model using bull referencre data were adjusted for the difference between dam EBV and maternal grandsire EBV; SSP = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls; SSPG = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.
The trends of GEBV and DRP for genotyped bulls are shown in Figure 1. Bias of predicted genetic trend was observed in the scenario GBLUPBull. The difference between DRP and GEBV from GBLUPBull was around 5, which was statistically significant. Compared with scenario of GBLUPBull, all the alternative approaches reduced bias of predicted genetic trend to some extent except GBLUPWBull. Bias of predicted genetic trend was partly corrected in the scenario GBLUPCow. The scenario SSP and SSPG greatly reduced bias of predicted genetic trend. Scenario GBLUPDam_mgs also reduced bias of predicted genetic trend. Bias of predicted genetic trend was reduced slightly in scenario GBLUPYear. Figure 2 shows the boxplots results for GEBV-DRP for each scenario in each birth year.
Figure 1The deregressed proofs (DRP) and genomic EBV (GEBV) trends of protein in different scenarios. GBLUPBull = genomic BLUP (GBLUP) model with bulls as reference population; GBLUPWBull = same as GBLUPBull but with a genomic relationship matrix Gω = 0.8G + 0.2A, where G is the genomic relationship matrix and A is the pedigree relationship matrix; GBLUPCow = GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear = year of birth effects were included in the model as genetic trend; GBLUPDam_mgs = GEBV from GBLUP model using bull reference data were adjusted for the difference between dam EBV and maternal grandsire EBV; SSP = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls; SSPG = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows. Color version available online.
Figure 2Boxplots for difference between genomic EBV (GEBV) and deregressed proofs (DRP; i.e., GEBV-DRP). The box shows the first to third quartile of GEBV-DRP. The horizontal bar inside the box shows the median value. The upper and lower bars show the upper and lower values within 1.5 times the interquartile range (IQR) from the upper and lower quartile. The IQR is the interquartile range that is defined as the distance between the 1st and 3rd quantile. Less than 3 IQR from either end of the box are labeled as outliers (o). GBLUPBull = genomic BLUP (GBLUP) model with bulls as reference populations; GBLUPWBull = same as GBLUPBull but with a genomic relationship matrix Gω = 0.8G + 0.2A, where G is the genomic relationship matrix and A is the pedigree relationship matrix; GBLUPCow = GBLUP model with both genotyped bulls and cows as reference population; GBLUPYear = year effects were included in the model to account for part of genetic trend; GBLUPDam_mgs = GEBV from GBLUP model using bull reference data were adjusted for the difference between dam EBV and maternal grandsire EBV; SSP = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls; SSPG = the single-step approach using phenotypes of all cows and genotypes of genotyped bulls and cows.
Our study investigated strategies to improve the prediction reliability and reduce bias of predicted genetic trend observed in the Danish Jersey population. Several strategies were tested; that is, including cows in the reference, including a year of birth effect in the prediction model, adjusting GEBV with the difference between dam EBV and MGS EBV, and using a single-step approach. The results showed that these strategies could reduce bias of predicted genetic trend to some extent. However, the prediction reliability and regression coefficients did not consistently improve in parallel with the reduction in the bias of predicted genetic trend.
The regression coefficients in different scenarios were smaller than 1 in our study. One possible reason could be that markers were not in complete linkage disequilibrium with causal genes, and thus could not fully account for the total genetic variance. Another reason could be that the data used in the analysis were not a random sample, but selected data.
The reliability of prediction for protein was improved when the genotyped cows were included in the reference, as it clearly enlarged the size of reference population, which is the most important factor affecting prediction reliability (
). However, we observed that GEBV were more inflated when both genotyped cows and genotyped bulls were used as reference population. Inflation may be caused by preferential treatment of cows included in the reference population (
; in their study, the regression coefficients of DRP on GEBV of protein decreased from 0.86 to 0.83 when the cows were added into the reference population. On the other hand, bias of predicted genetic trend was reduced when genotyped cows were included into the reference population. The reason could be that the cows which were bull dams and sibs of test bulls may account for the contribution of the test bulls’ dam to the bulls.
Previous studies reported that including a polygenic effect in a SNP-BLUP model or Bayesian model led to less inflation of GEBV (
). The regression coefficient was improved from 0.69 to 0.72 in the current study when the polygenic effect was included in the model (GBLUPWBull). However, the bias of predicted genetic trend was not reduced compared with the model without polygenic effect.
The single-step model used cows’ deregressed EBV as a response variable rather than the raw phenotypic data. However, the effect of genomic preselection is minor in the current Jersey data and the deregressed cow EBV should not be biased at all. Therefore, the results could be considered single-step prediction using raw data. However, if genomic preselection is used in breeding schemes, the EBV estimated using pedigree will be biased. In this case, it is better to use raw data as a response variable in single-step approach. The single-step prediction, which used all the females’ DRP and pedigree as well as genotypes from genotyped bulls and cows, increased the reliability and reduced inflation of GEBV and bias of predicted genetic trend. These results were consistent with previous reports (
). As DRP of nongenotyped animals also contributes to the prediction through a combined matrix, the prediction reliability was improved. Moreover, single-step models could reduce bias of predicted genetic trend by including all the records to trace selection (
). Similar to a GBLUP model including genotyped cows in the reference data, the regression coefficient decreased when the cow genotypes were included into the single-step approach. The selection index blending (
) with the same information used in the single-step approach without cow genotype data was compared with single-step approach in our study (data not shown). The prediction reliability was 0.32, which was higher than reliability of GEBV directly from GBLUPBull but lower than scenario SSP even though the information used in these 2 methods was the same. The bias of predicted genetic trend was corrected for individuals born in 2005 and 2006, but not for the individuals born after 2006 when the blending index was used. Genomic relationship matrix was modified with pedigree relationship matrix in single-step approach. As the pedigree relationship has influence on the regression coefficient, to be consistent, the scenario GBLUPWBull was investigated. The results from GBLUPWBull showed GBLUP model with 20% of the pedigree relationship matrix did not increase the prediction reliability and reduce bias of predicted genetic trend. However, the regression coefficients were improved by the weighted G matrix (from 0.69 of GBLUPBull to 0.72 of GBLUPWBull) and by the single-step approach (from 0.72 of GBLUPWBull to 0.78 of SSP). These results suggest that using a single-step method is an effective approach to increase the prediction reliability and reduce the bias of predicted genetic trend.
Including the year of birth effect reduced the bias of predicted genetic trend and improved the regression coefficients. The reason could be that the year effect partly accounted for the trend of selection among the dams. The GEBV together with the year effect captured the genetic progress across years, which led to a robust estimation of genetic trend (
The mean of GEBV adjusted for the difference between dam EBV and MGS EBV were much closer to the mean of DRP in the test population compared with the GEBV without adjustment. The reliability was improved greatly, which may have been caused by a possible autocorrelation between dam EBV and the progeny DRP. However, the regression coefficients deviated more from unity, which may have been caused by the preferential treatment of selected cows. Bias of prediction trend was corrected in a form of large inflation of the GEBV. Therefore, it is not a good approach to correct for bias of predicted genetic trend.
The results from the current study indicate that the regression coefficient, which has mainly been used in previous studies (
Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations.
J. Dairy Sci.2012; 95 (a http://dx.doi.org/10.3168/jds.2012-5379): 4657-4665
), should not be the only criterion to measure the unbiasedness of predictions. The regression coefficient is not always consistent with the bias of predicted genetic trends. As the prediction trend is important when the individuals across generations are compared, it should also be included in the evaluation criteria. The year mean of DRP could be expressed as the year mean of GEBV times the regression coefficients plus the intercept. Therefore, the bias of predicted genetic trend could be predicted using the regression coefficient and intercept. Therefore the intercept together with the regression coefficients should be given attention in genomic prediction.
Conclusions
The main reason for the bias of predicted genetic trend could be that the reference animals did not have all the information required to trace selection, especially the information of dams. Consequently, methods using more information related to selection can reduce the bias. The most efficient way is to implement a single-step approach for genomic prediction, as the single-step approach increased the prediction reliability, improved the regression coefficients, and led to an unbiased prediction trend. As bias of predicted genetic trends can be measured by the intercept and regression coefficient of observations on GEBV, both intercept and regression coefficients should be taken into consideration in validation of genomic predictions.
Acknowledgments
This work was performed within the project “Genomics in herds,” funded by VikingGenetics (Randers, Denmark) and Nordic Cattle Genetic Evaluation (Aarhus, Denmark).
References
Aguilar I.
Misztal I.
Johnson D.L.
Legarra A.
Tsuruta S.
Lawlor T.J.
Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
J. Dairy Sci.2010; 93 (http://dx.doi.org/10.3168/jds.2009-2730): 743-752
Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations.
J. Dairy Sci.2012; 95 (a http://dx.doi.org/10.3168/jds.2012-5379): 4657-4665