Invited Review: Reliability of genomic predictions for North American Holstein bulls
Article Outline
- Abstract
- Introduction
- Materials and Methods
- Results and Discussion
- Conclusions
- Acknowledgments
- Supplementary data
- References
- Copyright
Abstract
Genetic progress will increase when breeders examine genotypes in addition to pedigrees and phenotypes. Genotypes for 38,416 markers and August 2003 genetic evaluations for 3,576 Holstein bulls born before 1999 were used to predict January 2008 daughter deviations for 1,759 bulls born from 1999 through 2002. Genotypes were generated using the Illumina BovineSNP50 BeadChip and DNA from semen contributed by US and Canadian artificial-insemination organizations to the Cooperative Dairy DNA Repository. Genomic predictions for 5 yield traits, 5 fitness traits, 16 conformation traits, and net merit were computed using a linear model with an assumed normal distribution for marker effects and also using a nonlinear model with a heavier tailed prior distribution to account for major genes. The official parent average from 2003 and a 2003 parent average computed from only the subset of genotyped ancestors were combined with genomic predictions using a selection index. Combined predictions were more accurate than official parent averages for all 27 traits. The coefficients of determination (R2) were 0.05 to 0.38 greater with nonlinear genomic predictions included compared with those from parent average alone. Linear genomic predictions had R2 values similar to those from nonlinear predictions but averaged just 0.01 lower. The greatest benefits of genomic prediction were for fat percentage because of a known gene with a large effect. The R2 values were converted to realized reliabilities by dividing by mean reliability of 2008 daughter deviations and then adding the difference between published and observed reliabilities of 2003 parent averages. When averaged across all traits, combined genomic predictions had realized reliabilities that were 23% greater than reliabilities of parent averages (50 vs. 27%), and gains in information were equivalent to 11 additional daughter records. Reliability increased more by doubling the number of bulls genotyped than the number of markers genotyped. Genomic prediction improves reliability by tracing the inheritance of genes even with small effects.
Key words: genomic selection, genomic prediction, reliability, evaluation accuracy
Introduction
Genomic predictions combine genotypic, phenotypic, and pedigree data to increase the accuracy of estimates of genetic merit and to decrease generation interval. Traditional genetic evaluations combine only phenotypic data and probabilities that genes are identical by descent from pedigree data instead of tracing the inheritance of individual genes. Widely spaced markers could indicate the sharing of long chromosome segments within closely related family members, but could not detect the many minor genetic effects shared by distant relatives. Marker genotypes for thousands of loci across the genome can measure genetic similarity more precisely (Meuwissen, 2007). Markers that are identical in state may be shared through common ancestors earlier than those in the known pedigree.
Genetic effects must exist somewhere on the chromosomes for any trait with a nonzero heritability. Previously, marker-assisted selection was used to trace the inheritance of only a few major genes. Success was limited, and few studies reported substantial gains with real populations (Dekkers, 2004). More recently, selection in France resulted in R2 with eventual daughter evaluations that were 5 to 19% greater for genomic predictions than for parent averages (PA; Boichard et al., 2006). Expected gains in reliability from simulation were slightly lower (Guillaume et al., 2008).
A recently developed high-density assay of SNP can now be used to trace even small genetic effects (Van Tassell et al., 2008). “Fairly soon, as information continues to accumulate, a point will be reached where there is sufficient information on marker/QTL coupling in the ancestors of the candidate bulls, to eliminate the progeny testing step altogether, and shift completely to MAS as the primary means of selection among young sires” (Soller, 1994). Although that prediction did not come true in the decade after it was made, large gains are theoretically possible and could come true given sufficient numbers of genotyped animals and markers.
Objectives of this research were 1) to apply genomic prediction methods to genotypes for a large population of Holstein bulls, 2) to estimate gains in reliability from using genomic evaluations instead of traditional evaluations and PA, and 3) to document how the density of markers and numbers of genotyped bulls affect predictive ability.
Materials and Methods
Bull Population
The Holstein bulls to be genotyped were categorized as predictor bulls (3,576 bulls born from 1952 through 1998) or predicted bulls (1,759 bulls born from 1999 through 2002). Information from the predictor bulls was used to compute predictions, which were tested with the predicted bulls. These 2 groups can also be described as training data and test data, respectively (Meuwissen, 2007). The predictor bulls included mainly bulls born from 1995 through 1998 and ancestors of those bulls. The exact age distribution of the 5,335 genotyped bulls is in Table 1.
Table 1. Distribution of bulls selected for genotyping (n = 5,335) by birth year of bull
| Bull category | Birth year | Bulls (n) |
|---|---|---|
| Predictor (n = 3,576) | 1952 to 1959 | 10 |
| 1960 to 1969 | 17 | |
| 1970 to 1979 | 41 | |
| 1980 to 1989 | 545 | |
| 1990 | 73 | |
| 1991 | 57 | |
| 1992 | 50 | |
| 1993 | 56 | |
| 1994 | 148 | |
| 1995 | 686 | |
| 1996 | 666 | |
| 1997 | 840 | |
| 1998 | 387 | |
| Predicted (n = 1,759) | 1999 | 394 |
| 2000 | 384 | |
| 2001 | 605 | |
| 2002 | 376 |
Predictor bulls were required to have a reliability of at least 75% for net merit in August 2003, and predicted bulls were required to have information from ≥10 daughters in their evaluations by April 2008. An initial proposal was to select predictor bulls with the most extreme evaluations to help identify major genes, but selective genotyping was not used so that the selected bulls would be more representative of the general population and give more realistic estimates of achievable prediction accuracy.
Genomic Data
The main source of extracted DNA was from semen held in the Cooperative Dairy DNA Repository maintained by the Bovine Functional Genomics Laboratory, ARS, USDA (Beltsville, MD). All major AI organizations routinely contributed semen samples to the repository when young bulls were enrolled in progeny testing in the United States (Ashwell and Van Tassell, 1999). Also, Semex Alliance (Guelph, Ontario, Canada) routinely contributed semen and DNA from young bulls tested exclusively in Canada. Semen from significant ancestor bulls was purchased independently or was provided by the National Center for Genetic Resources Preservation, ARS, USDA (Fort Collins, CO), and genotyped to help trace genetic inheritance.
Marker genotypes were obtained using the BovineSNP50 BeadChip (Illumina, San Diego, CA). Markers on the chip were selected to be evenly distributed across chromosomes and polymorphic across a variety of breeds included in the International Bovine HapMap Project (International Bovine HapMap Consortium, 2006). Extraction of DNA and genotyping was conducted by the Bovine Functional Genomics Laboratory, ARS, USDA (Beltsville, MD); Division of Animal Sciences, University of Missouri (Columbia, MO); Department of Agricultural, Food and Nutritional Science, University of Alberta (Edmonton, Canada); GeneSeek (Lincoln, NE); Genetics & IVF Institute (Fairfax, VA); and Illumina Inc. (San Diego, CA). Scoring of marker genotypes was done using Illumina's Beadstudio software (v3.2.23).
For most SNP, genotypes were read for >99% of bulls with <0.1% error rates among those read. Success rate was quantified by agreement of son with sire genotypes and by reading the same DNA more than once for 9 individual bulls, for 2 pairs of identical twins, and for a trio of clones. Some SNP had minor allele frequencies of <0.05 in Holsteins and were excluded. That edit reduced the number of loci to 40,426 from the original 51,386 SNP that could be reliably read. Single nucleotide polymorphisms with lower minor allele frequencies may be included in future analyses because their effects are estimated more accurately as sample size increases.
Each SNP was compared with all others to eliminate those that were redundant (correlation of 1) because of complete linkage disequilibrium. Of the selected SNP, 2,010 had an inheritance pattern identical to another SNP for all 5,335 bulls or had <10 differences. Those “duplicate” SNP, which had physical distances between loci only half as large as the mean distance between adjacent loci, were removed leaving 38,416 loci for genomic predictions. Allele frequencies in the base (founder) population were estimated using the algorithm of Gengler et al. (2007) that solves for gene content of nongenotyped ancestors and descendants using pedigrees. The pedigree file with all known ancestors of the 5,335 bulls included 41,414 cows and bulls. The genotype file included 205
million known and 2.0
million (1%) unknown genotypes. For genotyped animals, missing genotypes were set to 0, 1, or 2 (number of the counted allele present) if the allele count estimated from relatives for that SNP was different by ≤0.20 from 0, 1, or 2, respectively. Using this process, 974,961 (49%) of the 2
million missing genotypes were imputed. Equations of VanRaden (2008) allowed distinction between known and missing genotypes, but the alternative of regressing on probabilities for all genotypes could increase accuracy and should be examined.
Official genetic evaluations were combined with genomic data secondarily instead of analyzing phenotypic records directly. All results were expressed on the US scale and included multitrait across-country evaluations from the Interbull Centre (Uppsala, Sweden) for bulls that had been progeny tested in Canada. Official evaluations for predictor bulls were obtained from August 2003 when the predicted bulls were 1 to 4 yr old. The dependent variable for analysis was daughter deviation weighted by reliability from daughters, which was computed from total daughter equivalents minus daughter equivalents from PA.
Genomic Predictions
Predictions were computed using linear and nonlinear genomic models (VanRaden, 2007, 2008). For linear predictions, the traditional additive genetic relationship matrix is replaced by a genomic relationship matrix and is equivalent to assigning equal genetic variance to all markers. For nonlinear predictions, markers with smaller effects are regressed further toward zero; markers with larger effects are regressed less to account for a nonnormal prior distribution of marker effects. Differing assumptions about numbers and sizes of QTL effects could result in better predictions than those of this initial test.
Genomic predictions and PA calculated from August 2003 data of older animals were compared for ability to predict April 2008 evaluations for younger bulls for 27 traits: milk, fat, and protein yields; fat and protein percentages; productive life; SCS; daughter pregnancy rate; sire and daughter calving ease; final score; stature; strength; body depth; dairy form; foot angle; rear legs (side and rear views); rump angle and width; fore udder; rear udder height; udder depth and cleft; front teat placement; teat length; and net merit. The experimental design provided an independent, realistic test by separating early daughter information of ancestors used to compute predictions from later daughter information of descendants used to assess prediction accuracy.
Because 2003 PA had not been stored for type traits or for calving ease, 2003 pedigree indexes (PI) constructed as 0.5(sire PTA) + 0.25(maternal grandsire PTA) + 0.25(birth year mean PTA) were substituted for PA for those traits. Reliability of PI is lower than that of PA, especially for highly heritable traits, because records for the dam are excluded. The 2008 PA was not substituted for the 2003 PA because then the son's information would have added to his dam's reliability.
Direct genomic predictions included less phenotypic information than the official PA because genotypes were available and evaluations were included for only a subset of the total population. Some sires and grandsires of the predicted bulls were not genotyped, and none of their dams were genotyped. For comparison with genomic predictions, a second set of PA for predicted bulls was computed using traditional relationships with only the subset of genotyped ancestors (evaluations of nongenotyped ancestors were excluded from PA). Information from the other relatives was included after all other processing.
Final genomic predictions for predicted bulls combined 3 terms by selection index: 1) direct genomic prediction; 2) PA computed from the subset of genotyped ancestors using traditional relationships; and 3) published PA or PI. The selection index for the predictor bulls included: 1) direct genomic prediction; 2) subset PTA; and 3) published PTA. Some of the predicted bulls already had PTA for service sire calving ease by August 2003, and in that case, their published PTA from 2003 was used to compute the combined PTA. To avoid a part-whole correlation between 2003 and 2008 data, only the 552 bulls with no progeny by 2003 were used to test predictions for sire calving ease.
For each bull, a 3
×
3 symmetric matrix V was set up with reliabilities for the 3 terms on the diagonals and the following functions of those 3 reliabilities on the off-diagonals:

Regressions and correlations were used to test predictions. A bull's published PTA is a weighted mean of his daughter deviation and his PA, and the use of deregressed evaluations or daughter deviations as dependent variables helps to avoid part-whole correlations with PA. Because daughter deviations as defined by VanRaden and Wiggans (1991) were not available for all traits, daughter deviations were computed as deregressed evaluations:

Genomic reliabilities were calculated in 2 ways. Expected genomic reliabilities were obtained by inverting mixed model equations that included genomic instead of traditional relationships. Realized genomic reliabilities were calculated from R2 of 2003 predictions with 2008 daughter deviations after adjusting for error variance in the daughter deviations and for prior selection on pedigree. The R2 from PA and from the nonlinear model were divided by mean reliability of daughter deviations (Rdau), and then the difference between the published and observed PA reliability was added to the adjusted genomic R2 to obtain the realized genomic reliability. Mathematically

Sex Chromosomes
The X chromosome of a bull is inherited by all of his daughters but by none of his sons. Thus, 2 estimates of his genetic merit can be provided: PTA for his daughters is the sum of all marker effects, whereas PTA for his sons excludes effects of 605 markers on the X chromosome. Another 44 markers were located on the pseudo-autosomal region of X and included in the autosomal sum rather than the X chromosome sum. Fewer SNP have been identified on the X chromosome, and the spacing between markers is about 3 times greater than on the autosomes.
Cows also can have different PTA for daughters than for sons. For cows, effects on the X chromosome are doubled for producing sons because the X chromosome transmitted to sons will be transmitted to 50% of granddaughters instead of the 25% expected for autosomes.
Son merit for bulls was constructed as twice the mean of his sons’ daughter deviations adjusted for PTA of the sons’ dams. For 796 genotyped bulls that had ≥10 evaluated sons, differences between PTA from daughters and mean of sons’ PTA were used to test if estimated effects for net merit on the X chromosome were statistically significant. Another test included only the autosomal and pseudo-autosomal markers in the genotype file and compared predictions computed with and without the 605 markers on X.
Numbers of Bulls and SNP
More predictor bulls can increase reliability by providing more data to estimate each SNP effect. Large numbers of records are required to estimate the small effects of individual genes accurately. Numbers of bulls were compared using subsets of the bull genotypes as they became available. Net merit R2 values for younger bulls were compared using 3 progressively larger subsets that included 1,402, 2,391, and 3,319 bulls. Methods used were the same as for the full set of 5,335 bulls.
More markers can increase the accuracy of genomic selection by providing SNP located closer to the causative genes. Three SNP densities were compared using the same methods and genotypes for the full set of predictor and predicted bulls. The edited set of 38,416 SNP with >5% minor allele frequency in Holsteins (designated as 40K) was compared with subsets of exactly 50 or 25% of those SNP: 19,208 (20K) or 9,604 (10K). The 20K and 10K subsets were obtained by keeping every other or every fourth SNP sequentially across each chromosome, respectively. Results for 5 yield traits (milk, fat, and protein yields and fat and protein percentages), 3 fitness traits (productive life, SCS, and daughter pregnancy rate), and net merit were obtained using the nonlinear genomic model.
Results and Discussion
Genomic predictions increased model R2 (P
<
0.0001) compared with use of PA alone for all 26 traits and for net merit. The R2 of daughter deviations with PA and with linear and nonlinear genomic predictions are reported in Table 2 for each trait. The greatest gains in R2 from using nonlinear genomic predictions rather than PA were for fat and protein percentages, fat yield, and udder depth.
Table 2. Coefficients of determination (R2
×
100) for 2008 daughter deviations with 2003 predictions
| Trait | Traditional parent average | Genomic prediction | Gain from nonlinear genomic prediction compared with parent average | ||
|---|---|---|---|---|---|
| Linear | Nonlinear | Difference1 | |||
| Net merit | 11 | 28 | 28 | 0 | 17 |
| Milk yield | 28 | 47 | 49 | 2 | 21 |
| Fat yield | 15 | 42 | 44 | 2 | 29 |
| Protein yield | 27 | 47 | 47 | 0 | 20 |
| Fat percentage | 25 | 55 | 63 | 8 | 38 |
| Protein percentage | 28 | 51 | 58 | 7 | 30 |
| Productive life | 17 | 26 | 27 | 1 | 10 |
| SCS | 23 | 37 | 38 | 1 | 15 |
| Daughter pregnancy rate | 20 | 30 | 29 | −1 | 9 |
| Sire calving ease | 17 | 21 | 22 | 1 | 5 |
| Daughter calving ease | 14 | 22 | 22 | 0 | 8 |
| Final score | 23 | 35 | 36 | 1 | 13 |
| Stature | 27 | 49 | 50 | 1 | 23 |
| Strength | 16 | 33 | 34 | 1 | 18 |
| Body depth | 17 | 36 | 37 | 1 | 20 |
| Dairy form | 9 | 29 | 28 | −1 | 19 |
| Foot angle | 13 | 23 | 21 | −2 | 8 |
| Rear legs (side view) | 10 | 27 | 27 | 0 | 17 |
| Rear legs (rear view) | 11 | 21 | 19 | −2 | 8 |
| Rump angle | 20 | 44 | 43 | −1 | 23 |
| Rump width | 19 | 38 | 36 | −2 | 17 |
| Fore udder | 17 | 39 | 40 | 1 | 23 |
| Rear udder height | 20 | 35 | 36 | 1 | 16 |
| Udder depth | 18 | 47 | 46 | −1 | 28 |
| Udder cleft | 18 | 30 | 30 | 0 | 12 |
| Front teat placement | 22 | 41 | 42 | 1 | 20 |
| Teat length | 12 | 35 | 34 | −1 | 22 |
| All | 19 | 36 | 37 | 1 | 18 |
1Nonlinear minus linear genomic prediction. |
The greatest marker effects were for fat percentage on Bos taurus autosome (BTA) 14 flanking the acyl-CoA:diacylglycerol acyltransferase 1 gene (Grisart et al., 2004), with lesser effects for milk and fat yields. Large marker effects for protein percentage were also present on BTA 6 flanking the ATP-binding cassette, subfamily G, member 2 gene (Cohen-Zinder et al., 2005). Detection of those effects demonstrates that genomic predictions work by tracking the inheritance of causal mutations. A previous analysis of markers on BTA 14 (de Roos et al., 2007) obtained similar results. Markers on BTA 18 centered on marker ARS-BFGL-NGS-109285 had the greatest effects for several traits: productive life, sire calving ease, daughter calving ease, rump width, stature, strength, and body depth. Another marker on BTA 18 had the largest effect on net merit in the region previously identified by Ashwell et al. (2004) as having a large effect on daughter pregnancy rate.
Marker effects for most other traits were evenly distributed across all chromosomes with only a few regions having larger effects, which may explain why the infinitesimal model and standard quantitative genetic theories have worked well. The distribution of marker effects indicates primarily polygenic rather than simple inheritance and suggests that the favorable alleles will not become homozygous quickly, and genetic variation will remain even after intense selection. Thus, dairy cattle breeders may expect genetic progress to continue for many generations.
Nonlinear and linear predictions were correlated by >0.99 for most traits. The nonlinear genomic model had little advantage in R2 over the linear model except for fat and protein percentages with increases of 8 and 7%, respectively (Table 2). Gains in R2 averaged 3% with simulated data (VanRaden, 2008) but generally were smaller with real data, which indicated that most traits are influenced by more loci than the 100 QTL used in simulation. The R2 improved when the prior assumption was that all markers have some effect rather than that most have no effect. Results comparing differing priors and a detailed summary of the locations of markers with largest effects for each trait were reported by Cole et al. (2008). Further nonlinear optimization procedures should be investigated and could result in larger advantages than those tested here.
Actual R2 may differ from expected reliability for 5 main reasons: 1) daughter deviations contain error, especially for lowly heritable traits, resulting in lower R2 than reliability; 2) selection of elite parents decreases R2 for directly selected traits, such as net merit, whereas published reliabilities assume no selection; 3) genetic effects may reside between the markers but are assumed to be located only at the markers; 4) gains in R2 may have large standard errors because of limited numbers of predicted bulls; and 5) a few genotypes are missing or read incorrectly. Observed gains in R2 were adjusted for effects of 1) and 2) to compute observed reliability, but no theoretical adjustments were available to correct expected gains in reliability for effects of 3), 4), and 5).
Gains in reliability from genotyping the predicted bulls are shown in Table 3 and averaged 23% across traits with a range from 8 to 43%. Gains were also converted to daughter equivalents or the number of phenotyped daughters that would provide the same increase in reliability. Daughter equivalents were calculated from the published heritability of each trait and averaged 11 for predicted bulls (Table 4). Gains in reliability were uniform across traits. Gains in daughter equivalents were smaller for traits with greater heritability than for traits with lesser heritability because each daughter equivalent is worth more for more heritable traits. For net merit, observed reliability of PA was less than theoretical reliability because of intense selection and because the net merit index was changed in 2006 to include stillbirths. Fat yield and some conformation traits also had lower observed than published reliability of PA.
Table 3. Expected and observed reliabilities (%) from genomic predictions and from parent average
| Trait | Parent average | Genomic prediction | Gain from nonlinear genomic prediction compared with published parent average | |||
|---|---|---|---|---|---|---|
| Published | Observed | Expected | Linear | Nonlinear | ||
| Net merit | 30 | 14 | 67 | 53 | 53 | 23 |
| Milk yield | 35 | 32 | 69 | 56 | 58 | 23 |
| Fat yield | 35 | 17 | 69 | 65 | 68 | 33 |
| Protein yield | 35 | 31 | 69 | 58 | 57 | 22 |
| Fat percentage | 35 | 29 | 69 | 69 | 78 | 43 |
| Protein percentage | 35 | 32 | 69 | 62 | 69 | 34 |
| Productive life | 27 | 28 | 55 | 42 | 45 | 18 |
| SCS | 30 | 29 | 62 | 49 | 51 | 21 |
| Daughter pregnancy rate | 25 | 33 | 52 | 41 | 41 | 16 |
| Sire calving ease | 27 | 26 | 60 | 33 | 35 | 8 |
| Daughter calving ease | 25 | 24 | 54 | 39 | 40 | 15 |
| Final score | 24 | 31 | 63 | 40 | 42 | 18 |
| Stature | 25 | 32 | 67 | 51 | 51 | 26 |
| Strength | 24 | 21 | 63 | 48 | 49 | 25 |
| Body depth | 24 | 23 | 63 | 50 | 51 | 27 |
| Dairy form | 24 | 12 | 62 | 52 | 49 | 25 |
| Foot angle | 23 | 20 | 58 | 40 | 37 | 14 |
| Rear legs (side view) | 24 | 14 | 62 | 47 | 46 | 22 |
| Rear legs (rear view) | 23 | 18 | 57 | 38 | 35 | 12 |
| Rump angle | 25 | 24 | 66 | 53 | 52 | 27 |
| Rump width | 24 | 25 | 62 | 49 | 47 | 23 |
| Fore udder | 24 | 22 | 63 | 53 | 54 | 30 |
| Rear udder height | 24 | 27 | 63 | 44 | 45 | 21 |
| Udder depth | 25 | 22 | 64 | 61 | 60 | 35 |
| Udder cleft | 24 | 26 | 61 | 41 | 41 | 17 |
| Front teat placement | 24 | 28 | 63 | 49 | 50 | 26 |
| Teat length | 25 | 15 | 65 | 52 | 51 | 26 |
| All | 27 | 25 | 63 | 49 | 50 | 23 |
Table 4. Heritabilities and daughter equivalents from genomic prediction and from parent average
| Trait | Heritability | Daughter equivalents | ||
|---|---|---|---|---|
| Parent average | Genomic prediction | Gain from genomic prediction compared with parent average | ||
| Net merit | 0.20 | 8 | 20 | 12 |
| Milk yield | 0.30 | 6 | 16 | 10 |
| Fat yield | 0.30 | 6 | 24 | 18 |
| Protein yield | 0.30 | 6 | 15 | 9 |
| Fat percentage | 0.50 | 3 | 22 | 19 |
| Protein percentage | 0.50 | 3 | 13 | 10 |
| Productive life | 0.08 | 18 | 39 | 21 |
| SCS | 0.12 | 14 | 32 | 18 |
| Daughter pregnancy rate | 0.04 | 32 | 67 | 35 |
| Sire calving ease | 0.09 | 16 | 24 | 8 |
| Daughter calving ease | 0.06 | 21 | 41 | 20 |
| Final score | 0.29 | 4 | 8 | 5 |
| Stature | 0.42 | 3 | 8 | 5 |
| Strength | 0.31 | 3 | 10 | 7 |
| Body depth | 0.37 | 3 | 9 | 6 |
| Dairy form | 0.29 | 4 | 12 | 8 |
| Foot angle | 0.15 | 7 | 14 | 7 |
| Rear legs (side view) | 0.21 | 5 | 14 | 9 |
| Rear legs (rear view) | 0.11 | 10 | 19 | 9 |
| Rump angle | 0.33 | 3 | 11 | 8 |
| Rump width | 0.26 | 4 | 12 | 8 |
| Fore udder | 0.29 | 4 | 14 | 10 |
| Rear udder height | 0.28 | 4 | 10 | 6 |
| Udder depth | 0.28 | 4 | 18 | 14 |
| Udder cleft | 0.24 | 5 | 10 | 5 |
| Front teat placement | 0.26 | 4 | 13 | 9 |
| Teat length | 0.26 | 4 | 14 | 10 |
| All | 0.25 | 8 | 19 | 11 |
Reliability of predictor bulls also increased slightly when genomic predictions replaced traditional PTA. Predictions were better (P
<
0.001) for 26 of 27 traits for bulls that added daughters and had an increase in reliability of ≥10% from 2003 to 2008. Only the gain for service-sire calving ease was nonsignificant. Gains in reliability for cows with records should be intermediate between those for young bulls and proven bulls because traditional reliabilities for most cows are only somewhat greater than their PA reliabilities.
Selection index regressions were fairly uniform for all predicted bulls even though separate 3
×
3 matrices were used for each. Mean regression coefficients for the direct prediction, subset PA, and published PA were 0.99, −0.52, and 0.53, respectively. The selection index regressions are a function of the mean reliabilities for the predicted bulls. Inclusion of the subset PA allows the difference between genomic and traditional predictions (for the same subset of data) to be added to the published PA, which included all national and international data. As genotypes and phenotypes are included for more parents, regressions should approach 1 for the direct genomic prediction and approach 0 for the 2 PA terms.
Genomic predictions were expected to have the same mean as traditional evaluations, but their standard deviation (SD) was expected to increase in proportion to the increased accuracy. Thus, the SD of change from PA to genomic prediction should equal the SD of true transmitting ability multiplied by the square root of the gain in reliability for each trait, where reliability is expressed as a fraction (divided by 100) rather than a percentage. That formula can be applied to gains in reliability from any source of information (daughters, animal's own records, and so on). Genomic predictions follow most of the same normal distribution formulas that animal breeders are already using.
Most animal breeders will conclude that these gains in reliability are sufficient to make genotyping profitable before breeders invest in progeny testing or embryo transfer. Rates of genetic progress should increase substantially as breeders take advantage of these new tools for improving animals (Schaeffer, 2008). Further increases in number of genotyped bulls, revisions to the statistical methods, and additional edits should increase the precision of future genomic predictions.
Sex Chromosomes
Effects on the X chromosome were smaller than expected; SD was about 0.1 genetic SD and accounted for only about 1% of genetic variance for most traits. However, those effects were associated (P
<
0.0001) with differences between genetic merit of bull sons compared with bull daughters. Official PTA measures daughter genetic merit almost entirely because most sires have many more daughters than sons with data. For net merit, the regression on X effect was −1.3 with an SD of 0.3, which was close to the theoretical value of −1.0. Predictions computed without the markers on X had slightly lower R2 for 8 of 9 traits than for the full set (Table 5).
Table 5. Coefficients of determination (R2
×
100) for parent average and for genomic predictions with differing numbers of markers
| Trait | Parent average | Number of markers | Without X | ||
|---|---|---|---|---|---|
| 9,604 (10K) | 19,208 (20K) | 38,416 (40K) | 37,811 | ||
| Net merit | 11 | 25 | 26 | 28 | 27 |
| Milk yield | 28 | 45 | 47 | 49 | 47 |
| Fat yield | 15 | 41 | 43 | 44 | 43 |
| Protein yield | 27 | 45 | 46 | 47 | 46 |
| Fat percentage | 25 | 59 | 61 | 63 | 62 |
| Protein percentage | 28 | 48 | 53 | 58 | 53 |
| Productive life | 17 | 24 | 25 | 27 | 26 |
| SCS | 23 | 34 | 36 | 38 | 36 |
| Daughter pregnancy rate | 20 | 27 | 28 | 29 | 29 |
Previous research with North American evaluations (Boettcher et al., 2001) indicated little genetic variation on the X chromosome, but many markers and many bulls now allow tracking even those small amounts of variation. Significant marker effects were detected on the X chromosome for several traits in the Netherlands (Sandor et al., 2006), and they also recommended using sex-linked markers in genomic evaluation.
Numbers of Bulls and SNP
For bull subsets (Table 6), gains in R2 for net merit were nearly linear with increasing numbers of predictor bulls. Gains for most other individual traits (not shown) followed that same pattern. Although linear increases cannot continue indefinitely, the results suggest that genotyping additional predictor bulls will be profitable and that genomic selection within small populations will not achieve the large gains obtained for the North American Holstein population.
Table 6. Coefficients of determination (R2
×
100) for parent average and for genomic prediction of net merit for bull subsets
| Bull subset | Parent average | Genomic prediction | Gain from genomic prediction compared with parent average | |
|---|---|---|---|---|
| Predictor bulls, n | Predicted bulls, n | |||
| 1,151 | 251 | 8 | 12 | 4 |
| 2,130 | 261 | 8 | 17 | 9 |
| 2,609 | 510 | 8 | 21 | 13 |
| 3,576 | 1,759 | 11 | 28 | 17 |
Greater SNP densities gave more accurate predictions for all 9 traits (Table 5). The R2 values were greater for each trait for the 40K SNP set compared with the 20K SNP subset and for the 20K SNP subset compared with the 10K SNP subset. Compared with the gain in R2 from PA to 40K SNP density, 10K SNP density provided about 80% of the gain, and 20K SNP density provided about 90%.
In a preliminary study with fewer bulls, differences in R2 between 20K and 40K SNP densities were not consistent or significant. Gains in reliability were expected from estimates of linkage disequilibrium for North American Holsteins (Sargolzaei et al., 2008) and from simulation studies (Calus et al., 2008). Although SNP density is already high, actual QTL are between the SNP, which may explain why most realized reliabilities were less than expected reliability. In the future, affordable SNP chips with greater density will likely become available and lead to further small increases in reliability.
The genetic history of the Holstein population may help to explain the results. Many animals share common DNA segments from Round Oak Rag Apple Elevation, Pawnee Farm Arlinda Chief, To-Mar Blackstar, and other popular ancestors occurring 4 to 10 generations back in current pedigrees. Few common ancestors occur >10 generations back because individual bulls had limited influence before AI with frozen semen began (Young and Seykora, 1996). Lengths of the shared chromosome segments are thus 0.10 to 0.25 of the mean chromosome length, and a few hundred markers per chromosome are adequate to trace those segments shared within families.
In the next generation, the common ancestors will be 1 generation further back, and more crossovers will occur between their adjacent alleles. If the allele effects estimated from families in this study were applied to less-related animals from other populations, predictions could be much less reliable. Divergent populations may require greater SNP densities. As more bulls are genotyped, more phenotypes will be available to estimate each effect. This will increase the value of having more SNP, but will also require the expense of genotyping the predictor bulls again using a denser chip.
Conclusions
Genomic methods let breeders determine which genes animals share. Genotypes for 3,576 predictor bulls and 1,759 predicted bulls were used to test predictive ability for genetic merit of 26 traits and net merit. Reliability for predicted bulls was 50% for genomic predictions versus 27% for traditional PA, a mean increase of 23% across traits. Gains from genomic data increased almost linearly with number of genotyped predictor bulls and also increased substantially as more SNP were included. Gains for proven bulls were also highly significant (P
<
0.001) but smaller because of greater initial reliabilities for proven bulls. Gains for young heifers should be nearly identical to those for young bulls, with the exception of some small effects on the X chromosome. Genomic predictions using all 5,335 proven bulls to predict the current generation of young bulls were distributed unofficially to animal owners in April 2008. Genomic predictions will be officially implemented in 2009 and will replace traditional PTA and PA from the animal model.
Acknowledgments
This project was supported by National Research Initiative Grants 2006-35205-16888 and 2006-35205-16701 from the USDA Cooperative State Research, Education, and Extension Service and by the National Association of Animal Breeders. The authors thank the following Animal Improvement Programs Laboratory (Beltsville, MD) staff: M. E. Tooker, L. M. Walton, and J. H. Megonigal Jr. for performing many of the computations, and J. B. Cole for providing suggestions on manuscript improvement. Two anonymous reviewers also provided many helpful suggestions.
Supplementary data
Interpretive summary.
References
- . Detection of quantitative trait loci affecting milk production, health, and reproductive traits in Holstein cattle. J. Dairy Sci. 2004;87:468–475
- . The Cooperative Dairy DNA Repository-A new resource for quantitative trait loci detection and verification. J. Dairy Sci. 1999;82(Suppl. 1):54;(Abstr.)
- . Evaluation of sire predicted transmitting abilities for evidence of X-chromosomal inheritance in North American sire families. J. Dairy Sci. 2001;84:256–265
- . Implementation of marker-assisted selection: Practical lessons from dairy cattle. In: Proc. 8th World Congr. Genet. Appl. Livest. Prod. Commun. 22-11. Instituto Prociencia, Belo Horizonte, Brazil. 2006;
- . Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008;178:553–561
- . Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle. Genome Res. 2005;15:936–944
- . Distribution and location of genetic effects for dairy traits. In: Proc. Interbull Mtg. Session: Use of Molecular Genomic Tools in Animal Breeding (accepted). ICAR, Rome, Italy. 2008;
- . Breeding value estimation for fat percentage using dense markers on Bos taurus autosome 14. J. Dairy Sci. 2007;90:4821–4829
- . Commercial application of marker- and gene-assisted selection in livestock: Strategies and lessons. J. Anim. Sci. 2004;82(E. Suppl.):E313–E328
- . A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal. 2007;1:21–28
- . Genetic and functional confirmation of the causality of the DGAT1 K232A quantitative trait nucleotide in affecting milk yield and composition. Proc. Natl. Acad. Sci. USA. 2004;101:2398–2403
- . Estimation by simulation of the efficiency of the French marker-assisted selection program in dairy cattle. Genet. Sel. Evol. 2008;40:91–102
- . An overview of the Bovine HapMap Project. In: Page 60 in Proc. 30th Int. Conf. Anim. Genet. ISAG 2006. Colégio Brasileiro de Reprodução Animal, Belo Horizonte, Brazil. 2006;
- . Genomic selection: Marker-assisted selection on a genome wide scale. J. Anim. Breed. Genet. 2007;124:321–322
- . Linkage disequilibrium on the bovine X chromosome: Characterization and use in quantitative trait locus mapping. Genetics. 2006;173:1777–1786
- . Extent of linkage disequilibrium in Holstein cattle in North America. J. Dairy Sci. 2008;91:2106–2117
- . Bull selection strategies using genomic estimated breeding values. In: Pages 35-41 in Tech. Presentations, 36th ICAR session, Niagara Falls, NY. Natl.. DHIA, Verona, WI. 2008;
- . Marker-assisted selection—An overview. Anim. Biotechnol. 1994;5:193–207
- . SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat. Methods. 2008;5:247–252
- . Genomic measures of relationship and inbreeding. Interbull Bull. 2007;37:33–36
- . Efficient methods to compute genomic predictions. J. Dairy Sci. 2008;91:4414–4423
- . Derivation, calculation, and use of national animal model information. J. Dairy Sci. 1991;74:2737–2746
- . Estimates of inbreeding and relationship among registered Holstein females in the United States. J. Dairy Sci. 1996;79:502–505
PII: S0022-0302(09)70305-4
doi:10.3168/jds.2008-1514
© 2009 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

