Advertisement
Research Article| Volume 95, ISSUE 8, P4657-4665, August 2012

Comparison of genomic predictions using medium-density (∼54,000) and high-density (∼777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations

      Abstract

      This study investigated genomic prediction using medium-density (∼54,000; 54 K) and high-density marker panels (∼777,000; 777 K), based on data from Nordic Holstein and Red Dairy Cattle (RDC). The Holstein data comprised 4,539 progeny-tested bulls, and the RDC data 4,403 progeny-tested bulls. The data were divided into reference data and test data using October 1, 2001, as a cut-off date (birth date of the bulls). This resulted in about 25% genotyped bulls in the Holstein test data and 20% in the RDC test data. For each breed, 3 data sets of markers were used to predict breeding values: (1) 54 K data set with missing genotypes, (2) 54 K data set where missing genotypes were imputed, and (3) imputed high-density (HD) marker data set created by imputing the 54 K data to the HD data based on 557 bulls genotyped using a 777 K single nucleotide polymorphism chip in Holstein, and 706 bulls in RDC. Based on the 3 marker data sets, direct genomic breeding values (DGV) for protein, fertility, and udder health were predicted using a genomic BLUP model (GBLUP) and a Bayesian mixture model with 2 normal distributions. Reliability of DGV was measured as squared correlations between deregressed proofs (DRP) and DGV corrected for reliability of DRP. Unbiasedness was assessed by regression of DRP on DGV, based on the bulls in the test data sets. Averaged over the 3 traits, reliability of DGV based on the HD markers was 0.5% higher than that based on the 54 K data in Holstein, and 1.0% higher than that in RDC. In addition, the HD markers led to an improvement of unbiasedness of DGV. The Bayesian mixture model led to 0.5% higher reliability than the GBLUP model in Holstein, but not in RDC. Imputing missing genotypes in the 54 K marker data did not improve genomic predictions for most of the traits.

      Key words

      Introduction

      One of the important factors affecting accuracy of genomic prediction is marker density (
      • Solberg T.R.
      • Sonesson A.K.
      • Woolliams J.A.
      • Meuwissen T.H.E.
      Genomic selection using different marker types and densities.
      ;
      • Habier D.
      • Fernando R.L.
      • Dekkers J.C.M.
      Genomic selection using low-density marker panels.
      ;
      • Meuwissen T.H.E.
      Accuracy of breeding values of ‘unrelated’ individuals predicted by dense SNP genotyping.
      ;
      • Weigel K.A.
      • de los Campos G.
      • Gonzalez-Recio O.
      • Naya H.
      • Wu X.L.
      • Long N.
      • Rosa G.J.M.
      • Gianola D.
      Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers.
      ). Higher marker density means that, on average, the markers are in stronger linkage disequilibrium (LD) with genes affecting the trait of interest, which should lead to better genomic predictions.
      Currently, a medium-density SNP chip with ∼54,000 markers (54 K;
      • Matukumalli L.K.
      • Lawley C.T.
      • Schnabel R.D.
      • Taylor J.F.
      • Allan M.F.
      • Heaton M.P.
      • O’Connell J.
      • Moore S.S.
      • Smith T.P.L.
      • Sonstegard T.S.
      • Van Tassell C.P.
      Development and characterization of a high density SNP genotyping assay for cattle.
      ) is widely used for genomic prediction in dairy cattle (
      • Su G.
      • Guldbrandtsen B.
      • Gregersen V.R.
      • Lund M.S.
      Preliminary investigation on reliability of genomic estimated breeding values in the Danish Holstein population.
      ;
      • VanRaden P.M.
      • Sullivan P.G.
      International genomic evaluation methods for dairy cattle.
      ;
      • Lund M.S.
      • de Ross S.P.
      • de Vries A.G.
      • Druet T.
      • Ducrocq V.
      • Fritz S.
      • Guillaume F.
      • Guldbrandtsen B.
      • Liu Z.
      • Reents R.
      • Schrooten C.
      • Seefried F.
      • Su G.
      A common reference population from four European Holstein populations increases reliability of genomic predictions.
      ). In 2010, a high-density (HD) SNP chip with ∼777,000 markers (777 K) was released (
      • Matukumalli L.K.
      • Schroeder S.
      • DeNise S.K.
      • Sonstegard T.
      • Lawley C.T.
      • Georges N.
      • Coppieters W.
      • Gietzen K.
      • Medrano J.F.
      • Rincon G.
      • Lince D.
      • Eggen A.
      • Glaser L.
      • Cam G.
      • Van Tassel C.
      Analyzing LD blocks and CNV segments in cattle: Novel genomic features identified using the BovineHD BeadChip. Pub. No. 370-2011-002.
      ). It is expected that using the HD markers will lead to more accurate genomic predictions than using the 54 K chip. However, simulation studies show that the advantage of HD markers in genomic prediction is large when few genes affect the trait (
      • Meuwissen T.
      • Goddard M.
      Accurate prediction of genetic values for complex traits by whole-genome resequencing.
      ) but very small in the case of a large number of genes affecting the trait (
      • VanRaden P.M.
      • O’Connell J.R.
      • Wiggans G.R.
      • Weigel K.A.
      Genomic evaluations with many more genotypes.
      ).
      Marker–QTL associations differ among populations. The differences depend on the genetic distances between populations (
      • Gautier M.
      • Faraut T.
      • Moazami-Goudarzi K.
      • Navratil V.
      • Fogfio M.
      • Grohs C.
      • Boland A.
      • Garnier J.G.
      • Boichard D.
      • Lathrop G.M.
      • Gut I.G.
      • Eggen A.
      Genetic and haplotypic structure in 14 European and African cattle breeds.
      ;
      • de Roos A.P.W.
      • Hayes B.J.
      • Spelman R.J.
      • Goddard M.E.
      Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.
      ,
      • de Roos A.P.W.
      • Hayes B.J.
      • Goddard M.E.
      Reliability of genomic predictions across multiple populations.
      ). The more closely related populations are, the more LD patterns are expected to be preserved among the populations. It has been reported that between Bos taurus cattle breeds, the LD phase is persistent only for marker pairs less than 10 kb apart (
      • Gautier M.
      • Faraut T.
      • Moazami-Goudarzi K.
      • Navratil V.
      • Fogfio M.
      • Grohs C.
      • Boland A.
      • Garnier J.G.
      • Boichard D.
      • Lathrop G.M.
      • Gut I.G.
      • Eggen A.
      Genetic and haplotypic structure in 14 European and African cattle breeds.
      ;
      • de Roos A.P.W.
      • Hayes B.J.
      • Spelman R.J.
      • Goddard M.E.
      Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.
      ). For the cattle genome, this requires a density of at least 300,000 markers. Thus, the benefit of changing from 54 K to HD markers should be more profound for genomic prediction across populations than within populations. In the Nordic dairy cattle joint genetic evaluation, the Red Dairy Cattle (RDC) population consists of Finnish Ayrshire, Swedish Red, and Danish Red. The Holstein population is mainly Danish Holstein. Therefore, the RDC population can be considered as a mixture of 3 populations, whereas the Holstein population can be taken as a single population. This leads to a hypothesis that the benefit for genomic prediction using HD markers rather than 54 K markers would be larger in the RDC population than in the Holstein population.
      The BLUP model (to estimate either SNP effects or individual additive genetic effects) is a popular approach in practical genomic evaluations using 54 K markers (
      • VanRaden P.M.
      • Van Tassell C.P.
      • Wiggans G.R.
      • Sonstegard T.S.
      • Schnabel R.D.
      • Taylor J.F.
      • Schenkel F.S.
      Invited review: Reliability of genomic predictions for North American Holstein bulls.
      ;
      • Harris B.L.
      • Johnson D.L.
      Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
      ;
      • Liu Z.T.
      • Seefried F.R.
      • Reinhardt F.
      • Rensing S.
      • Thaller G.
      • Reents R.
      Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction.
      ;
      • Su G.
      • Madsen P.
      • Nielsen U.S.
      • Mäntysaari E.A.
      • Aamand G.P.
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction for Nordic Red Cattle using one-step and selection index blending.
      ), because it is simple, has relatively low computational requirements, and performs as well as variable selection models for most traits (
      • Hayes B.J.
      • Bowman P.J.
      • Chamberlain A.J.
      • Goddard M.E.
      Invited review: Genomic selection in dairy cattle: Progress and challenges.
      ;
      • VanRaden P.M.
      • Van Tassell C.P.
      • Wiggans G.R.
      • Sonstegard T.S.
      • Schnabel R.D.
      • Taylor J.F.
      • Schenkel F.S.
      Invited review: Reliability of genomic predictions for North American Holstein bulls.
      ). Using HD markers, the number of unknowns in a prediction model increases dramatically. It is expected that variable selection models will predict genomic breeding values better than linear BLUP models because they can better attribute genetic variance to SNP in close LD with the QTL.
      The objective of this study was to compare genomic predictions using either imputed HD markers or current 54 K markers, applying either a linear BLUP model with genomic relationship matrix (genomic BLUP, GBLUP) or a Bayesian mixture model, based on the data from Nordic Holstein and RDC populations.

      Materials and Methods

      Data

      The data used in this analysis were genotypes and deregressed proofs (DRP) from Nordic Holstein and RDC populations. The DRP were derived from genetic evaluations in November 2010. The traits under analysis were protein yield, fertility, and udder health, which were the economically most important traits in the Nordic total merit index, and varied widely in heritability (from 0.04 for fertility and udder health to 0.39 for protein yield). The Holstein data comprised 4,539 progeny-tested bulls (mainly Danish Holstein), and the RDC data comprised 4,403 bulls (49.5% Finnish Ayrshire, 30.4% Swedish Red, 19.3% Danish Red, and 0.8% imported Red).
      The bulls were genotyped using the Illumina Bovine SNP50 BeadChip (Illumina Inc., San Diego, CA). Among the RDC bulls, 706 bulls (about one-third for each of the 3 RDC populations) were re-genotyped using the Illumina BovineHD BeadChip (777 K). For Holstein, 557 bulls in the EuroGenomics project (
      • Lund M.S.
      • de Ross S.P.
      • de Vries A.G.
      • Druet T.
      • Ducrocq V.
      • Fritz S.
      • Guillaume F.
      • Guldbrandtsen B.
      • Liu Z.
      • Reents R.
      • Schrooten C.
      • Seefried F.
      • Su G.
      A common reference population from four European Holstein populations increases reliability of genomic predictions.
      ) were re-genotyped using the HD chip. The 54 K genotypes were imputed to the HD genotypes using the Beagle package (
      • Browning B.L.
      • Browning S.R.
      A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.
      ), based on the marker data of the HD genotyped bulls. Because the aim of this study was to compare the 54 K and HD markers for genomic predictions, the imputation was based on the HD map, and those markers on the 54 K chip but not on the HD chip were excluded in the imputation process. To investigate the effect of inferring missing genotypes on genomic predictions, the missing genotypes in the 54 K data (due to applying different versions of the Illumina 54 K chip, and genotypes failing or being of poor quality) were also imputed using the Beagle package. All imputed genotypes were accepted. Thus, there were no missing genotypes in the imputed 54 K and HD data. The unimputed 54 K data and the imputed 54 K data were edited with criteria of minor allele frequency (MAF) 0.01 and locus average GenCall score 0.60. The imputed HD data were edited by deleting the markers that were in complete LD with the adjacent markers and the markers with MAF <0.01. To delete the markers in complete LD with the adjacent markers, LD between a marker and the next marker was inspected, starting from the first marker on each chromosome. If a marker (SNPi) and the next marker (SNPi+1) was in complete linkage, SNPi+1 was deleted, and then SNPi was compared with SNPi+2; otherwise SNPi+1 was compared with SNPi+2. After the procedure was complete, the LD (r2) of any pair of adjacent markers was <1.
      For each breed, 3 marker data sets were used to predict breeding values: (1) unimputed 54 K data, where missing marker genotypes (3.9% in Holstein and 4.4% in RDC) were replaced with population expectation calculated from allele frequencies at the corresponding locus; (2) imputed 54 K data, where missing genotypes in the 54 K data were imputed; and (3) imputed HD data. In RDC, markers on all 30 chromosomes were used. In Holstein, the X chromosome was excluded, because this chromosome was not exchanged as part of the EuroGenomics project. Because of small differences in allele frequencies between original and imputed 54 K data sets, the numbers of markers in the original and imputed 54 K data sets were not the same after deleting markers with minimal MAF <0.01 (Table 1).
      Table 1Number of SNP markers before editing (nraw) and after editing (ned), and average pair-wise linkage disequilibrium (LD) between adjacent markers.
      SNP panel
      Medium-density (∼54,000 markers; 54K) and high-density (∼777,000 markers; 777K) SNP panels.
      Breednraw
      Number of markers including X chromosome in Red Dairy Cattle, excluding X chromosome in Holstein. Because of small differences in allele frequencies between original and imputed (imp) 54K data sets, the numbers of markers in original and imputed 54K data sets were not the same after editing.
      ned
      Number of markers including X chromosome in Red Dairy Cattle, excluding X chromosome in Holstein. Because of small differences in allele frequencies between original and imputed (imp) 54K data sets, the numbers of markers in original and imputed 54K data sets were not the same after editing.
      LD
      Measured as r2, calculated based on markers in autosomes, using the SNP marker data before editing.
      54 KHolstein46,97343,413/43,922(imp)0.209
      Red Dairy Cattle49,65745,168/46,847(imp)0.180
      777 KHolstein648,219492,0570.557
      Red Dairy Cattle673,295528,5950.533
      1 Medium-density (∼54,000 markers; 54 K) and high-density (∼777,000 markers; 777 K) SNP panels.
      2 Number of markers including X chromosome in Red Dairy Cattle, excluding X chromosome in Holstein. Because of small differences in allele frequencies between original and imputed (imp) 54 K data sets, the numbers of markers in original and imputed 54 K data sets were not the same after editing.
      3 Measured as r2, calculated based on markers in autosomes, using the SNP marker data before editing.

      Statistical Model

      Direct genomic breeding values (DGV) were predicted using 2 models. One was a GBLUP model and the other was a Bayesian mixture model.

      GBLUP

      The GBLUP model (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ;
      • Hayes B.J.
      • Visscher P.M.
      • Goddard M.E.
      Increased accuracy of artificial selection by using the realized relationship matrix.
      ) is
      y=1μ+Zg+e,


      where y is the vector of DRP, μ is the overall mean, 1 is a vector of 1s, g is the vector of DGV, Z is the incidence matrix for g, and e is the vector of residuals.
      It was assumed that gN0,Gσg2 and eN0,Dσe2, where G is a genomic relationship matrix, σg2 is the genomic additive genetic variance, D is a diagonal matrix, and G=MM'2piqi where elements in column i of M are 0 − 2pi, 1 − 2pi, and 2 − 2pi for genotypes A1A1, A1A2, and A2A2, respectively, qi is the allele frequency of A1, and pi is the allele frequency of A2. In theory, base population allele frequencies should be used to construct a G matrix (
      • Gengler N.
      • Mayeres P.
      • Szydlowski M.
      A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian Blue cattle.
      ;
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ). However, many studies have shown that allele frequencies observed from current marker data perform as well as estimated base population allele frequencies with regard to accuracy of predicted genomic breeding value (
      • Aguilar I.
      • Misztal I.
      • Johnson D.L.
      • Legarra A.
      • Tsuruta S.
      • Lawlor T.J.
      Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
      ;
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      ). In this study, allele frequencies observed from the current marker data were used to construct the G matrix. When using the unimputed 54 K data, the missing marker genotype was replaced with the population expectation at the corresponding locus; that is, missing genotypes at locus j = 0(1 − pj)2 + 1[2pj (1 − pj)] + 2pj2 = 2pj), which was equivalent to using zero to replace the elements for missing genotypes in the M matrix (2pj − 2pj = 0). In other words, it was equivalent to assume that missing genotypes had null effect. Matrix D has a diagonal element dii=1rDRP2/rDRP2 to account for heterogeneous residual variances due to different reliabilities of DRP rDRP2. Variances σg2 and σe2 used for predictions were those estimated from reference data and the corresponding marker data.

      Bayesian Mixture

      The Bayesian mixture model (
      • Meuwissen T.H.E.
      Accuracy of breeding values of ‘unrelated’ individuals predicted by dense SNP genotyping.
      ) is
      y=1μ+Mq+e,


      where y is the vector of DRP, q is the vector of SNP genotype effects (qi), and M is as defined above. The model assumes that a small proportion (π) of SNP has large effects, and the remainder has small effects. This is achieved by assuming that the prior distribution of qi is either a normal distribution with a large variance σv12 or a normal distribution with small variance σv02; that is, qi1πN0,σv02+πN0,σv12.
      In the present study, π was set to be 0.05, 0.10, 0.20, or 0.50 when using the 54 K markers, and 0.005, 0.01, 0.02, or 0.05 when using the HD markers. These settings were chosen such that the expected number of markers to be in the distribution with large variance of the mixture is almost the same when using the 54 K markers and the HD markers. The Gibbs sampling algorithm was applied to the Bayesian mixture model. The Gibbs sampler was run as a single chain with a length of 50,000 samples. The first 20,000 samples were discarded as burn-in, and every 10th sample of the remaining 30,000 was saved to calculate posterior statistics. In general, the largest π led to slightly lower prediction accuracy than the other 3 priors in Holstein, and the smallest and the largest π yielded slightly lower prediction accuracy than the other 2 priors in RDC, regardless of 54 K or HD data. In the context, the presented results were those from the scenario of π = 0.20 when using the 54 K markers and of π = 0.02 when using HD markers, which were generally appropriate for the traits in the current study.

      Validation

      The error rate of imputation from the 54 K to the HD markers was assessed by a validation in which the HD genotyped bulls were divided into reference and test data. For RDC, the test data contained 150 bulls, and for Holstein, the test data consisted of 100 bulls. The bulls in the test data were randomly chosen from those HD genotyped bulls that did not have HD genotyped sons. In the test data, the HD markers not in the 54 K map were deleted, and then imputed. The error rate was calculated as the number of wrongly imputed alleles in proportion to the total number of imputed alleles.
      In the validation of genomic predictions, the whole data set in each breed was divided into reference (training) data and test data by the cut-off date (birth date of bulls) on October 1, 2001. The number of bulls in the reference and test data and the average reliability of DRP for each trait are shown in Table 2. The numbers of bulls were somewhat different among the traits. The main reason was that some bulls did not have EBV for one or more traits due to the restriction that the published EBV (from which DRP were derived) for protein should have a reliability of at least 0.60, and for fertility and udder health of at least 0.35.
      Table 2Heritability (h2) of the traits, number of bulls (n), and reliability of deregressed proofs rDRP2 in reference and test data sets.
      BreedTraith2ReferenceTest
      nrDRP2nrDRP2
      HolsteinProtein0.393,0030.9401,3950.924
      Fertility0.043,0370.6821,3780.607
      Udder health0.043,0050.8231,4610.749
      Red Dairy CattleProtein0.393,4210.9479240.917
      Fertility0.043,3770.7869410.671
      Udder health0.043,4210.9059790.797
      Genomic predictions using different marker data sets and different models were evaluated by comparing DGV and DRP for animals in the test data. Reliability of DGV was measured as squared correlation between DGV and DRP divided by the reliability of DRP (
      • Lund M.S.
      • de Ross S.P.
      • de Vries A.G.
      • Druet T.
      • Ducrocq V.
      • Fritz S.
      • Guillaume F.
      • Guldbrandtsen B.
      • Liu Z.
      • Reents R.
      • Schrooten C.
      • Seefried F.
      • Su G.
      A common reference population from four European Holstein populations increases reliability of genomic predictions.
      ;
      • Su G.
      • Madsen P.
      • Nielsen U.S.
      • Mäntysaari E.A.
      • Aamand G.P.
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction for Nordic Red Cattle using one-step and selection index blending.
      ). Unbiasedness of genomic prediction was assessed by regression of DRP on DGV. Given unbiased predictions, it is expected that the covariance
      CovDGV,DRP=CovDGV,DGV+ε+e=σDGV2,


      where ɛ is the prediction error of DGV and e is the residual of DRP; thus, the regression coefficient
      bDRP/DGV=CovDGV,DRP/σDGV2=1.


      Results

      LD Between Markers and Imputation Error Rate

      Based on the SNP marker data before editing, the ratio of the number of markers in the HD marker data to the number in the 54 K marker data was about 13.5:1 (Table 1). Correspondingly, average pair-wise distance between adjacent markers was about 4.5 kb in the HD data and 60 kb in the 54 K data. This indicates that the density of the HD is higher than the requirement (distance of marker pairs <10 kb) for persistent LD phase between Bos taurus breeds (
      • Gautier M.
      • Faraut T.
      • Moazami-Goudarzi K.
      • Navratil V.
      • Fogfio M.
      • Grohs C.
      • Boland A.
      • Garnier J.G.
      • Boichard D.
      • Lathrop G.M.
      • Gut I.G.
      • Eggen A.
      Genetic and haplotypic structure in 14 European and African cattle breeds.
      ;
      • de Roos A.P.W.
      • Hayes B.J.
      • Spelman R.J.
      • Goddard M.E.
      Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.
      ). Average pair-wise LD (r2) between adjacent markers in the HD marker data was 2.7 times as high as in 54 K data for Holstein and 3.0 times for RDC. Linkage disequilibrium was higher for Holstein compared with RDC, regardless of marker data sets. After marker data editing, the ratio of the number of markers in the HD marker data to the number in the 54 K marker data was decreased to 11.3:1, because many markers in complete LD with other markers in HD marker data were deleted.
      As shown in Table 3, the allele error rate of imputation from the 54 K to the HD markers was 0.77% for Holstein, and 0.96% for RDC. In addition, we observed variation in error rates among the 3 RDC populations: Danish Red had a higher error rate (1.75%) than Finnish Ayrshire (0.54%) and Swedish Red (0.59%), although the number of reference bulls was almost the same in each of the 3 RDC populations. The results indicated that imputation from the 54 K to the HD markers was quite accurate.
      Table 3Number of bulls in the imputation reference (nref) and test data (ntest) and allele error rate of imputation from 54 K (∼54,000 markers) to 777 K (∼777,000 markers) data.
      BreednrefntestError rate (%)
      Holstein4571000.77
      Red Dairy Cattle5561500.96

      Estimates of Additive Genetic Variances and SNP-Effect Variances

      Table 4 presents the estimated additive genetic variances using the GBLUP model and SNP-effect variances from the Bayesian mixture model. These variances were estimated based on the DRP derived from the EBV for which a Nordic standardization procedure (http://www.landbrugsinfo.dk/Kvaeg/Avl/Sider/principles.pdf) was applied. Therefore, the scales of these variances were different from the original scales of the traits. The additive genetic variances estimated using 54 K and 777 K marker data were similar in both breeds.
      Table 4Estimates of additive genetic variances σg2 from the genomic BLUP (GBLUP) model and SNP variances σv02andσv12; ×10,000 from the Bayesian mixture model
      54K=∼54,000 markers; 777K=∼777,000 markers; imp=imputed.
      BreedTraitGBLUPBayesian mixture
      54 K54 Kimp777 K54 K54 Kimp777 K
      σg2σg2σg2σv02σv12σv02σv12σv02σv12
      HolsteinProtein129.9129.4131.02.634140.41.977138.00.159123.7
      Fertility142.2138.8140.85.768143.83.607143.50.252129.2
      Udder93.293.293.41.505103.20.962102.80.10988.4
      Red Dairy CattleProtein99.795.997.93.60094.53.18185.80.14981.8
      Fertility132.8131.3132.04.383127.93.808119.60.216110.3
      Udder105.2104.2106.83.65899.62.62596.50.14990.3
      1 54 K = ∼54,000 markers; 777 K = ∼777,000 markers; imp = imputed.
      The SNP-effect variances σv02 and σv12 were dependent on the number of markers (m); the larger the number of markers, the smaller the variance. It was observed that the posterior proportions of SNP in the 2 distributions were similar to the priors. According to the estimated variances in Table 4 and the corresponding prior π = 0.20, the value of mπσv12+1πσv02 was similar to the additive genetic variance estimated from the GBLUP model. Among the traits, 89 to 97% of additive genetic variance was accounted for by 20% of the markers in the 54 K data or by 2% of the markers in the 777 K data.

      Genomic Prediction in Nordic Holstein

      Reliabilities of genomic predictions for Holstein based on the 54 K and HD markers using the 2 alternative models are shown in Table 5. The use of HD markers led to a small increase in reliability of DGV for protein and fertility, but not for udder health. On average, reliability of DGV based on the HD markers was 0.5% higher than that based on the 54 K markers. We observed that the Bayesian mixture model was superior to the GBLUP model, regardless of which marker data set was used. On average, the increase of reliability using the Bayesian mixture model was 0.5%. On the other hand, imputation of missing genotypes in the 54 K data did not yield any improvement of reliability of DGV.
      Table 5Reliability of direct genomic values using genomic BLUP (GBLUP) and Bayesian mixture based on 54 K (∼54,000 markers) and 777 K (∼777,000 markers) data, for Holstein bulls in test data
      Imp=imputed; π=proportion of SNP having large effects.
      TraitGBLUPBayesian mixture
      54 K54 Kimp777 K54 K (π = 0.2)54 Kimp (π = 0.2)777 K (π = 0.02)
      Protein0.4250.4260.4290.4350.4340.440
      Fertility0.4040.4030.4130.4060.4060.416
      Udder health0.3700.3720.3700.3750.3760.376
      Average0.4000.4000.4040.4050.4050.410
      1 Imp = imputed; π = proportion of SNP having large effects.
      A necessary condition for unbiased genomic prediction is that the regression coefficient of DRP on genomic prediction is 1. As shown in Table 6, using HD markers led to less biased DGV for protein and fertility but not for udder health. Compared with the GBLUP model, the Bayesian model did not reduce bias of DGV. Imputing missing genotypes in the 54 K data slightly increased bias compared with the unimputed 54 K data.
      Table 6Regression of deregressed proofs on direct genomic values using genomic BLUP (GBLUP) and Bayesian mixture based on 54 K (∼54,000 markers) and 777 K (∼777,000 markers) data, for Holstein bulls in test data
      Imp=imputed; π=proportion of SNP having large effects.
      TraitGBLUPBayesian mixture
      54 K54 Kimp777 K54 K (π = 0.2)54 Kimp (π = 0.2)777 K (π = 0.02)
      Protein0.8530.8470.8630.8550.8450.862
      Fertility0.9720.9630.9940.9680.9580.996
      Udder health0.9520.9330.9460.9480.9270.946
      Average0.9260.9140.9340.9240.9100.935
      1 Imp = imputed; π = proportion of SNP having large effects.

      Genomic Prediction in Nordic RDC

      The influences of models and marker data sets on reliability of DGV in RDC (Table 7) were somewhat different from those in Holstein. Imputing missing genotypes in the 54 K data improved reliability of DGV for protein, but not for the other 2 traits. The Bayesian mixture model gave very similar reliability as GBLUP, based on the 54 K markers, and was slightly better than GBLUP based on the HD markers. Applying the GBLUP model, reliability of DGV using the HD markers was on average 1.0% higher than using the unimputed 54 K markers, and 0.7% higher than using the imputed 54 K markers. When applying the Bayesian mixture model, the increase in reliability using the HD markers was 1.20 and 0.80%, respectively, compared with the unimputed 54 K and the imputed 54 K markers.
      Table 7Reliability of direct genomic values using genomic BLUP (GBLUP) and Bayesian mixture based on 54 K (∼54,000 markers) and 777 K (∼777,000 markers) marker data, for Red Dairy Cattle bulls in test data
      Imp=imputed; π=proportion of SNP having large effects.
      TraitGBLUPBayesian mixture
      54 K54 Kimp777 K54 K (π = 0.2)54 Kimp (π = 0.2)777 K (π = 0.02)
      Protein0.3460.3580.3580.3460.3570.359
      Fertility0.2970.2930.3040.2990.2960.307
      Udder health0.2440.2460.2570.2430.2480.259
      Average0.2960.2990.3060.2960.3000.308
      1 Imp = imputed; π = proportion of SNP having large effects.
      The regression coefficients of DRP on DGV (Table 8) were closer to 1 when DGV were predicted based on the HD markers, indicating a reduction of bias using HD markers. As in Holstein, in RDC the Bayesian mixture model did not reduce bias of DGV, regardless of the marker data set used. In contrast to Holstein, imputing missing genotypes in the 54 K data reduced bias of DGV, mainly for protein.
      Table 8Regression of deregressed proofs on direct genomic values using genomic BLUP (GBLUP) and Bayesian mixture based on 54 K (∼54,000 markers) and 777 K (∼777,000 markers) marker data, for Red Dairy Cattle bulls in test data
      Imp=imputed; π=proportion of SNP having large effects.
      TraitGBLUPBayesian mixture
      54 K54 Kimp777 K54 K (π = 0.2)54 Kimp (π = 0.2)777 K (π = 0.02)
      Protein0.8490.8750.8770.8350.8640.877
      Fertility0.9340.9390.9800.9330.9400.980
      Udder health0.8510.8540.8720.8390.8460.870
      Average0.8780.8890.9100.8690.8830.909
      1 Imp = imputed; π = proportion of SNP having large effects.

      Discussion

      This study investigated the advantage of using HD markers for genomic prediction. Based on the present data and models, when going from 54 K to HD markers the increase in reliability of DGV was, on average, 0.5% for Holstein and 1.0% for RDC. In addition, genomic predictions were less biased when based on HD markers. The results are consistent with simulation studies assuming a large number of genes affecting the trait. The study by
      • VanRaden P.M.
      • O’Connell J.R.
      • Wiggans G.R.
      • Weigel K.A.
      Genomic evaluations with many more genotypes.
      reported that increasing the number of markers from 54,000 to 500,000 yielded a gain of 1.6% in their simulation study, and the gains were 0.9 and 1.2% using 2 sets of imputed HD marker.
      • Harris B.L.
      • Johnson D.L.
      The impact of high density SNP chips on genomic evaluation in dairy cattle.
      showed very little gain when the number of markers was increased from 20,000 to 1,000,000 in a simulation study.
      The Nordic RDC in this study included the Finnish Ayrshire, Swedish Red, and Danish Red populations. The gain in reliability of genomic prediction using the HD markers was larger in RDC than in Holstein. This supports the hypothesis that HD markers give more benefit for genomic prediction across populations than within populations (
      • Toosi A.
      • Fernando R.L.
      • Dekkers J.C.M.
      Genomic selection in admixed and crossbred populations.
      ). Previous studies on LD and persistence of LD phase (
      • Gautier M.
      • Faraut T.
      • Moazami-Goudarzi K.
      • Navratil V.
      • Fogfio M.
      • Grohs C.
      • Boland A.
      • Garnier J.G.
      • Boichard D.
      • Lathrop G.M.
      • Gut I.G.
      • Eggen A.
      Genetic and haplotypic structure in 14 European and African cattle breeds.
      ;
      • de Roos A.P.W.
      • Hayes B.J.
      • Spelman R.J.
      • Goddard M.E.
      Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.
      ;
      • Villa-Angulo R.
      • Matukumalli L.K.
      • Gill C.A.
      • Choi J.
      • Van Tassell C.P.
      • Grefenstette J.J.
      High-resolution haplotype block structure in the cattle genome.
      ) suggested that genomic selection across populations and breeds requires a higher density of markers than genomic selection within population and breed. With increasing marker density from 54 K to 777 K, the relative increase of LD (calculated as LD777K/LD54K) was larger for RDC than for Holstein (Table 1). This may explain why RDC obtained a relatively larger gain from HD markers than Holstein.
      The number of markers in the HD data set after editing was 11 times the number in the 54 K data set. Average pair-wise LD between adjacent markers in HD data set was 3 times as high as in the 54 K data set for RDC and 2.7 times for Holstein. Assuming that the same pattern applies to LD between markers and QTL, this suggests much stronger LD between HD markers and genes affecting the trait of interest. Therefore, it was expected that the HD markers would lead to much better genomic predictions. However, the current study shows that the gain from the increased density of the HD markers was small. Several possible reasons exist for this. First, the advantage of increasing LD by HD markers might be counteracted by increasing the number of unknown parameters to be estimated. In the present study, to reduce the number of unknown parameters, the markers in complete LD with the other markers in the data were considered as noninformative markers and thus were deleted. It may be necessary to further reduce the number of markers by deleting the markers that are nearly noninformative. Second, the models used in this study may not be optimal. The results from the current study show that the Bayesian mixture model with 2 normal distributions had a small advantage over the GBLUP model based on the Holstein data. More sophisticated variable selection methods and models would be beneficial for exploiting the potential advantage of HD markers for genomic prediction; for example, mixture models with more than 2 distributions, models using preselected and well-constructed haplotypes or SNP blocks, and models with appropriate weights for different haplotypes or SNP blocks. Third, the HD marker genotypes were, for most of the bulls, not real marker genotypes using HD chips, but imputed ones. Previous studies on imputation from 3,000 to 54,000 marker data have reported that a small imputation allele error rate leads to a substantial loss of prediction reliability, even when only validation animals are imputed and reference animals have real 54 K genotypes. Averaged over the results from French, Nordic, and German validations (
      • Chen J.
      • Liu Z.
      • Reinhardt F.
      • Reents R.
      Reliability of genomic prediction using imputed genotypes for German Holsteins: Illumina 3 K to 54 K bovine chip.
      ;
      • Dassonneville R.
      • Brondum R.F.
      • Druet T.
      • Fritz S.
      • Guillaume F.
      • Guldbrandtsen B.
      • Lund M.S.
      • Ducrocq V.
      • Su G.
      Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations.
      ), each 1% of imputation allele error rate resulted in a loss of reliability of 1.3 percentage points. It should be also noted that this study analyzed only 3 traits. The benefits from HD markers may be larger for some traits, such as those traits affected by fewer genes.
      Although sizes of reference populations in RDC and Holstein were similar, RDC had lower reliabilities of DGV than Holstein. Average pair-wise LD between adjacent markers was higher in Holstein than in RDC. This indicates that the genetic similarity between individuals in the Holstein population is higher than that in the RDC population, and consequently leads to a higher reliability of genomic predictions in the Holstein population. A previous study (
      • Goddard M.
      Genomic selection: Prediction of accuracy and maximisation of long term response.
      ) has shown that reliability of genomic prediction depends on the effective population size. Further study is needed on the effective population sizes of current Nordic Holstein and RDC populations.
      Several previous studies based on 54 K marker data have reported that linear mixed models assuming that effects of all SNP are normally distributed with equal variances perform as well as variable selection models for most traits in dairy cattle (
      • Hayes B.J.
      • Bowman P.J.
      • Chamberlain A.J.
      • Goddard M.E.
      Invited review: Genomic selection in dairy cattle: Progress and challenges.
      ;
      • VanRaden P.M.
      • Van Tassell C.P.
      • Wiggans G.R.
      • Sonstegard T.S.
      • Schnabel R.D.
      • Taylor J.F.
      • Schenkel F.S.
      Invited review: Reliability of genomic predictions for North American Holstein bulls.
      ). However, for traits having known major genes such as fat percentage, variable selection models are superior over linear mixed models (
      • Cole J.B.
      • VanRaden P.M.
      • O’Connell J.R.
      • Van Tassell C.P.
      • Sonstegard T.S.
      • Schnabel R.D.
      • Taylor J.F.
      • Wiggans G.R.
      Distribution and location of genetic effects for dairy traits.
      ;
      • Legarra A.
      • Robert-Granie C.
      • Croiseau P.
      • Guillaume F.
      • Fritz S.
      Improved Lasso for genomic selection.
      ). In the present study, the Bayesian mixture model yielded 0.5% higher reliability than the GBLUP in Holstein, but the advantage of the mixture model was not observed in RDC, regardless of the marker data used. This contradicts the expectation that a variable selection model would have a greater advantage over a GBLUP model when using HD marker data than when using 54 K marker data. At least 3 possible reasons could explain this. First, the mixture model with 2 distributions may not be an optimal model to describe actual distribution of SNP effects. Second, the mixture model may be more sensitive to imputation errors than the GBLUP model. Third, the data information may not be sufficient to efficiently distinguish the SNP with large effects from those with small effects.
      Using the GBLUP model, the number of the mixed models equations is not determined by the number of markers, but by the number of individuals. Therefore, the computational demand is almost the same when using the 54 K or HD data. Using the Bayesian mixture model, the number of equations is determined by the number of markers. Consequently, the computing time increases with increasing the number of markers. For the analysis of Holstein data in our computing system (Intel Xeon 2.93 GHz processor), given the inverted G matrix, the GBLUP model took less than 10 min per trait. It took about 6 min to build the G matrix and calculate the inverted G matrix based on the 54 K marker data, and about 50 min based on the HD data. The Bayesian mixture model with Gibbs sampling approach (total 50,000 samples) took about 10 h when using the 54 K data, and about 120 h when using the HD data.
      Imputation of missing genotypes in 54 K marker data is expected to improve genomic predictions. However, the imputation procedure used in this study to infer missing genotypes in the 54 K data did not improve genomic predictions, except for protein in RDC. In the analysis based on the 54 K data with missing genotypes, the missing individual genotypes were replaced with population expectations. Thus, individuals with missing genotypes of a particular marker did not contribute to the estimated effect of this marker, and the DGV of the individual did not include the effect of this marker. Replacing missing genotypes with population expectations was a simple imputation. In the current data, there were only about 4% missing genotypes in the 54 K marker data. With the small proportion of missing genotypes, superiority of a good imputation procedure over a simple imputation procedure could be less important. This might partly explain why inferring missing individual marker genotypes in the 54 K data using a sophisticated imputation procedure did not lead to a clear improvement of genomic prediction, compared with replacing missing genotypes with population expectations.
      In conclusion, HD marker data have the potential to increase reliability of genomic predictions. However, the gain of genomic predictions using HD markers is small, based on current data and models. Further studies are needed to exploit the potential advantage of HD markers in genomic predictions.

      Acknowledgments

      We thank the Danish Cattle Federation (Aarhus, Denmark), Faba Co-op (Helsinki, Finland), Swedish Dairy Association (Stockholm, Sweden), and Nordic Cattle Genetic Evaluation (Aarhus, Denmark) for providing data. This work was performed in the project “Genomic Selection—From function to efficient utilization in cattle breeding (grant no. 3405-10-0137),” funded under Green Development and Demonstration Programme by the Danish Directorate for Food, Fisheries and Agri Business (Copenhagen, Denmark), the Milk Levy Fund (Aarhus, Denmark), VikingGenetics (Randers, Denmark), Nordic Cattle Genetic Evaluation (Aarhus, Denmark), and Aarhus University (Aarhus, Denmark).

      References

        • Aguilar I.
        • Misztal I.
        • Johnson D.L.
        • Legarra A.
        • Tsuruta S.
        • Lawlor T.J.
        Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
        J. Dairy Sci. 2010; 93: 743-752
        • Browning B.L.
        • Browning S.R.
        A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.
        Am. J. Hum. Genet. 2009; 84: 210-223
        • Chen J.
        • Liu Z.
        • Reinhardt F.
        • Reents R.
        Reliability of genomic prediction using imputed genotypes for German Holsteins: Illumina 3 K to 54 K bovine chip.
        in: The 2011 Interbull Open Meeting, Stavanger, Norway, Interbull, Uppsala, Sweden2011
        • Cole J.B.
        • VanRaden P.M.
        • O’Connell J.R.
        • Van Tassell C.P.
        • Sonstegard T.S.
        • Schnabel R.D.
        • Taylor J.F.
        • Wiggans G.R.
        Distribution and location of genetic effects for dairy traits.
        J. Dairy Sci. 2009; 92: 2931-2946
        • Dassonneville R.
        • Brondum R.F.
        • Druet T.
        • Fritz S.
        • Guillaume F.
        • Guldbrandtsen B.
        • Lund M.S.
        • Ducrocq V.
        • Su G.
        Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations.
        J. Dairy Sci. 2011; 94: 3679-3686
        • de Roos A.P.W.
        • Hayes B.J.
        • Goddard M.E.
        Reliability of genomic predictions across multiple populations.
        Genetics. 2009; 183: 1545-1553
        • de Roos A.P.W.
        • Hayes B.J.
        • Spelman R.J.
        • Goddard M.E.
        Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.
        Genetics. 2008; 179: 1503-1512
        • Forni S.
        • Aguilar I.
        • Misztal I.
        Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
        Genet. Sel. Evol. 2011; 43: 1
        • Gautier M.
        • Faraut T.
        • Moazami-Goudarzi K.
        • Navratil V.
        • Fogfio M.
        • Grohs C.
        • Boland A.
        • Garnier J.G.
        • Boichard D.
        • Lathrop G.M.
        • Gut I.G.
        • Eggen A.
        Genetic and haplotypic structure in 14 European and African cattle breeds.
        Genetics. 2007; 177: 1059-1070
        • Gengler N.
        • Mayeres P.
        • Szydlowski M.
        A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian Blue cattle.
        Animal. 2007; 1: 21-28
        • Goddard M.
        Genomic selection: Prediction of accuracy and maximisation of long term response.
        Genetica. 2009; 136: 245-257
        • Habier D.
        • Fernando R.L.
        • Dekkers J.C.M.
        Genomic selection using low-density marker panels.
        Genetics. 2009; 182: 343-353
        • Harris B.L.
        • Johnson D.L.
        Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
        J. Dairy Sci. 2010; 93: 1243-1252
        • Harris B.L.
        • Johnson D.L.
        The impact of high density SNP chips on genomic evaluation in dairy cattle.
        Interbull Bull. 2010; 42: 40-43
        • Hayes B.J.
        • Bowman P.J.
        • Chamberlain A.J.
        • Goddard M.E.
        Invited review: Genomic selection in dairy cattle: Progress and challenges.
        J. Dairy Sci. 2009; 92: 433-443
        • Hayes B.J.
        • Visscher P.M.
        • Goddard M.E.
        Increased accuracy of artificial selection by using the realized relationship matrix.
        Genet. Res. (Camb.). 2009; 91: 47-60
        • Legarra A.
        • Robert-Granie C.
        • Croiseau P.
        • Guillaume F.
        • Fritz S.
        Improved Lasso for genomic selection.
        Genet. Res. (Camb.). 2011; 93: 77-87
        • Liu Z.T.
        • Seefried F.R.
        • Reinhardt F.
        • Rensing S.
        • Thaller G.
        • Reents R.
        Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction.
        Genet. Sel. Evol. 2011; 43: 19
        • Lund M.S.
        • de Ross S.P.
        • de Vries A.G.
        • Druet T.
        • Ducrocq V.
        • Fritz S.
        • Guillaume F.
        • Guldbrandtsen B.
        • Liu Z.
        • Reents R.
        • Schrooten C.
        • Seefried F.
        • Su G.
        A common reference population from four European Holstein populations increases reliability of genomic predictions.
        Genet. Sel. Evol. 2011; 43: 43
        • Matukumalli L.K.
        • Lawley C.T.
        • Schnabel R.D.
        • Taylor J.F.
        • Allan M.F.
        • Heaton M.P.
        • O’Connell J.
        • Moore S.S.
        • Smith T.P.L.
        • Sonstegard T.S.
        • Van Tassell C.P.
        Development and characterization of a high density SNP genotyping assay for cattle.
        PLoS ONE. 2009; 4: e5350
        • Matukumalli L.K.
        • Schroeder S.
        • DeNise S.K.
        • Sonstegard T.
        • Lawley C.T.
        • Georges N.
        • Coppieters W.
        • Gietzen K.
        • Medrano J.F.
        • Rincon G.
        • Lince D.
        • Eggen A.
        • Glaser L.
        • Cam G.
        • Van Tassel C.
        Analyzing LD blocks and CNV segments in cattle: Novel genomic features identified using the BovineHD BeadChip. Pub. No. 370-2011-002.
        Illumina Inc., San Diego, CA2011
        • Meuwissen T.
        • Goddard M.
        Accurate prediction of genetic values for complex traits by whole-genome resequencing.
        Genetics. 2010; 185: 623-631
        • Meuwissen T.H.E.
        Accuracy of breeding values of ‘unrelated’ individuals predicted by dense SNP genotyping.
        Genet. Sel. Evol. 2009; 41: 35
        • Solberg T.R.
        • Sonesson A.K.
        • Woolliams J.A.
        • Meuwissen T.H.E.
        Genomic selection using different marker types and densities.
        J. Anim. Sci. 2008; 86: 2447-2454
        • Su G.
        • Guldbrandtsen B.
        • Gregersen V.R.
        • Lund M.S.
        Preliminary investigation on reliability of genomic estimated breeding values in the Danish Holstein population.
        J. Dairy Sci. 2010; 93: 1175-1183
        • Su G.
        • Madsen P.
        • Nielsen U.S.
        • Mäntysaari E.A.
        • Aamand G.P.
        • Christensen O.F.
        • Lund M.S.
        Genomic prediction for Nordic Red Cattle using one-step and selection index blending.
        J. Dairy Sci. 2012; 95: 909-917
        • Toosi A.
        • Fernando R.L.
        • Dekkers J.C.M.
        Genomic selection in admixed and crossbred populations.
        J. Anim. Sci. 2010; 88: 32-46
        • VanRaden P.M.
        Efficient methods to compute genomic predictions.
        J. Dairy Sci. 2008; 91: 4414-4423
        • VanRaden P.M.
        • O’Connell J.R.
        • Wiggans G.R.
        • Weigel K.A.
        Genomic evaluations with many more genotypes.
        Genet. Sel. Evol. 2011; 43: 10
        • VanRaden P.M.
        • Sullivan P.G.
        International genomic evaluation methods for dairy cattle.
        Genet. Sel. Evol. 2010; 42: 7
        • VanRaden P.M.
        • Van Tassell C.P.
        • Wiggans G.R.
        • Sonstegard T.S.
        • Schnabel R.D.
        • Taylor J.F.
        • Schenkel F.S.
        Invited review: Reliability of genomic predictions for North American Holstein bulls.
        J. Dairy Sci. 2009; 92: 16-24
        • Villa-Angulo R.
        • Matukumalli L.K.
        • Gill C.A.
        • Choi J.
        • Van Tassell C.P.
        • Grefenstette J.J.
        High-resolution haplotype block structure in the cattle genome.
        BMC Genet. 2009; 10: 19
        • Weigel K.A.
        • de los Campos G.
        • Gonzalez-Recio O.
        • Naya H.
        • Wu X.L.
        • Long N.
        • Rosa G.J.M.
        • Gianola D.
        Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers.
        J. Dairy Sci. 2009; 92: 5248-5257