Advertisement
Research Article| Volume 96, ISSUE 8, P5364-5375, August 2013

The estimation of genomic relationships using breedwise allele frequencies among animals in multibreed populations

Open ArchivePublished:June 17, 2013DOI:https://doi.org/10.3168/jds.2012-6523

      Abstract

      Different approaches of calculating genomic measures of relationship were explored and compared with pedigree relationships (A) within and across base breeds in a crossbreed population, using genotypes for 38,194 loci of 4,106 Nordic Red dairy cattle. Four genomic relationship matrices (G) were calculated using either observed allele frequencies (AF) across breeds or within-breed AF. The G matrices were compared separately when the AF were estimated in the observed and in the base population. Breedwise AF in the current and base population were estimated using linear regression models of individual genotypes on breed composition. Different G matrices were further used to predict direct estimated genomic values using a genomic BLUP model. Higher variability existed in the diagonal elements of G across breeds (standard deviation = 0.06, on average) compared with A (0.01). The use of simple observed AF across base breeds to compute G increased coefficients for individuals in distantly related populations. Estimated breedwise AF reduced differences in coefficients similarly within and across populations. The variability of the current adjusted G matrix decreased from 0.055 to 0.035 when breedwise AF were estimated from the base breed population. The direct estimated genomic values and their validation reliabilities were, however, unaffected by AF used to compute G when estimated with a genomic BLUP model, due to inclusion of breed means in the model. In multibreed populations, G adjusted with breedwise AF from the founder population may provide more consistency among relationship coefficients between genotyped and ungenotyped individuals in an across-breed single-step evaluation.

      Key words

      Introduction

      The use of marker genotypes to estimate relationships among individuals in a population has become increasingly important in many fields of genetics. In livestock breeding, knowledge of relationships is used routinely to estimate genetic variation and animal breeding values (EBV;
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ;
      • Hayes B.J.
      • Visscher P.M.
      • Goddard M.E.
      Increased accuracy of artificial selection by using the realized relationship matrix.
      ;
      • Su G.
      • Madsen P.
      • Nielsen U.S.
      • Mäntysaari E.A.
      • Aamand G.P.
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction for Nordic Red cattle using one-step and selection index blending.
      ), monitor inbreeding (
      • Fernández J.
      • Meuwissen T.H.E.
      • Toro M.A.
      • Mäki-Tanila A.
      Management of genetic diversity in small farm animal populations.
      ;
      • Toro M.A.
      • Meuwissen T.H.E.
      • Fernández J.
      • Shaat I.
      • Mäki-Tanila A.
      Assessing the genetic diversity in small farm animal populations.
      ), and for conservation of animal genetic resources (
      • Eding H.
      • Meuwissen T.H.E.
      Marker-based estimates of between and within population kinships for the conservation of genetic diversity.
      ). Traditionally, relationship coefficients are calculated from the pedigree data. Pedigree relationships are obtained as 2 times the expected average identity by descent (IBD) sharing between 2 relatives (
      • Malécot G.
      Les Mathématiques de L’hérédité.
      ) and have been applied successfully within the framework of mixed-model equations for best linear unbiased prediction of EBV. Presently, with the increasing availability of genetic markers covering the whole genome, pedigree-based relationships can be replaced or combined with realized relationships calculated from marker data in the prediction of genomic breeding values (genomic EBV;
      • Habier D.
      • Fernando R.L.
      • Dekkers J.C.M.
      The impact of genetic relationship information on genome-assisted breeding values.
      ;
      • Hayes B.J.
      • Visscher P.M.
      • Goddard M.E.
      Increased accuracy of artificial selection by using the realized relationship matrix.
      )
      Realized relationships derived from molecular markers are based on the actual IBD sharing or identity by state for genomic regions and, therefore, have more variation between closely related animals than pedigree relationships (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ;
      • Hayes B.J.
      • Visscher P.M.
      • Goddard M.E.
      Increased accuracy of artificial selection by using the realized relationship matrix.
      ). Moreover, realized relationships capture unrecorded pedigrees. Several different methods of calculating genomic relationship matrices (G) have been developed, for genotyped animals only (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ;
      • Yang J.
      • Benyamin B.
      • McEvoy B.P.
      • Gordon S.
      • Henders A.K.
      • Nyholt D.R.
      • Madden P.A.
      • Heath A.C.
      • Martin N.G.
      • Montgomery G.W.
      • Goddard M.E.
      • Visscher P.M.
      Common SNPs explain a large proportion of the heritability for human height.
      ) and when genotyped and ungenotyped individuals are combined (
      • Legarra A.
      • Aguilar I.
      • Misztal I.
      A relationship matrix including full pedigree and genomic information.
      ;
      • Misztal I.
      • Legarra A.
      • Aguilar I.
      Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.
      ;
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction when some animals are not genotyped.
      ). In the latter approach, an arbitrary weight on pedigree relationships (A) is often used to measure the amount of variation not explained by markers. Although variability exists in the accuracy of predictions among the above methods (
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      ), generally these accuracies are at least twice of those estimated from pedigree data only (see, for example,
      • Su G.
      • Madsen P.
      • Nielsen U.S.
      • Mäntysaari E.A.
      • Aamand G.P.
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction for Nordic Red cattle using one-step and selection index blending.
      ).
      Genomic relationship matrix G can be constructed by using a matrix having genotype information for each individual and marker (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ). Each genotype is a deviation from the marker-specific population mean, which is calculated using population allele frequencies. The estimation of G has been shown to be closer to A when inferences use allele frequencies (AF) in the distant ancestral population (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ;
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      ) instead of AF in the currently genotyped population. This is because the expected G coefficients would be expressed relative to the same base population as A. The limitation is that base population AF are generally not available with field data and their estimation can be challenging.
      • Gengler N.
      • Mayeres P.
      • Szydlowski M.
      A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
      showed an efficient way of calculating gene content and base population AF within a breed. However, in practice, currently genotyped populations are assumed to be the base population. This results from centering the genotype matrix used to build G with current-data AF so that the average genomic relationship between animals within the current population become 0 and scaling G such that the additive genetic variance would be comparable to that obtained through conventional methods (
      • Powell J.E.
      • Visscher P.M.
      • Goddard M.E.
      Reconciling the analysis of IBD and IBS in complex trait studies.
      ;
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      ).
      The use of observed AF within a breed may not have major practical implications in genomic BLUP (GBLUP) models. In the context of structured populations, the effect of using across-breed AF to make G may have consequences for the estimation of relationships, mainly attributable to varying source of AF between breeds.
      • Eding H.
      • Meuwissen T.H.E.
      Marker-based estimates of between and within population kinships for the conservation of genetic diversity.
      demonstrated that average relatedness between 2 populations could be expressed in terms of population-specific AF.
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      used the average of 3 breedwise AF for estimation in the combined 3-breed population. Although these approaches would be beneficial for multiple populations with distinctive subpopulations, a need still exists for approaches in populations that constitute mainly crossbred animals. The Nordic Red dairy cattle (RDC) comprise 3 subpopulations by country of birth [i.e., Denmark (DNK), Sweden (SWE), and Finland (FIN)]. Over the years of crossbreeding, the majority of animals (~98%) in the Nordic RDC are composites of base breeds. The absence of pure base-breed animals remains a major challenge for the estimation of breedwise AF. The objective of this study was to investigate whether the use of estimated breedwise AF in the calculation of genomic relationships would provide a more accurate estimate of G than using AF across breeds, and to determine the effect on G when AF are estimated in the base population versus the currently genotyped population.

      Materials and Methods

      Data

      This study was carried out in a structured population with 60 pure base breed bulls and 4,046 bulls of combinations of base breeds in the Nordic RDC. Genotypes for all 4,106 bulls were attained using the Illumina Bovine SNP50 BeadChip (Illumina Inc., San Diego, CA). For quality purposes, markers from the X chromosome, without map position in the UMD3.0 genome assembly (
      • Zimin A.V.
      • Delcher A.L.
      • Florea L.
      • Kelley D.R.
      • Schatz M.C.
      • Puiu D.
      • Hanrahan F.
      • Pertea G.
      • van Tassell C.P.
      • Sonstegard T.S.
      • Marçais G.
      • Roberts M.
      • Subramanian P.
      • Yorke J.A.
      • Salzberg S.L.
      A whole-genome assembly of the domestic cow, Bos taurus.
      ) and with minor allele frequency (MAF) <5% were discarded. In addition, animal genotypes with a GenCall score (
      Illumina Inc
      Illumina GenCall Data Analysis Software-Gen-Call software algorithms for clustering, calling, and scoring genotypes.
      ) <60% and marker loci with call rates <5% in a large reference sample from the same genotyping laboratory, consisting of Danish Holstein bulls, were discarded. Finally, missing genotypes were imputed using fastPHASE software chromosome by chromosome (
      • Scheet P.
      • Stephens M.
      A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase.
      ). Due to unavailability of pure base breeds, informative SNP above were selected based on across-breed AF. After quality control, a total of 38,194 SNP markers were available for analyses. The entire RDC pedigree, containing over 4 million records, was used to calculate breed proportions (BP) for individual bulls (
      • Lidauer M.
      • Mäntysaari E.A.
      • Strandén I.
      • Pösö J.
      • Pedersen J.
      • Nielsen U.S.
      • Johansson K.
      • Eriksson J.-Å.
      • Madsen P.
      • Aamand G.P.
      Random heterosis and recombination loss effects in a multibreed evaluation for Nordic Red dairy cattle.
      ). A breed was defined only if the average BP in the data was greater than 10%. Breeds used in this study were the Swedish Red (SRB), Finnish Ayrshire (FAY), and Norwegian Red (NRF), and the remaining breeds with BP less than 10% were combined into the breed “other.” A more detailed description about the population structure, breeds contained, and definitions of the final 4 breeds and their trends is provided by
      • Makgahlela M.L.
      • Mäntysaari E.A.
      • Strandén I.
      • Koivula M.
      • Nielsen U.S.
      • Sillanpää M.J.
      • Juga J.
      Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
      . The pedigree for genotyped bulls contained 22,300 animals.
      Phenotypes were individual daughter deviations (IDD) of 1,995,606 RDC cows for milk, protein, and fat yields, obtained from March 2010 official evaluations of the Nordic cattle genetic evaluations. By definition, the IDD are cow performances adjusted for fixed effects, nongenetic random effects, and genetic effects of the cow's dam (
      • Mrode R.A.
      • Swanson G.J.T.
      Calculating cow and daughter yield deviations and partitioning of genetic evaluations under a random regression model.
      ). Here, however, IDD were computed by using animal model deregression from 305-d combined EBV (
      • Mäntysaari E.A.
      • Koivula M.
      • Strandén I.
      • Pösö J.
      • Aamand G.P.
      Estimation of GEBVs using deregressed individual cow breeding values.
      ). For validation of predictions using different G matrices, the data were split into sets of 3,300 training bulls born between 1980 and 1999, and 806 validation bulls born between 1998 and 2005. The training data had older bulls, which were evaluated for the first time during the 2005 Nordic cattle routine evaluation.

      Estimation of Pedigree and Genomic Relationships

      Pedigree relationships of genotyped bulls were estimated using the RelaX2 computer program (
      • Strandén I.
      • Vuori K.
      RelaX2: Pedigree analysis programme.
      ). Genomic relationships were computed following methods 1 and 2 as demonstrated by
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      and modifications of the 2 methods to adapt to the admixed structure of the current population.
      Let there be n animals that have been genotyped for m markers. Let uij be genotype j of animal i, where genotype uij is the number of copies for the second allele (i.e., uij has value 0, 1, or 2). Following
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      method 1 and using observed AF across breeds, the original genomic relationship matrix G (denoted Gorg) was computed as Gorg = ZZ′/k, where Z is an n by m matrix of centered genotypes. Here, the element of animal i for marker j in Z is uij − 2pj, where pj is the frequency of the second allele at SNP marker j and k=2jpj1pj. In method 2, also shown by
      • Yang J.
      • Benyamin B.
      • McEvoy B.P.
      • Gordon S.
      • Henders A.K.
      • Nyholt D.R.
      • Madden P.A.
      • Heath A.C.
      • Martin N.G.
      • Montgomery G.W.
      • Goddard M.E.
      • Visscher P.M.
      Common SNPs explain a large proportion of the heritability for human height.
      , standardized genotypes were used to calculate the G matrix as Gorg2 = Z*Z*′/m, where each column j in Z* is Zj*=Zj/2pj1pj and Zj is column j in Z.
      Two adjusted genomic matrices were computed using breedwise AF: Gadj and Gadj2. These matrices were obtained by modifying methods 1 and 2 by
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      . Matrix Gadj was calculated as Gadj = MM′/k, where, with the same notation as in Z, elements of M are uij − 2pij and k is assumed the same as above. Here, the pij is the allele frequency for the jth SNP marker, expected for genotype Mij and taking into account the breed background of animal i. For the matrix Gadj2, the columns of M were further scaled by the standard deviation of the expected marker effects to obtain the M* matrix. For animal i, the genotype element j of M* was uij2pij2pij1pij. To improve numerical stability and avoid division by 0, estimated breedwise AF below 0.05 or above 0.95 were set to either 0.05 or 0.95, respectively. This threshold corresponded to our prior removal of SNP with MAF <5%, which was based on across-breed AF. Then finally, the relationship matrix was obtained as Gadj2 = M*M*′/m. Individual AF pij was calculated using the currently genotyped animals or base population as reference population.
      Current genotyped population-level breedwise AF were computed using a linear multiple regression and binomial models. A simple multiple regression vector (β) of genotypes (y) on breed proportions (X) was solved for each marker, and e is the independently normally distributed residual error:
      y= Xβ+ e


      Alternatively, genotype yi for individual i was considered as observation from binomial (pij, 2). The binomial likelihood Πi=1npijyi11pijyi2 was handled by having a logistic regression for parameters pij, defined as follows:
      logitpij= Xβ,


      where logitpij=lnpij1pij. The expected AF of the marker for each individual from the linear and logistic models become pˆij=Xβˆ and pˆij=expXβˆ/1+expXβˆ, respectively, where AF in βˆ=βˆ1,,βˆ4 in for SRB, FAY, NRF, and the combined breeds “other.”
      The AF across breeds and breedwise AF were estimated in the base (founding) population using the gene content approximation algorithm of
      • Gengler N.
      • Mayeres P.
      • Szydlowski M.
      A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
      . The setup in this algorithm follows the logic of genetic covariance among relatives, where the covariance between gene contents, which is the number of copies of one allele in a genotype, is proportional to the additive relationship between animals. Large pedigrees are used to compute A and linear mixed-model equations are used to account for selection and drift in AF across time, thus occurring during pedigree generations (
      • Gengler N.
      • Mayeres P.
      • Szydlowski M.
      A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
      ). As was done for the simple linear model above, the expected SNP genotype for ungenotyped base population animals was estimated from their genotyped relatives for every marker following the following model:
      y= Xγ+ Qg+ e,


      where γ is the vector of breed effects or intercept and Q is a design matrix allocating records to animal effects g, which is the estimated gene content for all animals including ancestors. The regression effect solutions to γ were used to calculate individual AF values pij similarly as using the current genotyped population AF.
      The constructions of all 4 G above (i.e., Gorg, Gorg2, Gadj, and Gadj2) were computed using corresponding AF estimated in the currently genotyped population. Additionally, Gorg, Gadj, and Gadj2 relationship matrices were recalculated using corresponding AF estimated from the base population. For the comparison in the calculation of within-breed AF, genotypes of the oldest 842 bulls in the data, born between 1971 and 1990, were used to estimate base population breedwise AF. Here, we used a simple mul tiple regression of genotypes on breed proportions as explained for the currently genotyped population.

      Effect of Alternative G Estimates on Genomic Predictions

      The estimation of variance components and the prediction of direct estimated genomic values (DGV) were carried out separately for each of the 3 matrices (i.e., Gorg, Gadj, and Gadj2) and separately when they were calculated using either observed or base population AF. The analyses were conducted using ASReml 3.0 (
      • Gilmour A.R.
      • Gogel B.J.
      • Cullis B.R.
      • Thompson R.
      ASREML User Guide Release 3.0.
      ) and MiX99 (
      • Lidauer M.
      • Strandén I.
      Fast and flexible program for genetic evaluation in dairy cattle.
      ) software under the following GBLUP model:
      IDD= Xb+ Sa+ e,


      where IDD is a vector of IDD for daughters of bulls in the reference data set, b is a vector of breed effects or intercept, S is the design matrix that relates observations to DGV for sires of daughters a, and e is a vector of random normal deviates. It is assumed that e~N0,Rσe2, where the diagonal element in R is rii = 1/wi, and wi is the number of effective record contributions of the cow and is a weighting factor for the ith IDD. It is assumed that a~N0,Gσa2, where G is the marker-based relationship matrix and σa2 is the additive genetic variance. Here the fixed-breed regression effects were only fitted for Gadj and Gadj2. The predicted values for all animals in this case were obtained as the sum of the animals’ DGV and fixed-breed regression solutions.

      Validation Analyses

      The validation reliability of DGV was assessed following the Interbull genomic EBV validation test (
      • Mäntysaari E.A.
      • Liu Z.
      • VanRaden P.
      Interbull validation test for genomic evaluations.
      ). A weighted regression model of deregressed breeding value (DRP) for bulls in the validation data on predicted DGV was fitted to obtain the regression b1 coefficient:
      DRP= 1b0+ b1â+ e,


      where DRP is the vector of DRP for the candidate bulls and â is the vector of estimated DGV for these bulls. The linear model was weighted by individual wk, defined as the effective daughter contribution of the bull. The validation reliability of the model was calculated as RDGV2=rDRP,DGV2w¯, where rDRP,DGV2 is the squared correlation between DRP and DGV and w¯ is the average of wk, which account for the inaccuracy in the estimation of DRP.

      Results

      Estimated Breedwise AF

      Breedwise AF were calculated for the defined breeds SRB, FAY, NRF, and the breed “other,” which combined small breeds. Only 60 bulls were pure base breed, with 59 having 100% BP for FAY. Few individuals had a BP of at least 50% for SRB (647) and NRF (40) in the data. In the genotyped population, both the linear regression and binomial models gave equivalent estimates of AF for the breeds, with correlations over all SNP between models close to 1. Whereas a linear model resulted in AF for few markers outside the expected range of 0 and 1, a binomial model restrained coefficients to fall within this range. The distributions of breedwise AF were also generally similar under the linear versus binomial model, except the NRF having the most markers with estimated AF out of the expected range using the linear model. A binomial model was, however, challenging to implement using Gengler's method due to software limitations. Because the estimated AF were similar, for consistency we present results from a linear regression model. Table 1 shows the numbers of markers for each breed with estimated AF that were estimated to be either less than 0 or greater than 1 when using the linear model. The least number of markers outside the parameter space were found when AF were estimated in the base population, with the range of 313 below 0 and 1,972 above 1. The most markers with AF out of parameter space were found when 842 old bulls were used in the AF estimation. Then, 2,460 AF were below 0 and 8,387 were greater than 1. The breeds NRF and SRB appeared to have the most markers with estimated AF outside the parameter space, which could partly be due to not having 100% NRF cattle and only 1 pure base breed SRB cattle in the genotyped population.
      Table 1The numbers of markers with estimated allele frequencies that fell outside the expected range of 0 and 1 for each breed when estimated with breed proportions as genetic group in the gene content model (base population) or in the linear regression model (current population).
      ItemBreed
      SRB=Swedish Red; FAY=Finnish Ayrshire; NRF=Norwegian Red; other=combined breeds.
      SRBFAYNRFOther
      Base population
       Less than 026192653
       Greater than 12761931,44954
      Currently genotyped population
       Less than 0152949607
       Greater than 11,6159023,30481
      Genotypes from 842 old bulls
       Less than 0275782,08918
       Greater than 11,8486685,708163
      1 SRB = Swedish Red; FAY = Finnish Ayrshire; NRF = Norwegian Red; other = combined breeds.
      The between-breed correlations of breedwise AF estimated in the base population, currently genotyped population, and the oldest genotyped bulls are shown in Table 2. Generally, breedwise AF had the highest correlations when AF were estimated in the base population (ranging from 0.678 to 0.817) and least correlated when estimated using genotypes from older bulls (0.421 to 0.651). The highest correlations in the base population AF estimates were between SRB and NRF (0.817), and lowest correlations were between NRF and the breed “other” (0.678). However, observed AF estimated using all genotypes and old genotyped bulls were highly correlated between SRB and FAY at 0.643 and 0.630, respectively, and less correlated between SRB and NRF (0.545 and 0.421, respectively).
      Table 2The correlations between breedwise allele frequencies estimated in the base population (
      • Gengler N.
      • Mayeres P.
      • Szydlowski M.
      A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
      ) within the oldest 842 bulls and in the currently genotyped population.
      ItemBreed
      SRB=Swedish Red; FAY=Finnish Ayrshire; NRF=Norwegian Red; other=combined breeds.
      SRBFAYNRFOther
      Base population
       SRB1.0000.7370.8170.741
       FAY1.0000.6880.771
       NRF1.0000.678
       Other1.000
      Currently genotyped population
       SRB1.0000.6440.6020.626
       FAY1.0000.5450.672
       NRF1.0000.570
       Other1.000
      Genotypes from old bulls
       SRB1.0000.6300.4680.594
       FAY1.0000.4210.651
       NRF1.0000.446
      1 SRB = Swedish Red; FAY = Finnish Ayrshire; NRF = Norwegian Red; other = combined breeds.

      Estimated Relationship Coefficients from Pedigree and Genomic Data

      Descriptive statistics of diagonal elements from the pedigree (Aii) and different genomic estimators (Gii) are presented in Table 3, across populations and within bulls registered in DNK, SWE, and FIN. The number of bulls was 800, 1,240, and 2,040 in the DNK, SWE, and FIN populations, respectively. Results are presented by country because these 3 populations are generally considered a single population; however, the genetic relationships between SWE and FIN populations is stronger than these 2 and the DNK population. The ranges of maximum elements from A within (1.081 to 1.135) and across populations (1.135) were smaller than those from G estimators within (1.233 to 1.450) and across populations (1.310 to 1.450). This difference in scales is because A is not an absolute measurement but an expected relatedness given the pedigree, whereas G measures the actual relatedness at marker loci. The variability of G across populations was greater in the original approaches (i.e., Gorg and Gorg2) and smaller for adjusted matrices, especially for Gadj2. Similar tendencies were found within the DNK animals but not for SWE and FIN bulls. Coefficients from Gorg and Gorg2 were generally similar, but we present estimates from both methods 1 and 2 for comparison to their adjusted alternatives proposed in the current study. The average Aii was greatest in the FIN bulls (1.016) and smallest in the DNK bulls (1.007). However, these averages were vice versa for Gii using observed AF from Gorg. The mean of diagonals from A and Gorg were close to 1 for across breeds in SWE and FIN but was 1.136 for DNK from Gorg.
      Table 3Descriptive statistics of diagonal elements from the pedigree (A), original, and adjusted genomic relationship matrices (G) estimated using allele frequencies in the genotyped population
      Summaries are across populations and within bulls born in Denmark, Sweden, and Finland.
      ItemMeanMinimumMaximumSD
      Across populations
      A1.0121.0001.1350.014
      Gorg
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      1.0190.8711.3790.074
      Gorg2
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      1.0190.8711.3790.074
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9490.7471.3100.045
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9650.7731.4500.055
      Danish bulls
      A1.0071.0001.1090.013
      Gorg
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      1.1360.9731.3280.072
      Gorg 2
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      1.1360.9731.3280.072
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9600.8281.3100.053
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9500.8051.4500.069
      Swedish bulls
      A1.0081.0001.0810.011
      Gorg
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      1.0060.8711.1840.037
      Gorg2
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      1.0060.8711.1840.037
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9570.7751.2330.044
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9770.7731.3640.051
      Finnish bulls
      A1.0161.0001.1350.015
      Gorg
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      0.9790.8771.1570.028
      Gorg2
      Original method (original G) 1 and 2 of VanRaden (2008) calculated using allele frequencies observed across breeds.
      0.9790.8771.1570.027
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9390.7841.2830.036
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise observed allele frequencies.
      0.9610.7761.4010.043
      1 Summaries are across populations and within bulls born in Denmark, Sweden, and Finland.
      2 Original method (original G) 1 and 2 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      calculated using allele frequencies observed across breeds.
      3 Adjusted method (adjusted G) 1 and 2 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      computed using breedwise observed allele frequencies.
      Table 4 shows summaries of diagonal elements from the G matrices calculated using estimated base population AF. Variability in the diagonals of G matrices calculated using base population AF was lower than when using the currently genotyped population AF. The variability in terms of standard deviation of diagonal elements was 22 and 35% less in the adjusted matrices Gadj and Gadj2, respectively, for across breeds, and in the range between 17 and 37% for within breeds. The averages of diagonal elements were less than 1 in all populations for the proposed adjusted matrices, irrespective of AF used to calculate G. For Gorg, a larger mean may have resulted from having, in general, higher diagonal elements. In all cases, the tendencies observed for diagonal elements were also clear for pairwise relationships (results not shown).
      Table 4Descriptive statistics of diagonal elements of the original and adjusted genomic relationship matrices (G) estimated using allele frequencies in the base population, across populations, and within bulls born in Denmark, Sweden, and Finland.
      ItemMeanMinimumMaximumSD
      Across populations
      Gorg
      Original method (original G) 1 of VanRaden (2008) calculated using base population allele frequencies across breeds.
      1.0230.9061.3290.049
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9810.8191.2240.035
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9860.8241.2620.036
      Danish bulls
      Gorg
      Original method (original G) 1 of VanRaden (2008) calculated using base population allele frequencies across breeds.
      1.0880.9511.2750.057
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9940.8681.2040.037
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9760.8331.2260.044
      Swedish bulls
      Gorg
      Original method (original G) 1 of VanRaden (2008) calculated using base population allele frequencies across breeds.
      1.0080.9061.1520.029
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9800.8691.1370.032
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9860.8871.1660.032
      Finnish bulls
      Gorg
      Original method (original G) 1 of VanRaden (2008) calculated using base population allele frequencies across breeds.
      1.0060.9351.1470.026
      Gadj
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9750.8711.1840.030
      Gadj2
      Adjusted method (adjusted G) 1 and 2 of VanRaden (2008) computed using breedwise allele frequencies from the base population.
      0.9890.8681.1990.032
      1 Original method (original G) 1 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      calculated using base population allele frequencies across breeds.
      2 Adjusted method (adjusted G) 1 and 2 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      computed using breedwise allele frequencies from the base population.
      Figure 1 shows the distributions of Gii from different G matrices calculated using observed AF in the combined population. The distributions were examined to ensure consistency with the statistics presented above, as measures such as minimum and maximum tend to be sensitive to extreme values. The shape of the density plot generally followed a normal distribution as suggested with a theoretical example by
      • Simeone R.
      • Misztal I.
      • Aguilar I.
      • Legarra A.
      Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
      . However, tails for all distributions were slightly longer to the right of the distributions, which could be partly associated with the distribution of Aii. The distributions from adjusted matrices (i.e., Gadj and Gadj2) had only 1 peak, whereas that of Gorg appeared to be bimodal. This agrees with the suggestion that multiple peaks would not occur in the distribution of Gii within a population, but may be expected in multiple populations if G is scaled with AF across breeds (
      • Simeone R.
      • Misztal I.
      • Aguilar I.
      • Legarra A.
      Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
      ). The density plots for diagonal elements from G were also examined within populations (plots not shown). The distributions for all Gii were generally distributed normally in SWE and FIN bulls; however, Gorg had a bimodal distribution in the DNK bulls.
      Figure thumbnail gr1
      Figure 1The distributions of diagonal elements from genomic matrices (G) calculated using estimated allele frequencies (AF) in the currently genotyped population. Gorg = original method 1 (original G) of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      , calculated using AF across breeds. Gadj and Gadj2 = new adjusted methods (adjusted G) 1 and 2 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      , respectively, computed using breedwise observed AF.
      Figure 2 shows the distributions of Gii from different G matrices calculated using estimated base population AF in the combined population. Similarly here as for observed AF, all plots were approximately normally distributed, with slight tails to the right. The bimodal density plot of Gorg diagonals was slightly smoothed when AF were estimated from the base population. Plots within populations were also normally distributed in the SWE and FIN bulls whereas Gorg appeared bimodal for the DNK animals. The variability of Gii, especially for adjusted matrices, was much less when AF were estimated from the base population (Figure 2) than from the current population (Figure 1). In this ad mixed population, the correlations between Aii and Gii by any of the methods were always close to zero when using population-level AF but increased to 0.16, 0.28, and 0.38, respectively, for Gorg, Gadj, and Gadj2 when AF were estimated from the base population. The correlations within populations were also higher and ranged from 0.26 for Gadj to 0.53 for Gorg when AF were estimated from the base populations.
      Figure thumbnail gr2
      Figure 2The distributions of diagonal elements from genomic matrices (G) calculated using estimated allele frequencies (AF) in the base population. Gorg = original method 1 (original G) of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      , calculated using AF across breeds. Gadj and Gadj2 = new adjusted methods (adjusted G) 1 and 2 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      , respectively, computed using breedwise base population AF.

      Validation Reliabilities of DGV

      Table 5 shows the regression coefficients and validation reliabilities for milk, protein, and fat obtained using alternative G matrices. The DGV were from G matrices calculated using AF in the current and the base population only because AF estimated from genotypes of old individuals appeared to be less usable. The regression coefficients and validation reliabilities were similar for all matrices irrespective of whether breedwise or across-breed AF were used and whether AF were estimated in the current or base population. Thus, predictions of genomic values converged to similar solutions regardless of AF used to compute G. For all matrices, regression coefficients were less than the expected value of 1 for milk (0.71), protein (0.75), and fat (0.81). The validation reliabilities were low for milk (0.33) and protein (0.33) and slightly higher for fat (0.43).
      Table 5Regression coefficients (b1) and validation reliabilities of direct estimated genomic values R2DGV from genomic relationship matrices (G) calculated using currently genotyped and base population allele frequencies (AF)
      Gorg=original method (original G) 1 of VanRaden (2008) calculated using AF across breeds; Gadj and Gadj2=new adjusted methods (adjusted G) 1 and 2 of VanRaden (2008), respectively, computed using breedwise AF.
      TraitObserved AFBase population AF
      GorgGadjGadj2GorgGadjGadj2
      b1R2DGVb1R2DGVb1R2DGVb1R2DGVb1R2DGVb1R2DGV
      Milk0.710.320.710.320.720.330.710.320.710.320.720.33
      Protein0.750.330.750.330.760.330.750.330.750.330.760.33
      Fat0.810.430.800.420.820.430.810.430.800.420.820.43
      1 Gorg = original method (original G) 1 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      calculated using AF across breeds; Gadj and Gadj2 = new adjusted methods (adjusted G) 1 and 2 of
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      , respectively, computed using breedwise AF.

      Discussion

      An important step when defining the model relates to the genetic covariance between relatives, which reflects shared genes that arise through common ancestry. High-density panels of SNP markers have recently been used to estimate genomic relationships in addition to the traditional pedigree-based relationships. Several methods for the estimation of G within a breed have been proposed in the literature (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ;
      • Misztal I.
      • Legarra A.
      • Aguilar I.
      Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.
      ;
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction when some animals are not genotyped.
      ;
      • Yang J.
      • Benyamin B.
      • McEvoy B.P.
      • Gordon S.
      • Henders A.K.
      • Nyholt D.R.
      • Madden P.A.
      • Heath A.C.
      • Martin N.G.
      • Montgomery G.W.
      • Goddard M.E.
      • Visscher P.M.
      Common SNPs explain a large proportion of the heritability for human height.
      ). A general agreement across studies about genomic selection is that there is a gain in prediction accuracies for young unproven bulls when genomic information is incorporated compared with traditional evaluations with pedigree information only, due to improved prediction of Mendelian sampling deviations between close relatives in G. Methods for calculating G are straightforward in a single-breed population; however, similar approaches tend to result in distorted coefficients in multibreed populations. Thus, relationships should account for the different expectations for mean and variance, depending on breed composition of the individuals in multibreeds (
      • Harris B.L.
      • Johnson D.L.
      Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
      ). Only a few studies have evaluated the prospect of accounting for multibreeds in G with real data (
      • Harris B.L.
      • Johnson D.L.
      Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
      ;
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      ). The resulting G has been tested in genomic predictions for crossbred animals in New Zealand (
      • Harris B.L.
      • Johnson D.L.
      Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
      ). The objectives of this study were to examine the prospects of accounting for breed composition in the calculation of G and assess coefficients of different G matrices compared with the pedigree-based relationship matrix within and across breeds in a multibreed population.

      Breedwise AF

      Allele frequencies play a crucial role in the calculation of G and, hence, erroneous estimation of AF may result in biased G coefficients. Our rationale behind the estimation of breedwise AF was that individuals from breeds that developed independently would likely have different AF. We found that the approach proposed by
      • Gengler N.
      • Mayeres P.
      • Szydlowski M.
      A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
      for estimating gene content of ungenotyped individuals given pedigree useful in the estimation of breedwise AF in the base population. The fewer number of markers with AF outside the expected range of 0 and 1 found when AF were estimated in the base population indicates that the pedigree was better able to differentiate base breeds compared with AF estimated in the currently genotyped animals. Estimates of AF outside the expected range result from using a simplified model without restrictions on the parameter space. Restricting the parameter space to fall between 0 and 1 using a binomial model resulted in coefficients close to these values. The AF out of bounds did not create a great problem in the current population, as no pure base breed animals were included and, therefore, an individuals’ expectation of AF was generally correct. However, if purebreds are included in the data, their expected AF may be imprecise; it may be useful in this case to use a binomial model. An alternative approach to Gengler's fixed breed effects would be to include unknown parent groups for each base breed in the matrix A−1 (
      • Quaas R.L.
      Additive genetic model with groups and relationships.
      ). This model would yield genetic group effects equivalent to our AF within breed in the base population.
      The correlations between estimated breedwise AF were also higher using the gene content approach. When considering only the minor AF, correlations dropped but were still higher in the base population. The observed high correlations between breedwise AF estimated from the base population compared with low correlations between these frequencies in the currently genotyped population have been reported in
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      . The authors pointed out that this indicates that recent drift within populations was removed during the estimation process and also revealed that breeds had been more similar over 10 generations in the past than they are at present, as expected from genetic drift of frequencies across pedigree generations. This progressive differentiation in AF between breeds, however, may not have been the case for the current population, as base breeds were combined and subjected to the same breeding goal for over 2 decades, which is expected to make them similar genetically. The higher correlations found between SRB, FAY, and the breed “other” were unusual. However, many animals with SRB and FAY fractions also have a breed proportion for Canadian Ayrshires that is now in the breed “other.” Thus, the estimation of AF might have detected relations to such animals. The expected pedigree-based breed proportions were used in this study to define the 4 breeds. Alternatively, accurate prediction of breed composition based on SNP genotype data has been demonstrated for multibreed populations (
      • Kuehn L.A.
      • Keele J.W.
      • Bennett G.L.
      • McDaneld T.G.
      • Smith T.P.L.
      • Snelling W.M.
      • Sonstegard T.S.
      • Thallman R.M.
      Predicting breed composition using breed frequencies of 50,000 markers from the US meat animal research center 2,000 bull project.
      ;
      • Frkonja A.
      • Gredler B.
      • Schnyder U.
      • Curik I.
      • Sölkner J.
      Prediction of breed composition in an admixed cattle population.
      ). However, these algorithms initially estimate breed-specific frequencies using purebred individuals, which was a limitation in this population due to unavailability of such individuals.

      Properties of A and G Within and Across Populations

      The diagonal elements of all G matrices that used observed AF were incomparable to the diagonal elements of A. Moreover, the variability in A was much smaller than observed in G matrices. The comparison of A and G computed using observed AF is generally vague, as the coefficients in A are expressed relative to the base population in the distant past and the additive genetic variance is defined for that generation. In contrast, when the base population for G is achieved by scaling IBD coefficients with observed AF, the additive genetic variance among animals considers average variation in the current genotyped animals (
      • Powell J.E.
      • Visscher P.M.
      • Goddard M.E.
      Reconciling the analysis of IBD and IBS in complex trait studies.
      ;
      • Yang J.
      • Benyamin B.
      • McEvoy B.P.
      • Gordon S.
      • Henders A.K.
      • Nyholt D.R.
      • Madden P.A.
      • Heath A.C.
      • Martin N.G.
      • Montgomery G.W.
      • Goddard M.E.
      • Visscher P.M.
      Common SNPs explain a large proportion of the heritability for human height.
      ). The estimate of additive genetic variance may be smaller in the current population than it was in the distant past because we expect the current population to be more inbred (
      • Powell J.E.
      • Visscher P.M.
      • Goddard M.E.
      Reconciling the analysis of IBD and IBS in complex trait studies.
      ). The highest variability in diagonal elements from Gorg and Gorg2 indicates that the use of across-breed AF increased coefficients in G for this admixed population. Variability was reduced by using breedwise AF in Gadj. However, in Gadj and Gorg, the overall scaling was based on the same marker variance across breeds, which was larger than the expected variance within breeds. Consequently, the variance of the diagonal elements appeared much smaller in Gadj than in Gorg. The simplified scaling factor was corrected in Gadj2, where elements were scaled by the mean marker variance of an individual's base breeds. The variance from the resulting Gadj2 was generally still smaller than observed from the original approaches across breeds in Gorg and Gorg2.
      Diagonal elements of G built using any of the approaches were more accurately estimated when the G used AF estimated from the base population. The variability of diagonal elements, particularly for Gadj2, was reduced to a greater extent, suggesting that breedwise AF in this case were less biased and highlights a great need to account for the pedigree structure in G for multibreed populations. Thus, with pedigree information, the base (founder) population is generally consistent. The calculated diagonals of G in this case were moderately correlated with A across breeds and in agreement with that reported by
      • Aguilar I.
      • Misztal I.
      • Johnson D.L.
      • Legarra A.
      • Tsuruta S.
      • Lawlor T.J.
      Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
      in the US Holstein population. However, the moderate correlation of 0.38 between diagonals of A and Gadj2 calculated with base population AF in our study was much lower than correlation of 0.68 reported with simulated data (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ) and correlations ranged from 0.50 to 0.56 for the US Holstein, Jersey, and Brown Swiss populations (
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      ). The highest correlations were generally found with true AF in simulated data (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ) and when all AF were assumed to be 0.5 for all markers (
      • Hayes B.J.
      • Goddard M.E.
      Technical note: Prediction of breeding values using marker-derived relationship matrices.
      ;
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      ). Assuming AF equal to 0.5 for all markers equalizes the relative contribution of markers to G instead of having rare alleles contributing more than common alleles (
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      ). In addition to AF, one reason why these correlations were different across studies might be due to different population structures. The advantage of base breedwise AF in Gadj2 could mean that the distant past founder population corresponds well with the current expected homozygosity. Moreover, our results suggest that calcu lation of G with respect to an individual's base breed corrected better the heterogeneity than simple animal deviations from across-population mean AF.
      When G coefficients were assessed within populations, we found that using across-breed AF tended to increase Gii for animals from populations that had fewer animals or that were distantly related to dominating breeds in the combined population.
      • Simeone R.
      • Misztal I.
      • Aguilar I.
      • Legarra A.
      Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
      indicated that with an equal number of animals contributing to the AF across populations, there would be fewer differences in scaling G between populations. The mean AF across breeds was strongly influenced by the Swedish and Finnish populations, as these breeds are more related genetically (
      • Brøndum R.F.
      • Rius-Vilarrasa E.
      • Strandén I.
      • Su G.
      • Guldbrandtsen B.
      • Fikse W.F.
      • Lund M.S.
      Reliabilities of genomic prediction using combined reference data of the Nordic red dairy cattle populations.
      ;
      • Makgahlela M.L.
      • Mäntysaari E.A.
      • Strandén I.
      • Koivula M.
      • Nielsen U.S.
      • Sillanpää M.J.
      • Juga J.
      Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
      ) and had more animals in the combined data. As a result, and in contrast to Aii, the general level of homozygosity in the Danish population appeared to be higher than in other populations. This is unexpected because the Danish population has been found to be more admixed than the other 2, due to years of crossbreeding (
      • Brøndum R.F.
      • Rius-Vilarrasa E.
      • Strandén I.
      • Su G.
      • Guldbrandtsen B.
      • Fikse W.F.
      • Lund M.S.
      Reliabilities of genomic prediction using combined reference data of the Nordic red dairy cattle populations.
      ;
      • Makgahlela M.L.
      • Mäntysaari E.A.
      • Strandén I.
      • Koivula M.
      • Nielsen U.S.
      • Sillanpää M.J.
      • Juga J.
      Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
      ). This means that as diagonal elements from Gorg have been increased in the Danish bulls, they were decreased for animals in the other populations. In addition, individuals with the highest diagonal elements in Gorg were found to be registered elsewhere but not in the Nordic RDC. Because these animals come from populations with AF deviating even further from the population mean AF, their genotypes make them appear more homozygous than the average homozygosity in this population. Apart from a great reduction in the variability of diagonal elements between AF estimated from the current and base population, the behaviors of different estimators of G were similar within and across populations.
      • Harris B.L.
      • Johnson D.L.
      Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
      indicated that multibreed reference populations will lead to biased coefficients in G if breed is not taken into account. In this study, the use of breedwise AF to calculate Gadj and Gadj2 reduced country differences in coefficients similarly within and across populations. Regarding distributions, the observation that diagonal elements of Gadj and Gadj2 generally followed a normal distribution, but diagonal elements of Gorg appeared bimodal, indicates a distortion in the elements of Gorg and suggests clusters that may be due to the population structure. This observation was in agreement with previous findings (
      • Simeone R.
      • Misztal I.
      • Aguilar I.
      • Legarra A.
      Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
      ). In their study, the authors used simulated data on multiple populations and 60,000 SNP markers with varying AF at each locus to compute G using observed AF across populations. They observed a bimodal distribution of the diagonal elements of G. Multiple peaks were correctly avoided by using breed-wise AF in our study.
      • Simeone R.
      • Misztal I.
      • Aguilar I.
      • Legarra A.
      Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
      pointed out a general lack of theoretical knowledge about the distribution of the diagonal elements of G both within and across breeds.
      The estimation of relationships and their use in predictions is widely carried out within a breed or treating multibreed data as a homogeneous breed (
      • Hayes B.J.
      • Visscher P.M.
      • Goddard M.E.
      Increased accuracy of artificial selection by using the realized relationship matrix.
      ;
      • Pryce J.E.
      • Gredler B.
      • Bolormaa S.
      • Bowman P.J.
      • Egger-Danner C.
      • Fuerst C.
      • Emmerling R.
      • Sölkner J.
      • Goddard M.E.
      • Hayes B.J.
      Short communication: Genomic selection using a multi-breed, across-country reference population.
      ;
      • Su G.
      • Madsen P.
      • Nielsen U.S.
      • Mäntysaari E.A.
      • Aamand G.P.
      • Christensen O.F.
      • Lund M.S.
      Genomic prediction for Nordic Red cattle using one-step and selection index blending.
      ), except in New Zealand where predictions account for breed effects for crossbred animals (
      • Harris B.L.
      • Johnson D.L.
      Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
      ). The observed correlations between diagonal elements from A and G within breed were higher when estimated base population AF were used instead of current population AF (VanRaden 2008;
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      ) or with AF equal to 0.5 (
      • VanRaden P.M.
      • Olson K.M.
      • Wiggans G.R.
      • Cole J.B.
      • Tooker M.E.
      Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
      ). Our use of estimated breedwise AF also greatly improved diagonal elements, indicating that AF have a large effect on relationship coefficients and would also have an effect on the estimation of population additive genetic variation. Using pig data,
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      explored different methods of scaling G with observed AF. They estimated genetic variances using different G for genotyped animals only, and estimates ranged from 2.25 when Gii were normalized to average to 1 and were inflated to 4.46 when G was scaled with the expectations of AF following a β distribution (
      • Gianola D.
      • de los Campos G.
      • Hill W.G.
      • Manfredi E.
      • Fernando R.
      Additive genetic variability and the Bayesian alphabet.
      ). The estimated additive genetic variances were more sensitive when a selected subset of genotyped animals was used for variance components estimation. Similar additive genetic variances were found when complete data of genotyped and ungenotyped animals were used (
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      ).
      In the absence of selective genotyping, the expected regression coefficient (b1) from the validation model is close to 1 (
      • Mäntysaari E.A.
      • Liu Z.
      • VanRaden P.
      Interbull validation test for genomic evaluations.
      ). The observed b1 values (range = 0.71–0.82) were less than the expected value of 1, which indicates bias in the estimated genomic values. The validation reliabilities of DGV from all G matrices were similar. In all cases, solutions indicated that DGV were unaffected by the AF used to calculate G. This observation agrees with previous reports where the gain in accuracy of DGV was small (0.01) when base population AF were used to compute G instead of observed AF in simulated data (
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      ), and were indifferent to AF with real data (
      • Forni S.
      • Aguilar I.
      • Misztal I.
      Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
      ). According to
      • Strandén I.
      • Christensen O.F.
      Allele coding in genomic evaluation.
      , DGV are neither sensitive to marker allele coding nor AF and the same DGV solutions would be calculated in GBLUP, provided that the model has a common fixed general mean effect. Thus, the absolute levels of values (i.e., animal effects) are only affected when the mean is uncounted for in the model. Similarly here, inclusion of fixed-breed regressions for Gadj2 brought breed means back into the DGV. It was clearly shown that different AF affected the calculation of G. Although G was sensitive to AF and was accurately computed using breedwise AF in the base population, the DGV validation reliabilities were indifferent. In multibreed populations, the use of Gadj2 may be more beneficial in single-step evaluations where most animals are evaluated by matrix A and through their relationships to genotyped animals. However, it should be emphasized that the use of across-breed MAF at least 5% to select SNP tend to remove markers within breed that may be informative for improved prediction accuracy. In the presence of purebred animals, it may be useful to select SNP based on MAF of 1 breed, even if monomorphic in all other breeds (
      • Olson K.M.
      • VanRaden P.M.
      • Tooker M.E.
      Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
      ). An alternative to adjusting breeds in G would be to estimate SNP effects in different breeds simultaneously in a multitrait regression model (
      • Olson K.M.
      • VanRaden P.M.
      • Tooker M.E.
      Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
      ;
      • Makgahlela M.L.
      • Mäntysaari E.A.
      • Strandén I.
      • Koivula M.
      • Nielsen U.S.
      • Sillanpää M.J.
      • Juga J.
      Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
      ). The prediction accuracies were found to be even higher when fitting correlated SNP effects between breeds (
      • Olson K.M.
      • VanRaden P.M.
      • Tooker M.E.
      Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
      ). The unavailability of purebred animals in the data also limited comparisons of our expected breedwise AF from the actual estimates within breed.

      Conclusions

      Current methods used for computing genomic relationships in multibreed populations need to be extended to allow for differential AF between breeds. This study showed that errors in the estimation of AF may have great consequences in the calculation of relationships. Across-breed observed AF increased diagonal elements of G for animals from breeds that are distantly related to the combined population and have fewer animals in the combined population. Breedwise AF reduced country differences in G similarly within and across populations, resulting in a normal distribution of diagonal elements. Breedwise AF were more accurately estimated when accounting for the pedigree structure or estimated from the base population, thereby reducing the variability of diagonal elements of Gadj2. The DGV and their validation reliabilities were unaffected by AF used to compute G when estimated using a GBLUP model. The method for Gadj2 may provide more consistency among relationship coefficients between genotyped and ungenotyped individuals in an across-breed single-step evaluation.

      Acknowledgements

      The authors acknowledges the Nordic Cattle Genetic Evaluation Ltd. (Aarhus, Denmark) and Nordic Genomic Selection project for providing the genotype and phenotype data. M. L. Makgahlela acknowledges financial support from the Finnish Ministry of Agriculture and Forestry (Helsinki, Finland) and the University of Helsinki in Finland.

      References

        • Aguilar I.
        • Misztal I.
        • Johnson D.L.
        • Legarra A.
        • Tsuruta S.
        • Lawlor T.J.
        Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
        J. Dairy Sci. 2010; 93: 743-752
        • Brøndum R.F.
        • Rius-Vilarrasa E.
        • Strandén I.
        • Su G.
        • Guldbrandtsen B.
        • Fikse W.F.
        • Lund M.S.
        Reliabilities of genomic prediction using combined reference data of the Nordic red dairy cattle populations.
        J. Dairy Sci. 2011; 94: 4700-4707
        • Christensen O.F.
        • Lund M.S.
        Genomic prediction when some animals are not genotyped.
        Genet. Sel. Evol. 2010; 42: 2
        • Eding H.
        • Meuwissen T.H.E.
        Marker-based estimates of between and within population kinships for the conservation of genetic diversity.
        J. Anim. Breed. Genet. 2001; 118: 141-159
        • Fernández J.
        • Meuwissen T.H.E.
        • Toro M.A.
        • Mäki-Tanila A.
        Management of genetic diversity in small farm animal populations.
        Animal. 2011; 5: 1684-1698
        • Forni S.
        • Aguilar I.
        • Misztal I.
        Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
        Genet. Sel. Evol. 2011; 43: 1
        • Frkonja A.
        • Gredler B.
        • Schnyder U.
        • Curik I.
        • Sölkner J.
        Prediction of breed composition in an admixed cattle population.
        Anim. Genet. 2012; 43: 696-703
        • Gengler N.
        • Mayeres P.
        • Szydlowski M.
        A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
        Animal. 2007; 1: 21-28
        • Gianola D.
        • de los Campos G.
        • Hill W.G.
        • Manfredi E.
        • Fernando R.
        Additive genetic variability and the Bayesian alphabet.
        Genetics. 2009; 183: 347-363
        • Gilmour A.R.
        • Gogel B.J.
        • Cullis B.R.
        • Thompson R.
        ASREML User Guide Release 3.0.
        VSN International Ltd., Hemel Hempstead, UK2009
        • Habier D.
        • Fernando R.L.
        • Dekkers J.C.M.
        The impact of genetic relationship information on genome-assisted breeding values.
        Genetics. 2007; 177: 2389-2397
        • Harris B.L.
        • Johnson D.L.
        Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
        J. Dairy Sci. 2010; 93: 1243-1252
        • Hayes B.J.
        • Goddard M.E.
        Technical note: Prediction of breeding values using marker-derived relationship matrices.
        J. Anim. Sci. 2008; 86: 2089-2092
        • Hayes B.J.
        • Visscher P.M.
        • Goddard M.E.
        Increased accuracy of artificial selection by using the realized relationship matrix.
        Genet. Res. (Camb.). 2009; 91: 47-60
        • Illumina Inc
        Illumina GenCall Data Analysis Software-Gen-Call software algorithms for clustering, calling, and scoring genotypes.
        Illumina. Pub. No. 370-2004-009. Illumina Inc., San Diego, CA2005
        • Kuehn L.A.
        • Keele J.W.
        • Bennett G.L.
        • McDaneld T.G.
        • Smith T.P.L.
        • Snelling W.M.
        • Sonstegard T.S.
        • Thallman R.M.
        Predicting breed composition using breed frequencies of 50,000 markers from the US meat animal research center 2,000 bull project.
        J. Anim. Sci. 2011; 89: 1742-1750
        • Legarra A.
        • Aguilar I.
        • Misztal I.
        A relationship matrix including full pedigree and genomic information.
        J. Dairy Sci. 2009; 92: 4656-4663
        • Lidauer M.
        • Mäntysaari E.A.
        • Strandén I.
        • Pösö J.
        • Pedersen J.
        • Nielsen U.S.
        • Johansson K.
        • Eriksson J.-Å.
        • Madsen P.
        • Aamand G.P.
        Random heterosis and recombination loss effects in a multibreed evaluation for Nordic Red dairy cattle.
        in: Proc. 8th World Congr. Genet. Appl. Livest. Prod, Belo Horizonte, Brazil2006
        • Lidauer M.
        • Strandén I.
        Fast and flexible program for genetic evaluation in dairy cattle.
        Interbull Bull. 1999; 20: 20
        • Makgahlela M.L.
        • Mäntysaari E.A.
        • Strandén I.
        • Koivula M.
        • Nielsen U.S.
        • Sillanpää M.J.
        • Juga J.
        Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
        J. Anim. Breed. Genet. 2013; 130 (): 10-19
        • Malécot G.
        Les Mathématiques de L’hérédité.
        Masson et Cie, Paris, France1948
        • Mäntysaari E.A.
        • Koivula M.
        • Strandén I.
        • Pösö J.
        • Aamand G.P.
        Estimation of GEBVs using deregressed individual cow breeding values.
        Interbull Bull. 2011; 44: 26-29
        • Mäntysaari E.A.
        • Liu Z.
        • VanRaden P.
        Interbull validation test for genomic evaluations.
        Interbull Bull. 2010; 41: 17-22
        • Misztal I.
        • Legarra A.
        • Aguilar I.
        Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.
        J. Dairy Sci. 2009; 92: 4648-4655
        • Mrode R.A.
        • Swanson G.J.T.
        Calculating cow and daughter yield deviations and partitioning of genetic evaluations under a random regression model.
        Livest. Prod. Sci. 2004; 86: 253-260
        • Olson K.M.
        • VanRaden P.M.
        • Tooker M.E.
        Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
        J. Dairy Sci. 2012; 95: 5378-5383
        • Powell J.E.
        • Visscher P.M.
        • Goddard M.E.
        Reconciling the analysis of IBD and IBS in complex trait studies.
        Nat. Rev. Genet. 2010; 11: 800-805
        • Pryce J.E.
        • Gredler B.
        • Bolormaa S.
        • Bowman P.J.
        • Egger-Danner C.
        • Fuerst C.
        • Emmerling R.
        • Sölkner J.
        • Goddard M.E.
        • Hayes B.J.
        Short communication: Genomic selection using a multi-breed, across-country reference population.
        J. Dairy Sci. 2011; 94: 2625-2630
        • Quaas R.L.
        Additive genetic model with groups and relationships.
        J. Dairy Sci. 1988; 71: 1338-1345
        • Scheet P.
        • Stephens M.
        A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase.
        Am. J. Hum. Genet. 2006; 78: 629-644
        • Simeone R.
        • Misztal I.
        • Aguilar I.
        • Legarra A.
        Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
        J. Anim. Breed. Genet. 2011; 128: 386-393
        • Strandén I.
        • Christensen O.F.
        Allele coding in genomic evaluation.
        Genet. Sel. Evol. 2011; 43: 25
        • Strandén I.
        • Vuori K.
        RelaX2: Pedigree analysis programme.
        in: Proc. 8th World Congress Genetics Applied Livest. Prod, Belo Horizonte, Brazil2006
        • Su G.
        • Madsen P.
        • Nielsen U.S.
        • Mäntysaari E.A.
        • Aamand G.P.
        • Christensen O.F.
        • Lund M.S.
        Genomic prediction for Nordic Red cattle using one-step and selection index blending.
        J. Dairy Sci. 2012; 95: 909-917
        • Toro M.A.
        • Meuwissen T.H.E.
        • Fernández J.
        • Shaat I.
        • Mäki-Tanila A.
        Assessing the genetic diversity in small farm animal populations.
        Animal. 2011; 5: 1669-1683
        • VanRaden P.M.
        Efficient methods to compute genomic predictions.
        J. Dairy Sci. 2008; 91: 4414-4423
        • VanRaden P.M.
        • Olson K.M.
        • Wiggans G.R.
        • Cole J.B.
        • Tooker M.E.
        Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
        J. Dairy Sci. 2011; 94: 5673-5682
        • Yang J.
        • Benyamin B.
        • McEvoy B.P.
        • Gordon S.
        • Henders A.K.
        • Nyholt D.R.
        • Madden P.A.
        • Heath A.C.
        • Martin N.G.
        • Montgomery G.W.
        • Goddard M.E.
        • Visscher P.M.
        Common SNPs explain a large proportion of the heritability for human height.
        Nat. Genet. 2010; 42: 565-569
        • Zimin A.V.
        • Delcher A.L.
        • Florea L.
        • Kelley D.R.
        • Schatz M.C.
        • Puiu D.
        • Hanrahan F.
        • Pertea G.
        • van Tassell C.P.
        • Sonstegard T.S.
        • Marçais G.
        • Roberts M.
        • Subramanian P.
        • Yorke J.A.
        • Salzberg S.L.
        A whole-genome assembly of the domestic cow, Bos taurus.
        Genome Biol. 2009; 10: R42