If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Agricultural Sciences, PO Box 27, FIN-00014, University of Helsinki, Helsinki, FinlandMTT Agrifood Research Finland, Biotechnology and Food Research, Biometrical Genetics, FIN-31600 Jokioinen, Finland
Department of Agricultural Sciences, PO Box 27, FIN-00014, University of Helsinki, Helsinki, FinlandMTT Agrifood Research Finland, Biotechnology and Food Research, Biometrical Genetics, FIN-31600 Jokioinen, Finland
Different approaches of calculating genomic measures of relationship were explored and compared with pedigree relationships (A) within and across base breeds in a crossbreed population, using genotypes for 38,194 loci of 4,106 Nordic Red dairy cattle. Four genomic relationship matrices (G) were calculated using either observed allele frequencies (AF) across breeds or within-breed AF. The G matrices were compared separately when the AF were estimated in the observed and in the base population. Breedwise AF in the current and base population were estimated using linear regression models of individual genotypes on breed composition. Different G matrices were further used to predict direct estimated genomic values using a genomic BLUP model. Higher variability existed in the diagonal elements of G across breeds (standard deviation = 0.06, on average) compared with A (0.01). The use of simple observed AF across base breeds to compute G increased coefficients for individuals in distantly related populations. Estimated breedwise AF reduced differences in coefficients similarly within and across populations. The variability of the current adjusted G matrix decreased from 0.055 to 0.035 when breedwise AF were estimated from the base breed population. The direct estimated genomic values and their validation reliabilities were, however, unaffected by AF used to compute G when estimated with a genomic BLUP model, due to inclusion of breed means in the model. In multibreed populations, G adjusted with breedwise AF from the founder population may provide more consistency among relationship coefficients between genotyped and ungenotyped individuals in an across-breed single-step evaluation.
The use of marker genotypes to estimate relationships among individuals in a population has become increasingly important in many fields of genetics. In livestock breeding, knowledge of relationships is used routinely to estimate genetic variation and animal breeding values (EBV;
). Traditionally, relationship coefficients are calculated from the pedigree data. Pedigree relationships are obtained as 2 times the expected average identity by descent (IBD) sharing between 2 relatives (
) and have been applied successfully within the framework of mixed-model equations for best linear unbiased prediction of EBV. Presently, with the increasing availability of genetic markers covering the whole genome, pedigree-based relationships can be replaced or combined with realized relationships calculated from marker data in the prediction of genomic breeding values (genomic EBV;
Realized relationships derived from molecular markers are based on the actual IBD sharing or identity by state for genomic regions and, therefore, have more variation between closely related animals than pedigree relationships (
). Moreover, realized relationships capture unrecorded pedigrees. Several different methods of calculating genomic relationship matrices (G) have been developed, for genotyped animals only (
). In the latter approach, an arbitrary weight on pedigree relationships (A) is often used to measure the amount of variation not explained by markers. Although variability exists in the accuracy of predictions among the above methods (
). Each genotype is a deviation from the marker-specific population mean, which is calculated using population allele frequencies. The estimation of G has been shown to be closer to A when inferences use allele frequencies (AF) in the distant ancestral population (
) instead of AF in the currently genotyped population. This is because the expected G coefficients would be expressed relative to the same base population as A. The limitation is that base population AF are generally not available with field data and their estimation can be challenging.
showed an efficient way of calculating gene content and base population AF within a breed. However, in practice, currently genotyped populations are assumed to be the base population. This results from centering the genotype matrix used to build G with current-data AF so that the average genomic relationship between animals within the current population become 0 and scaling G such that the additive genetic variance would be comparable to that obtained through conventional methods (
The use of observed AF within a breed may not have major practical implications in genomic BLUP (GBLUP) models. In the context of structured populations, the effect of using across-breed AF to make G may have consequences for the estimation of relationships, mainly attributable to varying source of AF between breeds.
used the average of 3 breedwise AF for estimation in the combined 3-breed population. Although these approaches would be beneficial for multiple populations with distinctive subpopulations, a need still exists for approaches in populations that constitute mainly crossbred animals. The Nordic Red dairy cattle (RDC) comprise 3 subpopulations by country of birth [i.e., Denmark (DNK), Sweden (SWE), and Finland (FIN)]. Over the years of crossbreeding, the majority of animals (~98%) in the Nordic RDC are composites of base breeds. The absence of pure base-breed animals remains a major challenge for the estimation of breedwise AF. The objective of this study was to investigate whether the use of estimated breedwise AF in the calculation of genomic relationships would provide a more accurate estimate of G than using AF across breeds, and to determine the effect on G when AF are estimated in the base population versus the currently genotyped population.
Materials and Methods
Data
This study was carried out in a structured population with 60 pure base breed bulls and 4,046 bulls of combinations of base breeds in the Nordic RDC. Genotypes for all 4,106 bulls were attained using the Illumina Bovine SNP50 BeadChip (Illumina Inc., San Diego, CA). For quality purposes, markers from the X chromosome, without map position in the UMD3.0 genome assembly (
) <60% and marker loci with call rates <5% in a large reference sample from the same genotyping laboratory, consisting of Danish Holstein bulls, were discarded. Finally, missing genotypes were imputed using fastPHASE software chromosome by chromosome (
). Due to unavailability of pure base breeds, informative SNP above were selected based on across-breed AF. After quality control, a total of 38,194 SNP markers were available for analyses. The entire RDC pedigree, containing over 4 million records, was used to calculate breed proportions (BP) for individual bulls (
). A breed was defined only if the average BP in the data was greater than 10%. Breeds used in this study were the Swedish Red (SRB), Finnish Ayrshire (FAY), and Norwegian Red (NRF), and the remaining breeds with BP less than 10% were combined into the breed “other.” A more detailed description about the population structure, breeds contained, and definitions of the final 4 breeds and their trends is provided by
. The pedigree for genotyped bulls contained 22,300 animals.
Phenotypes were individual daughter deviations (IDD) of 1,995,606 RDC cows for milk, protein, and fat yields, obtained from March 2010 official evaluations of the Nordic cattle genetic evaluations. By definition, the IDD are cow performances adjusted for fixed effects, nongenetic random effects, and genetic effects of the cow's dam (
). For validation of predictions using different G matrices, the data were split into sets of 3,300 training bulls born between 1980 and 1999, and 806 validation bulls born between 1998 and 2005. The training data had older bulls, which were evaluated for the first time during the 2005 Nordic cattle routine evaluation.
Estimation of Pedigree and Genomic Relationships
Pedigree relationships of genotyped bulls were estimated using the RelaX2 computer program (
and modifications of the 2 methods to adapt to the admixed structure of the current population.
Let there be n animals that have been genotyped for m markers. Let uij be genotype j of animal i, where genotype uij is the number of copies for the second allele (i.e., uij has value 0, 1, or 2). Following
method 1 and using observed AF across breeds, the original genomic relationship matrix G (denoted Gorg) was computed as Gorg = ZZ′/k, where Z is an n by m matrix of centered genotypes. Here, the element of animal i for marker j in Z is uij − 2pj, where pj is the frequency of the second allele at SNP marker j and In method 2, also shown by
. Matrix Gadj was calculated as Gadj = MM′/k, where, with the same notation as in Z, elements of M are uij − 2pij and k is assumed the same as above. Here, the pij is the allele frequency for the jth SNP marker, expected for genotype Mij and taking into account the breed background of animal i. For the matrix Gadj2, the columns of M were further scaled by the standard deviation of the expected marker effects to obtain the M* matrix. For animal i, the genotype element j of M* was To improve numerical stability and avoid division by 0, estimated breedwise AF below 0.05 or above 0.95 were set to either 0.05 or 0.95, respectively. This threshold corresponded to our prior removal of SNP with MAF <5%, which was based on across-breed AF. Then finally, the relationship matrix was obtained as Gadj2 = M*M*′/m. Individual AF pij was calculated using the currently genotyped animals or base population as reference population.
Current genotyped population-level breedwise AF were computed using a linear multiple regression and binomial models. A simple multiple regression vector (β) of genotypes (y) on breed proportions (X) was solved for each marker, and e is the independently normally distributed residual error:
Alternatively, genotype yi for individual i was considered as observation from binomial (pij, 2). The binomial likelihood was handled by having a logistic regression for parameters pij, defined as follows:
where The expected AF of the marker for each individual from the linear and logistic models become and respectively, where AF in in for SRB, FAY, NRF, and the combined breeds “other.”
The AF across breeds and breedwise AF were estimated in the base (founding) population using the gene content approximation algorithm of
. The setup in this algorithm follows the logic of genetic covariance among relatives, where the covariance between gene contents, which is the number of copies of one allele in a genotype, is proportional to the additive relationship between animals. Large pedigrees are used to compute A and linear mixed-model equations are used to account for selection and drift in AF across time, thus occurring during pedigree generations (
). As was done for the simple linear model above, the expected SNP genotype for ungenotyped base population animals was estimated from their genotyped relatives for every marker following the following model:
where γ is the vector of breed effects or intercept and Q is a design matrix allocating records to animal effects g, which is the estimated gene content for all animals including ancestors. The regression effect solutions to γ were used to calculate individual AF values pij similarly as using the current genotyped population AF.
The constructions of all 4 G above (i.e., Gorg, Gorg2, Gadj, and Gadj2) were computed using corresponding AF estimated in the currently genotyped population. Additionally, Gorg, Gadj, and Gadj2 relationship matrices were recalculated using corresponding AF estimated from the base population. For the comparison in the calculation of within-breed AF, genotypes of the oldest 842 bulls in the data, born between 1971 and 1990, were used to estimate base population breedwise AF. Here, we used a simple mul tiple regression of genotypes on breed proportions as explained for the currently genotyped population.
Effect of Alternative G Estimates on Genomic Predictions
The estimation of variance components and the prediction of direct estimated genomic values (DGV) were carried out separately for each of the 3 matrices (i.e., Gorg, Gadj, and Gadj2) and separately when they were calculated using either observed or base population AF. The analyses were conducted using ASReml 3.0 (
where IDD is a vector of IDD for daughters of bulls in the reference data set, b is a vector of breed effects or intercept, S is the design matrix that relates observations to DGV for sires of daughters a, and e is a vector of random normal deviates. It is assumed that where the diagonal element in R is rii = 1/wi, and wi is the number of effective record contributions of the cow and is a weighting factor for the ith IDD. It is assumed that where G is the marker-based relationship matrix and is the additive genetic variance. Here the fixed-breed regression effects were only fitted for Gadj and Gadj2. The predicted values for all animals in this case were obtained as the sum of the animals’ DGV and fixed-breed regression solutions.
Validation Analyses
The validation reliability of DGV was assessed following the Interbull genomic EBV validation test (
). A weighted regression model of deregressed breeding value (DRP) for bulls in the validation data on predicted DGV was fitted to obtain the regression b1 coefficient:
where DRP is the vector of DRP for the candidate bulls and â is the vector of estimated DGV for these bulls. The linear model was weighted by individual wk, defined as the effective daughter contribution of the bull. The validation reliability of the model was calculated as where is the squared correlation between DRP and DGV and is the average of wk, which account for the inaccuracy in the estimation of DRP.
Results
Estimated Breedwise AF
Breedwise AF were calculated for the defined breeds SRB, FAY, NRF, and the breed “other,” which combined small breeds. Only 60 bulls were pure base breed, with 59 having 100% BP for FAY. Few individuals had a BP of at least 50% for SRB (647) and NRF (40) in the data. In the genotyped population, both the linear regression and binomial models gave equivalent estimates of AF for the breeds, with correlations over all SNP between models close to 1. Whereas a linear model resulted in AF for few markers outside the expected range of 0 and 1, a binomial model restrained coefficients to fall within this range. The distributions of breedwise AF were also generally similar under the linear versus binomial model, except the NRF having the most markers with estimated AF out of the expected range using the linear model. A binomial model was, however, challenging to implement using Gengler's method due to software limitations. Because the estimated AF were similar, for consistency we present results from a linear regression model. Table 1 shows the numbers of markers for each breed with estimated AF that were estimated to be either less than 0 or greater than 1 when using the linear model. The least number of markers outside the parameter space were found when AF were estimated in the base population, with the range of 313 below 0 and 1,972 above 1. The most markers with AF out of parameter space were found when 842 old bulls were used in the AF estimation. Then, 2,460 AF were below 0 and 8,387 were greater than 1. The breeds NRF and SRB appeared to have the most markers with estimated AF outside the parameter space, which could partly be due to not having 100% NRF cattle and only 1 pure base breed SRB cattle in the genotyped population.
Table 1The numbers of markers with estimated allele frequencies that fell outside the expected range of 0 and 1 for each breed when estimated with breed proportions as genetic group in the gene content model (base population) or in the linear regression model (current population).
The between-breed correlations of breedwise AF estimated in the base population, currently genotyped population, and the oldest genotyped bulls are shown in Table 2. Generally, breedwise AF had the highest correlations when AF were estimated in the base population (ranging from 0.678 to 0.817) and least correlated when estimated using genotypes from older bulls (0.421 to 0.651). The highest correlations in the base population AF estimates were between SRB and NRF (0.817), and lowest correlations were between NRF and the breed “other” (0.678). However, observed AF estimated using all genotypes and old genotyped bulls were highly correlated between SRB and FAY at 0.643 and 0.630, respectively, and less correlated between SRB and NRF (0.545 and 0.421, respectively).
Table 2The correlations between breedwise allele frequencies estimated in the base population (
Estimated Relationship Coefficients from Pedigree and Genomic Data
Descriptive statistics of diagonal elements from the pedigree (Aii) and different genomic estimators (Gii) are presented in Table 3, across populations and within bulls registered in DNK, SWE, and FIN. The number of bulls was 800, 1,240, and 2,040 in the DNK, SWE, and FIN populations, respectively. Results are presented by country because these 3 populations are generally considered a single population; however, the genetic relationships between SWE and FIN populations is stronger than these 2 and the DNK population. The ranges of maximum elements from A within (1.081 to 1.135) and across populations (1.135) were smaller than those from G estimators within (1.233 to 1.450) and across populations (1.310 to 1.450). This difference in scales is because A is not an absolute measurement but an expected relatedness given the pedigree, whereas G measures the actual relatedness at marker loci. The variability of G across populations was greater in the original approaches (i.e., Gorg and Gorg2) and smaller for adjusted matrices, especially for Gadj2. Similar tendencies were found within the DNK animals but not for SWE and FIN bulls. Coefficients from Gorg and Gorg2 were generally similar, but we present estimates from both methods 1 and 2 for comparison to their adjusted alternatives proposed in the current study. The average Aii was greatest in the FIN bulls (1.016) and smallest in the DNK bulls (1.007). However, these averages were vice versa for Gii using observed AF from Gorg. The mean of diagonals from A and Gorg were close to 1 for across breeds in SWE and FIN but was 1.136 for DNK from Gorg.
Table 3Descriptive statistics of diagonal elements from the pedigree (A), original, and adjusted genomic relationship matrices (G) estimated using allele frequencies in the genotyped population
Table 4 shows summaries of diagonal elements from the G matrices calculated using estimated base population AF. Variability in the diagonals of G matrices calculated using base population AF was lower than when using the currently genotyped population AF. The variability in terms of standard deviation of diagonal elements was 22 and 35% less in the adjusted matrices Gadj and Gadj2, respectively, for across breeds, and in the range between 17 and 37% for within breeds. The averages of diagonal elements were less than 1 in all populations for the proposed adjusted matrices, irrespective of AF used to calculate G. For Gorg, a larger mean may have resulted from having, in general, higher diagonal elements. In all cases, the tendencies observed for diagonal elements were also clear for pairwise relationships (results not shown).
Table 4Descriptive statistics of diagonal elements of the original and adjusted genomic relationship matrices (G) estimated using allele frequencies in the base population, across populations, and within bulls born in Denmark, Sweden, and Finland.
Figure 1 shows the distributions of Gii from different G matrices calculated using observed AF in the combined population. The distributions were examined to ensure consistency with the statistics presented above, as measures such as minimum and maximum tend to be sensitive to extreme values. The shape of the density plot generally followed a normal distribution as suggested with a theoretical example by
Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
. However, tails for all distributions were slightly longer to the right of the distributions, which could be partly associated with the distribution of Aii. The distributions from adjusted matrices (i.e., Gadj and Gadj2) had only 1 peak, whereas that of Gorg appeared to be bimodal. This agrees with the suggestion that multiple peaks would not occur in the distribution of Gii within a population, but may be expected in multiple populations if G is scaled with AF across breeds (
Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
). The density plots for diagonal elements from G were also examined within populations (plots not shown). The distributions for all Gii were generally distributed normally in SWE and FIN bulls; however, Gorg had a bimodal distribution in the DNK bulls.
Figure 1The distributions of diagonal elements from genomic matrices (G) calculated using estimated allele frequencies (AF) in the currently genotyped population. Gorg = original method 1 (original G) of
Figure 2 shows the distributions of Gii from different G matrices calculated using estimated base population AF in the combined population. Similarly here as for observed AF, all plots were approximately normally distributed, with slight tails to the right. The bimodal density plot of Gorg diagonals was slightly smoothed when AF were estimated from the base population. Plots within populations were also normally distributed in the SWE and FIN bulls whereas Gorg appeared bimodal for the DNK animals. The variability of Gii, especially for adjusted matrices, was much less when AF were estimated from the base population (Figure 2) than from the current population (Figure 1). In this ad mixed population, the correlations between Aii and Gii by any of the methods were always close to zero when using population-level AF but increased to 0.16, 0.28, and 0.38, respectively, for Gorg, Gadj, and Gadj2 when AF were estimated from the base population. The correlations within populations were also higher and ranged from 0.26 for Gadj to 0.53 for Gorg when AF were estimated from the base populations.
Figure 2The distributions of diagonal elements from genomic matrices (G) calculated using estimated allele frequencies (AF) in the base population. Gorg = original method 1 (original G) of
Table 5 shows the regression coefficients and validation reliabilities for milk, protein, and fat obtained using alternative G matrices. The DGV were from G matrices calculated using AF in the current and the base population only because AF estimated from genotypes of old individuals appeared to be less usable. The regression coefficients and validation reliabilities were similar for all matrices irrespective of whether breedwise or across-breed AF were used and whether AF were estimated in the current or base population. Thus, predictions of genomic values converged to similar solutions regardless of AF used to compute G. For all matrices, regression coefficients were less than the expected value of 1 for milk (0.71), protein (0.75), and fat (0.81). The validation reliabilities were low for milk (0.33) and protein (0.33) and slightly higher for fat (0.43).
Table 5Regression coefficients (b1) and validation reliabilities of direct estimated genomic values R2DGV from genomic relationship matrices (G) calculated using currently genotyped and base population allele frequencies (AF)
Gorg=original method (original G) 1 of VanRaden (2008) calculated using AF across breeds; Gadj and Gadj2=new adjusted methods (adjusted G) 1 and 2 of VanRaden (2008), respectively, computed using breedwise AF.
An important step when defining the model relates to the genetic covariance between relatives, which reflects shared genes that arise through common ancestry. High-density panels of SNP markers have recently been used to estimate genomic relationships in addition to the traditional pedigree-based relationships. Several methods for the estimation of G within a breed have been proposed in the literature (
). A general agreement across studies about genomic selection is that there is a gain in prediction accuracies for young unproven bulls when genomic information is incorporated compared with traditional evaluations with pedigree information only, due to improved prediction of Mendelian sampling deviations between close relatives in G. Methods for calculating G are straightforward in a single-breed population; however, similar approaches tend to result in distorted coefficients in multibreed populations. Thus, relationships should account for the different expectations for mean and variance, depending on breed composition of the individuals in multibreeds (
). The objectives of this study were to examine the prospects of accounting for breed composition in the calculation of G and assess coefficients of different G matrices compared with the pedigree-based relationship matrix within and across breeds in a multibreed population.
Breedwise AF
Allele frequencies play a crucial role in the calculation of G and, hence, erroneous estimation of AF may result in biased G coefficients. Our rationale behind the estimation of breedwise AF was that individuals from breeds that developed independently would likely have different AF. We found that the approach proposed by
for estimating gene content of ungenotyped individuals given pedigree useful in the estimation of breedwise AF in the base population. The fewer number of markers with AF outside the expected range of 0 and 1 found when AF were estimated in the base population indicates that the pedigree was better able to differentiate base breeds compared with AF estimated in the currently genotyped animals. Estimates of AF outside the expected range result from using a simplified model without restrictions on the parameter space. Restricting the parameter space to fall between 0 and 1 using a binomial model resulted in coefficients close to these values. The AF out of bounds did not create a great problem in the current population, as no pure base breed animals were included and, therefore, an individuals’ expectation of AF was generally correct. However, if purebreds are included in the data, their expected AF may be imprecise; it may be useful in this case to use a binomial model. An alternative approach to Gengler's fixed breed effects would be to include unknown parent groups for each base breed in the matrix A−1 (
). This model would yield genetic group effects equivalent to our AF within breed in the base population.
The correlations between estimated breedwise AF were also higher using the gene content approach. When considering only the minor AF, correlations dropped but were still higher in the base population. The observed high correlations between breedwise AF estimated from the base population compared with low correlations between these frequencies in the currently genotyped population have been reported in
. The authors pointed out that this indicates that recent drift within populations was removed during the estimation process and also revealed that breeds had been more similar over 10 generations in the past than they are at present, as expected from genetic drift of frequencies across pedigree generations. This progressive differentiation in AF between breeds, however, may not have been the case for the current population, as base breeds were combined and subjected to the same breeding goal for over 2 decades, which is expected to make them similar genetically. The higher correlations found between SRB, FAY, and the breed “other” were unusual. However, many animals with SRB and FAY fractions also have a breed proportion for Canadian Ayrshires that is now in the breed “other.” Thus, the estimation of AF might have detected relations to such animals. The expected pedigree-based breed proportions were used in this study to define the 4 breeds. Alternatively, accurate prediction of breed composition based on SNP genotype data has been demonstrated for multibreed populations (
). However, these algorithms initially estimate breed-specific frequencies using purebred individuals, which was a limitation in this population due to unavailability of such individuals.
Properties of A and G Within and Across Populations
The diagonal elements of all G matrices that used observed AF were incomparable to the diagonal elements of A. Moreover, the variability in A was much smaller than observed in G matrices. The comparison of A and G computed using observed AF is generally vague, as the coefficients in A are expressed relative to the base population in the distant past and the additive genetic variance is defined for that generation. In contrast, when the base population for G is achieved by scaling IBD coefficients with observed AF, the additive genetic variance among animals considers average variation in the current genotyped animals (
). The estimate of additive genetic variance may be smaller in the current population than it was in the distant past because we expect the current population to be more inbred (
). The highest variability in diagonal elements from Gorg and Gorg2 indicates that the use of across-breed AF increased coefficients in G for this admixed population. Variability was reduced by using breedwise AF in Gadj. However, in Gadj and Gorg, the overall scaling was based on the same marker variance across breeds, which was larger than the expected variance within breeds. Consequently, the variance of the diagonal elements appeared much smaller in Gadj than in Gorg. The simplified scaling factor was corrected in Gadj2, where elements were scaled by the mean marker variance of an individual's base breeds. The variance from the resulting Gadj2 was generally still smaller than observed from the original approaches across breeds in Gorg and Gorg2.
Diagonal elements of G built using any of the approaches were more accurately estimated when the G used AF estimated from the base population. The variability of diagonal elements, particularly for Gadj2, was reduced to a greater extent, suggesting that breedwise AF in this case were less biased and highlights a great need to account for the pedigree structure in G for multibreed populations. Thus, with pedigree information, the base (founder) population is generally consistent. The calculated diagonals of G in this case were moderately correlated with A across breeds and in agreement with that reported by
in the US Holstein population. However, the moderate correlation of 0.38 between diagonals of A and Gadj2 calculated with base population AF in our study was much lower than correlation of 0.68 reported with simulated data (
). Assuming AF equal to 0.5 for all markers equalizes the relative contribution of markers to G instead of having rare alleles contributing more than common alleles (
). In addition to AF, one reason why these correlations were different across studies might be due to different population structures. The advantage of base breedwise AF in Gadj2 could mean that the distant past founder population corresponds well with the current expected homozygosity. Moreover, our results suggest that calcu lation of G with respect to an individual's base breed corrected better the heterogeneity than simple animal deviations from across-population mean AF.
When G coefficients were assessed within populations, we found that using across-breed AF tended to increase Gii for animals from populations that had fewer animals or that were distantly related to dominating breeds in the combined population.
Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
indicated that with an equal number of animals contributing to the AF across populations, there would be fewer differences in scaling G between populations. The mean AF across breeds was strongly influenced by the Swedish and Finnish populations, as these breeds are more related genetically (
) and had more animals in the combined data. As a result, and in contrast to Aii, the general level of homozygosity in the Danish population appeared to be higher than in other populations. This is unexpected because the Danish population has been found to be more admixed than the other 2, due to years of crossbreeding (
). This means that as diagonal elements from Gorg have been increased in the Danish bulls, they were decreased for animals in the other populations. In addition, individuals with the highest diagonal elements in Gorg were found to be registered elsewhere but not in the Nordic RDC. Because these animals come from populations with AF deviating even further from the population mean AF, their genotypes make them appear more homozygous than the average homozygosity in this population. Apart from a great reduction in the variability of diagonal elements between AF estimated from the current and base population, the behaviors of different estimators of G were similar within and across populations.
indicated that multibreed reference populations will lead to biased coefficients in G if breed is not taken into account. In this study, the use of breedwise AF to calculate Gadj and Gadj2 reduced country differences in coefficients similarly within and across populations. Regarding distributions, the observation that diagonal elements of Gadj and Gadj2 generally followed a normal distribution, but diagonal elements of Gorg appeared bimodal, indicates a distortion in the elements of Gorg and suggests clusters that may be due to the population structure. This observation was in agreement with previous findings (
Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
). In their study, the authors used simulated data on multiple populations and 60,000 SNP markers with varying AF at each locus to compute G using observed AF across populations. They observed a bimodal distribution of the diagonal elements of G. Multiple peaks were correctly avoided by using breed-wise AF in our study.
Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
). The observed correlations between diagonal elements from A and G within breed were higher when estimated base population AF were used instead of current population AF (VanRaden 2008;
). Our use of estimated breedwise AF also greatly improved diagonal elements, indicating that AF have a large effect on relationship coefficients and would also have an effect on the estimation of population additive genetic variation. Using pig data,
explored different methods of scaling G with observed AF. They estimated genetic variances using different G for genotyped animals only, and estimates ranged from 2.25 when Gii were normalized to average to 1 and were inflated to 4.46 when G was scaled with the expectations of AF following a β distribution (
). The estimated additive genetic variances were more sensitive when a selected subset of genotyped animals was used for variance components estimation. Similar additive genetic variances were found when complete data of genotyped and ungenotyped animals were used (
). The observed b1 values (range = 0.71–0.82) were less than the expected value of 1, which indicates bias in the estimated genomic values. The validation reliabilities of DGV from all G matrices were similar. In all cases, solutions indicated that DGV were unaffected by the AF used to calculate G. This observation agrees with previous reports where the gain in accuracy of DGV was small (0.01) when base population AF were used to compute G instead of observed AF in simulated data (
, DGV are neither sensitive to marker allele coding nor AF and the same DGV solutions would be calculated in GBLUP, provided that the model has a common fixed general mean effect. Thus, the absolute levels of values (i.e., animal effects) are only affected when the mean is uncounted for in the model. Similarly here, inclusion of fixed-breed regressions for Gadj2 brought breed means back into the DGV. It was clearly shown that different AF affected the calculation of G. Although G was sensitive to AF and was accurately computed using breedwise AF in the base population, the DGV validation reliabilities were indifferent. In multibreed populations, the use of Gadj2 may be more beneficial in single-step evaluations where most animals are evaluated by matrix A and through their relationships to genotyped animals. However, it should be emphasized that the use of across-breed MAF at least 5% to select SNP tend to remove markers within breed that may be informative for improved prediction accuracy. In the presence of purebred animals, it may be useful to select SNP based on MAF of 1 breed, even if monomorphic in all other breeds (
). The unavailability of purebred animals in the data also limited comparisons of our expected breedwise AF from the actual estimates within breed.
Conclusions
Current methods used for computing genomic relationships in multibreed populations need to be extended to allow for differential AF between breeds. This study showed that errors in the estimation of AF may have great consequences in the calculation of relationships. Across-breed observed AF increased diagonal elements of G for animals from breeds that are distantly related to the combined population and have fewer animals in the combined population. Breedwise AF reduced country differences in G similarly within and across populations, resulting in a normal distribution of diagonal elements. Breedwise AF were more accurately estimated when accounting for the pedigree structure or estimated from the base population, thereby reducing the variability of diagonal elements of Gadj2. The DGV and their validation reliabilities were unaffected by AF used to compute G when estimated using a GBLUP model. The method for Gadj2 may provide more consistency among relationship coefficients between genotyped and ungenotyped individuals in an across-breed single-step evaluation.
Acknowledgements
The authors acknowledges the Nordic Cattle Genetic Evaluation Ltd. (Aarhus, Denmark) and Nordic Genomic Selection project for providing the genotype and phenotype data. M. L. Makgahlela acknowledges financial support from the Finnish Ministry of Agriculture and Forestry (Helsinki, Finland) and the University of Helsinki in Finland.
References
Aguilar I.
Misztal I.
Johnson D.L.
Legarra A.
Tsuruta S.
Lawlor T.J.
Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.