## Abstract

Different approaches of calculating genomic measures of relationship were explored and compared with pedigree relationships (

**A**) within and across base breeds in a crossbreed population, using genotypes for 38,194 loci of 4,106 Nordic Red dairy cattle. Four genomic relationship matrices (**G**) were calculated using either observed allele frequencies (AF) across breeds or within-breed AF. The**G**matrices were compared separately when the AF were estimated in the observed and in the base population. Breedwise AF in the current and base population were estimated using linear regression models of individual genotypes on breed composition. Different**G**matrices were further used to predict direct estimated genomic values using a genomic BLUP model. Higher variability existed in the diagonal elements of**G**across breeds (standard deviation = 0.06, on average) compared with**A**(0.01). The use of simple observed AF across base breeds to compute**G**increased coefficients for individuals in distantly related populations. Estimated breedwise AF reduced differences in coefficients similarly within and across populations. The variability of the current adjusted**G**matrix decreased from 0.055 to 0.035 when breedwise AF were estimated from the base breed population. The direct estimated genomic values and their validation reliabilities were, however, unaffected by AF used to compute**G**when estimated with a genomic BLUP model, due to inclusion of breed means in the model. In multibreed populations,**G**adjusted with breedwise AF from the founder population may provide more consistency among relationship coefficients between genotyped and ungenotyped individuals in an across-breed single-step evaluation.## Key words

## Introduction

The use of marker genotypes to estimate relationships among individuals in a population has become increasingly important in many fields of genetics. In livestock breeding, knowledge of relationships is used routinely to estimate genetic variation and animal breeding values (EBV;

VanRaden, 2008

; Hayes et al., 2009

; Su et al., 2012

), monitor inbreeding (Fernández et al., 2011

; Toro et al., 2011

), and for conservation of animal genetic resources (Eding and Meuwissen, 2001

). Traditionally, relationship coefficients are calculated from the pedigree data. Pedigree relationships are obtained as 2 times the expected average identity by descent (**IBD**) sharing between 2 relatives () and have been applied successfully within the framework of mixed-model equations for best linear unbiased prediction of EBV. Presently, with the increasing availability of genetic markers covering the whole genome, pedigree-based relationships can be replaced or combined with realized relationships calculated from marker data in the prediction of genomic breeding values (genomic EBV;Habier et al., 2007

; Hayes et al., 2009

)Realized relationships derived from molecular markers are based on the actual IBD sharing or identity by state for genomic regions and, therefore, have more variation between closely related animals than pedigree relationships (

VanRaden, 2008

; Hayes et al., 2009

). Moreover, realized relationships capture unrecorded pedigrees. Several different methods of calculating genomic relationship matrices (**G**) have been developed, for genotyped animals only (VanRaden, 2008

; Yang et al., 2010

) and when genotyped and ungenotyped individuals are combined (Legarra et al., 2009

; Misztal et al., 2009

; Christensen and Lund, 2010

). In the latter approach, an arbitrary weight on pedigree relationships (**A**) is often used to measure the amount of variation not explained by markers. Although variability exists in the accuracy of predictions among the above methods (Forni et al., 2011

), generally these accuracies are at least twice of those estimated from pedigree data only (see, for example, Su et al., 2012

).Genomic relationship matrix

**G**can be constructed by using a matrix having genotype information for each individual and marker (VanRaden, 2008

). Each genotype is a deviation from the marker-specific population mean, which is calculated using population allele frequencies. The estimation of **G**has been shown to be closer to**A**when inferences use allele frequencies (**AF**) in the distant ancestral population (VanRaden, 2008

; VanRaden et al., 2011

) instead of AF in the currently genotyped population. This is because the expected **G**coefficients would be expressed relative to the same base population as**A**. The limitation is that base population AF are generally not available with field data and their estimation can be challenging.Gengler et al., 2007

showed an efficient way of calculating gene content and base population AF within a breed. However, in practice, currently genotyped populations are assumed to be the base population. This results from centering the genotype matrix used to build **G**with current-data AF so that the average genomic relationship between animals within the current population become 0 and scaling**G**such that the additive genetic variance would be comparable to that obtained through conventional methods (Powell et al., 2010

; Forni et al., 2011

).The use of observed AF within a breed may not have major practical implications in genomic BLUP (

**GBLUP**) models. In the context of structured populations, the effect of using across-breed AF to make**G**may have consequences for the estimation of relationships, mainly attributable to varying source of AF between breeds.Eding and Meuwissen, 2001

demonstrated that average relatedness between 2 populations could be expressed in terms of population-specific AF. VanRaden et al., 2011

used the average of 3 breedwise AF for estimation in the combined 3-breed population. Although these approaches would be beneficial for multiple populations with distinctive subpopulations, a need still exists for approaches in populations that constitute mainly crossbred animals. The Nordic Red dairy cattle (**RDC**) comprise 3 subpopulations by country of birth [i.e., Denmark (**DNK**), Sweden (**SWE**), and Finland (**FIN**)]. Over the years of crossbreeding, the majority of animals (~98%) in the Nordic RDC are composites of base breeds. The absence of pure base-breed animals remains a major challenge for the estimation of breedwise AF. The objective of this study was to investigate whether the use of estimated breedwise AF in the calculation of genomic relationships would provide a more accurate estimate of**G**than using AF across breeds, and to determine the effect on**G**when AF are estimated in the base population versus the currently genotyped population.## Materials and Methods

### Data

This study was carried out in a structured population with 60 pure base breed bulls and 4,046 bulls of combinations of base breeds in the Nordic RDC. Genotypes for all 4,106 bulls were attained using the Illumina Bovine SNP50 BeadChip (Illumina Inc., San Diego, CA). For quality purposes, markers from the X chromosome, without map position in the UMD3.0 genome assembly (

Zimin et al., 2009

) and with minor allele frequency (**MAF**) <5% were discarded. In addition, animal genotypes with a GenCall score (Illumina Inc., 2005

) <60% and marker loci with call rates <5% in a large reference sample from the same genotyping laboratory, consisting of Danish Holstein bulls, were discarded. Finally, missing genotypes were imputed using fastPHASE software chromosome by chromosome (Scheet and Stephens, 2006

). Due to unavailability of pure base breeds, informative SNP above were selected based on across-breed AF. After quality control, a total of 38,194 SNP markers were available for analyses. The entire RDC pedigree, containing over 4 million records, was used to calculate breed proportions (**BP**) for individual bulls (Lidauer et al., 2006

). A breed was defined only if the average BP in the data was greater than 10%. Breeds used in this study were the Swedish Red (- Lidauer M.
- Mäntysaari E.A.
- Strandén I.
- Pösö J.
- Pedersen J.
- Nielsen U.S.
- Johansson K.
- Eriksson J.-Å.
- Madsen P.
- Aamand G.P.

Random heterosis and recombination loss effects in a multibreed evaluation for Nordic Red dairy cattle.

in: Proc. 8th World Congr. Genet. Appl. Livest. Prod, Belo Horizonte, Brazil2006

**SRB**), Finnish Ayrshire (**FAY**), and Norwegian Red (**NRF**), and the remaining breeds with BP less than 10% were combined into the breed “other.” A more detailed description about the population structure, breeds contained, and definitions of the final 4 breeds and their trends is provided byMakgahlela et al., 2013

. The pedigree for genotyped bulls contained 22,300 animals.Phenotypes were individual daughter deviations (

**IDD**) of 1,995,606 RDC cows for milk, protein, and fat yields, obtained from March 2010 official evaluations of the Nordic cattle genetic evaluations. By definition, the IDD are cow performances adjusted for fixed effects, nongenetic random effects, and genetic effects of the cow's dam (Mrode and Swanson, 2004

). Here, however, IDD were computed by using animal model deregression from 305-d combined EBV (Mäntysaari et al., 2011

). For validation of predictions using different **G**matrices, the data were split into sets of 3,300 training bulls born between 1980 and 1999, and 806 validation bulls born between 1998 and 2005. The training data had older bulls, which were evaluated for the first time during the 2005 Nordic cattle routine evaluation.### Estimation of Pedigree and Genomic Relationships

Pedigree relationships of genotyped bulls were estimated using the RelaX2 computer program (

Strandén and Vuori., 2006

). Genomic relationships were computed following methods 1 and 2 as demonstrated by VanRaden, 2008

and modifications of the 2 methods to adapt to the admixed structure of the current population.Let there be

*n*animals that have been genotyped for*m*markers. Let u_{ij}be genotype*j*of animal*i*, where genotype u_{ij}is the number of copies for the second allele (i.e., u_{ij}has value 0, 1, or 2). FollowingVanRaden, 2008

method 1 and using observed AF across breeds, the original genomic relationship matrix **G**(denoted**Gorg**) was computed as**Gorg**=**ZZ**′/k, where**Z**is an*n*by*m*matrix of centered genotypes. Here, the element of animal*i*for marker*j*in**Z**is u_{ij}− 2p_{j}, where p_{j}is the frequency of the second allele at SNP marker*j*and $\text{k}=2{\sum}_{j}{\text{p}}_{j}\left(1-{\text{p}}_{j}\right)\text{.}$ In method 2, also shown byYang et al., 2010

, standardized genotypes were used to calculate the **G**matrix as**Gorg**2 =**Z*****Z***′/*m*, where each column*j*in**Z**^{*}is ${Z}_{j}^{*}={Z}_{j}/\sqrt{2{\text{p}}_{j}\left(1-{\text{p}}_{j}\right)}$ and**Z**_{j}is column*j*in**Z**.Two adjusted genomic matrices were computed using breedwise AF:

**Gadj**and**Gadj**2. These matrices were obtained by modifying methods 1 and 2 byVanRaden, 2008

. Matrix **Gadj**was calculated as**Gadj**=**MM**′/k, where, with the same notation as in**Z**, elements of**M**are u_{ij}− 2p_{ij}and*k*is assumed the same as above. Here, the p_{ij}is the allele frequency for the*j*th SNP marker, expected for genotype M_{ij}and taking into account the breed background of animal*i*. For the matrix**Gadj**2, the columns of**M**were further scaled by the standard deviation of the expected marker effects to obtain the**M*** matrix. For animal*i*, the genotype element*j*of**M*** was $\frac{{\text{u}}_{ij}-2{\text{p}}_{ij}}{\sqrt{2{\text{p}}_{ij}\left(1-{\text{p}}_{ij}\right)}}\text{.}$ To improve numerical stability and avoid division by 0, estimated breedwise AF below 0.05 or above 0.95 were set to either 0.05 or 0.95, respectively. This threshold corresponded to our prior removal of SNP with MAF <5%, which was based on across-breed AF. Then finally, the relationship matrix was obtained as**Gadj**2 =**M*****M***′/*m*. Individual AF p_{ij}was calculated using the currently genotyped animals or base population as reference population.Current genotyped population-level breedwise AF were computed using a linear multiple regression and binomial models. A simple multiple regression vector (

**β**) of genotypes (**y**) on breed proportions (**X**) was solved for each marker, and**e**is the independently normally distributed residual error:$\text{y}=\text{X}\beta +\text{e}$

Alternatively, genotype

*y*_{i}for individual*i*was considered as observation from binomial (p_{ij}, 2). The binomial likelihood ${\Pi}_{i=1}^{n}{\text{p}}_{ij}^{{y}_{i1}}\left(1-{\text{p}}_{ij}^{{y}_{i2}}\right)$ was handled by having a logistic regression for parameters p_{ij}, defined as follows:$\text{logit}\left(\left[{\text{p}}_{ij}\right]\right)=\text{X}\beta \text{,}$

where $\text{logit}\left({\text{p}}_{ij}\right)=\mathrm{ln}\left(\frac{{\text{p}}_{ij}}{1-{\text{p}}_{ij}}\right)\text{.}$ The expected AF of the marker for each individual from the linear and logistic models become $\left[{\stackrel{\u02c6}{\text{p}}}_{ij}\right]=X\stackrel{\u02c6}{\beta}$ and $\left[{\stackrel{\u02c6}{\text{p}}}_{ij}\right]=\mathrm{exp}X\stackrel{\u02c6}{\beta}/\left[1+\mathrm{exp}\left(X\stackrel{\u02c6}{\beta}\right)\right]\text{,}$ respectively, where AF in $\stackrel{\u02c6}{\beta}=\left({\stackrel{\u02c6}{\beta}}_{1},\text{\u2026},{\stackrel{\u02c6}{\beta}}_{4}\right)$ in for SRB, FAY, NRF, and the combined breeds “other.”

The AF across breeds and breedwise AF were estimated in the base (founding) population using the gene content approximation algorithm of

Gengler et al., 2007

. The setup in this algorithm follows the logic of genetic covariance among relatives, where the covariance between gene contents, which is the number of copies of one allele in a genotype, is proportional to the additive relationship between animals. Large pedigrees are used to compute **A**and linear mixed-model equations are used to account for selection and drift in AF across time, thus occurring during pedigree generations (Gengler et al., 2007

). As was done for the simple linear model above, the expected SNP genotype for ungenotyped base population animals was estimated from their genotyped relatives for every marker following the following model:$\text{y}=\text{X}\gamma +\text{Qg}+\text{e,}$

where

**γ**is the vector of breed effects or intercept and**Q**is a design matrix allocating records to animal effects**g**, which is the estimated gene content for all animals including ancestors. The regression effect solutions to**γ**were used to calculate individual AF values p_{ij}similarly as using the current genotyped population AF.The constructions of all 4

**G**above (i.e.,**Gorg**,**Gorg**2,**Gadj**, and**Gadj**2) were computed using corresponding AF estimated in the currently genotyped population. Additionally,**Gorg**,**Gadj**, and**Gadj**2 relationship matrices were recalculated using corresponding AF estimated from the base population. For the comparison in the calculation of within-breed AF, genotypes of the oldest 842 bulls in the data, born between 1971 and 1990, were used to estimate base population breedwise AF. Here, we used a simple mul tiple regression of genotypes on breed proportions as explained for the currently genotyped population.### Effect of Alternative G Estimates on Genomic Predictions

The estimation of variance components and the prediction of direct estimated genomic values (

**DGV**) were carried out separately for each of the 3 matrices (i.e.,**Gorg**,**Gadj**, and**Gadj**2) and separately when they were calculated using either observed or base population AF. The analyses were conducted using ASReml 3.0 (Gilmour et al., 2009

) and MiX99 (Lidauer and Strandén, 1999

) software under the following GBLUP model:$\text{IDD}=\text{Xb}+\text{Sa}+\text{e,}$

where

**IDD**is a vector of IDD for daughters of bulls in the reference data set,**b**is a vector of breed effects or intercept,**S**is the design matrix that relates observations to DGV for sires of daughters**a**, and**e**is a vector of random normal deviates. It is assumed that $e~N\left(0,R{\sigma}_{\text{e}}^{2}\right)\text{,}$ where the diagonal element in**R**is r_{ii}= 1/*w*_{i}, and*w*_{i}is the number of effective record contributions of the cow and is a weighting factor for the*i*th IDD. It is assumed that $a~N\left(0,G{\sigma}_{\text{a}}^{2}\right)\text{,}$ where**G**is the marker-based relationship matrix and ${\sigma}_{\text{a}}^{2}$ is the additive genetic variance. Here the fixed-breed regression effects were only fitted for**Gadj**and**Gadj**2. The predicted values for all animals in this case were obtained as the sum of the animals’ DGV and fixed-breed regression solutions.### Validation Analyses

The validation reliability of DGV was assessed following the Interbull genomic EBV validation test (

Mäntysaari et al., 2010

). A weighted regression model of deregressed breeding value (**DRP**) for bulls in the validation data on predicted DGV was fitted to obtain the regression b_{1}coefficient:$\text{DRP}={\text{1b}}_{0}+{\text{b}}_{1}\text{\xe2}+\text{e,}$

where

**DRP**is the vector of DRP for the candidate bulls and**â**is the vector of estimated DGV for these bulls. The linear model was weighted by individual*w*_{k}, defined as the effective daughter contribution of the bull. The validation reliability of the model was calculated as ${\text{R}}_{\text{DGV}}^{2}=\frac{{\text{r}}_{\left(\text{DRP},\text{DGV}\right)}^{2}}{\overline{w}}\text{,}$ where ${\text{r}}_{\left(\text{DRP},\text{DGV}\right)}^{2}$ is the squared correlation between DRP and DGV and $\overline{w}$ is the average of*w*_{k}, which account for the inaccuracy in the estimation of DRP.## Results

### Estimated Breedwise AF

Breedwise AF were calculated for the defined breeds SRB, FAY, NRF, and the breed “other,” which combined small breeds. Only 60 bulls were pure base breed, with 59 having 100% BP for FAY. Few individuals had a BP of at least 50% for SRB (647) and NRF (40) in the data. In the genotyped population, both the linear regression and binomial models gave equivalent estimates of AF for the breeds, with correlations over all SNP between models close to 1. Whereas a linear model resulted in AF for few markers outside the expected range of 0 and 1, a binomial model restrained coefficients to fall within this range. The distributions of breedwise AF were also generally similar under the linear versus binomial model, except the NRF having the most markers with estimated AF out of the expected range using the linear model. A binomial model was, however, challenging to implement using Gengler's method due to software limitations. Because the estimated AF were similar, for consistency we present results from a linear regression model. Table 1 shows the numbers of markers for each breed with estimated AF that were estimated to be either less than 0 or greater than 1 when using the linear model. The least number of markers outside the parameter space were found when AF were estimated in the base population, with the range of 313 below 0 and 1,972 above 1. The most markers with AF out of parameter space were found when 842 old bulls were used in the AF estimation. Then, 2,460 AF were below 0 and 8,387 were greater than 1. The breeds NRF and SRB appeared to have the most markers with estimated AF outside the parameter space, which could partly be due to not having 100% NRF cattle and only 1 pure base breed SRB cattle in the genotyped population.

Table 1The numbers of markers with estimated allele frequencies that fell outside the expected range of 0 and 1 for each breed when estimated with breed proportions as genetic group in the gene content model (base population) or in the linear regression model (current population).

Item | Breed | |||
---|---|---|---|---|

SRB | FAY | NRF | Other | |

Base population | ||||

Less than 0 | 26 | 19 | 265 | 3 |

Greater than 1 | 276 | 193 | 1,449 | 54 |

Currently genotyped population | ||||

Less than 0 | 152 | 94 | 960 | 7 |

Greater than 1 | 1,615 | 902 | 3,304 | 81 |

Genotypes from 842 old bulls | ||||

Less than 0 | 275 | 78 | 2,089 | 18 |

Greater than 1 | 1,848 | 668 | 5,708 | 163 |

1 SRB = Swedish Red; FAY = Finnish Ayrshire; NRF = Norwegian Red; other = combined breeds.

The between-breed correlations of breedwise AF estimated in the base population, currently genotyped population, and the oldest genotyped bulls are shown in Table 2. Generally, breedwise AF had the highest correlations when AF were estimated in the base population (ranging from 0.678 to 0.817) and least correlated when estimated using genotypes from older bulls (0.421 to 0.651). The highest correlations in the base population AF estimates were between SRB and NRF (0.817), and lowest correlations were between NRF and the breed “other” (0.678). However, observed AF estimated using all genotypes and old genotyped bulls were highly correlated between SRB and FAY at 0.643 and 0.630, respectively, and less correlated between SRB and NRF (0.545 and 0.421, respectively).

Table 2The correlations between breedwise allele frequencies estimated in the base population (

Gengler et al., 2007

) within the oldest 842 bulls and in the currently genotyped population.Item | Breed | |||
---|---|---|---|---|

SRB | FAY | NRF | Other | |

Base population | ||||

SRB | 1.000 | 0.737 | 0.817 | 0.741 |

FAY | 1.000 | 0.688 | 0.771 | |

NRF | 1.000 | 0.678 | ||

Other | 1.000 | |||

Currently genotyped population | ||||

SRB | 1.000 | 0.644 | 0.602 | 0.626 |

FAY | 1.000 | 0.545 | 0.672 | |

NRF | 1.000 | 0.570 | ||

Other | 1.000 | |||

Genotypes from old bulls | ||||

SRB | 1.000 | 0.630 | 0.468 | 0.594 |

FAY | 1.000 | 0.421 | 0.651 | |

NRF | 1.000 | 0.446 |

1 SRB = Swedish Red; FAY = Finnish Ayrshire; NRF = Norwegian Red; other = combined breeds.

### Estimated Relationship Coefficients from Pedigree and Genomic Data

Descriptive statistics of diagonal elements from the pedigree (

**A**_{ii}) and different genomic estimators (**G**_{ii}) are presented in Table 3, across populations and within bulls registered in DNK, SWE, and FIN. The number of bulls was 800, 1,240, and 2,040 in the DNK, SWE, and FIN populations, respectively. Results are presented by country because these 3 populations are generally considered a single population; however, the genetic relationships between SWE and FIN populations is stronger than these 2 and the DNK population. The ranges of maximum elements from**A**within (1.081 to 1.135) and across populations (1.135) were smaller than those from**G**estimators within (1.233 to 1.450) and across populations (1.310 to 1.450). This difference in scales is because**A**is not an absolute measurement but an expected relatedness given the pedigree, whereas**G**measures the actual relatedness at marker loci. The variability of**G**across populations was greater in the original approaches (i.e.,**Gorg**and**Gorg2**) and smaller for adjusted matrices, especially for**Gadj**2. Similar tendencies were found within the DNK animals but not for SWE and FIN bulls. Coefficients from**Gorg**and**Gorg**2 were generally similar, but we present estimates from both methods 1 and 2 for comparison to their adjusted alternatives proposed in the current study. The average**A**_{ii}was greatest in the FIN bulls (1.016) and smallest in the DNK bulls (1.007). However, these averages were vice versa for**G**_{ii}using observed AF from**Gorg**. The mean of diagonals from**A**and**Gorg**were close to 1 for across breeds in SWE and FIN but was 1.136 for DNK from**Gorg**.Table 3Descriptive statistics of diagonal elements from the pedigree (

**A**), original, and adjusted genomic relationship matrices (**G**) estimated using allele frequencies in the genotyped populationItem | Mean | Minimum | Maximum | SD |
---|---|---|---|---|

Across populations | ||||

A | 1.012 | 1.000 | 1.135 | 0.014 |

Gorg | 1.019 | 0.871 | 1.379 | 0.074 |

Gorg2 | 1.019 | 0.871 | 1.379 | 0.074 |

Gadj | 0.949 | 0.747 | 1.310 | 0.045 |

Gadj2 | 0.965 | 0.773 | 1.450 | 0.055 |

Danish bulls | ||||

A | 1.007 | 1.000 | 1.109 | 0.013 |

Gorg | 1.136 | 0.973 | 1.328 | 0.072 |

Gorg 2 | 1.136 | 0.973 | 1.328 | 0.072 |

Gadj | 0.960 | 0.828 | 1.310 | 0.053 |

Gadj2 | 0.950 | 0.805 | 1.450 | 0.069 |

Swedish bulls | ||||

A | 1.008 | 1.000 | 1.081 | 0.011 |

Gorg | 1.006 | 0.871 | 1.184 | 0.037 |

Gorg2 | 1.006 | 0.871 | 1.184 | 0.037 |

Gadj | 0.957 | 0.775 | 1.233 | 0.044 |

Gadj2 | 0.977 | 0.773 | 1.364 | 0.051 |

Finnish bulls | ||||

A | 1.016 | 1.000 | 1.135 | 0.015 |

Gorg | 0.979 | 0.877 | 1.157 | 0.028 |

Gorg2 | 0.979 | 0.877 | 1.157 | 0.027 |

Gadj | 0.939 | 0.784 | 1.283 | 0.036 |

Gadj2 | 0.961 | 0.776 | 1.401 | 0.043 |

1 Summaries are across populations and within bulls born in Denmark, Sweden, and Finland.

2 Original method (original

**G**) 1 and 2 ofVanRaden, 2008

calculated using allele frequencies observed across breeds.3 Adjusted method (adjusted

**G**) 1 and 2 ofVanRaden, 2008

computed using breedwise observed allele frequencies.Table 4 shows summaries of diagonal elements from the

**G**matrices calculated using estimated base population AF. Variability in the diagonals of**G**matrices calculated using base population AF was lower than when using the currently genotyped population AF. The variability in terms of standard deviation of diagonal elements was 22 and 35% less in the adjusted matrices**Gadj**and**Gadj**2, respectively, for across breeds, and in the range between 17 and 37% for within breeds. The averages of diagonal elements were less than 1 in all populations for the proposed adjusted matrices, irrespective of AF used to calculate**G**. For**Gorg**, a larger mean may have resulted from having, in general, higher diagonal elements. In all cases, the tendencies observed for diagonal elements were also clear for pairwise relationships (results not shown).Table 4Descriptive statistics of diagonal elements of the original and adjusted genomic relationship matrices (

**G**) estimated using allele frequencies in the base population, across populations, and within bulls born in Denmark, Sweden, and Finland.Item | Mean | Minimum | Maximum | SD |
---|---|---|---|---|

Across populations | ||||

Gorg | 1.023 | 0.906 | 1.329 | 0.049 |

Gadj | 0.981 | 0.819 | 1.224 | 0.035 |

Gadj2 | 0.986 | 0.824 | 1.262 | 0.036 |

Danish bulls | ||||

Gorg | 1.088 | 0.951 | 1.275 | 0.057 |

Gadj | 0.994 | 0.868 | 1.204 | 0.037 |

Gadj2 | 0.976 | 0.833 | 1.226 | 0.044 |

Swedish bulls | ||||

Gorg | 1.008 | 0.906 | 1.152 | 0.029 |

Gadj | 0.980 | 0.869 | 1.137 | 0.032 |

Gadj2 | 0.986 | 0.887 | 1.166 | 0.032 |

Finnish bulls | ||||

Gorg | 1.006 | 0.935 | 1.147 | 0.026 |

Gadj | 0.975 | 0.871 | 1.184 | 0.030 |

Gadj2 | 0.989 | 0.868 | 1.199 | 0.032 |

1 Original method (original

**G**) 1 ofVanRaden, 2008

calculated using base population allele frequencies across breeds.2 Adjusted method (adjusted

**G**) 1 and 2 ofVanRaden, 2008

computed using breedwise allele frequencies from the base population.Figure 1 shows the distributions of

**G**_{ii}from different**G**matrices calculated using observed AF in the combined population. The distributions were examined to ensure consistency with the statistics presented above, as measures such as minimum and maximum tend to be sensitive to extreme values. The shape of the density plot generally followed a normal distribution as suggested with a theoretical example bySimeone et al., 2011

. However, tails for all distributions were slightly longer to the right of the distributions, which could be partly associated with the distribution of **A**_{ii}. The distributions from adjusted matrices (i.e.,**Gadj**and**Gadj**2) had only 1 peak, whereas that of**Gorg**appeared to be bimodal. This agrees with the suggestion that multiple peaks would not occur in the distribution of**G**_{ii}within a population, but may be expected in multiple populations if**G**is scaled with AF across breeds (Simeone et al., 2011

). The density plots for diagonal elements from **G**were also examined within populations (plots not shown). The distributions for all**G**_{ii}were generally distributed normally in SWE and FIN bulls; however,**Gorg**had a bimodal distribution in the DNK bulls.Figure 2 shows the distributions of

**G**_{ii}from different**G**matrices calculated using estimated base population AF in the combined population. Similarly here as for observed AF, all plots were approximately normally distributed, with slight tails to the right. The bimodal density plot of**Gorg**diagonals was slightly smoothed when AF were estimated from the base population. Plots within populations were also normally distributed in the SWE and FIN bulls whereas**Gorg**appeared bimodal for the DNK animals. The variability of**G**_{ii}, especially for adjusted matrices, was much less when AF were estimated from the base population (Figure 2) than from the current population (Figure 1). In this ad mixed population, the correlations between**A**_{ii}and**G**_{ii}by any of the methods were always close to zero when using population-level AF but increased to 0.16, 0.28, and 0.38, respectively, for**Gorg**,**Gadj**, and**Gadj**2 when AF were estimated from the base population. The correlations within populations were also higher and ranged from 0.26 for**Gadj**to 0.53 for**Gorg**when AF were estimated from the base populations.### Validation Reliabilities of DGV

Table 5 shows the regression coefficients and validation reliabilities for milk, protein, and fat obtained using alternative

**G**matrices. The DGV were from**G**matrices calculated using AF in the current and the base population only because AF estimated from genotypes of old individuals appeared to be less usable. The regression coefficients and validation reliabilities were similar for all matrices irrespective of whether breedwise or across-breed AF were used and whether AF were estimated in the current or base population. Thus, predictions of genomic values converged to similar solutions regardless of AF used to compute**G**. For all matrices, regression coefficients were less than the expected value of 1 for milk (0.71), protein (0.75), and fat (0.81). The validation reliabilities were low for milk (0.33) and protein (0.33) and slightly higher for fat (0.43).Table 5Regression coefficients (b

_{1}) and validation reliabilities of direct estimated genomic values R^{2}_{DGV}from genomic relationship matrices (**G**) calculated using currently genotyped and base population allele frequencies (AF)Trait | Observed AF | Base population AF | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Gorg | Gadj | Gadj2 | Gorg | Gadj | Gadj2 | |||||||

b_{1} | R^{2}_{DGV} | b_{1} | R^{2}_{DGV} | b_{1} | R^{2}_{DGV} | b_{1} | R^{2}_{DGV} | b_{1} | R^{2}_{DGV} | b_{1} | R^{2}_{DGV} | |

Milk | 0.71 | 0.32 | 0.71 | 0.32 | 0.72 | 0.33 | 0.71 | 0.32 | 0.71 | 0.32 | 0.72 | 0.33 |

Protein | 0.75 | 0.33 | 0.75 | 0.33 | 0.76 | 0.33 | 0.75 | 0.33 | 0.75 | 0.33 | 0.76 | 0.33 |

Fat | 0.81 | 0.43 | 0.80 | 0.42 | 0.82 | 0.43 | 0.81 | 0.43 | 0.80 | 0.42 | 0.82 | 0.43 |

1

**Gorg**= original method (original**G**) 1 ofVanRaden, 2008

calculated using AF across breeds; **Gadj**and**Gadj**2 = new adjusted methods (adjusted**G**) 1 and 2 ofVanRaden, 2008

, respectively, computed using breedwise AF.## Discussion

An important step when defining the model relates to the genetic covariance between relatives, which reflects shared genes that arise through common ancestry. High-density panels of SNP markers have recently been used to estimate genomic relationships in addition to the traditional pedigree-based relationships. Several methods for the estimation of

**G**within a breed have been proposed in the literature (VanRaden, 2008

; Misztal et al., 2009

; Christensen and Lund, 2010

; Yang et al., 2010

). A general agreement across studies about genomic selection is that there is a gain in prediction accuracies for young unproven bulls when genomic information is incorporated compared with traditional evaluations with pedigree information only, due to improved prediction of Mendelian sampling deviations between close relatives in **G**. Methods for calculating**G**are straightforward in a single-breed population; however, similar approaches tend to result in distorted coefficients in multibreed populations. Thus, relationships should account for the different expectations for mean and variance, depending on breed composition of the individuals in multibreeds (Harris and Johnson, 2010

). Only a few studies have evaluated the prospect of accounting for multibreeds in **G**with real data (Harris and Johnson, 2010

; VanRaden et al., 2011

). The resulting **G**has been tested in genomic predictions for crossbred animals in New Zealand (Harris and Johnson, 2010

). The objectives of this study were to examine the prospects of accounting for breed composition in the calculation of **G**and assess coefficients of different**G**matrices compared with the pedigree-based relationship matrix within and across breeds in a multibreed population.### Breedwise AF

Allele frequencies play a crucial role in the calculation of

**G**and, hence, erroneous estimation of AF may result in biased**G**coefficients. Our rationale behind the estimation of breedwise AF was that individuals from breeds that developed independently would likely have different AF. We found that the approach proposed byGengler et al., 2007

for estimating gene content of ungenotyped individuals given pedigree useful in the estimation of breedwise AF in the base population. The fewer number of markers with AF outside the expected range of 0 and 1 found when AF were estimated in the base population indicates that the pedigree was better able to differentiate base breeds compared with AF estimated in the currently genotyped animals. Estimates of AF outside the expected range result from using a simplified model without restrictions on the parameter space. Restricting the parameter space to fall between 0 and 1 using a binomial model resulted in coefficients close to these values. The AF out of bounds did not create a great problem in the current population, as no pure base breed animals were included and, therefore, an individuals’ expectation of AF was generally correct. However, if purebreds are included in the data, their expected AF may be imprecise; it may be useful in this case to use a binomial model. An alternative approach to Gengler's fixed breed effects would be to include unknown parent groups for each base breed in the matrix **A**^{−1}(Quaas, 1988

). This model would yield genetic group effects equivalent to our AF within breed in the base population.The correlations between estimated breedwise AF were also higher using the gene content approach. When considering only the minor AF, correlations dropped but were still higher in the base population. The observed high correlations between breedwise AF estimated from the base population compared with low correlations between these frequencies in the currently genotyped population have been reported in

VanRaden et al., 2011

. The authors pointed out that this indicates that recent drift within populations was removed during the estimation process and also revealed that breeds had been more similar over 10 generations in the past than they are at present, as expected from genetic drift of frequencies across pedigree generations. This progressive differentiation in AF between breeds, however, may not have been the case for the current population, as base breeds were combined and subjected to the same breeding goal for over 2 decades, which is expected to make them similar genetically. The higher correlations found between SRB, FAY, and the breed “other” were unusual. However, many animals with SRB and FAY fractions also have a breed proportion for Canadian Ayrshires that is now in the breed “other.” Thus, the estimation of AF might have detected relations to such animals. The expected pedigree-based breed proportions were used in this study to define the 4 breeds. Alternatively, accurate prediction of breed composition based on SNP genotype data has been demonstrated for multibreed populations (Kuehn et al., 2011

; Frkonja et al., 2012

). However, these algorithms initially estimate breed-specific frequencies using purebred individuals, which was a limitation in this population due to unavailability of such individuals.### Properties of A and G Within and Across Populations

The diagonal elements of all

**G**matrices that used observed AF were incomparable to the diagonal elements of**A**. Moreover, the variability in**A**was much smaller than observed in**G**matrices. The comparison of**A**and**G**computed using observed AF is generally vague, as the coefficients in**A**are expressed relative to the base population in the distant past and the additive genetic variance is defined for that generation. In contrast, when the base population for**G**is achieved by scaling IBD coefficients with observed AF, the additive genetic variance among animals considers average variation in the current genotyped animals (Powell et al., 2010

; Yang et al., 2010

). The estimate of additive genetic variance may be smaller in the current population than it was in the distant past because we expect the current population to be more inbred (Powell et al., 2010

). The highest variability in diagonal elements from **Gorg**and**Gorg**2 indicates that the use of across-breed AF increased coefficients in**G**for this admixed population. Variability was reduced by using breedwise AF in**Gadj**. However, in**Gadj**and**Gorg**, the overall scaling was based on the same marker variance across breeds, which was larger than the expected variance within breeds. Consequently, the variance of the diagonal elements appeared much smaller in**Gadj**than in**Gorg**. The simplified scaling factor was corrected in**Gadj**2, where elements were scaled by the mean marker variance of an individual's base breeds. The variance from the resulting**Gadj**2 was generally still smaller than observed from the original approaches across breeds in**Gorg**and**Gorg**2.Diagonal elements of

**G**built using any of the approaches were more accurately estimated when the**G**used AF estimated from the base population. The variability of diagonal elements, particularly for**Gadj**2, was reduced to a greater extent, suggesting that breedwise AF in this case were less biased and highlights a great need to account for the pedigree structure in**G**for multibreed populations. Thus, with pedigree information, the base (founder) population is generally consistent. The calculated diagonals of**G**in this case were moderately correlated with**A**across breeds and in agreement with that reported byAguilar et al., 2010

in the US Holstein population. However, the moderate correlation of 0.38 between diagonals of **A**and**Gadj**2 calculated with base population AF in our study was much lower than correlation of 0.68 reported with simulated data (VanRaden, 2008

) and correlations ranged from 0.50 to 0.56 for the US Holstein, Jersey, and Brown Swiss populations (VanRaden et al., 2011

). The highest correlations were generally found with true AF in simulated data (VanRaden, 2008

) and when all AF were assumed to be 0.5 for all markers (Hayes and Goddard, 2008

; VanRaden et al., 2011

). Assuming AF equal to 0.5 for all markers equalizes the relative contribution of markers to **G**instead of having rare alleles contributing more than common alleles (Forni et al., 2011

). In addition to AF, one reason why these correlations were different across studies might be due to different population structures. The advantage of base breedwise AF in **Gadj2**could mean that the distant past founder population corresponds well with the current expected homozygosity. Moreover, our results suggest that calcu lation of**G**with respect to an individual's base breed corrected better the heterogeneity than simple animal deviations from across-population mean AF.When

**G**coefficients were assessed within populations, we found that using across-breed AF tended to increase**G**_{ii}for animals from populations that had fewer animals or that were distantly related to dominating breeds in the combined population.Simeone et al., 2011

indicated that with an equal number of animals contributing to the AF across populations, there would be fewer differences in scaling **G**between populations. The mean AF across breeds was strongly influenced by the Swedish and Finnish populations, as these breeds are more related genetically (Brøndum et al., 2011

; Makgahlela et al., 2013

) and had more animals in the combined data. As a result, and in contrast to **A**_{ii}, the general level of homozygosity in the Danish population appeared to be higher than in other populations. This is unexpected because the Danish population has been found to be more admixed than the other 2, due to years of crossbreeding (Brøndum et al., 2011

; Makgahlela et al., 2013

). This means that as diagonal elements from **Gorg**have been increased in the Danish bulls, they were decreased for animals in the other populations. In addition, individuals with the highest diagonal elements in**Gorg**were found to be registered elsewhere but not in the Nordic RDC. Because these animals come from populations with AF deviating even further from the population mean AF, their genotypes make them appear more homozygous than the average homozygosity in this population. Apart from a great reduction in the variability of diagonal elements between AF estimated from the current and base population, the behaviors of different estimators of**G**were similar within and across populations.Harris and Johnson, 2010

indicated that multibreed reference populations will lead to biased coefficients in **G**if breed is not taken into account. In this study, the use of breedwise AF to calculate

**Gadj**and

**Gadj**2 reduced country differences in coefficients similarly within and across populations. Regarding distributions, the observation that diagonal elements of

**Gadj**and

**Gadj**2 generally followed a normal distribution, but diagonal elements of

**Gorg**appeared bimodal, indicates a distortion in the elements of

**Gorg**and suggests clusters that may be due to the population structure. This observation was in agreement with previous findings (

Simeone et al., 2011

). In their study, the authors used simulated data on multiple populations and 60,000 SNP markers with varying AF at each locus to compute **G**using observed AF across populations. They observed a bimodal distribution of the diagonal elements of

**G**. Multiple peaks were correctly avoided by using breed-wise AF in our study.

Simeone et al., 2011

pointed out a general lack of theoretical knowledge about the distribution of the diagonal elements of **G**both within and across breeds.

The estimation of relationships and their use in predictions is widely carried out within a breed or treating multibreed data as a homogeneous breed (

Hayes et al., 2009

; Pryce et al., 2011

; Su et al., 2012

), except in New Zealand where predictions account for breed effects for crossbred animals (Harris and Johnson, 2010

). The observed correlations between diagonal elements from **A**and**G**within breed were higher when estimated base population AF were used instead of current population AF (VanRaden 2008;VanRaden et al., 2011

) or with AF equal to 0.5 (VanRaden et al., 2011

). Our use of estimated breedwise AF also greatly improved diagonal elements, indicating that AF have a large effect on relationship coefficients and would also have an effect on the estimation of population additive genetic variation. Using pig data, Forni et al., 2011

explored different methods of scaling **G**with observed AF. They estimated genetic variances using different**G**for genotyped animals only, and estimates ranged from 2.25 when**G**_{ii}were normalized to average to 1 and were inflated to 4.46 when**G**was scaled with the expectations of AF following a β distribution (Gianola et al., 2009

). The estimated additive genetic variances were more sensitive when a selected subset of genotyped animals was used for variance components estimation. Similar additive genetic variances were found when complete data of genotyped and ungenotyped animals were used (Forni et al., 2011

).In the absence of selective genotyping, the expected regression coefficient (b

_{1}) from the validation model is close to 1 (Mäntysaari et al., 2010

). The observed b_{1}values (range = 0.71–0.82) were less than the expected value of 1, which indicates bias in the estimated genomic values. The validation reliabilities of DGV from all**G**matrices were similar. In all cases, solutions indicated that DGV were unaffected by the AF used to calculate**G**. This observation agrees with previous reports where the gain in accuracy of DGV was small (0.01) when base population AF were used to compute**G**instead of observed AF in simulated data (VanRaden, 2008

), and were indifferent to AF with real data (Forni et al., 2011

). According to Strandén and Christensen, 2011

, DGV are neither sensitive to marker allele coding nor AF and the same DGV solutions would be calculated in GBLUP, provided that the model has a common fixed general mean effect. Thus, the absolute levels of values (i.e., animal effects) are only affected when the mean is uncounted for in the model. Similarly here, inclusion of fixed-breed regressions for **Gadj**2 brought breed means back into the DGV. It was clearly shown that different AF affected the calculation of**G**. Although**G**was sensitive to AF and was accurately computed using breedwise AF in the base population, the DGV validation reliabilities were indifferent. In multibreed populations, the use of**Gadj**2 may be more beneficial in single-step evaluations where most animals are evaluated by matrix**A**and through their relationships to genotyped animals. However, it should be emphasized that the use of across-breed MAF at least 5% to select SNP tend to remove markers within breed that may be informative for improved prediction accuracy. In the presence of purebred animals, it may be useful to select SNP based on MAF of 1 breed, even if monomorphic in all other breeds (Olson et al., 2012

). An alternative to adjusting breeds in **G**would be to estimate SNP effects in different breeds simultaneously in a multitrait regression model (Olson et al., 2012

; Makgahlela et al., 2013

). The prediction accuracies were found to be even higher when fitting correlated SNP effects between breeds (Olson et al., 2012

). The unavailability of purebred animals in the data also limited comparisons of our expected breedwise AF from the actual estimates within breed.## Conclusions

Current methods used for computing genomic relationships in multibreed populations need to be extended to allow for differential AF between breeds. This study showed that errors in the estimation of AF may have great consequences in the calculation of relationships. Across-breed observed AF increased diagonal elements of

**G**for animals from breeds that are distantly related to the combined population and have fewer animals in the combined population. Breedwise AF reduced country differences in**G**similarly within and across populations, resulting in a normal distribution of diagonal elements. Breedwise AF were more accurately estimated when accounting for the pedigree structure or estimated from the base population, thereby reducing the variability of diagonal elements of**Gadj**2. The DGV and their validation reliabilities were unaffected by AF used to compute**G**when estimated using a GBLUP model. The method for**Gadj**2 may provide more consistency among relationship coefficients between genotyped and ungenotyped individuals in an across-breed single-step evaluation.## Acknowledgements

The authors acknowledges the Nordic Cattle Genetic Evaluation Ltd. (Aarhus, Denmark) and Nordic Genomic Selection project for providing the genotype and phenotype data. M. L. Makgahlela acknowledges financial support from the Finnish Ministry of Agriculture and Forestry (Helsinki, Finland) and the University of Helsinki in Finland.

## References

- Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
*J. Dairy Sci.*2010; 93: 743-752 - Reliabilities of genomic prediction using combined reference data of the Nordic red dairy cattle populations.
*J. Dairy Sci.*2011; 94: 4700-4707 - Genomic prediction when some animals are not genotyped.
*Genet. Sel. Evol.*2010; 42: 2 - Marker-based estimates of between and within population kinships for the conservation of genetic diversity.
*J. Anim. Breed. Genet.*2001; 118: 141-159 - Management of genetic diversity in small farm animal populations.
*Animal.*2011; 5: 1684-1698 - Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information.
*Genet. Sel. Evol.*2011; 43: 1 - Prediction of breed composition in an admixed cattle population.
*Anim. Genet.*2012; 43: 696-703 - A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian blue cattle.
*Animal.*2007; 1: 21-28 - Additive genetic variability and the Bayesian alphabet.
*Genetics.*2009; 183: 347-363 - ASREML User Guide Release 3.0.VSN International Ltd., Hemel Hempstead, UK2009
- The impact of genetic relationship information on genome-assisted breeding values.
*Genetics.*2007; 177: 2389-2397 - Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation.
*J. Dairy Sci.*2010; 93: 1243-1252 - Technical note: Prediction of breeding values using marker-derived relationship matrices.
*J. Anim. Sci.*2008; 86: 2089-2092 - Increased accuracy of artificial selection by using the realized relationship matrix.
*Genet. Res. (Camb.).*2009; 91: 47-60 - Illumina GenCall Data Analysis Software-Gen-Call software algorithms for clustering, calling, and scoring genotypes.Illumina. Pub. No. 370-2004-009. Illumina Inc., San Diego, CA2005
- Predicting breed composition using breed frequencies of 50,000 markers from the US meat animal research center 2,000 bull project.
*J. Anim. Sci.*2011; 89: 1742-1750 - A relationship matrix including full pedigree and genomic information.
*J. Dairy Sci.*2009; 92: 4656-4663 - Random heterosis and recombination loss effects in a multibreed evaluation for Nordic Red dairy cattle.in: Proc. 8th World Congr. Genet. Appl. Livest. Prod, Belo Horizonte, Brazil2006
- Fast and flexible program for genetic evaluation in dairy cattle.
*Interbull Bull.*1999; 20: 20 - Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
*J. Anim. Breed. Genet.*2013; 130 (): 10-19 - Les Mathématiques de L’hérédité.Masson et Cie, Paris, France1948
- Estimation of GEBVs using deregressed individual cow breeding values.
*Interbull Bull.*2011; 44: 26-29 - Interbull validation test for genomic evaluations.
*Interbull Bull.*2010; 41: 17-22 - Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information.
*J. Dairy Sci.*2009; 92: 4648-4655 - Calculating cow and daughter yield deviations and partitioning of genetic evaluations under a random regression model.
*Livest. Prod. Sci.*2004; 86: 253-260 - Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
*J. Dairy Sci.*2012; 95: 5378-5383 - Reconciling the analysis of IBD and IBS in complex trait studies.
*Nat. Rev. Genet.*2010; 11: 800-805 - Short communication: Genomic selection using a multi-breed, across-country reference population.
*J. Dairy Sci.*2011; 94: 2625-2630 - Additive genetic model with groups and relationships.
*J. Dairy Sci.*1988; 71: 1338-1345 - A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase.
*Am. J. Hum. Genet.*2006; 78: 629-644 - Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population.
*J. Anim. Breed. Genet.*2011; 128: 386-393 - Allele coding in genomic evaluation.
*Genet. Sel. Evol.*2011; 43: 25 - RelaX2: Pedigree analysis programme.in: Proc. 8th World Congress Genetics Applied Livest. Prod, Belo Horizonte, Brazil2006
- Genomic prediction for Nordic Red cattle using one-step and selection index blending.
*J. Dairy Sci.*2012; 95: 909-917 - Assessing the genetic diversity in small farm animal populations.
*Animal.*2011; 5: 1669-1683 - Efficient methods to compute genomic predictions.
*J. Dairy Sci.*2008; 91: 4414-4423 - Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss.
*J. Dairy Sci.*2011; 94: 5673-5682 - Common SNPs explain a large proportion of the heritability for human height.
*Nat. Genet.*2010; 42: 565-569 - A whole-genome assembly of the domestic cow,
*Bos taurus*.*Genome Biol.*2009; 10: R42

## Article info

### Publication history

Published online: June 17, 2013

Accepted:
April 24,
2013

Received:
December 24,
2012

### Identification

### Copyright

© 2013 American Dairy Science Association. Published by Elsevier Inc.

### User license

Elsevier user license | How you can reuse

Elsevier's open access license policy

Elsevier user license

## Permitted

### For non-commercial purposes:

- Read, print & download
- Text & data mine
- Translate the article

## Not Permitted

- Reuse portions or extracts from the article in other works
- Redistribute or republish the final article
- Sell or re-use for commercial purposes

Elsevier's open access license policy