Advertisement

Local breed proportions and local breed heterozygosity in genomic predictions for crossbred dairy cows

Open AccessPublished:October 25, 2022DOI:https://doi.org/10.3168/jds.2022-22225

      ABSTRACT

      For genomic prediction of crossbred animals, models that account for the breed origin of alleles (BOA) in marker genotypes can allow the effects of marker alleles to differ depending on their ancestral breed. Previous studies have shown that genomic estimated breeding values for crossbred cows can be calculated using the marker effects that are estimated in the contributing pure breeds and combined based on estimated BOA in the genotypes of the crossbred cows. In the presented study, we further exploit the BOA information for improving the prediction of genomic breeding values of crossbred dairy cows. We investigated 2 types of BOA-derived breed proportions: global breed proportions, defined as the proportion of marker alleles assigned to each breed across the whole genome; and local breed proportions (LBP), defined as the proportions of alleles on chromosome segments which were assigned to each breed. Further, we investigated 2 BOA-derived measures of heterozygosity for the prediction of total genetic value. First, global breed heterozygosity, defined as the proportion of marker loci that have alleles originating in 2 different breeds over the whole genome. Second, local breed heterozygosity (LBH), defined as proportions of marker loci on chromosome segments that had alleles originating in 2 different breeds. We estimated variance related to LBP and LBH on the remaining variation after accounting for prediction with solutions from the genomic evaluations of the pure breeds and validated alternative models for production traits in 5,214 Danish crossbred dairy cows. The estimated LBP variances were 0.9, 1.2, and 1.0% of phenotypic variance for milk, fat, and protein yield, respectively. We observed no clear LBH effect. Cross-validation showed that models with LBP effects had a numerically small but statistically significantly higher predictive ability than models only including global breed proportions. We observed similar improvement in accuracy by the model having an across crossbred residual additive genetic effect, accounting for the additive genetic variation that was not accounted for by the solutions from purebred. For genomic predictions of crossbred animals, estimated BOA can give useful information on breed proportions, both globally in the genome and locally in genome regions, and on breed heterozygosity.

      Key words

      INTRODUCTION

      Crossbreeding (i.e., the mating of animals from different breeds) is a common practice in many livestock production programs. Crossbred animals show superior performance for many important traits compared with purebred animals, so-called heterosis (
      • Falconer D.S.
      • Mackay T.F.C.
      Introduction to Quantitative Genetics.
      ). Traditionally, crossbreeding has been more common in meat production systems, such as for beef cattle, pigs, and poultry, than for dairy cattle. However, the benefit of crossbreeding for dairy cattle has gained interest in recent decades (
      • Sørensen M.K.
      • Norberg E.
      • Pedersen J.
      • Christensen L.G.
      Invited review: Crossbreeding in dairy cattle: A Danish perspective.
      ;

      Kargo, M., J. F. Ettema, M. Fjordside, and L. Hjortø. 2014. Combi-Cross–The use of new technologies for improving dairy crossbreeding programs. Proceedings of the 10th World Congr. Genet. Appl. Livest. Prod. Vancouver, Canada.

      ;
      • Clasen J.B.
      • Fikse W.F.
      • Kargo M.
      • Rydhmer L.
      • Strandberg E.
      • Østergaard S.
      Economic consequences of dairy crossbreeding in conventional and organic herds in Sweden.
      ).
      Crossbreeding gives challenges to genomic prediction. Among those is the difference in the linkage disequilibrium (LD) between markers and QTL between breeds, and thus higher LD within breeds than between breeds (
      • Ibánez-Escriche N.
      • Fernando R.L.
      • Toosi A.
      • Dekkers J.C.
      Genomic selection of purebreds for crossbred performance.
      ;
      • Lund M.S.
      • Su G.
      • Janss L.
      • Guldbrandtsen B.
      • Brøndum R.F.
      Genomic evaluation of cattle in a multi-breed context.
      ). To accommodate for this difference, models that account for the breed origin of alleles (BOA) in genotypes of crossbred animals were proposed (
      • Ibánez-Escriche N.
      • Fernando R.L.
      • Toosi A.
      • Dekkers J.C.
      Genomic selection of purebreds for crossbred performance.
      ;
      • Christensen O.F.
      • Madsen P.
      • Nielsen B.
      • Su G.
      Genomic evaluation of both purebred and crossbred performances.
      ). However, these models have not always resulted in more accurate predictions than the models that assume the same marker allele effects across breeds (
      • Ibánez-Escriche N.
      • Fernando R.L.
      • Toosi A.
      • Dekkers J.C.
      Genomic selection of purebreds for crossbred performance.
      ;
      • Sevillano C.A.
      • Vandenplas J.
      • Bastiaansen J.W.M.
      • Bergsma R.
      • Calus M.P.L.
      Genomic evaluation for a three-way crossbreeding system considering breed-of-origin of alleles.
      ;
      • Guillenea A.
      • Su G.
      • Lund M.S.
      • Karaman E.
      Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles.
      ).
      For dairy cattle,
      • VanRaden P.M.
      • Tooker M.E.
      • Chud T.C.S.
      • Norman H.D.
      • Megonigal Jr., J.H.
      • Haagen I.W.
      • Wiggans G.R.
      Genomic predictions for crossbred dairy cattle.
      calculated genomic EBV (GEBV) for crossbred animals using solutions from purebred evaluations weighted by breed proportions.
      • Eiríksson J.H.
      • Karaman E.
      • Su G.
      • Christensen O.F.
      Breed of origin of alleles and genomic predictions for crossbred dairy cows.
      ,
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      ) presented a method based on BOA where the GEBV of crossbred were also based on solutions from purebred evaluations. Such methods require breed level estimates, weighted by the breed proportions, to account for the different genetic levels of the breeds (
      • VanRaden P.M.
      • Tooker M.E.
      • Chud T.C.S.
      • Norman H.D.
      • Megonigal Jr., J.H.
      • Haagen I.W.
      • Wiggans G.R.
      Genomic predictions for crossbred dairy cattle.
      ;
      • Eiríksson J.H.
      • Karaman E.
      • Su G.
      • Christensen O.F.
      Breed of origin of alleles and genomic predictions for crossbred dairy cows.
      ).
      • VanRaden P.M.
      • Tooker M.E.
      • Chud T.C.S.
      • Norman H.D.
      • Megonigal Jr., J.H.
      • Haagen I.W.
      • Wiggans G.R.
      Genomic predictions for crossbred dairy cattle.
      used multiple breed genetic evaluation models, which estimated the breed levels. However, it is practical to use solutions from routine genetic evaluations. The routine evaluations are often performed for each of the pure breeds separately, and these do not provide breed level estimates.
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      suggested estimating the breed levels from purebred phenotypes. Their approach, however, requires assumptions that are unlikely to hold in all instances. In particular, they assumed that the level differences between breeds were only genetic and not related to the different breeds being raised in different environments. An alternative is to estimate the breed levels from phenotypes of crossbred animals.
      Estimated BOA (
      • Vandenplas J.
      • Calus M.P.L.
      • Sevillano C.A.
      • Windig J.J.
      • Bastiaansen J.W.M.
      Assigning breed origin to alleles in crossbred animals.
      ) of crossbred animals can be used to calculate 2 types of estimates for each breed contribution: global breed level and local breed level. The global breed proportion (GBP) is the proportion of alleles throughout the genome that originate from the breed in question, and the local breed proportion (LBP) is the proportion of alleles locally for segments of the genome that originate from the breed. Thus, a GBP effect attempts to account for the different overall levels of the breeds contributing to crossbred animals, and an LBP effect attempts to indicate segment-specific breed levels contributing to crossbred animals.
      In addition to the differences in LD and breed levels between breeds, the mixture of genetic backgrounds complicates predictions for crossbred animals. In particular, the increased heterozygosity in the genomes of crossbred animals, which contributes to heterosis, is of interest to study. Genomic predictions for heifers and cows can be used for 2 types of selection. First, which type of semen should be used in their insemination, and second, whether the animal should enter the herd or be sold (
      • Calus M.P.L.
      • Bijma P.
      • Veerkamp R.F.
      Evaluation of genomic selection for replacement strategies using selection index theory.
      ;
      • Hjortø L.
      • Ettema J.F.
      • Kargo M.
      • Sørensen A.C.
      Genomic testing interacts with reproductive surplus in reducing genetic lag and increasing economic net return.
      ;
      • Clasen J.B.
      • Kargo M.
      • Østergaard S.
      • Fikse W.F.
      • Rydhmer L.
      • Strandberg E.
      Genetic consequences of terminal crossbreeding, genomic test, sexed semen, and beef semen in dairy herds.
      ). In the latter case, it is not only the breeding value that matters, but the level of heterosis also contributes to the genetic potential for production. Therefore, estimated total genetic value (ETGV), including heterosis, for crossbred heifers can be useful. Expected heterosis based on pedigree information can be fitted in genetic evaluations (e.g.,
      • Lidauer M.H.
      • Mäntysaari E.A.
      • Strandén I.
      • Pösö J.
      • Pedersen J.
      • Nielsen U.S.
      • Johansson K.
      • Eriksson J.-A. A.
      • Madsen P.
      • Aamand G.P.
      Random heterosis and recombination loss effects in a multibreed evaluation for Nordic Red Dairy Cattle. Abstract c24–02 in Proc. 8th World Congr. Genet. Appl. Livest. Prod., Belo Horizonte, Brazil.
      ). This kind of heterozygosity estimate accounts for the difference in the average performance of crossbred animals due to the expected increase in heterozygosity when alleles come from different breeds. However, for animals with at least 1 crossbred parent (e.g., 3-way crosses and backcrosses), the true number of loci that carry alleles from different breeds may differ from the expectation. Thus, more accurate estimates of heterozygosity, based on genomic information could be useful for predicting heterosis for crossbred dairy heifers.
      Chromosome segments that have different BOA of the 2 haplotypes are more likely to have heterozygous QTL, than when both haplotypes have the same BOA. Estimated BOA can therefore give important information about increased heterozygosity of QTL in crossbred animals. First, estimated BOA gives the proportion of marker loci over the whole genome that have alleles originating in different breeds [i.e., global breed heterozygosity (GBH)]. Second, BOA could give information on whether the 2 alleles in marker loci or chromosome segments originated from the same breed or from 2 different breeds. We call this local breed heterozygosity (LBH). The LBH allows estimation of the effect of genome region-specific breed heterozygosity. An alternative could be to include genotype heterozygosity with dominance model (
      • Xiang T.
      • Christensen O.F.
      • Vitezica Z.G.
      • Legarra A.
      Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs.
      ;
      • Doekes H.P.
      • Bijma P.
      • Veerkamp R.F.
      • de Jong G.
      • Wientjes Y.C.J.
      • Windig J.J.
      Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle.
      ).
      The aims of this study were, first, to present how estimated BOA can be used to model the effects of GBP, LBP, GBH, and LBH in the genome of crossbred animals. Second, to compare model fit of genomic models with or without LBP or LBH effects. Third, to compare cross-validation predictive ability for GEBV and ETGV calculation using the same models. The investigation was made for milk production traits in Danish crossbred dairy cows. All models used solutions from the genomic evaluations of the contributing pure breeds as a starting point, but our aim was to investigate whether additional effects (e.g., effects of LBP and LBH) estimated from phenotypes and genotypes of the crossbred cows could improve the model fit and predictive ability.

      MATERIALS AND METHODS

      No animals were used in this study, and ethical approval for the use of animals was thus deemed unnecessary.

      Data

      Data had 5,214 Danish crossbred dairy cows born in the years 2012 to 2018 from 74 herds. The study focused on crosses of the Holstein (H), Jersey (J), and Nordic Red dairy cattle (R) breeds. Therefore, we excluded crossbred cows with contribution of more than 1/16 from other breeds according to registered pedigree. The cows were genotyped using EuroG MD chip (
      • EuroGenomics
      EuroGenomics genotyping microarray.
      ). In addition, genotypes of 7,500 purebred animals, 2,500 from each of the 3 breeds, H, J, and R, genotyped on various 50k SNP chips, were included for the imputing, phasing, and assigning BOA steps. Figure 1 has a principal component plot, showing the first 2 principal components of the genomic relationship matrix between the crossbred cows, as well as 3,000 purebred animals, 1,000 from each of the 3 breeds. We phased and imputed the genotypes to a set of 50,684 markers using FImpute (
      • Sargolzaei M.
      • Chesnais J.P.
      • Schenkel F.S.
      A new approach for efficient genotype imputation using information from relatives.
      ). We estimated BOA in the genotypes of the crossbred cows using the AllOr method (
      • Eiríksson J.H.
      • Karaman E.
      • Su G.
      • Christensen O.F.
      Breed of origin of alleles and genomic predictions for crossbred dairy cows.
      ). Most of the cows included in this study were also included in
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      , which contains further details on genotypes, imputation, and the BOA assignment. The phenotypes were 305-d lactation milk (MY), fat (FY), and protein (PY) yields for the first 3 parities of the cows, a total of 11,001 records.
      Figure thumbnail gr1
      Figure 1First (PC1) and second (PC2) principle components of genomic relationships for 1,000 Holstein (black), 1,000 Jersey (yellow), 1,000 Red dairy cattle (red) and 5,214 crossbred (dark gray) animals. Percentages are proportion of variance explained by the respective principle components.
      We calculated GEBV for the crossbred cows based on BOA as described in (
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      ). In short, the GEBV were calculated using solutions from the separate genomic evaluations of the pure breeds. Using estimated BOA, we split the gene content of the crossbred cows into components, each component counting the reference alleles with assigned origin in one of the pure breeds. We multiplied each component with estimated marker effects from the respective breed. We call these GEBV primary genomic estimated breeding values (pGEBV). The separate purebred evaluations came from the milk production breeding value estimations computed by Nordic Cattle Genetic Evaluation in December 2021 (
      • NAV
      NAV routine genetic evaluation of dairy cattle—Data and genetic models.
      ) for each of the breeds (H, J, and R). Because we wanted to investigate if useful breed level correction factors could be estimated from the phenotypes of crossbred, we did not correct pGEBV for differences in breed levels.
      We constructed 2 sets of corrected phenotypes, y1 and y2, with and without correction for breed heterozygosity, respectively. In both cases, we corrected the original phenotypes for fixed effects and pGEBV. The phenotype corrected for heterozygosity was
      y1=yX1β^1Za^pGEBV,


      where the y vector contains the original phenotypes, the X1 matrix connects the fixed effects to the phenotypes, the β^1 vector contains the fixed effect solutions of parity, herd-year, calving age, and breed heterozygosity from an animal model, Z is an incidence matrix connecting phenotypes to cows, and a^pGEBV is the vector of pGEBV for the cows. The β^1 solutions were estimated with a pedigree-based animal model using a larger dataset, which had 207,116 lactation yields from 96,798 crossbred and purebred cows. The cows were in the same herds as the genotyped crossbred cows with phenotypes in y. In that way, information from the purebred cows, and the crossbred cows without genotypes, strengthened the estimation of the fixed effects. Similarly, for the phenotypes not corrected for heterozygosity, we had
      y2=yX2β^2Za^pGEBV,


      where X2β^2 includes the same effects as X1β^1, except the breed heterozygosity effect was left out. The β^1 and β^2 fixed effects were estimates from the same model.

      Global and Local Effects

      Here, we explain the construction of global (across all loci) and local (for individual marker loci or chromosome segments) indicators of breed proportions and breed heterozygosity. The BOA output from AllOr contains, separate for the 2 gametes of the crossbred cows, an estimate of which breed the allele originated. From that, we built vectors of length equal to the number of markers (m = 50,684) for each animal i, gamete j (j = 1, 2) and breed b (b = H, J, R), denoted si,jb (i.e., 6 vectors for each animal). A marker in vector si,jb had the value 1 for allele j assigned to breed b but 0 for allele j assigned to another breed. If the allele could not be assigned to a definite BOA, the corresponding value in si,jb was a number between 0 and 1, according to the rules described for the output of AllOr (
      • Eiríksson J.H.
      • Karaman E.
      • Su G.
      • Christensen O.F.
      Breed of origin of alleles and genomic predictions for crossbred dairy cows.
      ). For each marker and gamete, the sum across breeds was 1; that is, b{H,J,R}si,jb=1.
      Based on vectors si,jb, we built 2 types of estimates of the breed composition for each animal: GBP and LBP. We defined the LBP vector for animal i as pib=(si,1b+si,2b)/2. The GBP of animal i, denoted as pib¯, was the average of the values in pib. Figure 2 presents the distribution of GBP (pib¯) in the crossbred animals. Additionally, to show the deviation in GBP from the expectation based on pedigree, we looked separately at the 695 3-way crossbred with H maternal grandsire or maternal granddam in the data. Figure 3 presents the distribution of GBP of H in this subset.
      Figure thumbnail gr2
      Figure 2The distribution of breed proportions of Holstein, Jersey, and Red dairy cattle (RDC) in the 5,214 crossbred cows based on assigned breed of origin of alleles.
      Figure thumbnail gr3
      Figure 3Distribution of global breed proportions of Holstein in 695 3-way crossbred cows with Holstein maternal grandsire or maternal granddam.
      In addition to the single marker-based LBP vector, we defined LBP considering segments of 100 adjacent markers. For each animal i and breed b, the segment-based LBP vectors were constructed as an average of nonoverlapping chromosome segments of 100 adjacent values from pib, resulting in vector p(100),ib of length 506 marker segments. The last segment of each chromosome was combined with the second-to-last segment if its length was less than half of the defined segment length.
      Similarly to the LBP vector, we constructed LBH for animal i between breeds b and b' as tib,b=si,1bsi,2b+si,1bsi,2b, where ○ is the element-wise multiplication and bb. We estimated GBH as the average of the values in tib,b, denoted tib,b¯, which represents the proportion of loci that are assigned to 2 different breeds over the whole genome. Further, we constructed LBH of chromosome segments of length 100, denoted t(100),ib,b, which were based on the average of values from tib,b as described for LBP. Figure 4 presents the distribution of GBH values for the crossbred animals.
      Figure thumbnail gr4
      Figure 4The distribution of global breed heterozygosity for Holstein/Red dairy cattle pairs (H×R), Holstein/Jersey pairs (H×J), and Jersey/Red dairy cattle pairs (J×R), for the 5,214 crossbred cows based on assigned breed of origin of alleles.

      Models for GEBV

      Four types of models for predicting the part of the breeding values of crossbred cows that was not captured by pGEBV are presented. First, a simple model having GBP, second, models having GBP and LBP effects, third, a model having GBP and a residual additive genetic (RA) effect by integrating an across crossbred genomic relationship matrix, and fourth, a model having GBP, LBP, and RA effects.
      The simplest model for predicting GEBV had only regression on GBP and was named base model (BaseM). The model was
      y1=1μ+ZfHηH+ZfJηJ+Zepe+e,
      [1]


      where μ is the intercept, 1 is a vector of ones, vector fb contains breed proportions pib¯ for breed b{H,J,R} for all animals, ηb is the fixed effect of GBP of breed b, epe is vector of random permanent environment effects and e is the vector of random residuals. It was assumed that epeN(0,Iσpe2) and eN(0,Iσe2), where σpe2 is the permanent environment variance and σe2 is the residual variance. Note that fb were only included for 2 breeds (H and J), because when an intercept is included in the model and fH + fJ + fR = 1, there is no need to include fixed regression on all 3 breed proportions.
      The local breed proportion model (LBPM) extended BaseM by including regression on LBP as random effects in a manner similar to having marker effects in a SNP-BLUP model. The LBP effects were included either for individual markers [LBPM(1)] or segments of 100 [LBPM(100)] markers. The model had the same general structure whether LBP was modeled for individual markers or for marker segments:
      y1=1μ+ZfHηH+ZfJηJ+ZPHγH+ZPJγJ+ZPRγR+Zepe+e,
      [2]


      where P*b = Pbfb1', and γb is a vector with the regression coefficients on LBP for breed b. Here, matrix Pb contains the pib(p(100),ib) for each animal as a row. Consequently, P*b contains the deviations from the breed proportion for each marker (segment). It was assumed that γbN(0,Iσγ,b2), where σγ,b2 is LBP marker (segment) variance for breed b.
      When there are more marker segments than animals, a computationally more efficient equivalent LBPM can be constructed using LBP similarity matrices. The size of the resulting system of equations becomes proportional to the number of animals, rather than the number of markers, which is computationally advantageous, and the solutions are invariant to the transformation. Then, the LBP effect of breed b for the crossbred cows is vb = P*b γb, and the LBP similarity matrix Qb for breed b was constructed as Qb=PbPbλv, where λv is a normalizing constant defined as λv=tr(PbPb)n, where n is the number of animals. The related LBP variance for breed b is σv,b2=σγ,b2λv.
      An alternative to using LBP to model the remaining genetic variation in y1, unexplained by pGEBV, is to use the genotypes directly. Thus, we investigated whether there was an additive genetic effect left in the crossbred data, here named RA genetic effect. The theoretical background for coding and scaling based on allele frequencies of base population for genetic relationship matrices, such as
      • VanRaden P.M.
      Efficient methods to compute genomic predictions.
      , are not applicable here, because, first, most of the additive genetic variance has already been accounted for, and second, the cows are crossbred rather than from a uniform population. Therefore, we constructed the RA relationship matrix as the normalized genomic additive relationship matrix (
      • Vitezica Z.G.
      • Varona L.
      • Elsen J.-M.
      • Misztal I.
      • Herring W.
      • Legarra A.
      Genomic BLUP including additive and dominant variation in purebreds and F1 crossbreds, with an application in pigs.
      ): G=MMtr(MM)/n, where the M matrix contains the marker genotypes of the crossbred cows (irrespective of BOA) coded as −1, 0, and 1 for genotypes aa, Aa, and AA, respectively. In that way, we make no assumptions on the allele frequency of base population animals. The residual additive effects model (RAM) is
      y1=1μ+ZfHηH+ZfJηJ+Za+Zepe+e,
      [3]


      where aN(0,Gσa2), and σa2 is the RA variance. Note that σa2 cannot be interpreted as traditional additive genetic variance.
      Further, we investigated a model that combined the LBP effects and the RA effects (RA+LBPM). Thus, the model included the Za term from Equation 3 in addition to all the terms in Equation 2. We only tested RA+LBPM for segment length of 100 for LBP, denoted RA+LBPM(100).

      Models for ETGV

      We calculated ETGV as the estimated breeding value plus effects of heterozygosity, based on 2 main models with heterozygosity from genomic information. First, we tested models based on BOA (i.e., with GBH, and with and without accounting for LBH). Second, we modeled heterozygosity based on the genotype directly, either by considering genome-wide genotype heterozygosity, or by additionally accounting for local genotype heterozygosity using a dominance relationship matrix.
      The global BOA heterozygosity model (BOA-HM) included both GBH from BOA information and the terms in BaseM in Equation 1:
      y2=1μ+ZfHηH+ZfJηJ+ZhH,JτH,J+ZhH,RτH,R+ZhJ,RτJ,R+Zepe+e,
      [4]


      where hb,b' is a vector with tib,b¯ for each animal i, and τb,b is the fixed effect of proportions of loci with 1 allele assigned to breeds b and 1 allele assigned to breed b'.
      For comparison, we tested the pedigree heterozygosity model (Ped-HM), where we replaced hb,b in Equation 4 with a pedigree-based estimate of breed heterozygosity, hpedb,b. We calculated the elements of hpedb,b as hped,ib,b=fped,sb×fped,db+fped,sb×fped,db, where fped,sb and fped,db are pedigree-based breed proportions of sire and dam of animal i, respectively.
      The local breed heterozygosity model (LB-HM) extends BOA-HM by including random LBH effects. The LB-HM has the following form, regardless of whether LBH is modeled by single markers [LB-HM(1)] or marker segments [LB-HM(100)]:
      y2=1μ+ZfHηH+ZfJηJ+ZhH,JτH,J+ZhH,RτH,R+ZhJ,RτJ,R+ZTH,JδH,J+ZTH,RδH,R+ZTJ,RδJ,R+Zepe+e,
      [5]


      where Tb,b=Tb,bhb,b1. The rows of the Tb,b matrix contains the LBH vectors, tib,b(t(100),ib,b) for each animal, and therefore Tb,b contains the local deviations from the GBH of each cow. Further, δb,bN(0,Iσδ,b,b2) are the effects of LBH between breeds b and b', where σδ,b,b2 is the marker (segment) LBH variance. The LB-HM has an equivalent model with LBH similarity matrices instead of the Tb,bδb,b terms. In that case, the LBH effect of breed pair b and b' for the crossbred cows is ub,b=Tb,bδb,b and the LBP similarity matrices were constructed as Kb,b=Tb,bTb,bλu, where λu is a normalizing constant defined as λu=tr(Tb,bTb,b)n. The related LBH variance is σu,b,b2=σδ,b,b2λu.
      In addition to considering breed heterozygosity based on the estimated BOA, we tested models that included the genotype heterozygosity instead of breed heterozygosity. Thus, the model had the marker genotype heterozygosity, irrespective of BOA (i.e., assuming heterozygosity of a locus was the same for all pairs of breeds). First, we considered a genotype heterozygosity model (GT-HM). Instead of including GBH effect hb,bτb,b as in Equation 4, the GT-HM had an effect of the proportion of loci that are heterozygous across the genome:
      y2=1μ+ZfHηH+ZfJηJ+Zkκ+Zepe+e,
      [6]


      where vector k contains the proportion of marker loci that are heterozygous for each animal and κ is the fixed effect of genotype heterozygosity. Genotype heterozygosity can also be considered using the dominance relationship matrix, which accounts for heterozygosity at individual markers. The dominance heterozygosity model (D-HM) extends GT-HM:
      y2=1μ+ZfHηH+ZfJηJ+Zkκ+Zd+Zepe+e,
      [7]


      where the d vector has the random dominance deviation effects for each animal. It was assumed that dN(0,Dσd2), where D=WWtr(WW)/n is the genomic dominance relationship matrix following
      • Vitezica Z.G.
      • Varona L.
      • Elsen J.-M.
      • Misztal I.
      • Herring W.
      • Legarra A.
      Genomic BLUP including additive and dominant variation in purebreds and F1 crossbreds, with an application in pigs.
      . The W matrix has the genotypes coded as 1 for heterozygote and 0 for either homozygote, and σd2 is the dominance variance. Similar to the estimated RA variance in RAM, the σd2 estimated here cannot be interpreted as a population dominance variance (
      • Vitezica Z.G.
      • Varona L.
      • Elsen J.-M.
      • Misztal I.
      • Herring W.
      • Legarra A.
      Genomic BLUP including additive and dominant variation in purebreds and F1 crossbreds, with an application in pigs.
      ).

      Model Fit and Variance Components

      We tested the models on the data presented above. First, we estimated variance components for the models using all data and compared the goodness-of-fit of the models.
      For the LBPM models, we used the marker-based SNP-BLUP type models (Equation [2]) for segment lengths 100 markers, but the equivalent similarity matrix model for LBPM(1). Similarly, we used the SNP-BLUP version in Equation [5] for LB-HM(100), and the equivalent similarity matrix models for LB-HM(1).
      We built and inverted the relationship and similarity matrices using Julia (
      • Bezanson J.
      • Edelman A.
      • Karpinski S.
      • Shah V.B.
      Julia: A fresh approach to numerical computing.
      ). We added a small value, 0.0001, to the diagonal of the LBP and LBH similarity matrices to ensure that they were invertible. We estimated REML variance components for all models based on all data using the average-information REML algorithm in the DMU package (
      • Madsen P.
      • Jensen J.
      DMU, A package of analyzing multivariate mixed models. Version 6, release 5.2.
      ). However, for a few models, the algorithm did not converge. In those instances, we used the EM-REML algorithm instead, also using the DMU package.
      We compared the goodness-of-fit of the more complicated models to the models having fixed global effects (i.e., BaseM, BOA-HM or GT-HM) with the likelihood ratio test. Significance in likelihood ratio test was determined by comparing difference in −2log(L), between models with a mixture (50/50) of 2 chi-squared distributions with Nm-Nbase and Nm-Nbase −1 degrees of freedom. Here, L is the REML likelihood, Nm is the number of parameters in the more advanced model and Nbase is the number of parameters in the nested model. Additionally, we computed Bayesian information criterion [BIC; BIC = −2log(L)+Nm × log(n-Nfixed)], where n is the number of animals and Nfixed is the number of fixed effects (including intercept), and Akaike information criterion [AIC; AIC = −2log(L)+2Nm] for all the models (
      • Meyer K.
      Estimates of direct and maternal covariance functions for growth of Australian beef calves from birth to weaning.
      ). Consequently, lower AIC or BIC indicates models with higher likelihood, with a penalty for the number of parameters. Therefore, the models with the lowest AIC or BIC are preferred.

      Cross-Validation

      We assessed the predictive ability of the models using random 5-fold cross-validation using similar settings as in
      • Aliloo H.
      • Pryce J.E.
      • González-Recio O.
      • Cocks B.G.
      • Hayes B.J.
      Accounting for dominance to improve genomic evaluations of dairy cows for fertility and milk production traits.
      . We constructed 5 nonoverlapping validation sets out of the 5,214 crossbred cows of approximately equal size. The split was done randomly with 2 restrictions: first, paternal half-sib groups were kept in the same set, and second, cows that had daughters among the crossbred cows were not included in the validation set to avoid the unrealistic scenario of predicting dams based on information from their daughters. For each of the validation sets, we set the phenotypes in y1(y2) as missing for the validation animals and estimated the effects in the models based on the remaining data. The total GEBV for animal i was subsequently constructed as GEBVi=a^pGEBV,i+y^1,i, where y^1,i is the sum of all estimated effects, except the permanent environment, in each model, including the intercept.
      Predictive ability of the models for predicting GEBV, denoted rGEBV,y, was estimated as weighted correlation between total GEBV and corrected phenotypes adjusted for their respective fixed effects. If a cow had yield records from multiple lactations (i.e., from second and third in addition to the first lactation), we used an average of the 2 or 3 lactation yields. The weighted correlation used the reliability based on the number of records and assuming the heritability of 0.30 and the repeatability of 0.45 as weights (the weights ranged from 0.3 to 0.47). Using reliability as weights follows the Interbull standard validation test for genomic predictions (
      • Mäntysaari E.A.
      • Liu Z.
      • VanRaden P.
      Interbull validation test for genomic evaluations.
      ). We also estimated the dispersion bias of the prediction with the b1 coefficient from the weighted linear regression of y1 on GEBV.
      We evaluated the predicted total genetic values from the ETGV models in a similar manner as for the GEBV models. We calculated the ETGV for animal i as ETGVi=a^pGEBV,i+y^2,i, where y^2,i is the sum of all estimated effects, except permanent environment, from the respective model for cow i, including the intercept. The ETGV were compared to the corrected phenotypes that were not corrected for heterozygosity, y2 = yX2β2. The predictive ability correlation (rETGV,y) and b1 was assessed in the same manner as for GEBV.
      To get an estimate of the sampling variance of the correlation estimates and to assess whether they were different, we used 10,000 bootstrap samples of the validation animal GEBV and corrected phenotypes for each trait. From each sample, we calculated rGEBV,y and b1 from the models. We tested statistical significance at P < 0.05 for each pair of models from the distribution of the difference of rGEBV,y and b1 between the alternative models [i.e., whether 0 (no difference) was within the 95% CI of the differences in rGEBV,y and b1]. We performed the bootstrap procedure in the same way to test for differences in the models for ETGV. Note that this testing of difference between 2 models takes into account that the estimated rGEBV,y and b1 are strongly correlated between the alternative models.

      RESULTS

      Model Fit and Variance Components

      Table 1 shows the estimated variance components, −2log(L), BIC, and AIC for models for the BaseM, LBPM(1), LBPM(100), RAM, and RA+LBPM(100) models for GEBV. The LBPM(1), LBPM(100), RAM, and RA+LBPM(100) models fitted the data significantly (P < 0.01) better than the BaseM for all 3 traits based on likelihood ratio test. For all 3 traits, BIC was lowest for RAM whereas AIC was lowest for RA+LBPM(100).
      Table 1Estimated variance components, the log of the maximum likelihood [−2log(L)], Bayesian information criterion (BIC), and Akaike information criterion (AIC) in models for predicting breeding value
      σv,H2 = Local breed proportion variance for Holstein; σv,J2 = local breed proportion variance for Jersey; σv,R2 = local breed proportion variance for Red dairy cattle; σa2 = residual additive genetic variance; σpe2 = permanent environment variance; σe2 = residual variance. The value for BaseM was subtracted from all −2log(L), BIC, and AIC.
      Model
      BaseM = base model, fitting global breed proportions; LBPM(m) = local breed proportion model with segment length of m markers; RAM = residual additive effect model; RA+LBPM(100) = residual additive effect model including local breed proportion effects with segment length of 100 markers.
      σv,H2σv,J2σv,R2σa2σpe2σe2−2log(L)BICAIC
      Milk yield
      Variances for milk yield are given in (10 kg)2.
       BaseM9,01416,550000
       LBPM(1)2070998,72016,552−19
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      7−13
       LBPM(100)1981988,72216,552−19
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      7−13
       RAM3,6647,25816,515−56
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      −47−54
       RA+LBPM(100)1540853,3047,18616,518−67
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      −33−59
      Fat yield
      Variances for fat yield and protein yield are given in kg2.
       BaseM1,5552,943000
       LBPM(1)51701,5032,943−16
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      10−10
       LBPM(100)48701,5062,943−15
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      10−9
       RAM5581,2922,937−40
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      −31−38
       RA+LBPM(100)45055161,2682,937−51
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      −17−43
      Protein yield
      Variances for fat yield and protein yield are given in kg2.
       BaseM1,0391,761000
       LBPM(1)25071,0091,761−18
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      8−12
       LBPM(100)23071,0101,761−18
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      8−12
       RAM4628211,756−62
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      −54−60
       RA+LBPM(100)19064308111,756−74
      Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      −39−66
      1 σv,H2 = Local breed proportion variance for Holstein; σv,J2 = local breed proportion variance for Jersey; σv,R2 = local breed proportion variance for Red dairy cattle; σa2 = residual additive genetic variance; σpe2 = permanent environment variance; σe2 = residual variance. The value for BaseM was subtracted from all −2log(L), BIC, and AIC.
      2 BaseM = base model, fitting global breed proportions; LBPM(m) = local breed proportion model with segment length of m markers; RAM = residual additive effect model; RA+LBPM(100) = residual additive effect model including local breed proportion effects with segment length of 100 markers.
      3 Variances for milk yield are given in (10 kg)2.
      4 Variances for fat yield and protein yield are given in kg2.
      * Models have significantly (P < 0.01) better fit than the BaseM model based on likelihood ratio test.
      The total of estimated LBP variance summed over the 3 breeds for MY from the LBPM(1) was 30,600 kg2, which equals 2.9% of the estimated genetic variance from the pedigree-based animal model. The LBP variance for MY accounted for 0.9% of the phenotypic variance. The estimated LBP variances for all traits were slightly lower when we considered the LBP for chromosome segments rather than individual markers. For MY, the estimated LBP variances were 29,600 kg2 for LBPM(100). The total estimated LBP variances from LBPM(1) were 57.9 and 31.2 kg2 for FY and PY, respectively, summed over the 3 breeds. The total estimated LBP variances were 1.2 and 1.0% of the phenotypic variance for FY and PY, respectively.
      Table 2 has the estimated variance components, −2log(L), BIC and AIC for Ped-HM, BOA-HM, LB-HM(1), LB-HM(100), GT-HM, and D-HM models for ETGV. For PY and MY, inclusion of LBH effects did not improve goodness-of-fit. Further, BIC and AIC were lowest for BOA-HM and the estimated LBH variances were not significantly different from zero for any of the models for MY and PY. For FY, however, the LB-HM models fitted the data significantly (P < 0.01) better than the BOA-HM based on likelihood ratio test. The AIC was also lower for the full LB-HM models, irrespective of segment length, than for BOA-HM for FY. Because of the different fixed effects in the models, the GT-HM and D-HM models could not be compared with the Ped-HM, BOA-HM, and LB-HM models based on the REML likelihoods. For all 3 traits, the D-HM fitted the data significantly (P < 0.01) better than the GT-HM based on likelihood ratio test and had lower BIC and AIC.
      Table 2Estimated variance components, the log of the maximum likelihood [−2log(L)], Bayesian information criterion (BIC), and Akaike information criterion (AIC) in models for predicting total genetic value
      sigmav,H,R2 = Local breed heterozygosity variance for Holstein/Red dairy cattle cross; σv,H,J2 = local breed heterozygosity variance for Holstein/Jersey cross; σv,J,R2 = local breed heterozygosity variance for Jersey/Red dairy cattle cross; σd2 = dominance variance; σpe2 = permanent environment variance; σe2 = residual variance. The value for BOA-HM was subtracted from all −2log(L), BIC, and AIC.
      Model
      Ped-HM = pedigree global heterozygosity model. BOA-HM = breed of origin global heterozygosity model. LB-HM(m) = local breed heterozygosity model with segment length of m markers. GT-HM = global genotype heterozygosity model. D-HM = dominance model.
      σu,H,R2σu,H,J2σu,J,R2σd2σpe2σe2−2log(L)BICAIC
      Milk yield
      Variances are given in (10 kg)2.
       Ped-HM8,97816,547101010
       BOA-HM8,94716,546000
       LB-HM(1)8008,94016,5460266
       LB-HM(100)9408,93916,5460266
       GT-HM8,32617,498300
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      300
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      300
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
       D-HM2,7765,85217,478265
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      Models have significantly (P < 0.01) better fit than the GT-HM model based on likelihood ratio test.
      273
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      267
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      Fat yield
      Variances are given in kg2.
       Ped-HM1,5432,943111111
       BOA-HM1,5372,943000
       LB-HM(1)61321,4962,944−11
      Models have significantly (P < 0.01) better fit than the BOA-HM model based on likelihood ratio test.
      15−5
       LB-HM(100)61311,4972,944−11
      Models have significantly (P < 0.01) better fit than the BOA-HM model based on likelihood ratio test.
      15−5
       GT-HM1,3943,182422
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      422
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      422
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
       D-HM6731,0113,178398
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      Models have significantly (P < 0.01) better fit than the GT-HM model based on likelihood ratio test.
      406
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      400
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      Protein yield
      Variances are given in kg2.
       Ped-HM1,0021,786484848
       BOA-HM1,0281,760000
       LB-HM(1)0011,0271,7600266
       LB-HM(100)0011,0281,7600266
       GT-HM1,0021,78655
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      55
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      55
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
       D-HM5486891,78315
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      Models have significantly (P < 0.01) better fit than the GT-HM model based on likelihood ratio test.
      23
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      17
      Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      1 sigmav,H,R2 = Local breed heterozygosity variance for Holstein/Red dairy cattle cross; σv,H,J2 = local breed heterozygosity variance for Holstein/Jersey cross; σv,J,R2 = local breed heterozygosity variance for Jersey/Red dairy cattle cross; σd2 = dominance variance; σpe2 = permanent environment variance; σe2 = residual variance. The value for BOA-HM was subtracted from all −2log(L), BIC, and AIC.
      2 Ped-HM = pedigree global heterozygosity model. BOA-HM = breed of origin global heterozygosity model. LB-HM(m) = local breed heterozygosity model with segment length of m markers. GT-HM = global genotype heterozygosity model. D-HM = dominance model.
      3 Variances are given in (10 kg)2.
      4 Variances are given in kg2.
      5 Not comparable to Ped-HM, BOA-HM, and LB-HM models because the fixed effects are different.
      * Models have significantly (P < 0.01) better fit than the BOA-HM model based on likelihood ratio test.
      ** Models have significantly (P < 0.01) better fit than the GT-HM model based on likelihood ratio test.

      Cross-Validation

      The cross-validation results for calculating GEBV are in Table 3. Differences between the models were generally small, with the largest difference between the highest and the lowest rGEBV,y being 0.012 for PY. The highest rGEBV,y was from the RA+LBPM(100) for all 3 traits, 0.581, 0.396, and 0.432 for MY, FY, and PY, respectively. The standard deviations of the rGEBV,y across the 10,000 bootstrap samples were 0.010, 0.012, and 0.012, for MY, FY, and PY, respectively, with only very minor differences between models. For all 3 traits, the LBPM models gave significantly (P < 0.05) higher rGEBV,y than BaseM. The RAM and the LBPM models had similar rGEBV,y for all 3 traits, significantly (P < 0.05) higher rGEBV,y than from BaseM with the exception of RAM for FY where the difference was not significant.
      Table 3Correlations between genomic estimated breeding values and corrected phenotypes (rGEBV,y) and dispersion bias (b1) of prediction for milk yield, fat yield, and protein yield
      Model
      BaseM = base model, fitting global breed proportions; LBPM(m) = local breed proportion model with segment length of m markers; RAM = residual additive effect model; RA+LBPM(100) = residual additive effect model including local breed proportion effects with segment length of 100 markers.
      Milk yieldFat yieldProtein yield
      rGEBV,yb1rGEBV,yb1rGEBV,yb1
      BaseM0.574
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1.000
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.387
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.885
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.420
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.961
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      LBPM(1)0.577
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1.009
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.392
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.906
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.425
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.981
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      LBPM(100)0.578
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1.011
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.392
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.906
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.426
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.982
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      RAM0.578
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.983
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.391
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.881
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.428
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.944
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      RA+LBPM(100)0.581
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.993
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.396
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.901
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.432
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.961
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      a–c Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1 BaseM = base model, fitting global breed proportions; LBPM(m) = local breed proportion model with segment length of m markers; RAM = residual additive effect model; RA+LBPM(100) = residual additive effect model including local breed proportion effects with segment length of 100 markers.
      In general, the b1 coefficients for GEBV were close to 1 for all models for MY and PY, indicating low dispersion bias. The values ranged from 0.983 to 1.011 for MY, and 0.944 to 0.982 for PY. The predictions were somewhat inflated for FY, with b1 ranging from 0.881 to 0.907 for FY across models. For all 3 traits, the lowest b1 was from RAM and the highest from LBPM(100).
      The cross-validation results for ETGV are in Table 4. The rETGV,y and b1 values across the models were almost identical for all traits. The range in rETGV,y was from 0.568 to 0.573 for MY, 0.425 to 0.429 for FY, and 0.399 to 0.407 for PY. Standard deviations of rETGV,y across the 10,000 bootstrap samples were 0.010, 0.012, and 0.012, for MY, FY, and PY, respectively, for all tested models. Among the models that only included fixed regressions on genome-wide heterozygosity and no random effects (i.e., Ped-HM, BOA-HM, and GT-HM), the BOA-HM had the highest rETGV,y for all the traits. However, the differences in rETGV,y and b1 between these models were not statistically significant. Similar to the model fit results, the cross-validation results for FY from the LB-HM models differed from those for MY and PY. For MY and PY, the rETGV,y from BOA-HM and both LB-HM models were almost identical. The highest rETGV,y for MY and PY was achieved by D-HM, significantly (P < 0.05) higher than from GT-HM. However, the differences were not significant when compared with the BOA-HM and the LB-HM models. For FY, the LB-HM models had the highest rETGV,y for both segment lengths, significantly higher than BOA-HM. Further, the D-HM had similar rETGV,y as the LB-HM models, and significantly higher than GT-HM. Similar to the effects on rETGV,y, the inclusion of LBH effects did not affect b1 for MY and PY. The b1 for FY were lower than for the other traits, but without considerable differences between the studied models.
      Table 4Correlations between estimated total genetic value and corrected phenotypes (rETGV,y) and dispersion bias (b1) of prediction for milk yield, fat yield, and protein yield
      Model
      Ped-HM = Pedigree heterozygosity model; BOA-HM = global breed heterozygosity model; LB-HM(m) = local breed heterozygosity model with segment length of m markers; GT-HM = global genotype heterozygosity model; D-HM = dominance model.
      Milk yieldFat yieldProtein yield
      rETGV,yb1rETGV,yb1rETGV,yb1
      Ped-HM0.568
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1.000
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.425
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.907
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.400
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.960
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      BOA-HM0.569
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.999
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.426
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.907
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.402
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.961
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      LB-HM(1)0.569
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.999
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.429
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.913
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.402
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.962
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      LB-HM(100)0.569
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.999
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.429
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.913
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.402
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.962
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      GT-HM0.568
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1.000
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.423
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.906
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.399
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.958
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      D-HM0.573
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.992
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.429
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.905
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.407
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      0.953
      Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      a–d Differences in correlations and b1 between models with a common superscript are not statistically significant (P < 0.05) based on the confidence interval of the differences between models in 10,000 bootstrap samples.
      1 Ped-HM = Pedigree heterozygosity model; BOA-HM = global breed heterozygosity model; LB-HM(m) = local breed heterozygosity model with segment length of m markers; GT-HM = global genotype heterozygosity model; D-HM = dominance model.

      DISCUSSION

      We have presented an investigation about whether the assigned BOA could provide information on breed proportions and breed heterozygosity that are useful for genomic prediction, in addition to BOA allowing for breed-specific marker effects. We found that inclusion of either LBP or RA did improve model fit for genomic models for production traits of Danish crossbred dairy cows, and resulted in a slight improvement of predictive ability, of up to 1 percentage point. Local effects of heterozygosity did not improve fit or predictive ability for ETGV for MY and PY, but for FY a statistically significant LBH effect of J × R was found.

      Breed Proportions

      In BOA models for the calculation of GEBV for the crossbred animals,
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      and
      • Guillenea A.
      • Su G.
      • Lund M.S.
      • Karaman E.
      Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles.
      accounted for breed levels based on estimates of breed proportions from BOA assignment, similarly as GBP. Other studies have used pedigree-based estimates of breed proportions (
      • Makgahlela M.L.
      • Mäntysaari E.A.
      • Strandén I.
      • Koivula M.
      • Nielsen U.S.
      • Sillanpää M.J.
      • Juga J.
      Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
      ), breed base representation (
      • VanRaden P.M.
      • Tooker M.E.
      • Chud T.C.S.
      • Norman H.D.
      • Megonigal Jr., J.H.
      • Haagen I.W.
      • Wiggans G.R.
      Genomic predictions for crossbred dairy cattle.
      ), or the output from the Admixture software (
      • Khansefid M.
      • Goddard M.E.
      • Haile-Mariam M.
      • Konstantinov K.V.
      • Schrooten C.
      • de Jong G.
      • Jewell E.G.
      • O'Connor E.
      • Pryce J.E.
      • Daetwyler H.D.
      • MacLeod I.M.
      Improving genomic prediction of crossbred and purebred dairy cattle.
      ) to account for breed proportions in crossbred animals for genomic prediction. Genotype-based approaches should be able to account for deviations in proportion of genome from the expectation based on breed composition of parents. This is evident for 3-way terminal crosses, where the crossbred parent, typically the dam, has a breed proportion of 0.50 for both contributing breeds, but the genomic breed proportion of the offspring can vary considerably. Figure 3 shows how this distribution looks for the 3-way crosses in our data. Among the 695 cows with H grandsire or granddam, 18% had GBP of H outside the 0.20 to 0.30 range. Based on pedigree information they would all get the breed proportion 0.25.
      We are not aware of any published studies using BOA to investigate the effects of LBP as done in this study. Differences in the QTL allele frequencies between breeds can lead to differences in genetic levels of breeds. Modeling the breed levels globally, as in BaseM, only accounts for the effects of the QTL allele frequency difference averaged over the genome (i.e., it assumes that these effects are evenly distributed across the genome). When LBP is included, local differences in QTL allele frequencies are accommodated for. The estimated marker effects from the pure breeds, which we used for pGEBV, can only predict the effects of QTL that are both segregating within breed and in LD with markers that are also segregating within the breed. Consequently, QTL that are fixed in one breed, but segregating or fixed with another allele in the other breeds, are not contributing to pGEBV. However, the LBP effects capture the effect of having the chromosome segment with the QTL from the breed with the fixed QTL.
      The results in this study (Table 1) indicate a significant LBP variance in crossbred cows from H, J, and R breeds for production traits, particularly for the H proportion. However, for comparison of the estimated variances, the contribution of each breed to the crossbred group has to be considered. Of the 3 breeds considered, the contribution of H to the crossbred animals was the largest (Figure 2). The presented LBP variances, σv,b2, depend on λv, a normalizing constant including the trace of P*bP*b', and hence on the diagonal of the matrix. An animal without the breed b contribution has 0 as diagonal element in the Qb matrix. The same is true if both of its parents are purebred, such that the LBP is constant (0.5) throughout the genome. More than half of the animals in this study had no J contribution, which could partly explain the low LBP variance estimated for J compared to H and R. However, J is less related to the H and R breeds than the relationship between the H and R breeds (Figure 1 and
      • Gautason E.
      • Schönherz A.A.
      • Sahana G.
      • Guldbrandtsen B.
      Relationship of Icelandic cattle with Northern and Western European cattle breeds, admixture and population structure.
      ). Therefore, larger LBP effects could be expected for J, if the breed contributions were the same for all breeds, which was not the case in this study (Figure 2).
      • Christensen O.F.
      • Legarra A.
      • Lund M.S.
      • Su G.
      Genetic evaluation for three-way crossbreeding.
      constructed the segregation partial relationship matrix between the maternal breeds of 3-way crossbred animals based on BOA in a similar manner as we constructed the LBP effects in our study. Segregation between breeds such as LBP is related to difference in allele frequencies of QTL (
      • Lo L.L.
      • Fernando R.L.
      • Grossman M.
      Covariance between relatives in multibreed populations: Additive model.
      ). The segregation term is, however, defined between pairs of breeds (
      • Lo L.L.
      • Fernando R.L.
      • Grossman M.
      Covariance between relatives in multibreed populations: Additive model.
      ;
      • García-Cortés L.A.
      • Toro M.A.
      Multibreed analysis by splitting the breeding values.
      ), whereas LBP is defined only for a given breed. The relationship between LBP and breed segregation needs further investigation.
      The estimated LBP variances decreased slightly with increasing chromosome segment length for the LBP from 1 to 100 markers (Table 1). However, the segment length did not affect rGEBV,y. The longer the segment length, the more likely there are recombination points within segments that show difference from one breed origin to another. Consequently, the values in the p(100),ib vector becomes closer to GBP and, therefore, dilute the variation in LBP. Additionally, QTL with opposite effects are more likely to be within the same segments when the segments are long, also diluting possible LBP effects. Majority of the crossbred animals were from simple crosses such as F1, 3-way crosses, and first generations of rotational crossbreeding. Therefore, the genomes of the animals were expected to be constructed from relatively long segments originating from the pure breeds [Further information on the genotyped cows is in
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      ]. After applying crossbreeding for many generations, the chromosome segments with the same BOA are expected to be shorter. The differences between model fit and predictive ability between LBPM(1) and LBPM(100) might thus be larger for more complicated crossbreeding scenarios. We made additional tests using LBP models with segment length of 20 and 500 markers. For segment length of 20 markers, the results were almost identical to those from LBPM(1). For segment length of 500 markers, the estimated LBP variances were somewhat lower than those from LBPM(100), but the predictive ability was similar.
      In this study, we calculated pGEBV from estimated marker effects from separate purebred evaluations that used data from different purebred animals. Other authors (
      • Karaman E.
      • Su G.
      • Croue I.
      • Lund M.S.
      Genomic prediction using a reference population of multiple pure breeds and admixed individuals.
      ;
      • Guillenea A.
      • Su G.
      • Lund M.S.
      • Karaman E.
      Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles.
      ) have estimated within-breed marker effects in BOA models from combined data set of crossbred and purebred animals. In theory, LBP would also be relevant for such models because the marker effects depend on the within-breed-allele frequency.

      Residual Additive Effects

      The RAM considers the additive genetic effects twice. First, the within-breed genetic effect was predicted (i.e., pGEBV) based on solutions from the separate purebred evaluations. Second, the RA effect aims at capturing additive genetic effects that the within-breed prediction failed to capture. Among the reasons for the presence of RA variance could be, first, the different genetic background of purebred and crossbred animals, which results in different allele effects in the purebred and the crossbred animals (
      • Christensen O.F.
      • Madsen P.
      • Nielsen B.
      • Su G.
      Genomic evaluation of both purebred and crossbred performances.
      ). Second, the estimated marker effects from the purebred populations are estimates, and therefore not completely accurate despite large reference groups. Third, the phenotypes of crossbred animals themselves are an information source for their genetic effects, which could improve model fit. Fourth, the RA variance could partly come from the allele frequency differences of the breeds, which should also be accounted by the LBP variance. The estimated σa2 from RA+LPBM(100) was indeed lower than its estimate from RAM for all 3 traits (Table 1) but the reduction was only a small proportion of the total estimated σa2. The small increase in rGEBV,yfrom RAM compared with BaseM indicates that the RA effects were not very important for prediction in our data, where the number of crossbred animals was small, compared to the large reference data of purebred animals contributing to pGEBV.

      Breeding Value Estimation

      The proposed models here are add-ons to the GEBV calculation based on BOA and solutions from purebred evaluations (
      • Eiríksson J.H.
      • Karaman E.
      • Su G.
      • Christensen O.F.
      Breed of origin of alleles and genomic predictions for crossbred dairy cows.
      ,
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      ). The simplest model in our comparison for breeding value estimation, BaseM, accounted for the breed levels based on crossbred information and GBP rather than using phenotypes of purebred as in (
      • Eiríksson J.H.
      • Byskov K.
      • Su G.
      • Thomasen J.R.
      • Christensen O.F.
      Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
      ), and required, therefore, no assumptions on the same environment for the involved purebred animals. For this data, only slight increase in predictive ability was obtained by adding either the LBP effects or the RA effects to the model. In practice, the simpler BaseM may therefore be the preferred model.
      As presented above, the relevance of LBP depends on the breed composition of the crossbred animals included, and particularly the F1 crosses do not contribute to the LBP variation. The difference in allele frequencies between the breeds in question also affects the relevance of LBP with large differences being likely to result in larger LBP effects. Therefore, the inclusion of LBP effects may have larger effect on prediction accuracy than observed in this study in the following situations: (1) groups of crossbred animals that include no or few F1 crosses, and (2) crosses of breeds with large difference in allele frequencies of QTL. Further, the accuracy of genomic predictions depends on the number of phenotypic records included in the training set for estimation of marker effects (
      • Meuwissen T.H.
      • Hayes B.J.
      • Goddard M.E.
      Prediction of total genetic value using genome-wide dense marker maps.
      ). In our study, phenotypic records on around 4,100 cows were included in each of the training sets for estimating LBP effects for 3 breeds in the cross-validation. Training on larger data may result in more accurate estimates of the LBP effects, which should result in higher predictive ability.

      Heterozygosity

      When BOA assignment of the marker alleles is available, calculating GBH as an estimate of breed heterozygosity is simple. In principle, GBH should be able to account for realized breed heterozygosity, in contrast with estimates based on pedigree, which require the assumption that offspring receive exactly half of the breed components in their parents. The contribution of grandparent's breed in the genome of crossbreds can however vary considerably as shown in Figure 3.
      The kκ term in Equations [6] and [7] was defined as the opposite of a homozygosity term in
      • Xiang T.
      • Christensen O.F.
      • Vitezica Z.G.
      • Legarra A.
      Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs.
      and
      • Doekes H.P.
      • Bijma P.
      • Veerkamp R.F.
      • de Jong G.
      • Wientjes Y.C.J.
      • Windig J.J.
      Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle.
      (i.e., regression on genome-wide heterozygosity), except that in our study the κ parameter was estimated across breeds. Comparison of prediction ability of the BOA-HM, Ped-HM, and GT-HM models indicated that the BOA-derived heterozygosity indicator GBH was a similar or a better indicator for heterozygosity than the pedigree or genotype genome-wide heterozygosity indicators for ETGV calculation (Table 4). Breed heterozygosity indicates the proportion of the genome that has alleles from alternating breeds, including alleles of QTL. The difference in allele frequency between the breeds increases the probability of the 2 haplotypes at chromosome segments to be different compared to when the 2 haplotypes come from the same breed. That results in increased QTL heterozygosity in breed-heterozygous chromosome segments. Genotype heterozygosity indicates that the markers, which supposedly are linked to QTL, are heterozygous, regardless of breeds. However, for crossbred animals, linkage can be inconsistent based on the breed origin (
      • Ibánez-Escriche N.
      • Fernando R.L.
      • Toosi A.
      • Dekkers J.C.
      Genomic selection of purebreds for crossbred performance.
      ), and, thus, the marker heterozygosity might not be a good indicator for heterozygosity of QTL across breeds. Therefore, in the case of crossbred animals of breeds with low degree of shared QTL-marker LD, GBH may be a better heterozygosity indicator than genotype heterozygosity.
      The results on LBH effect on FY are out of line with the other results of this study (Table 2, Table 4). First, we detected no variance of J × R LBH on the other 2 traits and no improvement of including LBH effect. Second, we detected no, or very low, variance of the other breed pair LBH effects on FY. Milk fat percentage is considerably higher for J cows than the other breeds (
      • Årstatistik Avl
      Husdyrinnovation Kvæg, SEGES, Skejby.
      ). It would therefore not be surprising to detect genetic effects related to fat production that were unique for J or J crossbreeding. However, if the observed J × R LBH variance was related to J specific alleles, similar result would be expected for the H × J LBH effect. More data were available on H × J crossing than R × J (Figure 4), and yet we observed no variance explained with H × J effects (Table 2). Therefore, given the limited data we had for estimating the variance of J × R LBH effect (Figure 4), and the inconsistency with other results, further investigation is needed before any conclusions are made on the LBH effect on FY.
      In purebred dairy cattle, dominance variance has been estimated for yield traits (
      • Sun C.
      • VanRaden P.M.
      • Cole J.B.
      • O'Connell J.R.
      Improvement of prediction ability for genomic selection of dairy cattle by including dominance effects.
      ;
      • Aliloo H.
      • Pryce J.E.
      • González-Recio O.
      • Cocks B.G.
      • Hayes B.J.
      Accounting for dominance to improve genomic evaluations of dairy cows for fertility and milk production traits.
      ). However, results on increased accuracy of prediction when dominance effects are included, compared with models only including additive effects, are inconsistent (
      • Sun C.
      • VanRaden P.M.
      • Cole J.B.
      • O'Connell J.R.
      Improvement of prediction ability for genomic selection of dairy cattle by including dominance effects.
      ;
      • Aliloo H.
      • Pryce J.E.
      • González-Recio O.
      • Cocks B.G.
      • Hayes B.J.
      Accounting for dominance to improve genomic evaluations of dairy cows for fertility and milk production traits.
      ). Further,
      • Doekes H.P.
      • Bijma P.
      • Veerkamp R.F.
      • de Jong G.
      • Wientjes Y.C.J.
      • Windig J.J.
      Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle.
      reported low dominance variance when regression on genome-wide inbreeding was included in the model for Holstein cattle, and only limited variation in inbreeding depression across the genome. The limited benefit of the dominance effects, when included in addition to the global heterozygosity indicators in this study, are consistent with these results. Additionally, low level of LD in the crossbred group and relatively few data may hamper our ability to estimate dominance effects.
      For selection of crossbred heifers for milk production, providing ETGV can facilitate a good selection basis, in addition to GEBV. As an example, for crossbreeding systems where crossbred dairy cows are inseminated with beef semen (

      Kargo, M., J. F. Ettema, M. Fjordside, and L. Hjortø. 2014. Combi-Cross–The use of new technologies for improving dairy crossbreeding programs. Proceedings of the 10th World Congr. Genet. Appl. Livest. Prod. Vancouver, Canada.

      ;
      • Clasen J.B.
      • Kargo M.
      • Østergaard S.
      • Fikse W.F.
      • Rydhmer L.
      • Strandberg E.
      Genetic consequences of terminal crossbreeding, genomic test, sexed semen, and beef semen in dairy herds.
      ), all selection among the crossbred cows is on their potential for production, rather than their potential to produce good offspring for dairy production. Therefore, accurate ETGV is more relevant than GEBV in that case. For predicting heterosis, GBH could be an interesting alternative to pedigree-based heterozygosity estimates. Further, GBH is an option for accounting for heterosis in phenotypes of genotyped crossbred animals for inclusion into genetic evaluation. Our results suggest that accounting for local heterozygosity is not important for production traits in crossbred dairy cows.

      CONCLUSIONS

      Assigned BOA can give information on breed proportions, both globally in the genome and locally in genome regions, which are useful for genomic prediction of crossbred dairy cows. We found significant variance for LBP effects on production traits in Danish crossbred dairy cows, of magnitude around 1% of phenotypic variance. The importance of LBP and the size of this variance depends on the breed composition of the crossbred animals. In our data, including LBP or RA effects improved GEBV prediction for crossbred cows slightly, when the effects were added to prediction from solutions from separate purebred genomic evaluation. The increase in predictive ability was around a 0.5 percentage point. Assigned BOA can further give information on breed heterozygosity, which can be useful for either accounting for, or predicting, heterosis. From our results, we cannot see clear benefit from modeling heterozygosity locally in the genome rather than only globally across all loci.

      ACKNOWLEDGMENTS

      This study was a part of the DairyCross project supported by the Green Development and Demonstration Program (GUDP) from the Danish Ministry of Food, Agriculture and Fisheries (J. nr. 34009-18-1365; Copenhagen, Denmark). The authors have not stated any conflicts of interest.

      REFERENCES

        • Aliloo H.
        • Pryce J.E.
        • González-Recio O.
        • Cocks B.G.
        • Hayes B.J.
        Accounting for dominance to improve genomic evaluations of dairy cows for fertility and milk production traits.
        Genet. Sel. Evol. 2016; 48 (26830030): 8
        • Årstatistik Avl
        Husdyrinnovation Kvæg, SEGES, Skejby.
        • Bezanson J.
        • Edelman A.
        • Karpinski S.
        • Shah V.B.
        Julia: A fresh approach to numerical computing.
        SIAM Rev. 2017; 59: 65-98
        • Calus M.P.L.
        • Bijma P.
        • Veerkamp R.F.
        Evaluation of genomic selection for replacement strategies using selection index theory.
        J. Dairy Sci. 2015; 98 (26142859): 6499-6509
        • Christensen O.F.
        • Legarra A.
        • Lund M.S.
        • Su G.
        Genetic evaluation for three-way crossbreeding.
        Genet. Sel. Evol. 2015; 47 (26694257): 98
        • Christensen O.F.
        • Madsen P.
        • Nielsen B.
        • Su G.
        Genomic evaluation of both purebred and crossbred performances.
        Genet. Sel. Evol. 2014; 46 (24666469): 23
        • Clasen J.B.
        • Fikse W.F.
        • Kargo M.
        • Rydhmer L.
        • Strandberg E.
        • Østergaard S.
        Economic consequences of dairy crossbreeding in conventional and organic herds in Sweden.
        J. Dairy Sci. 2020; 103 (31733860): 514-528
        • Clasen J.B.
        • Kargo M.
        • Østergaard S.
        • Fikse W.F.
        • Rydhmer L.
        • Strandberg E.
        Genetic consequences of terminal crossbreeding, genomic test, sexed semen, and beef semen in dairy herds.
        J. Dairy Sci. 2021; 104 (33814139): 8062-8075
        • Doekes H.P.
        • Bijma P.
        • Veerkamp R.F.
        • de Jong G.
        • Wientjes Y.C.J.
        • Windig J.J.
        Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle.
        Genet. Sel. Evol. 2020; 52 (33115403): 64
        • Eiríksson J.H.
        • Byskov K.
        • Su G.
        • Thomasen J.R.
        • Christensen O.F.
        Genomic predictions for crossbred dairy cows by combining solutions from purebred evaluation based on breed origin of alleles.
        J. Dairy Sci. 2022; 105 (35465992): 5178-5191
        • Eiríksson J.H.
        • Karaman E.
        • Su G.
        • Christensen O.F.
        Breed of origin of alleles and genomic predictions for crossbred dairy cows.
        Genet. Sel. Evol. 2021; 53 (34742238): 84
        • EuroGenomics
        EuroGenomics genotyping microarray.
        https://www.eurogenomics.com/eurog-md-chip.html
        Date: 2019
        Date accessed: March 18, 2022
        • Falconer D.S.
        • Mackay T.F.C.
        Introduction to Quantitative Genetics.
        4th ed. Longman Group Ltd, 1996
        • García-Cortés L.A.
        • Toro M.A.
        Multibreed analysis by splitting the breeding values.
        Genet. Sel. Evol. 2006; 38 (17129562): 601-615
        • Gautason E.
        • Schönherz A.A.
        • Sahana G.
        • Guldbrandtsen B.
        Relationship of Icelandic cattle with Northern and Western European cattle breeds, admixture and population structure.
        Acta Agric. Scand. A Anim. Sci. 2020; 69: 25-38
        • Guillenea A.
        • Su G.
        • Lund M.S.
        • Karaman E.
        Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles.
        J. Dairy Sci. 2022; 105 (35033341): 2426-2438
        • Hjortø L.
        • Ettema J.F.
        • Kargo M.
        • Sørensen A.C.
        Genomic testing interacts with reproductive surplus in reducing genetic lag and increasing economic net return.
        J. Dairy Sci. 2015; 98 (25465627): 646-658
        • Ibánez-Escriche N.
        • Fernando R.L.
        • Toosi A.
        • Dekkers J.C.
        Genomic selection of purebreds for crossbred performance.
        Genet. Sel. Evol. 2009; 41 (19284703): 12
        • Karaman E.
        • Su G.
        • Croue I.
        • Lund M.S.
        Genomic prediction using a reference population of multiple pure breeds and admixed individuals.
        Genet. Sel. Evol. 2021; 53 (34058971): 46
      1. Kargo, M., J. F. Ettema, M. Fjordside, and L. Hjortø. 2014. Combi-Cross–The use of new technologies for improving dairy crossbreeding programs. Proceedings of the 10th World Congr. Genet. Appl. Livest. Prod. Vancouver, Canada.

        • Khansefid M.
        • Goddard M.E.
        • Haile-Mariam M.
        • Konstantinov K.V.
        • Schrooten C.
        • de Jong G.
        • Jewell E.G.
        • O'Connor E.
        • Pryce J.E.
        • Daetwyler H.D.
        • MacLeod I.M.
        Improving genomic prediction of crossbred and purebred dairy cattle.
        Front. Genet. 2020; 11 (33381150)598580
        • Lidauer M.H.
        • Mäntysaari E.A.
        • Strandén I.
        • Pösö J.
        • Pedersen J.
        • Nielsen U.S.
        • Johansson K.
        • Eriksson J.-A. A.
        • Madsen P.
        • Aamand G.P.
        Random heterosis and recombination loss effects in a multibreed evaluation for Nordic Red Dairy Cattle. Abstract c24–02 in Proc. 8th World Congr. Genet. Appl. Livest. Prod., Belo Horizonte, Brazil.
        Brazilian Society of Animal Breeding, 2006
        • Lo L.L.
        • Fernando R.L.
        • Grossman M.
        Covariance between relatives in multibreed populations: Additive model.
        Theor. Appl. Genet. 1993; 87 (24190314): 423-430
        • Lund M.S.
        • Su G.
        • Janss L.
        • Guldbrandtsen B.
        • Brøndum R.F.
        Genomic evaluation of cattle in a multi-breed context.
        Livest. Sci. 2014; 166: 101-110
        • Madsen P.
        • Jensen J.
        DMU, A package of analyzing multivariate mixed models. Version 6, release 5.2.
        • Makgahlela M.L.
        • Mäntysaari E.A.
        • Strandén I.
        • Koivula M.
        • Nielsen U.S.
        • Sillanpää M.J.
        • Juga J.
        Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle.
        J. Anim. Breed. Genet. 2013; 130 (23317061): 10-19
        • Mäntysaari E.A.
        • Liu Z.
        • VanRaden P.
        Interbull validation test for genomic evaluations.
        Interbull Bull. 2010; 41: 17-22
        • Meuwissen T.H.
        • Hayes B.J.
        • Goddard M.E.
        Prediction of total genetic value using genome-wide dense marker maps.
        Genetics. 2001; 157 (11290733): 1819-1829
        • Meyer K.
        Estimates of direct and maternal covariance functions for growth of Australian beef calves from birth to weaning.
        Genet. Sel. Evol. 2001; 33 (11712971): 487-514
        • NAV
        NAV routine genetic evaluation of dairy cattle—Data and genetic models.
        • Sargolzaei M.
        • Chesnais J.P.
        • Schenkel F.S.
        A new approach for efficient genotype imputation using information from relatives.
        BMC Genomics. 2014; 15 (24935670): 478
        • Sevillano C.A.
        • Vandenplas J.
        • Bastiaansen J.W.M.
        • Bergsma R.
        • Calus M.P.L.
        Genomic evaluation for a three-way crossbreeding system considering breed-of-origin of alleles.
        Genet. Sel. Evol. 2017; 49 (29061123): 75
        • Sørensen M.K.
        • Norberg E.
        • Pedersen J.
        • Christensen L.G.
        Invited review: Crossbreeding in dairy cattle: A Danish perspective.
        J. Dairy Sci. 2008; 91 (18946115): 4116-4128
        • Sun C.
        • VanRaden P.M.
        • Cole J.B.
        • O'Connell J.R.
        Improvement of prediction ability for genomic selection of dairy cattle by including dominance effects.
        PLoS One. 2014; 9 (25084281)e103934
        • Vandenplas J.
        • Calus M.P.L.
        • Sevillano C.A.
        • Windig J.J.
        • Bastiaansen J.W.M.
        Assigning breed origin to alleles in crossbred animals.
        Genet. Sel. Evol. 2016; 48 (27549177): 61
        • VanRaden P.M.
        Efficient methods to compute genomic predictions.
        J. Dairy Sci. 2008; 91 (18946147): 4414-4423
        • VanRaden P.M.
        • Tooker M.E.
        • Chud T.C.S.
        • Norman H.D.
        • Megonigal Jr., J.H.
        • Haagen I.W.
        • Wiggans G.R.
        Genomic predictions for crossbred dairy cattle.
        J. Dairy Sci. 2020; 103 (31837783): 1620-1631
        • Vitezica Z.G.
        • Varona L.
        • Elsen J.-M.
        • Misztal I.
        • Herring W.
        • Legarra A.
        Genomic BLUP including additive and dominant variation in purebreds and F1 crossbreds, with an application in pigs.
        Genet. Sel. Evol. 2016; 48 (26825279): 6
        • Xiang T.
        • Christensen O.F.
        • Vitezica Z.G.
        • Legarra A.
        Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs.
        Genet. Sel. Evol. 2016; 48 (27887565): 92