Advertisement
Research| Volume 102, ISSUE 12, P11081-11091, December 2019

Inclusion of herdmate data improves genomic prediction for milk-production and feed-efficiency traits within North American dairy herds

Open AccessPublished:September 20, 2019DOI:https://doi.org/10.3168/jds.2019-16820

      ABSTRACT

      Genomic data are widely available in the dairy industry and provide a cost-effective means of predicting genetic merit to inform selection decisions and increase genetic gains. As more dairy farms adopt genomic selection practices, dairy producers will soon have genomic data available on all of the animals within their herds. This is a very rich, but currently underused, source of information. Herdmates provide an excellent indication of how a selection candidate's genetics will perform within a given herd, noting that herdmates often include close relatives that share a similar environment. The study objective was to evaluate the utility of incorporating herdmate data into genomic predictions in a data set composed of 3,303 Holsteins from one herd in Canada and 6 herds throughout the United States. Within-herd prediction accuracy was assessed for milk-production and feed-efficiency traits determined from genomic best linear unbiased prediction under 4 different scenarios. Scenario 1 did not include herdmates in the training population. Scenarios 2 through 4 included herdmates in the training population, and scenarios 3 and 4 also included modeling of herd-specific marker effects. Leave-one-out cross validation was used to maximize the number of herdmates in the training population in scenarios 2 through 4, while maintaining constant training population size with scenario 1. Results from the present study reveal the importance of incorporating herdmate data into genomic evaluations. Inclusion of herdmates in the training population improved mean within-herd prediction accuracy for milk-production traits (± standard error) by 0.08 ± 0.03 (milk yield), 0.07 ± 0.03 (fat percentage), and 0.05 ± 0.01 (protein percentage) and feed-efficiency traits by 0.07 ± 0.02 (milk energy), 0.03 ± 0.02 (DMI), and 0.08 ± 0.01 (metabolic body weight). Modeling herd-specific marker effects further improved mean within-herd prediction accuracy for milk yield and energy by 0.03 ± 0.01 and 0.02 ± 0.01, respectively. Herds with higher within-herd heritability and low genomic correlation with the remaining herds benefitted most from the inclusion of herdmate data.

      Key words

      INTRODUCTION

      Routine genotyping of all heifer calves within a dairy herd is a common and cost-effective means of predicting genetic merit to inform selection and culling decisions. Genomic selection increases the rate of genetic gain through increased accuracy of EBV, as compared with pedigree-based approaches that ignore Mendelian segregation within families (
      • Meuwissen T.H.
      • Hayes B.J.
      • Goddard M.E.
      Prediction of total genetic value using genome-wide dense marker maps.
      ;
      • Wiggans G.R.
      • Cole J.B.
      • Hubbard S.M.
      • Sonstegard T.S.
      Genomic selection in dairy cattle: The USDA experience.
      ). Dairy producers can test (genotype) their calves, obtain predictions of genetic merit derived from an established reference population, and cull young animals with poor genetic potential; this can reduce feed costs during the rearing period and genetically improve the herd by propagating animals with superior genetic merit. As more dairy farms adopt genomic selection practices, dairy producers will soon have genomic data available on all of the animals within their herds. This is a very rich, but currently underused, source of information. Herdmates provide an excellent indication of how a selection candidate's genetics will perform within a given herd, noting that herdmates often include close relatives that share a similar environment and management system.
      The goal of a national genetic evaluation system is to identify sires (and, more recently, cows) whose offspring will perform well, on average, across a wide range of environments. Environmental conditions and herd management practices are considered as “nuisance factors,” and their effects are minimized or eliminated by spreading offspring across many different herds and environments in a progeny testing program. Genetic differences that could allow some animals to perform well under certain conditions, such as feeding a total mixed ration versus relying on rotational grazing, or in the presence or absence of heat stress (or heat abatement devices), are ignored. This approach is in contrast to that of plant breeders, in which the performance and resilience of various crop varieties are optimized for local soil types and growing season conditions. The current approach also conflicts with an individual dairy producer's goal of breeding cattle that will perform well under the housing, nutrition, and management conditions present on their specific farms.
      The statistical models for predicting breeding values assume 2 components: the additive genetic effects and the environmental effects sum to an individual animal's phenotype. This approach ignores gene-by-environment interactions, meaning the expression of an individual's genetic potential is currently assumed to be independent of the environmental conditions to which the animal was exposed. In a genomic analysis, this means that the estimated effects of genetic markers are assumed to be constant across all environments. Carrying out genetic improvement at the national level using this one-size-fits-all approach has provided tremendous genetic progress in dairy cattle and other livestock species; however, it may not be well suited for selecting the optimal animals for a targeted environment or production system.
      Several studies have investigated and reported on the existence of genotype-by-environment interactions in dairy cattle.
      • Strandberg E.
      • Brotherstone S.
      • Wall E.
      • Coffey M.P.
      Genotype by environment interaction for first-lactation female fertility traits in UK dairy cattle.
      and
      • Tiezzi F.
      • de Los Campos G.
      • Parker Gaddis K.L.
      • Maltecca C.
      Genotype by environment (climate) interaction improves genomic prediction for production traits in US Holstein cattle.
      both used sire models to examine interactions with environmental covariates using reaction norm models.
      • Yao C.
      • De Los Campos G.
      • VandeHaar M.
      • Spurlock D.
      • Armentano L.
      • Coffey M.
      • De Haas Y.
      • Veerkamp R.
      • Staples C.
      • Connor E.
      Use of genotype × environment interaction model to accommodate genetic heterogeneity for residual feed intake, dry matter intake, net energy in milk, and metabolic body weight in dairy cattle.
      explicitly modeled constant and across-environment (geographic region) marker effects to examine genotype-by-environment effects. However, the extent to which phenotypic variation reflects herd-specific marker effects is largely unknown. It is known that sire effects often vary by herd, which implies the existence of herd-specific marker effects.
      The objective of our study was to test the predictive ability of models that incorporate herdmate data and explicitly model genotype-by-herd interactions for prediction of within-herd performance for milk-production and feed-efficiency traits in North American herds.

      MATERIALS AND METHODS

      Phenotypic Data

      The data used in this study consisted of 3,303 Holstein cows from one herd in Canada (216 cows) and 6 herds located throughout the United States (Florida, Iowa, Michigan, Maryland, and 2 Wisconsin herds with 331, 908, 253, 483, 341, and 771 cows, respectively). More detailed information about the data sources can be found in the study by
      • Tempelman R.J.
      • Spurlock D.M.
      • Coffey M.
      • Veerkamp R.F.
      • Armentano L.E.
      • Weigel K.A.
      • de Haas Y.
      • Staples C.R.
      • Connor E.E.
      • Lu Y.
      • VandeHaar M.J.
      Heterogeneity in genetic and nongenetic variation and energy sink relationships for residual feed intake across research stations and countries.
      .
      Phenotypic data included 6 milk-production and feed-efficiency traits: milk yield (MY), fat %, protein %, milk energy (MKE), DMI, and metabolic BW. Phenotypes consisted of averaged data collected during a 6-wk period 50 to 200 d postpartum. Data from an individual cow's most recent parity were used in the analysis. Milk energy was calculated based on the gross energy per kilogram in fat, protein, and lactose. Metabolic BW was computed as BW to the 0.75 power. Each trait was adjusted for known fixed and random factors using a linear mixed-effects model. Fixed effects included herd-year-season at calving, DIM, and both linear and quadratic effects of the continuous variable parity. Experimental ration group was included as a random effect. The residuals for each cow were used as adjusted phenotypes in the following genomic evaluation.

      Genotypic Data

      The SNP genotypes were available for all 3,303 cows. Genotypes were obtained from the Council on Dairy Cattle Breeding (Bowie, MD) Holstein genotype database. Raw genotypes represented various low- and medium-density arrays, and missing genotypes were imputed to higher density using genotype information from bulls and cows in the Council on Dairy Cattle Breeding database (
      • Wiggans G.R.
      • Cooper T.A.
      • VanRaden P.M.
      • Van Tassell C.P.
      • Bickhart D.M.
      • Sonstegard T.S.
      Increasing the number of single nucleotide polymorphisms used in genomic evaluation of dairy cattle.
      ). Any remaining missing genotypes (about 2%) were filled in using rounded allele frequencies from the current US Holstein population. Genotypes at each locus were coded as 0, 1, or 2 copies of the minor allele. A total of 56,575 SNP per cow with a minor allele frequency greater than 5% were available for the genomic evaluation analyses.

      Statistical Models

      We considered 4 scenarios for genomic evaluation. Scenario 1 excluded herdmates of the test animal in the training population. Scenario 2 included herdmates of the test animal in the training population. Both scenarios 1 and 2 assumed marker effects were constant across environments. Scenario 3 included herdmates of the test animal in the training population and allowed marker-by-herd interactions. Similar to scenario 3, scenario 4 included herdmates in the training population; however, we considered a 2-step approach to modeling marker-by-herd interactions that would allow post hoc adjustment of national EBV, where in step 1 phenotypes (preadjusted for known fixed effects) within each herd are regressed on EBV generated under scenario 1. The regression coefficient from step 1 represents a herd-specific scaling factor (expansion or shrinkage adjustment of the EBV). The EBV is the predicted deviation from the mean phenotypic performance and assumes that a one-unit difference in EBV equates to a one-unit phenotypic difference for that trait. Fitted values from this model adjust for the expansion or shrinkage of EBV within a particular herd but do not result in the reranking of animals. In step 2 the model residuals within each herd are examined for remaining herd-specific marker effects in an effort to more accurately rank animals within a herd [i.e., the residuals (from step 1) used as the dependent variable in step 2 are computed as y − intercept − scaling factor × EBV]. The herd-adjusted breeding value is equivalent to the sum of the fitted value generated by regressing phenotypes on the scenario 1 EBV (i.e., scenario 1 EBV multiplied by a herd-specific scaling factor) and the EBV generated in step 2.

      Interaction Model

      The interaction model used in scenario 3 assumed marker effects had 2 components: one that was common across all herds (β0) and one that was specific for each of the herds in our study population (β1, β2, β3,…). The equation (assuming 3 herds for ease of notation) was as follows:
      [y1y2y3]=[1μ11μ21μ3]+[X1X2X3]β0+[X1000X2000X3][β1β2β3]+[ɛ1ɛ2ɛ3],
      [1]


      where y1, y2, and y3 represented a vector of phenotypes and X1, X2, and X3 represented matrices of genotypes for individuals in herd 1, herd 2, and herd 3; μ1, μ2, and μ3 were herd-specific intercepts; and ε1, ε2, and ε3 represented a vector of residuals for each of the herds. Residuals were assumed to be independent [ɛ1~N(0,Iσɛ12),ɛ2~N(0,Iσɛ22),ɛ3~N(0,Iσɛ32)]; σɛ12, σɛ22 and σɛ32 were herd-specific residual variances; and I is an identity matrix.
      The model can be represented as a genomic BLUP model, by letting
      u0=[X1X2X3]β0,u1=[X100]β1,u2=[0X20]β2,u3=[00X3]β3,
      [2]


      and represented as
      y = μ + u0 + u1 + u2 + u3 + ε,
      [3]


      where u0~N(0,G0σg02) is a vector of additive genetic effects due to constant marker effects across herds and u1, u2, and u3 [u1~N(0,G1σg12),u2~N(0,G2σg22),u3~N(0,G3σg32)] are vectors of herd-specific genetic effects due to deviation of marker effects for a particular herd; σg02 is the common genomic variance and σg12, σg22 and σg32 are herd-specific genomic variances. For example, vector u1 contains the herd-specific genetic effects for individuals in herd 1. Elements of vector u1 for individuals in herds 2 and 3 are equal to zero as indicated by Equation [2]. The sum of the across-herd genetic effects (u0) and herd-specific genetic effects (u1, u2, and u3) represent a herd-customized breeding value.
      G0=[Z1Z12p1j(1-p1j)Z1Z22p1j(1-p1j)2p2j(1-p2j)Z1Zn2p1j(1-p1j)2pnj(1-pnj)Z2Z12p2j(1-p2j)2p1j(1-p1j)Z2Z22p2j(1-p2j)Z2Zn2p2j(1-p2j)2pnj(1-pnj)ZnZ12pnj(1-pnj)2p1j(1-p1j)ZnZ22pnj(1-pnj)2p2j(1-p2j)ZnZn2pnj(1-pnj)],
      [4]


      The marker-derived genomic relationship matrix G0 was computed as a multipopulation genomic relationship matrix with each herd being considered a separate population using the method described in the study by
      • Wientjes Y.C.J.
      • Bijma P.
      • Vandenplas J.
      • Calus M.P.L.
      Multi-population genomic relationships for estimating current genetic variances within and genetic correlations between populations.
      and is shown in Equation [4]. In this equation, Zn represented a mean-centered genotype matrix for herd n and pnj represented the minor allele frequency of the jth marker in herd n. The genomic relationship matrix for herd n (Gn) was computed as
      Gn=ZnZn2pnj(1-pnj),
      [5]


      where Zn represented a mean-centered genotype matrix for herd n and pnj represented the minor allele frequency of the jth marker in herd n. A classical G matrix {[ZZ′/∑2p(1 − p)], p = allele frequency in entire study population} was also considered to estimate the common genomic variance component instead of the multipopulation G matrix. In this study population, the 2 approaches resulted in almost identical estimation of breeding values for each trait (correlation >0.98) and very similar within-herd prediction accuracies (correlation >0.999999, mean difference <0.0001), see Supplemental Tables S1–S3 (https://doi.org/10.3168/jds.2019-16820).

      Across-Herd Model

      The across-herd model used in scenarios 1, 2, and step 1 of scenario 4 assumed marker effects were common across all herds (β0) and could be obtained by setting β1 = β2 = β3 = 0 from the interaction model. The equations became
      [y1y2y3]=[1μ11μ21μ3]+[X1X2X3]β0+[ɛ1ɛ2ɛ3],
      [6]


      and, when represented as a genomic BLUP model,
      u0=[X1X2X3]β0,y=μ+u0+ɛ.
      [7]


      Inference on Variance Components

      Genomic variance components were estimated from the interaction model using the full data set. The genomic heritabilities (h2) for each trait within herd i were calculated as
      h2=(σgi2+σg02)(σgi2+σg02+σɛi2).
      [8]


      Trait-specific genomic correlation for a single herd with the remaining herds was measured as the square root of the proportion of genetic variance within a herd explained by common marker effects using the following equation:
      cor=σg02(σg02+σgi2).
      [9]


      Assessment of Within-Herd Prediction Accuracy

      Prediction accuracy was calculated as the correlation between genomic predicted breeding values and adjusted phenotypes. An efficient leave-one-out cross validation (
      • Cheng H.
      • Garrick D.J.
      • Fernando R.L.
      Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.
      ) was used to maximize the number of herdmates included in the training population for scenarios 2 through 4. While estimating breeding values within each herd, the training population size was held constant under all 4 scenarios by removing a subset of individuals from the remaining herds in scenarios 2, 3, and step 2 of scenario 4 to maintain an equivalent number of individuals comprising the scenario 1 training population. A paired sample t-test was used to test for significant differences in mean within-herd prediction accuracy among the different scenarios evaluated.

      Software

      The mixed model equations for generating adjusted phenotypes, estimating (co)variance components, and predicting breeding values were solved with restricted maximum likelihood (REML) method using the R package “sommer” version 3.6 (
      • Covarrubias-Pazaran G.
      Genome-assisted prediction of quantitative traits using the R package sommer.
      ).

      RESULTS

      Population Stratification

      The genetic substructure of the data is plotted in Figure 1 using the first and second largest eigenvectors derived from the genomic relationship matrix of all cows. Each herd is highlighted in a separate plot to illustrate the degree of clustering of individuals from a particular herd relative to the remaining herds. In a scenario where genetic similarity within a herd is similar to the level of genetic similarity between random individuals of the remaining herds, one would observe the data points for a particular herd to random overlay data points belonging to the remaining herds. Figure 1 illustrates that for some of the herds, animals within the same herd tend to cluster together, and therefore, animals from these herds are more genetically similar to one another. The 2 most geographically distant herds, the Florida and Canada herds, each cluster toward the outer edges of the plotted data points. The Fst values were calculated using Weir and Hill's method (
      • Weir B.S.
      • Hill W.G.
      Estimating F-statistics.
      ) to measure the degree of genetic differentiation of each herd with the remaining herds. The Fst is a fixation index ranging from zero to one, with zero indicating no differentiation. The Fst values range from 0.002 to 0.010, with the Canadian herd being the most differentiated. The Fst values for 10 random subsets of individuals equivalent to the herd sizes in the study population are all less than 0.0001.
      Figure thumbnail gr1
      Figure 1Plot of principal components 1 and 2 for all 3,303 genotyped animals. Dark data points indicate animals from an individual featured herd (Canada, Florida, Iowa, Michigan, Maryland, Wisconsin 1, and Wisconsin 2). Fst = fixation index ranging from 0 to 1 that measures population differentiation.

      Genomic Variance Component Estimation

      The genomic variance component estimates for milk-production and feed-efficiency traits for each of the herds are presented in Table 1, Table 2 and Figure 2A. Within-herd heritability determined from the interaction model is reported in Table 1. Decomposition of the variance is presented in Figure 2A. Estimated genomic heritabilities for all of the traits vary among herds. The observed differences are a result of both varying residual variances among herds and varying herd-specific genomic variances (Figure 2A). Trait-specific genomic correlations of individual herds with the remaining herds are reported in Table 2. A genomic correlation estimate less than one for an individual herd with the remaining herds indicates the presence of herd-specific genomic variance. This herd-specific genomic variance is composed of 2 components: scaling and reranking. Scaling is a herd-specific expansion or shrinkage of the EBV that does not result in reranking of animals within the herd. In general, genomic correlations are high; however, notable differences in genomic correlation are present among herds for MY and MKE traits.
      Table 1Estimates (±SE) of within-herd heritability for milk-production and feed-efficiency traits: milk yield (MY), fat %, protein %, milk energy (MKE), DMI, and metabolic BW (MBW)
      HerdMYFat %Protein %MKEDMIMBW
      Canada0.47 ± 0.140.82 ± 0.090.77 ± 0.090.40 ± 0.140.17 ± 0.120.53 ± 0.11
      Florida0.44 ± 0.120.51 ± 0.090.59 ± 0.090.35 ± 0.120.26 ± 0.110.57 ± 0.10
      Iowa0.17 ± 0.050.52 ± 0.050.55 ± 0.040.20 ± 0.050.29 ± 0.050.49 ± 0.05
      Michigan0.50 ± 0.130.70 ± 0.100.60 ± 0.100.44 ± 0.130.29 ± 0.120.44 ± 0.10
      Maryland0.21 ± 0.070.52 ± 0.060.56 ± 0.060.16 ± 0.070.27 ± 0.070.45 ± 0.06
      Wisconsin 10.23 ± 0.110.51 ± 0.090.54 ± 0.080.11 ± 0.100.33 ± 0.110.46 ± 0.09
      Wisconsin 20.15 ± 0.050.50 ± 0.050.53 ± 0.050.14 ± 0.050.21 ± 0.050.44 ± 0.05
      Table 2Estimates (±SE) of genomic correlation with other North American herds for milk-production and feed-efficiency traits: milk yield (MY), fat %, protein %, milk energy (MKE), DMI, and metabolic BW (MBW)
      HerdMYFat %Protein %MKEDMIMBW
      Canada0.74 ± 0.140.97 ± 0.081.00 ± 0.080.74 ± 0.161.00 ± 0.361.00 ± 0.12
      Florida0.64 ± 0.101.00 ± 0.100.96 ± 0.080.61 ± 0.120.85 ± 0.180.85 ± 0.09
      Iowa1.00 ± 0.150.88 ± 0.051.00 ± 0.040.75 ± 0.120.89 ± 0.091.00 ± 0.05
      Michigan0.56 ± 0.090.78 ± 0.070.88 ± 0.090.58 ± 0.111.00 ± 0.211.00 ± 0.12
      Maryland0.83 ± 0.150.90 ± 0.070.97 ± 0.060.85 ± 0.190.89 ± 0.131.00 ± 0.07
      Wisconsin 10.81 ± 0.200.97 ± 0.091.00 ± 0.081.00 ± 0.460.74 ± 0.140.94 ± 0.10
      Wisconsin 21.00 ± 0.180.95 ± 0.061.00 ± 0.051.00 ± 0.201.00 ± 0.131.00 ± 0.06
      Figure thumbnail gr2
      Figure 2(A) Plots of within-herd variance decomposition of milk-production and feed-efficiency traits estimated from the interaction model using the full data set. (B) Bar plots of improvement in predictive ability determined under 4 scenarios: scenario 1 = exclusion of herdmates from training population; scenario 2 = inclusion of herdmates in training population; scenario 3 = inclusion of herdmates in training population and modeling of herd-specific genetic effects; scenario 4 = inclusion of herdmates in training population and modeling of herd-specific genetic effects using 2-step approach. MY = milk yield; MKE = milk energy content; MBW = metabolic BW; CAN = Canada.

      Prediction Accuracy

      Table 3 presents the correlation between genomic predicted breeding values and phenotypes for 4 scenarios. For most traits, prediction accuracies are significantly higher for scenarios 2 through 4 in comparison with scenario 1, which includes the same number of individuals in the training population but does not include herdmates of the test animals. Inclusion of herdmates in the training population improved mean within-herd prediction accuracy for milk-production traits (±SE) by 0.08 ± 0.03 (MY; P < 0.05), 0.07 ± 0.03 (fat %; P < 0.05), and 0.05 ± 0.01 (protein %; P < 0.05) and feed-efficiency traits by 0.07 ± 0.02 (MKE; P < 0.05), 0.03 ± 0.02 (DMI; NS), and 0.08 ± 0.01 (metabolic BW; P < 0.001).
      Table 3Within-herd correlation estimates (±SE) for genomic predicted breeding values with milk-production and feed-efficiency traits determined under 4 scenarios
      ItemHerd
      CanadaFloridaIowaMichiganMarylandWisconsin 1Wisconsin 2
      MY
       Scenario 10.11 ± 0.070.03 ± 0.060.14 ± 0.030.14 ± 0.060.17 ± 0.040.26 ± 0.050.21 ± 0.04
       Scenario 20.30 ± 0.07
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.21 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.18 ± 0.030.23 ± 0.06
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.23 ± 0.04
      P < 0.05,
      0.29 ± 0.050.18 ± 0.04
       Scenario 30.33 ± 0.06
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.27 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.18 ± 0.030.33 ± 0.06
      P < 0.01,
      0.24 ± 0.04
      P < 0.05,
      0.28 ± 0.050.18 ± 0.04
       Scenario 40.33 ± 0.06
      P < 0.01,
      0.27 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.19 ± 0.03
      P < 0.05,
      0.32 ± 0.06
      P < 0.01,
      0.23 ± 0.04
      P < 0.05,
      0.29 ± 0.050.21 ± 0.04
      Fat%
       Scenario 10.15 ± 0.070.34 ± 0.050.32 ± 0.030.45 ± 0.060.44 ± 0.040.44 ± 0.050.46 ± 0.03
       Scenario 20.32 ± 0.06
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.30 ± 0.050.47 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.52 ± 0.05
      P < 0.01,
      0.50 ± 0.04
      P < 0.01,
      0.47 ± 0.050.49 ± 0.03
       Scenario 30.32 ± 0.06
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.30 ± 0.050.47 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.52 ± 0.05
      P < 0.05,
      0.50 ± 0.04
      P < 0.01,
      0.46 ± 0.050.49 ± 0.03
       Scenario 40.30 ± 0.07
      P < 0.05,
      0.34 ± 0.050.47 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.53 ± 0.05
      P < 0.05,
      0.52 ± 0.04
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.46 ± 0.050.52 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      Protein%
       Scenario 10.45 ± 0.060.35 ± 0.050.44 ± 0.030.57 ± 0.050.51 ± 0.040.49 ± 0.050.52 ± 0.03
       Scenario 20.51 ± 0.06
      P < 0.05,
      0.43 ± 0.05
      P < 0.01,
      0.49 ± 0.03
      P < 0.01,
      0.59 ± 0.050.58 ± 0.04
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.53 ± 0.05
      P < 0.05,
      0.54 ± 0.03
       Scenario 30.51 ± 0.06
      P < 0.05,
      0.43 ± 0.05
      P < 0.01,
      0.49 ± 0.03
      P < 0.01,
      0.60 ± 0.050.58 ± 0.04
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.53 ± 0.05
      P < 0.05,
      0.54 ± 0.03
       Scenario 40.51 ± 0.06
      P < 0.05,
      0.41 ± 0.05
      P < 0.05,
      0.52 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.60 ± 0.05
      P < 0.05,
      0.58 ± 0.04
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.52 ± 0.050.55 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      MKE
       Scenario 10.12 ± 0.070.09 ± 0.050.15 ± 0.030.12 ± 0.060.12 ± 0.050.17 ± 0.050.16 ± 0.04
       Scenario 20.27 ± 0.07
      P < 0.01,
      0.22 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.22 ± 0.03
      P < 0.01,
      0.19 ± 0.06
      P < 0.05,
      0.18 ± 0.04
      P < 0.05,
      0.19 ± 0.050.13 ± 0.04
       Scenario 30.30 ± 0.07
      P < 0.01,
      0.26 ± 0.05
      P < 0.01,
      0.23 ± 0.03
      P < 0.01,
      0.26 ± 0.06
      P < 0.05,
      0.18 ± 0.04
      P < 0.05,
      0.18 ± 0.050.12 ± 0.04
       Scenario 40.30 ± 0.07
      P < 0.01,
      0.27 ± 0.05
      P < 0.01,
      0.22 ± 0.03
      P < 0.01,
      0.26 ± 0.06
      P < 0.05,
      0.18 ± 0.040.18 ± 0.050.16 ± 0.04
      DMI
       Scenario 10.13 ± 0.070.21 ± 0.050.21 ± 0.030.22 ± 0.060.24 ± 0.040.20 ± 0.050.19 ± 0.04
       Scenario 20.07 ± 0.070.25 ± 0.050.32 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.26 ± 0.060.29 ± 0.04
      P < 0.05,
      0.26 ± 0.05
      P < 0.01,
      0.18 ± 0.04
       Scenario 30.06 ± 0.070.25 ± 0.050.32 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.26 ± 0.060.30 ± 0.040.28 ± 0.05
      P < 0.05,
      0.19 ± 0.04
       Scenario 40.13 ± 0.070.26 ± 0.050.32 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.26 ± 0.060.31 ± 0.04
      P < 0.05,
      0.30 ± 0.05
      P < 0.01,
      0.22 ± 0.04
      MBW
       Scenario 10.34 ± 0.060.29 ± 0.050.37 ± 0.030.45 ± 0.060.38 ± 0.040.26 ± 0.050.34 ± 0.03
       Scenario 20.38 ± 0.060.43 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.45 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.51 ± 0.05
      P < 0.01,
      0.44 ± 0.04
      P < 0.01,
      0.35 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.43 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
       Scenario 30.39 ± 0.060.44 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.45 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.51 ± 0.05
      P < 0.01,
      0.44 ± 0.04
      P < 0.01,
      0.36 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.43 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
       Scenario 40.38 ± 0.060.42 ± 0.05
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.45 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      0.47 ± 0.060.45 ± 0.04
      P < 0.01,
      0.33 ± 0.05
      P < 0.05,
      0.43 ± 0.03
      P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      Scenario 1 = training population excludes herdmates; scenario 2 = training population includes herdmates; scenario 3 = training population includes herdmates, and the genomic prediction model includes both constant across-herd and herd-specific genetic marker effects; scenario 4 = 2-step genotype-by-herd interaction model. Step 1 = regress phenotypes within each herd on EBV generated under scenario 1; step 2 = examine step 1 residuals within each herd for remaining herd-specific genetic variance, estimate herd-specific marker effects, and adjust breeding values from scenario 1 (adjusted EBV = step 1 fitted value + step 2 EBV). MY = milk yield; MKE = milk energy content; MBW = metabolic BW.
      * P < 0.05,
      ** P < 0.01,
      *** P < 0.001 (Hotelling's overlapping correlated correlation coefficient comparison test for differences from scenario 1 correlation estimates).
      Scenarios 2 and 3 both include herdmates in the training population; however, scenario 2 assumes marker effects are constant across herds, whereas scenario 3 allows marker effects to have a herd-specific component. Inclusion of this interaction term further improves mean within-herd prediction accuracy for MY and MKE (±SE) by 0.03 ± 0.01 and 0.02 ± 0.01, respectively. More notable improvements were observed for herds with higher heritabilities but low genomic correlation with the other herds.
      Scenarios 3 and 4 both include a herd-specific genomic variance component; however, scenario 4 takes a 2-step approach to modeling herd-specific marker effects. In scenario 4, step 1 involves determining genomic predicted breeding values in a manner equivalent to scenario 1 (assumes constant marker effects across herds, herdmates not included in the training population). The second step involves a within-herd analysis of residuals obtained from regressing phenotypes on the step 1 genomic predicted breeding values to determine whether additional herd-specific genomic variance is present. In the case that it does exist, step 1 predicted breeding values are adjusted for herd-specific marker effects determined in step 2. The herd adjusted breeding value is equivalent to the sum of the fitted value generated by regressing phenotypes on step 1 EBV (i.e., scenario 1 EBV multiplied by a herd-specific scaling factor, see Table 4) and the EBV generated in step 2. The 2-step approach yields very similar accuracies to scenario 3 with mean within-herd accuracy differences less than 0.02 for all traits.
      Table 4Regression coefficients (±SE) for milk production and feed efficiency phenotypes regressed on genomic predicted breeding values determined under 4 scenarios
      Item
      Scenario 1 = training population excludes herdmates; scenario 2 = training population includes herdmates; scenario 3 = training population includes herdmates, and the genomic prediction model includes both constant across-herd and herd-specific genetic marker effects; scenario 4 = 2-step genotype-by-herd interaction model. Step 1 = regress phenotypes within each herd on EBV generated under scenario 1; step 2 = examine step 1 residuals within each herd for remaining herd-specific genetic variance, estimate herd-specific marker effects, and adjust breeding values from scenario 1 (adjusted EBV = step 1 fitted value + step 2 EBV). MY = milk yield; MKE = milk energy content; MBW = metabolic BW.
      Herd
      CanadaFloridaIowaMichiganMarylandWisconsin 1Wisconsin 2
      MY
       Scenario 10.52 ± 0.320.12 ± 0.250.71 ± 0.170.67 ± 0.310.88 ± 0.231.38 ± 0.281.13 ± 0.19
       Scenario 21.04 ± 0.230.94 ± 0.240.82 ± 0.151.05 ± 0.281.06 ± 0.201.34 ± 0.240.98 ± 0.20
       Scenario 30.98 ± 0.190.90 ± 0.170.99 ± 0.180.99 ± 0.181.02 ± 0.191.23 ± 0.231.08 ± 0.22
       Scenario 41.06 ± 0.201.04 ± 0.201.00 ± 0.171.03 ± 0.190.94 ± 0.181.00 ± 0.181.00 ± 0.17
      Fat %
       Scenario 10.19 ± 0.090.63 ± 0.100.85 ± 0.081.10 ± 0.140.96 ± 0.091.14 ± 0.131.40 ± 0.10
       Scenario 20.52 ± 0.110.60 ± 0.101.03 ± 0.071.17 ± 0.120.95 ± 0.081.08 ± 0.111.13 ± 0.07
       Scenario 30.56 ± 0.110.64 ± 0.111.00 ± 0.061.06 ± 0.110.96 ± 0.081.10 ± 0.121.10 ± 0.07
       Scenario 40.99 ± 0.221.01 ± 0.151.00 ± 0.060.97 ± 0.100.98 ± 0.070.95 ± 0.101.01 ± 0.06
      Protein %
       Scenario 10.81 ± 0.110.71 ± 0.100.89 ± 0.061.38 ± 0.120.99 ± 0.081.00 ± 0.101.21 ± 0.07
       Scenario 20.83 ± 0.100.89 ± 0.100.96 ± 0.061.26 ± 0.111.01 ± 0.061.06 ± 0.091.08 ± 0.06
       Scenario 30.84 ± 0.100.89 ± 0.100.98 ± 0.061.16 ± 0.100.99 ± 0.061.07 ± 0.091.09 ± 0.06
       Scenario 40.99 ± 0.121.08 ± 0.131.02 ± 0.060.98 ± 0.080.98 ± 0.060.99 ± 0.091.00 ± 0.05
      MKE
       Scenario 10.60 ± 0.330.55 ± 0.331.05 ± 0.230.65 ± 0.340.64 ± 0.241.17 ± 0.360.83 ± 0.18
       Scenario 21.02 ± 0.251.12 ± 0.271.01 ± 0.150.96 ± 0.320.93 ± 0.241.23 ± 0.350.69 ± 0.20
       Scenario 30.98 ± 0.220.95 ± 0.191.00 ± 0.140.94 ± 0.221.01 ± 0.251.49 ± 0.450.88 ± 0.26
       Scenario 41.04 ± 0.231.01 ± 0.200.95 ± 0.140.99 ± 0.231.00 ± 0.261.03 ± 0.300.83 ± 0.18
      DMI
       Scenario 10.68 ± 0.351.04 ± 0.271.09 ± 0.170.77 ± 0.220.97 ± 0.180.92 ± 0.250.77 ± 0.14
       Scenario 20.34 ± 0.341.16 ± 0.241.11 ± 0.110.84 ± 0.200.98 ± 0.151.06 ± 0.210.78 ± 0.15
       Scenario 30.35 ± 0.381.17 ± 0.241.06 ± 0.100.91 ± 0.211.02 ± 0.151.03 ± 0.190.91 ± 0.17
       Scenario 40.68 ± 0.351.10 ± 0.221.03 ± 0.101.03 ± 0.241.05 ± 0.151.06 ± 0.191.00 ± 0.16
      MBW
       Scenario 10.86 ± 0.160.81 ± 0.151.00 ± 0.081.24 ± 0.160.97 ± 0.110.72 ± 0.150.96 ± 0.09
       Scenario 20.87 ± 0.141.12 ± 0.130.95 ± 0.061.18 ± 0.130.93 ± 0.090.91 ± 0.131.04 ± 0.08
       Scenario 30.89 ± 0.151.05 ± 0.120.98 ± 0.061.19 ± 0.130.95 ± 0.090.91 ± 0.131.06 ± 0.08
       Scenario 41.03 ± 0.171.09 ± 0.130.98 ± 0.061.01 ± 0.120.95 ± 0.091.06 ± 0.171.00 ± 0.07
      1 Scenario 1 = training population excludes herdmates; scenario 2 = training population includes herdmates; scenario 3 = training population includes herdmates, and the genomic prediction model includes both constant across-herd and herd-specific genetic marker effects; scenario 4 = 2-step genotype-by-herd interaction model. Step 1 = regress phenotypes within each herd on EBV generated under scenario 1; step 2 = examine step 1 residuals within each herd for remaining herd-specific genetic variance, estimate herd-specific marker effects, and adjust breeding values from scenario 1 (adjusted EBV = step 1 fitted value + step 2 EBV). MY = milk yield; MKE = milk energy content; MBW = metabolic BW.
      The regression coefficient for phenotypes regressed on EBV should be close to one. A one-unit change in EBV should equate to a one-unit change in phenotype. Within-herd regression coefficients for EBV generated under scenarios 1 through 4 are reported in Table 4. The regression coefficients represent a herd-specific scaling (expansion or shrinkage) of the EBV. Bias (deviation from 1) is reduced with the inclusion of herdmate information and explicit modeling of genotype × herd interactions.
      Figure 2 illustrates increased gains in prediction accuracy achievable with inclusion of herdmate data, especially for herds with a large herd-specific variance component. Gain in within-herd prediction accuracy indicates the herd-specific genomic variance is due to reranking of animals within a herd and not just herd-specific scaling (shrinkage or expansion) of the EBV. The within-herd correlation of herdmate information unadjusted (scenario 1) and adjusted (scenario 4 − 2-step genotype × herd approach) genomic predicted breeding values is reported in Table 5. A correlation coefficient <1 indicates the presence of herd-specific genomic variance independent of scaling effects. The 2-step approach to modeling marker-by-herd interactions would facilitate post hoc adjustment of national EBV. The gains in within-herd explained phenotypic variance using this approach are also reported in Table 5 along with the improvement in ranking accuracy (measured by Spearman's correlation coefficient). Improvement in ranking accuracy is observed for herds with herd-specific genomic variance that is independent of scaling effects (i.e., herds that demonstrate gain in phenotypic variation explained by herdmate information adjusted EBV and herds that demonstrate decreased correlation of herdmate information adjusted and unadjusted EBV). For example, the herdmate information adjusted EBV and unadjusted EBV exhibit near zero correlation (r = −0.03) for the milk-yield trait in the Florida herd. However, the herdmate information adjusted EBV better explains phenotypic variation within the herd, and a large gain in ranking accuracy is attained with the herdmate adjusted EBV.
      Table 5Within-herd correlation of genomic predicted breeding values determined under scenarios with exclusion or inclusion of herdmate information and differences in explained phenotype variation (R2) and ranking accuracy for milk-production and feed-efficiency traits
      Item
      Scenario 1 (Sc1) = training population excludes herdmates; scenario 4 (Sc4) = 2-step genotype-by-herd interaction model. Step 1 = regress phenotypes within each herd on EBV generated under scenario 1; step 2 = examine step 1 residuals within each herd for remaining herd-specific genetic variance, estimate herd-specific marker effects, and adjust breeding values from scenario 1 (adjusted EBV = step 1 fitted value + step 2 EBV). ΔR2 = difference in R2 values determined from phenotype regression on EBV; ΔRanking accuracy = difference in Spearman's rank correlation coefficients for phenotype and EBV; MY = milk yield; MKE = milk energy content; MBW = metabolic BW.
      Herd
      CanadaFloridaIowaMichiganMarylandWisconsin 1Wisconsin 2
      MY
       EBV correlation (Sc1, Sc4)0.34−0.030.720.380.740.901.00
       ΔR2 (Sc4 − Sc1)0.100.070.020.090.030.020.00
       ΔRanking accuracy (Sc4 − Sc1)0.220.190.060.170.040.030.00
      Fat %
       EBV correlation (Sc1, Sc4)0.440.990.660.820.850.940.87
       ΔR2 (Sc4 − Sc1)0.070.000.120.070.070.020.06
       ΔRanking accuracy (Sc4 − Sc1)0.140.010.150.080.070.020.06
      Protein %
       EBV correlation (Sc1, Sc4)0.880.850.840.960.880.940.94
       ΔR2 (Sc4 − Sc1)0.060.040.080.030.070.020.04
       ΔRanking accuracy (Sc4 − Sc1)0.040.060.080.010.060.010.03
      MKE
       EBV correlation (Sc1, Sc4)0.430.350.690.380.660.951.00
       ΔR2 (Sc4 − Sc1)0.070.060.030.050.020.000.00
       ΔRanking accuracy (Sc4 − Sc1)0.150.130.090.160.060.020.00
      DMI
       EBV correlation (Sc1, Sc4)1.000.750.650.840.780.690.89
       ΔR2 (Sc4 − Sc1)0.000.030.060.020.040.050.01
       ΔRanking accuracy (Sc4 − Sc1)0.000.060.120.030.080.090.02
      MBW
       EBV correlation (Sc1, Sc4)0.900.690.810.940.870.800.79
       ΔR2 (Sc4 − Sc1)0.030.090.070.030.050.040.07
       ΔRanking accuracy (Sc4 − Sc1)0.070.130.080.010.070.050.10
      1 Scenario 1 (Sc1) = training population excludes herdmates; scenario 4 (Sc4) = 2-step genotype-by-herd interaction model. Step 1 = regress phenotypes within each herd on EBV generated under scenario 1; step 2 = examine step 1 residuals within each herd for remaining herd-specific genetic variance, estimate herd-specific marker effects, and adjust breeding values from scenario 1 (adjusted EBV = step 1 fitted value + step 2 EBV). ΔR2 = difference in R2 values determined from phenotype regression on EBV; ΔRanking accuracy = difference in Spearman's rank correlation coefficients for phenotype and EBV; MY = milk yield; MKE = milk energy content; MBW = metabolic BW.

      DISCUSSION

      In the near future, many dairy herds will have complete genomic data as increasing numbers of dairy farmers incorporate genomic testing as a routine management tool. This is a potentially rich and untapped source of contextual information that can be leveraged to capture genotype-by-herd interactions and tailor breeding programs to the individual dairy producer. Results from the present study reveal the importance of incorporating herdmate data into genomic evaluations. Significant gains in within-herd prediction accuracy are attainable when herdmate data are included in the training population. The inclusion of herd-specific genetic effects will improve the accuracy of selection decisions, particularly on farms where the genetic background of the herd or the environmental conditions differ from the typical herd in the national reference population.
      The observed improvement in prediction accuracy may be multifactorial. It is well known that accuracy of genomic prediction is dependent on the degree of relatedness between the selection candidates and the reference population (
      • Habier D.
      • Fernando R.L.
      • Dekkers J.C.
      The impact of genetic relationship information on genome-assisted breeding values.
      ,
      • Edwards S.M.
      • Buntjer J.B.
      • Jackson R.
      • Bentley A.R.
      • Lage J.
      • Byrne E.
      • Burt C.
      • Jack P.
      • Berry S.
      • Flatman E.
      The effects of training population design on genomic prediction accuracy in wheat.
      ).
      • Schultz N.E.
      • Weigel K.A.
      An improved genomic prediction model in populations featuring shared environments and familial relatedness.
      demonstrated in data simulated to be similar to Holstein herd structure, improved prediction accuracy with inclusion of herdmates irrespective of the prediction method (genomic BLUP vs. a Bayesian approach). Herdmates tend to have similar genetic background, due to reliance on semen from a single AI company; consistency in the herd's breeding objectives over time; and retention of progeny within the herd. Second, improvement in prediction accuracy may be in part due to the presence and capture of gene-by-herd interactions, even when they are not explicitly modeled. With the limited number of herds included in this study, herd-specific genetic effects may have a strong influence on the across-herd marker effect estimation. The simple inclusion of herdmates in the training data set while assuming a constant marker effect across herds revealed large improvements in prediction accuracy, whereas explicitly modeling marker-by-herd interactions often resulted in only modest further improvement.
      Results from this study reveal herd-specific genetic variance contributes to phenotypic variance. Interestingly, the herds included in this study were previously analyzed as part of a larger data set to identify genotype-by-region interactions for North America, Scotland, and the Netherlands (
      • Yao C.
      • De Los Campos G.
      • VandeHaar M.
      • Spurlock D.
      • Armentano L.
      • Coffey M.
      • De Haas Y.
      • Veerkamp R.
      • Staples C.
      • Connor E.
      Use of genotype × environment interaction model to accommodate genetic heterogeneity for residual feed intake, dry matter intake, net energy in milk, and metabolic body weight in dairy cattle.
      ). The authors observed that prediction accuracy was not a reflection of the size of the reference population. For example, the North American population was largest but accuracies were not the best for any trait, whereas the Scotland reference population was the smallest, composed of only 2 herds, but had the highest accuracies for all traits examined except residual feed intake. The authors concluded greater genetic and environmental variation in the North American data may have contributed to the reduced accuracy of that region. The herd-specific genetic effects observed among the North American herds in our current study support their conclusion and demonstrate opportunities for improvement of genomic evaluations. For example, the within-herd heritability estimate for milk production in the Florida herd was relatively high, given one might expect the increased environmental stress in that geographic region would result in lower heritability. Results from this study indicate that within-herd heritability for the Florida herd would be much lower if herd-specific genetic effects were not modeled. A large portion of the genomic variance within the Florida herd is herd-specific, and therefore, within-herd ranking of animals will benefit more from EBV adjustment for herd-specific genetic effects. The herdmate information adjusted and unadjusted EBV were very poorly correlated. The climate of the Florida herd is very different compared with that of the remaining herds in the study and may be a source of herd-specific genetic effects on milk yield. Adjustment of EBV within the Florida herd for the herd-specific genetic effects results in a substantial increase in ranking accuracy.
      Practical application of the study findings for genotyped herds may entail an examination of nationally EBV correlation with observed phenotypes within a herd. The national EBV has the benefits of a very large reference population and ability to most accurately estimate small constant marker effects across herds. Poor correlation may indicate unaccounted for herd-specific genetic effects. Available phenotypes within the herd (ideally from a single snapshot of time to reduce random noise) should be regressed on the national EBV. The regression coefficient from this model represents a herd-specific expansion or shrinkage adjustment of the EBV. The EBV is the predicted deviation from the mean phenotypic performance and assumes that a one-unit difference in EBV equates to a one-unit phenotypic difference for that trait. Fitted values from this model adjust for the expansion or shrinkage of EBV within a particular herd but do not result in the reranking of animals. Although, expansion or shrinkage of EBV for component traits of net merit indices could result in reranking of animals within a herd based on net merit.
      In an effort to more accurately rank animals within a herd, phenotypes preadjusted for national EBV effects can be examined for additional genetic variance not captured by the nationally estimated constant marker effects. If the genetic variance is greater than zero, cross validation can be used to determine whether within-herd prediction accuracy improves with modeling of herd-specific genetic effects. Results from this study indicate this 2-step approach is an effective method of modeling herd-specific genetic effects and improving within-herd prediction accuracy. Even small changes in the ranking of animals due to herd-specific genetic effects may be effective when dairy producers are making selection and culling decisions every day.
      The authors acknowledge the sample size in this study is small for accurate variance component estimation. Replacement of the scenario 1 EBV in this study with a national EBV determined from a large reference population would more accurately estimate the common genomic variance within herds, which might in turn reduce the gains in prediction accuracy observed in this study. Further research in this area is warranted.
      The 2-step approach in this study focused on evaluating the presence of herd-specific genetic effects and their effect on prediction accuracy. A logical direction for future research would be to identify regions of the genome associated with large herd-specific genetic effects. A similar 2-step approach could be performed to identify markers within a herd associated with phenotypes preadjusted for national EBV. Replication of associated markers in multiple herds would serve as validation and facilitate mapping of candidate genes that may serve as targets for management strategies to optimize a herd's genetic potential.

      CONCLUSIONS

      Herdmates tend to be more genetically similar to each other than animals belonging to separate herds, in addition to sharing a common environment and production management system. Thus, herdmates provide an excellent example of how the test animal's genetics will perform in the current herd environment. Significant gains in within-herd prediction accuracy are attainable by well-representing herdmates of the test animals in the training population. We constructed a novel, but practical, approach to incorporate herd-specific marker effects in genomic predicted breeding values. The approach can be applied on an individual-herd basis by performing a post hoc adjustment of national EBV for herd-specific genetic effects. Herd-customized approaches will aid dairy producers in optimally using their genomic data investment and improve selection decisions involving animals in their herds.

      ACKNOWLEDGMENTS

      This project was supported by Agriculture and Food Research Initiative Competitive Grants no. 2017-67012-26108 and 2011-68004-30340 from the USDA National Institute of Food and Agriculture (Washington, DC). This study uses data generated by the USDA–Agriculture and Food Research Initiative project supported by grant no. 2011-68004-30340 (VandeHaar et al., 2011), and data have been made publicly available in the form of data sharing with scientists from the USDA–Agricultural Research Service and Council on Dairy Cattle Breeding (Bowie, MD).

      Supplementary Material

      REFERENCES

        • Cheng H.
        • Garrick D.J.
        • Fernando R.L.
        Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.
        J. Anim. Sci. Biotechnol. 2017; 8 (28469846): 38
        • Covarrubias-Pazaran G.
        Genome-assisted prediction of quantitative traits using the R package sommer.
        PLoS One. 2016; 11 (27271781)e0156744
        • Edwards S.M.
        • Buntjer J.B.
        • Jackson R.
        • Bentley A.R.
        • Lage J.
        • Byrne E.
        • Burt C.
        • Jack P.
        • Berry S.
        • Flatman E.
        The effects of training population design on genomic prediction accuracy in wheat.
        bioRxiv. 2018;
        • Habier D.
        • Fernando R.L.
        • Dekkers J.C.
        The impact of genetic relationship information on genome-assisted breeding values.
        Genetics. 2007; 177 (18073436): 2389-2397
        • Meuwissen T.H.
        • Hayes B.J.
        • Goddard M.E.
        Prediction of total genetic value using genome-wide dense marker maps.
        Genetics. 2001; 157 (11290733): 1819-1829
        • Schultz N.E.
        • Weigel K.A.
        An improved genomic prediction model in populations featuring shared environments and familial relatedness.
        in: Proc. World Congr. Genet. Appl. Livest. Prod., Auckland, New Zealand. 2018: 520
        • Strandberg E.
        • Brotherstone S.
        • Wall E.
        • Coffey M.P.
        Genotype by environment interaction for first-lactation female fertility traits in UK dairy cattle.
        J. Dairy Sci. 2009; 92 (19528622): 3437-3446
        • Tempelman R.J.
        • Spurlock D.M.
        • Coffey M.
        • Veerkamp R.F.
        • Armentano L.E.
        • Weigel K.A.
        • de Haas Y.
        • Staples C.R.
        • Connor E.E.
        • Lu Y.
        • VandeHaar M.J.
        Heterogeneity in genetic and nongenetic variation and energy sink relationships for residual feed intake across research stations and countries.
        J. Dairy Sci. 2015; 98 (25582589): 2013-2026
        • Tiezzi F.
        • de Los Campos G.
        • Parker Gaddis K.L.
        • Maltecca C.
        Genotype by environment (climate) interaction improves genomic prediction for production traits in US Holstein cattle.
        J. Dairy Sci. 2017; 100 (28109596): 2042-2056
        • Weir B.S.
        • Hill W.G.
        Estimating F-statistics.
        Annu. Rev. Genet. 2002; 36 (12359738): 721-750
        • Wientjes Y.C.J.
        • Bijma P.
        • Vandenplas J.
        • Calus M.P.L.
        Multi-population genomic relationships for estimating current genetic variances within and genetic correlations between populations.
        Genetics. 2017; 207 (28821589): 503-515
        • Wiggans G.R.
        • Cole J.B.
        • Hubbard S.M.
        • Sonstegard T.S.
        Genomic selection in dairy cattle: The USDA experience.
        Annu. Rev. Anim. Biosci. 2017; 5 (27860491): 309-327
        • Wiggans G.R.
        • Cooper T.A.
        • VanRaden P.M.
        • Van Tassell C.P.
        • Bickhart D.M.
        • Sonstegard T.S.
        Increasing the number of single nucleotide polymorphisms used in genomic evaluation of dairy cattle.
        J. Dairy Sci. 2016; 99 (27040793): 4504-4511
        • Yao C.
        • De Los Campos G.
        • VandeHaar M.
        • Spurlock D.
        • Armentano L.
        • Coffey M.
        • De Haas Y.
        • Veerkamp R.
        • Staples C.
        • Connor E.
        Use of genotype × environment interaction model to accommodate genetic heterogeneity for residual feed intake, dry matter intake, net energy in milk, and metabolic body weight in dairy cattle.
        J. Dairy Sci. 2017; 100 (28109605): 2007-2016