Research Article| Volume 94, ISSUE 5, P2613-2620, May 2011

Ok

# Differences among methods to validate genomic evaluations for dairy cattle

Open Archive

## Abstract

Two methods of testing predictions from genomic evaluations were investigated. Data used were from the August 2006 and April 2010 official USDA genetic evaluations of dairy cattle. The training data set consisted of both cows and bulls that were proven (had own or daughter information) as of August 2006 and included 8,022, 1,959, and 1,056 Holsteins, Jerseys, and Brown Swiss, respectively. The validation data set consisted of bulls that were unproven as of August 2006 and were proven by April 2010 with 2,653, 411, and 132 Holsteins, Jerseys, and Brown Swiss for the production traits. Method 1 used the training animal's predicted transmitting ability (PTA) from August of 2006. Method 2 used the training animal's April 2010 PTA to estimate single nucleotide polymorphism effects. Both methods were tested using several regressions with the same validation animals. In both cases, the validation animals were tested using the deregressed April 2010 PTA. All traits that had genomic evaluations from the official USDA April 2010 genetic evaluations were tested. Results included bias, differences from expected regressions (calculated using selection intensities), and the coefficients of determination. The genomic information increased the predictive ability for most of the traits in all of the breeds. The 2 methods of testing resulted in some differences that would affect interpretation of results. The coefficient of determination was higher for all traits using method 2. This was the expected result as the data were not independent because evaluations of the validation bulls contributed to their sires’ evaluations. The regression coefficients from method 2 were often higher than the regression coefficients from method 1. Many traits had regression coefficients that were higher than 2 standard deviations from the expected regressions when using method 2. This was partially due to the lack of independence of the training and validation data sets. Most traits did have some level of bias in the prediction equations, regardless of breed. The use of method 1 made it possible to evaluate the increased accuracy in proven first-crop bull evaluations by using genomic information. Proven first-crop bulls had an increase in accuracy from the addition of genomic information. It is advised to use method 1 for validation of genomic evaluations.

## Introduction

The methods of validation for genomic evaluation are becoming an important topic as more countries use within-country genomic evaluations, and because of the impending multi-country genomic evaluations. This topic has received some attention in the literature, but most results have come from simulated studies.

Mäntysaari, E., Z. Liu, and P. VanRaden. 2010. Interbull validation test for genomic evaluations. Interbull Bull. 41:5 pages. Accessed March 3, 2011. http://www.interbull.org/images/stories/Mntysaari.pdf.

outline methods for genomic validation procedures that could be used with varying populations and data structures.
• Amer P.R.
• Banos G.
Implications of avoiding overlap between training and testing data sets when evaluating genomic predictions of genetic merit.
found it best to not overlap data in the training and validation data sets and showed the consequences of overlapping the training and validation data sets using a single national genetic evaluation run. However, there has been an acceptance of methods that does include overlap between the 2 data sets. Validation typically tests the regression of deregressed PTA on genomic predicted transmitting ability (GPTA) for animals not in the reference population.
Interbull currently allows for marketing of young genomically tested bulls from countries that have passed genomic validation (

Interbull. 2010. Interbull validation test for Genomic evaluations—GEBV test. 2010. Accessed August 10, 2010. http://www.interbull.org/images/stories/GEBV_validationtest_June2010.pdf.

). Interbull allows 2 types of validation methods for genomic evaluations (

Interbull. 2010. Interbull validation test for Genomic evaluations—GEBV test. 2010. Accessed August 10, 2010. http://www.interbull.org/images/stories/GEBV_validationtest_June2010.pdf.

). The first method uses the current PTA of the training animals to make predictions for the validation animals, and the other method uses the training animals’ PTA from 4 yr before to make predictions. Both methods test the GPTA using the current PTA of the validation animals (

Interbull. 2010. Interbull validation test for Genomic evaluations—GEBV test. 2010. Accessed August 10, 2010. http://www.interbull.org/images/stories/GEBV_validationtest_June2010.pdf.

). Use of current PTA for both training and validation would be simpler than regenerating the proofs from 4 yr ago. Not all countries have access to data from 4 yr ago and regeneration of that data may not be possible. Some concern exists, however, about using the current PTA for genomic predictions because the validation bulls contribute information back to their sires, which are typically training bulls, as demonstrated by
• Amer P.R.
• Banos G.
Implications of avoiding overlap between training and testing data sets when evaluating genomic predictions of genetic merit.
. Another advantage of using 4-yr-old instead of current PTA to make the predictions is that the gain in genomic reliability of first-crop proven bulls can be validated.
The objective of this study was to investigate 2 common methods of genomic evaluation validation using the genomic data available in the US dairy cattle population.

## Materials and Methods

### Data

The animals were genotyped using the BovineSNP50 BeadChip (Illumina, San Diego, CA). Genotyped animals passed the general edits as explained by
• Wiggans G.R.
• Bacheller L.R.
• Tooker M.E.
• Hutchison J.L.
• Cooper T.A.
• Sonstegard T.S.
Selection and management of DNA markers for use in genomic evaluations.
. A common set of 43,382 SNP were used across Holsteins, Jerseys, and Brown Swiss, 3 fewer than reported by
• Wiggans G.R.
• Bacheller L.R.
• Tooker M.E.
• Hutchison J.L.
• Cooper T.A.
• Sonstegard T.S.
Selection and management of DNA markers for use in genomic evaluations.
due to poor SNP performance and quality. The data included PTA from the official USDA genetic evaluation of dairy cattle from both the August 2006 and April 2010 genetic evaluations. Proven cows and bulls were used in the training data set. A cow adjustment was applied for Holsteins and Jerseys to account for cow bias (
• Wiggans G.R.
• Cooper T.A.
Cow adjustments for genomic predictions of Holstein and Jersey bulls.
). The cow adjustment was not applied for Brown Swiss because of insufficient number of genotyped cows to estimate the adjustment factors (
• Wiggans G.R.
• Cooper T.A.
Cow adjustments for genomic predictions of Holstein and Jersey bulls.
). The training data sets consisted of 8,022, 1,959, and 1,056 Holsteins, Jerseys, and Brown Swiss, respectively. The training data set included foreign animals, some of which did not have any US daughters (e.g., Canadian Holsteins, Switzerland Brown Swiss, among others). The validation data set included only bulls that had US registration numbers and were unproven (no daughter information) in August 2006 and were proven (with daughters in at least 10 herds in the United States) in April of 2010. The number of validation bulls was 2,653, 411, and 132 Holsteins, Jerseys, and Brown Swiss, respectively, and varied slightly by trait. This was partially due to the 10-herd restriction, especially for the conformation traits because fewer herds classify. The number of validation bulls was much lower for calving traits because the bulls were younger when they had a traditional evaluation and many had a PTA in the 4-yr cut-off data set and, therefore, were not included in the validation data set for those traits.
The traits analyzed included all the traits with genomic evaluations in the official USDA April 2010 genetic evaluation. Holstein conformation traits were analyzed with PTA provided by Holstein Association USA (Brattleboro, VT). The data from August 2006 were converted to the January 2010 base so that they were consistent with the April 2010 data. The formula used for PTA base conversion was
$PTAnew=SD ratio×(PTAold−base change),$
[1]

where PTAnew is the PTA on the 2010 base and PTAold is the PTA on the 2005 base. Specifics on the base change values and corresponding SD ratios are given by

VanRaden, P. M., J. B. Cole, M. E. Tooker, and T. A. Cooper. 2009a. Genetic base changes for January 2010. AIPL Res. Rep. BASE2 (8-09). Accessed September 5, 2010. http://aipl.arsusda.gov/reference/base2010.htm.

.

### Methods

The training animals were used to compute predictions, which were then applied and tested using the validation animals. The genomic predictions were computed using nonlinear genomic models (
Efficient methods to compute genomic predictions.
), which allowed the SNP with smaller effects to be regressed more toward zero. The model also included a polygenic effect (to be consistent with official USDA genomic evaluations), which was set to 0.10 so that 90% of the genetic variation was from SNP effects and the other 10% inherited through the pedigree relationships (
• Van Tassell C.P.
• Wiggans G.R.
• Sonstegard T.S.
• Schnabel R.D.
• Taylor J.F.
• Schenkel F.S.
Invited review: Reliability of genomic predictions for North American Holstein bulls.
).
Two different methods of genomic validation were used. Method 1 used the genotypes and PTA of proven bulls and cows (had daughter or own information) from the August 2006 official USDA genetic evaluation, whereas method 2 used the PTA of the same animals but from the April 2010 USDA genetic evaluation using the traditional evaluation only.
In both methods, the SNP effects were then applied to the genotypes of the validation bulls and combined with parent average (PA) to produce GPTA. Those GPTA were tested using the deregressed PTA from April 2010 of the validation bulls. The validation data sets consisted of the same animals. The PTA used to make the SNP estimates were not the same for the training animals, resulting in different GPTA for the validation animals. The regressions to test the genomic predictions were
$DD=b0+b1×GPTA1+e1$
[2]

and
$DD=b0+b2×GPTA2+e2,$
[3]

where DD is the deregressed daughter deviations, computed using the methodology explained by
• Van Tassell C.P.
• Wiggans G.R.
• Sonstegard T.S.
• Schnabel R.D.
• Taylor J.F.
• Schenkel F.S.
Invited review: Reliability of genomic predictions for North American Holstein bulls.
; b0 is the intercept; b1 is the regression coefficient for the GPTA1, where the SNP effects were calculated using the August 2006 PTA; b2 is the regression coefficient for the GPTA2 using the April 2010 PTA; and e is the random residual error. These equations were compared with each other and to the traditional evaluations. The traditional evaluation substituted PA for GPTA in equations 2 and 3 where the PA is either the August 2006 genetic evaluations (PA1) for equation 2 or the April 2010 genetic evaluations (PA2) for equation 3.
In addition to testing validation bulls, bulls in the training set that were first-crop proven bulls were also analyzed for gains in accuracy. The analysis used only new daughter information for the deregressed PTA, where the contribution of only new daughters was calculated similarly to the calculation of
• Norman H.D.
• Wright J.R.
• Powell R.L.
• Miglior F.
• de Jong G.
Consistency of maturity rate for milk yield across countries and generations.
with the formula
$D2=[(n1+n2) D1,2−n1D1]/n2,$
[4]

where D2 is the contribution of the daughters added over the last 4 yr; D1 is the contribution of daughters in the bull's genetic evaluation 4 yr ago; D1,2 is the contribution of all of the daughters; n1 is the number of daughters that contributed to the PTA 4 yr ago; and n2 is the number of daughters added between August 2006 and April 2010.
The expected regressions [E(b1]] were calculated using the formula used by Interbull (

Mäntysaari, E., Z. Liu, and P. VanRaden. 2010. Interbull validation test for genomic evaluations. Interbull Bull. 41:5 pages. Accessed March 3, 2011. http://www.interbull.org/images/stories/Mntysaari.pdf.

) with the equation
$E(b1)={1−[i(i−x)]}/{1−[i(i−x)r2]},$
[5]

where i is the standardized selection differential for the genotyped bulls; x is the selection differential for the genotyped animals from the truncated data set; and r2 is the coefficient of determination of the validation bull.

## Results and Discussion

The results for method 1 and method 2 (equations 2 and 3) and corresponding expected regression coefficients for validation bulls are found in Tables 1, 2 and 3 for Holsteins, Jerseys, and Brown Swiss, respectively. Squared correlations for the PA for the validation bulls using the August 2006 and April 2010 evaluations are found in Table 4 (PA was used instead of GPTA in equations 2 and 3). Table 5 contains the coefficients of determination for predicting later daughters of the first-crop training bulls for yield, health, and fertility traits (using equation 5 to identify only new daughter information). Table 5 also contains the regressions for the additional daughter information based on genomic information with the traditional PTA, the subset PTA that was computed using the pedigree relationship matrix for the genotyped animals, and the direct genomic value (DGV), which includes the information from genomics only.
Table 1Regressions and squared correlations for prediction of Holstein validation data using GPTA1
Training animal August 2006 data used to compute genomic PTA for validation animals.
or GPTA2
Training animal April 2010 data used to compute genomic PTA for validation animals.
TraitExpectedRegression ± SEnIntercept ± SESquared correlation
GPTA1GPTA2GPTA1GPTA2GPTA1GPTA2
Milk (kg)0.930.90 ± 0.020.96 ± 0.022,653−10.4 ± 7.543.5 ± 6.44045
Fat (kg)0.880.94 ± 0.020.97 ± 0.022,653−1.06 ± 0.291.5 ± 0.224146
Protein (kg)0.880.87 ± 0.020.91 ± 0.022,6530.39 ± 0.221.5 ± 0.183743
PL
Productive life.
(mo)
0.771.03 ± 0.041.00 ± 0.032,653−1.87 ± 0.08−1.57 ± 0.072432
SCS0.810.88 ± 0.030.86 ± 0.022,653−0.02 ± 0.004−0.03 ± 0.0042936
DPR
Daughter pregnancy rate.
(%)
0.861.08 ± 0.041.04 ± 0.032,653−0.24 ± 0.040.00 ± 0.042131
Final score0.760.86 ± 0.030.88 ± 0.022,5980.56 ± 0.02−0.01 ± 0.023451
Stature0.900.95 ± 0.021.01 ± 0.022,1840.32 ± 0.02−0.16 ± 0.024966
Strength0.910.91 ± 0.030.99 ± 0.022,1840.35 ± 0.02−0.06 ± 0.023657
Dairy form0.951.02 ± 0.031.07 ± 0.022,1830.58 ± 0.02−0.19 ± 0.024260
Foot angle0.820.90 ± 0.030.96 ± 0.022,1830.57 ± 0.030.02 ± 0.033556
Rear legs0.960.94 ± 0.031.09 ± 0.022,1840.04 ± 0.03−0.05 ± 0.023657
Body depth0.920.93 ± 0.031.00 ± 0.022,1840.45 ± 0.02−0.10 ± 0.023758
Rump angle0.970.92 ± 0.021.07 ± 0.022,184−0.15 ± 0.03−0.06 ± 0.024261
Rump width0.860.86 ± 0.020.99 ± 0.022,1840.32 ± 0.02−0.12 ± 0.024059
Fore udder0.790.89 ± 0.030.99 ± 0.022,1840.51 ± 0.02−0.07 ± 0.023757
Rear udder height0.780.85 ± 0.030.90 ± 0.022,1840.79 ± 0.03−0.01 ± 0.033151
Udder depth0.830.87 ± 0.021.07 ± 0.022,1840.10 ± 0.02−0.15 ± 0.024160
Udder cleft0.840.94 ± 0.031.07 ± 0.022,1840.40 ± 0.03−0.14 ± 0.023760
Front teat placement0.870.88 ± 0.021.08 ± 0.022,1290.23 ± 0.03−0.14 ± 0.023858
Teat length0.950.93 ± 0.031.06 ± 0.022,184−0.06 ± 0.020.02 ± 0.023458
Sire CE
Calving ease.
(%)
0.850.79 ± 0.030.82 ± 0.021,6182.6 ± 0.222.10 ± 0.182830
Daughter CE (%)0.860.90 ± 0.050.81 ± 0.031,622−0.06 ± 0.431.34 ± 0.251721
Sire stillbirth (%)0.860.79 ± 0.070.82 ± 0.051,6172.77 ± 0.532.08 ± 0.36710
Daughter stillbirth (%)0.970.86 ± 0.070.97 ± 0.041,6110.79 ± 0.570.55 ± 0.34918
1 Training animal August 2006 data used to compute genomic PTA for validation animals.
2 Training animal April 2010 data used to compute genomic PTA for validation animals.
3 Productive life.
4 Daughter pregnancy rate.
5 Calving ease.
Table 2Regressions and squared correlations for prediction of Jersey validation data using GPTA1
Training animal August 2006 data used to compute genomic PTA for validation animals.
or GPTA2
Training animal April 2010 data used to compute genomic PTA for validation animals.
TraitExpectedRegression ± SEnIntercept ± SESquared correlation
GPTA1GPTA2GPTA1GPTA2GPTA1GPTA2
Milk (kg)0.941.02 ± 0.051.09 ± 0.0541165 ± 1499 ± 124855
Fat (kg)0.910.83 ± 0.050.95 ± 0.054115.03 ± 0.595.90 ± 0.503743
Protein (kg)0.910.91 ± 0.060.99 ± 0.054112.67 ± 1.003.8 ± 0.363946
PL
Productive life.
(mo)
0.911.08 ± 0.121.08 ± 0.09411−0.11 ± 0.18−0.25 ± 0.171625
SCS0.930.71 ± 0.080.79 ± 0.074110.06 ± 0.010.02 ± 0.011722
DPR
Daughter pregnancy rate.
(%)
0.951.24 ± 0.151.27 ± 0.07381−0.02 ± 0.11−0.06 ± 0.091529
Final score0.900.61 ± 0.070.93 ± 0.073780.25 ± 0.050.09 ± 0.051632
Stature0.930.93 ± 0.050.95 ± 0.04384−0.10 ± 0.060.08 ± 0.054654
Strength0.940.94 ± 0.080.98 ± 0.06384−0.12 ± 0.04−0.03 ± 0.042939
Dairy form0.900.63 ± 0.070.86 ± 0.073700.16 ± 0.060.01 ± 0.051830
Foot angle0.980.78 ± 0.080.90 ± 0.073830.01 ± 0.040.01 ± 0.032131
Rear legs0.971.04 ± 0.101.04 ± 0.08323−0.02 ± 0.040.01 ± 0.032430
Rump angle0.920.95 ± 0.061.10 ± 0.06385−0.08 ± 0.04−0.02 ± 0.043648
Rump width0.960.97 ± 0.070.96 ± 0.06383−0.13 ± 0.04−0.06 ± 0.033443
Fore udder0.980.85 ± 0.070.90 ± 0.052930.07 ± 0.050.03 ± 0.043546
Rear udder height0.890.63 ± 0.060.86 ± 0.063700.30 ± 0.060.10 ± 0.052236
Udder depth1.001.00 ± 0.080.97 ± 0.052330.12 ± 0.070.14 ± 0.054148
Udder cleft0.900.74 ± 0.080.94 ± 0.073850.09 ± 0.040.06 ± 0.042233
Front teat placement0.950.92 ± 0.070.97 ± 0.063850.01 ± 0.05−0.03 ± 0.053442
Teat length0.970.86 ± 0.070.91 ± 0.06376−0.05 ± 0.05−0.01 ± 0.042836
1 Training animal August 2006 data used to compute genomic PTA for validation animals.
2 Training animal April 2010 data used to compute genomic PTA for validation animals.
3 Productive life.
4 Daughter pregnancy rate.
Table 3Regressions and squared correlations for prediction of Brown Swiss validation data using GPTA1
Training animal August 2006 data used to compute genomic PTA for validation animals.
or GPTA2
Training animal April 2010 data used to compute genomic PTA for validation animals.
TraitExpectedRegression ± SEnIntercept ± SESquared correlation
GPTA1GPTA2GPTA1GPTA2GPTA1GPTA2
Milk (kg)0.850.65 ± 0.141.17 ± 0.15132−154 ± 46−116 ± 331431
Fat (kg)0.930.53 ± 0.121.02 ± 0.13132−6.0 ± 1.6−4.8 ± 1.11432
Protein (kg)0.880.54 ± 0.111.05 ± 0.14132−3.7 ± 1.3−3.4 ± 1.01431
PL
Productive life.
(mo)
0.941.26 ± 0.291.38 ± 0.21132−1.9 ± 0.50−1.05 ± 0.421326
SCS0.901.04 ± 0.181.20 ± 0.161320.01 ± 0.020.00 ± 0.021731
DPR
Daughter pregnancy rate.
(%)
0.910.55 ± 0.370.90 ± 0.251320.23 ± 0.310.44 ± 0.2729
Final score0.931.37 ± 0.231.51 ± 0.24109−0.11 ± 0.07−0.09 ± 0.082524
Stature0.961.07 ± 0.161.26 ± 0.141200.06 ± 0.14−0.08 ± 0.122840
Strength0.980.72 ± 0.200.87 ± 0.18107−0.09 ± 0.09−0.05 ± 0.081015
Dairy form0.890.90 ± 0.251.11 ± 0.191000.04 ± 0.13−0.03 ± 0.111222
Foot angle1.001.34 ± 0.301.50 ± 0.251120.04 ± 0.10−0.02 ± 0.091523
Rear legs, side0.950.84 ± 0.160.96 ± 0.151210.02 ± 0.080.08 ± 0.071824
Rump angle0.980.88 ± 0.171.01 ± 0.151200.05 ± 0.130.12 ± 0.111926
Rump width0.960.97 ± 0.231.27 ± 0.18113−0.12 ± 0.09−0.08 ± 0.071429
Fore udder0.960.98 ± 0.261.15 ± 0.19870.00 ± 0.150.06 ± 0.121524
Rear udder0.841.01 ± 0.211.13 ± 0.151150.07 ± 0.110.02 ± 0.101831
Udder depth0.961.09 ± 0.151.19 ± 0.141200.11 ± 0.110.07 ± 0.103137
Udder cleft0.930.92 ± 0.170.99 ± 0.141200.11 ± 0.110.08 ± 0.101928
Front teat0.931.28 ± 0.211.45 ± 0.17115−0.15 ± 0.13−0.20 ± 0.112636
Teat length0.961.21 ± 0.181.13 ± 0.151200.17 ± 0.150.10 ± 0.142731
Rear legs, rear0.99−0.26 ± 0.101.51 ± 0.26121−0.09 ± 0.08−0.01 ± 0.07621
Sire CE
Calving ease.
(%)
0.880.41 ± 0.661.05 ± 0.22473.9 ± 2.90.35 ± 1.13115
Daughter CE (%)0.931.56 ± 0.530.42 ± 0.2046−3.4 ± 2.93.03 ± 1.04164
1 Training animal August 2006 data used to compute genomic PTA for validation animals.
2 Training animal April 2010 data used to compute genomic PTA for validation animals.
3 Productive life.
4 Daughter pregnancy rate.
5 Calving ease.
Table 4Squared correlations for traditional evaluations of validation bulls based on August 2006 or April 2010 parent average
TraitSquared correlation 2006Squared correlation 2010
HolsteinJerseyBrown SwissHolsteinJerseyBrown Swiss
Milk0.190.360.050.320.470.20
Fat0.170.300.060.290.400.21
Protein0.200.310.050.330.420.19
Productive life0.170.080.070.210.210.19
SCS0.140.100.090.280.210.24
DPR
Daughter pregnancy rate.
0.160.070.010.280.270.09
Final score0.190.110.100.350.340.18
Stature0.230.360.150.410.490.26
Strength0.160.210.070.370.360.15
Dairy form0.190.120.060.390.300.16
Foot angle0.240.110.060.450.240.16
Rear legs0.180.200.140.410.270.19
Body depth0.16NA
Not applicable.
NA0.37NANA
Rump angle0.170.240.100.360.430.19
Rump width0.210.240.100.420.390.25
Fore udder0.120.220.150.340.420.27
Rear udder height0.170.160.090.350.340.21
Udder depth0.100.300.250.310.440.30
Udder cleft0.210.160.110.430.340.20
Front teat placement0.130.270.230.340.410.35
Teat length0.100.190.170.340.310.22
Sire calving ease0.21NA0.140.22NA0.14
Daughter calving ease0.10NA0.160.13NA0.03
Sire stillbirth0.06NANA0.10NANA
Daughter stillbirth0.07NANA0.12NANA
1 Daughter pregnancy rate.
2 Not applicable.
Table 5Gains from genomic information for the training bulls that gained daughter information for yield, health, and fertility traits
Trait
PL=productive life; DPR=daughter pregnancy rate.
Relationship matrix of the genomic animals.
PTA ± SE
DGV
Direct genomic value.
± SE
R
Relationship matrix of the genomic animals.
PTA ± SER
Relationship matrix of the genomic animals.
Holstein
Milk (kg)−0.52 ± 0.330.45 ± 0.351.10 ± 0.100.780.89 ± 0.040.68
Fat (kg)−0.53 ± 0.290.41 ± 0.301.21 ± 0.100.760.91 ± 0.040.62
Protein (kg)−0.54 ± 0.300.53 ± 0.331.04 ± 0.110.730.89 ± 0.040.62
PL (mo)0.56 ± 0.21−0.34 ± 0.220.77 ± 0.090.640.93 ± 0.030.60
SCS0.33 ± 0.25−0.12 ± 0.280.90 ± 0.080.751.03 ± 0.030.69
DPR (%)1.36 ± 0.30−1.19 ± 0.321.05 ± 0.110.681.14 ± 0.040.62
Jersey
Milk (kg)1.29 ± 0.60−1.29 ± 0.610.96 ± 0.270.630.93 ± 0.080.58
Fat (kg)0.88 ± 0.56−0.59 ± 0.610.66 ± 0.220.720.93 ± 0.060.70
Protein (kg)1.43 ± 0.61−1.46 ± 0.630.92 ± 0.250.550.79 ± 0.080.48
PL (mo)1.79 ± 0.64−1.43 ± 0.690.73 ± 0.260.461.07 ± 0.090.43
SCS−0.12 ± 0.72−0.21 ± 0.751.27 ± 0.230.350.66 ± 0.090.25
DPR (%)0.36 ± 1.31−0.79 ± 1.361.92 ± 0.370.521.22 ± 0.130.41
Brown Swiss
Milk (kg)−0.30 ± 0.50−0.03 ± 0.640.97 ± 0.450.320.55 ± 0.100.26
Fat (kg)0.54 ± 0.47−1.04 ± 0.600.94 ± 0.420.170.41 ± 0.120.13
Protein (kg)−0.16 ± 0.39−0.11 ± 0.540.75 ± 0.390.280.43 ± 0.090.24
PL (mo)0.50 ± 0.96−0.33 ± 1.100.96 ± 0.550.501.07 ± 0.120.49
SCS−0.99 ± 0.661.17 ± 0.741.14 ± 0.380.530.89 ± 0.100.46
DPR (%)0.41 ± 0.98−0.32 ± 1.101.41 ± 0.530.481.34 ± 0.160.45
1 PL = productive life; DPR = daughter pregnancy rate.
2 Relationship matrix of the genomic animals.
3 Direct genomic value.
Almost all traits had an increase in the predictive ability from incorporating genomic information into the genetic evaluation over the traditional evaluations (PA). This is evident in the comparison of the squared correlation coefficients for Table 4 with Table 1, Table 2, Table 3. For example, Holstein milk had a squared correlation coefficient of 0.19 for PA1 and one of 0.40 using GPTA1, which is a gain of 0.21 for the correlation coefficient. The Holsteins had the largest gains in accuracy for the traits, followed by Jerseys. These results were expected because the Holsteins had the largest training data set. The standard errors also were smallest for the Holstein breed, again due to the high number of validation bulls, followed by Jerseys and Brown Swiss. The expected regressions illustrate how much selection has occurred in the genotyped bulls for each trait, with the traits being closer to 1 having less selection pressure.
Both daughter calving ease and sire calving ease in the Brown Swiss analysis did not gain accuracy from genomic evaluations as can be seen in the comparison of Table 3 with Table 4. Several likely reasons could explain this. Fewer animals contributed to the calving ease analysis because some of the Brown Swiss training animals were foreign and did not have calving ease information available. Another probable reason is extremely low numbers of validation bulls for calving traits, which could cause difficulty in properly validating the effect of genomic information for these traits. In addition, the heritability of the calving ease traits is very low, and lower heritability traits, especially with fewer daughters per bull, have been found to have lower squared correlations (
• Luan T.
• Woolliams J.A.
• Lien S.
• Kent M.
• Svendsen M.
• Meuwissen T.H.E.
The accuracy of genomic selection in Norwegian Red cattle assessed by cross-validation.
;
• Guo G.
• Lund M.S.
• Zhang Y.
• Su G.
Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables.
).
Differences in squared correlation coefficients between the methods used for validation were fairly large for some traits. In most instances, the GPTA2 method of validation resulted in a higher squared correlation than that from the model using GPTA1 (equations 2 and 3, respectively). This was consistent with findings from
• Amer P.R.
• Banos G.
Implications of avoiding overlap between training and testing data sets when evaluating genomic predictions of genetic merit.
, who found that the apparent realized accuracy was often inflated when the son's daughters contributed back into the breeding values of the sires. The deregression of the PTA using a full pedigree should have limited the son's daughter contribution to the deregressed proofs of the sires. More inflation probably existed using the USDA genetic evaluations because the cow information was used in the traditional and genomic evaluations. A son would contribute more information back into the dam evaluation than the sire evaluation because bulls typically have more offspring than cows. The inflation of regressions and squared correlation coefficients when using GPTA2 instead of GPTA1 was more evident in the conformation traits, probably due to the number of daughters each validation bull had (fewer daughters than would typically be available from milk production traits). Holstein conformation traits were also affected by changes in processing pedigree and phenotypic data between August 2006 and April 2010, which was one of the reasons for higher squared correlations for the GPTA2 data when compared with differences between the methods in other traits.
• Amer P.R.
• Banos G.
Implications of avoiding overlap between training and testing data sets when evaluating genomic predictions of genetic merit.
also found that low heritability traits and low numbers of daughters for validation bulls caused higher inflation of results.
Interbull guidelines for passing genomic validation, which is necessary for the marketing of young bulls in other countries, requires that genomic regressions must fall within 2 SE of the expected regression (

Interbull. 2010. Interbull validation test for Genomic evaluations—GEBV test. 2010. Accessed August 10, 2010. http://www.interbull.org/images/stories/GEBV_validationtest_June2010.pdf.

). For the Holsteins, this gave a small acceptable interval for the regression coefficients. As it can be seen in Table 1, the production traits had an SE of 0.02 for Holsteins, resulting in an interval of 0.08 for regression coefficients. In comparison, Table 3 shows the Brown Swiss SE were about 0.13 for production traits, making an interval size of 0.52. Meeting the requirements would be less stringent for traits that had PA that performed well. The restriction to be within 2 SE could be a disadvantage to countries with larger genotyped populations, because SE is a function of the SD and the number of animals genotyped. We did test the Holsteins by randomly using 25% of the validation bulls, and the regression coefficients were very similar to those in Table 1; however, the SE were about 0.05, which would cause many traits to pass validation (results not shown). Interbull uses a genetic SD interval and then allows for an adjustment based on SE above or below that for trend validation in traditional evaluations (

Interbull.2008. Description of National Genetic Evaluation Systems for dairy cattle traits as applied in different Interbull member countries. Accessed June 2010. http://www-interbull.slu.se/national_ges_info2/framesida-ges.htm.

). A method similar to this would not penalize countries with more validation animals genotyped.
Most traits for Holsteins and Jerseys fall within 2 SE of the expected regressions using GPTA1; however, using the GPTA2, many traits have too high of a regression, indicating that the genomic predictions did better than the expected regressions (Tables 1 and 2). This is especially evident in the Holsteins where all type traits, fat yield, productive life, SCS, and daughter pregnancy rates were too high to pass current validation standards. A few traits existed for which the GPTA1 regressions were outside of the 2 SE confidence interval of the expected regression, but the GPTA2 regression coefficient was within the confidence interval. This was the case for the Jerseys and Brown Swiss. The Brown Swiss passed using GPTA2 but not GPTA1 for fat yield, protein yield, and rear legs–rear view (Table 3). The regression coefficients were 0.53 for fat yield and 0.54 for protein yield when GPTA1 was used; however, when GPTA2 was used, the regression coefficients went to 1.02 and 1.05, respectively. The Jerseys passed using GPTA2 but not GPTA1 for SCS, final score, dairy form, foot angle, and rear udder height. Both Jerseys and Brown Swiss had fewer training and validation animals. The Holsteins did not have any traits that passed using GPTA2 instead of GPTA1, but rather had the opposite, where 7 of the type traits passed using GPTA1 but were too high for GPTA2. It was slightly surprising that the same genotyped population would pass using one form of validation but not the other.
Several countries use method 2 (GPTA2) or a version thereof for genomic validation studies (

Lund, M. S., A. P. W. de Roos, A. G. de Vries, T. Druet, V. Ducrocq, S. Fritz, F. Guillaume, B. Guldbrandtsen, Z. Liu, R. Reents, C. Schrooten, M. Seefried, and G. Su. 2010. Improving genomic prediction by EuroGenomics collaboration. Commun. No. 880 in Proc. 9th World Congr. Genet. Appl. Livest. Prod., Leipzig, Germany. Gesellschaft für Tierzuchtwissenschaften e.V., Gießen, Germany.

). The results varied by country in that study, and the PA in their case was pedigree index, which, again, varied by country, due to data available. In most cases, the pedigree index did not account for dam information. The genomic information produced high accuracy in that study.
Another concern with the use of GPTA2 for validation purposes was the comparison to traditional methods. Typically, the PA is used for comparison purposes and to assess the gains from genomic evaluations, and when GPTA2 is used, the use of PA2 would seem logical. However, the USDA genetic evaluations use the dam's information, which is correlated with the son's later data and causes rather large inflation of the PA regression coefficient, as evident in Table 5. However, the gains in the accuracy were not inflated using GPTA2 because PA2 was higher than PA1 by more than GPTA2 was higher than GPTA1. The average difference between the squared correlation coefficients for PA1 and PA2 was −0.16, −0.15, and −0.09 for Holsteins, Jerseys, and Brown Swiss, respectively. In contrast, the average difference in the squared correlation coefficients from GPTA1 and GPTA2 was −0.14, −0.10, and −0.09 for Holsteins, Jerseys, and Brown Swiss, respectively. This could cause lower reported gains of accuracy associated with using genomic information.
Several instances exist where the validation using GPTA1 would be extremely difficult and would not be the ideal method. For example, when countries combined their genomic reference populations, the countries would have to use phenotypic data from international multi-trait across-country evaluations (MACE) because some of the training bulls (and in some cases, most training bulls) would not have daughters in the domestic country. This is problematic because many countries have made changes to their traditional domestic evaluations in the last 4 yr. Evaluation systems that have changed are hard to validate when the MACE values from 4 yr ago are used. The problem could be addressed by having Interbull conduct a special traditional evaluation with the countries’ current models by truncating the data from 4 yr ago. National PTA could be submitted to Interbull for a special MACE and then results from the MACE could be used for genomic validation for the member countries. This would solve the problems associated with using GPTA2 and also the difficulties associated with obtaining valid data for GPTA1.
Even though the main focus of genomics has been on the young bulls, first-crop bulls also gain accuracy through genomic predictions. Table 5 indicates that training bulls that gained at least 10 daughters from August 2006 to April 2010 also benefited from gains in accuracy from genomic evaluations. On average, the squared correlation for the production traits increased by 0.10, 0.05, and 0.05 for Holsteins, Jerseys, and Brown Swiss, respectively. The traits that often take longer to evaluate, such as productive life, also had gains from genomic information, although those tended to be smaller gains. For example, Holstein productive life went from a squared correlation of 0.60 to 0.64. The gains for first-crop bulls were not able to be quantified using method 2 for genomic validation.

## Conclusions

Genomic validation methods should not use overlapping data. Both methods illustrated the gains in predictive ability by using genomic evaluation over the PA. Regardless of the method used, genomic information increased predictive ability for most traits. Using the actual truncated data is advised, and this most closely resembles the real-life point where the decisions are made. A validation approach that does not penalize large countries with large validation populations is needed.

## References

• Amer P.R.
• Banos G.
Implications of avoiding overlap between training and testing data sets when evaluating genomic predictions of genetic merit.
J. Dairy Sci. 2010; 93: 3320-3330
• Guo G.
• Lund M.S.
• Zhang Y.
• Su G.
Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables.
J. Anim. Breed. Genet. 2010; 127: 423-432
1. Interbull.2008. Description of National Genetic Evaluation Systems for dairy cattle traits as applied in different Interbull member countries. Accessed June 2010. http://www-interbull.slu.se/national_ges_info2/framesida-ges.htm.

2. Interbull. 2010. Interbull validation test for Genomic evaluations—GEBV test. 2010. Accessed August 10, 2010. http://www.interbull.org/images/stories/GEBV_validationtest_June2010.pdf.

• Luan T.
• Woolliams J.A.
• Lien S.
• Kent M.
• Svendsen M.
• Meuwissen T.H.E.
The accuracy of genomic selection in Norwegian Red cattle assessed by cross-validation.
Genetics. 2009; 183: 1119-1126
3. Lund, M. S., A. P. W. de Roos, A. G. de Vries, T. Druet, V. Ducrocq, S. Fritz, F. Guillaume, B. Guldbrandtsen, Z. Liu, R. Reents, C. Schrooten, M. Seefried, and G. Su. 2010. Improving genomic prediction by EuroGenomics collaboration. Commun. No. 880 in Proc. 9th World Congr. Genet. Appl. Livest. Prod., Leipzig, Germany. Gesellschaft für Tierzuchtwissenschaften e.V., Gießen, Germany.

4. Mäntysaari, E., Z. Liu, and P. VanRaden. 2010. Interbull validation test for genomic evaluations. Interbull Bull. 41:5 pages. Accessed March 3, 2011. http://www.interbull.org/images/stories/Mntysaari.pdf.

• Norman H.D.
• Wright J.R.
• Powell R.L.
• Miglior F.
• de Jong G.
Consistency of maturity rate for milk yield across countries and generations.
J. Dairy Sci. 2007; 90: 3937-3944
Efficient methods to compute genomic predictions.
J. Dairy Sci. 2008; 91: 4414-4423
5. VanRaden, P. M., J. B. Cole, M. E. Tooker, and T. A. Cooper. 2009a. Genetic base changes for January 2010. AIPL Res. Rep. BASE2 (8-09). Accessed September 5, 2010. http://aipl.arsusda.gov/reference/base2010.htm.

• Van Tassell C.P.
• Wiggans G.R.
• Sonstegard T.S.
• Schnabel R.D.
• Taylor J.F.
• Schenkel F.S.
Invited review: Reliability of genomic predictions for North American Holstein bulls.
J. Dairy Sci. 2009; 92: 16-24
• Wiggans G.R.
• Cooper T.A.