## ABSTRACT

The variance of gametic diversity (
${\sigma}_{\text{gamete}}^{2}$) can be used to find individuals that more likely produce progeny with extreme breeding values. The aim of this study was to obtain this variance for individuals from routine genomic evaluations, and to apply gametic variance in a selection criterion in conjunction with breeding values to improve genetic progress. An analytical approach was developed to estimate
${\sigma}_{\text{gamete}}^{2}$ by the sum of binomial variances of all individual quantitative trait loci across the genome. Simulation was used to verify the predictability of this variance in a range of scenarios. The accuracy of prediction ranged from 0.49 to 0.85, depending on the scenario and model used. Compared with sequence data, SNP data are sufficient for estimating
${\sigma}_{\text{gamete}}^{2}$ Results also suggested that markers with low minor allele frequency and the covariance between markers should be included in the estimation. To incorporate
${\sigma}_{\text{gamete}}^{2}$ into selective breeding programs, we proposed a new index, relative predicted transmitting ability, which better utilizes the genetic potential of individuals than traditional predicted transmitting ability. Simulation with a small genome showed an additional genetic gain of up to 16% in 10 generations, depending on the number of quantitative trait loci and selection intensity. Finally, we applied
${\sigma}_{\text{gamete}}^{2}$ to the US genomic evaluations for Holstein and Jersey cattle. As expected, the

*DGAT1*gene had a strong effect on the estimation of ${\sigma}_{\text{gamete}}^{2}$ for several production traits. However, inbreeding had a small impact on gametic variability, with greater effect for more polygenic traits. In conclusion, gametic variance, a potentially important parameter for selection programs, can be easily computed and is useful for improving genetic progress and controlling genetic diversity.## Key words

## INTRODUCTION

Since the introduction of marker-assisted selection and genomic selection, technological improvements have resulted in widespread incorporation of molecular information into genetic evaluations (

Nejati-Javaremi et al., 1997

; Meuwissen et al., 2001

; Schaeffer, 2006

). Increased prediction accuracy, along with reduced generation intervals, has made genomic selection an important tool for achieving fast progress in dairy selection programs (García-Ruiz et al., 2016

). Despite concerns about inbreeding in selection and mating designs, most selection programs only consider breeding values when making selection decisions. Even with genomic selection models, genomic breeding value or PTA and evaluation of future progeny are mostly based on expected breeding values without consideration of the variability of those values due to Mendelian sampling.In addition to breeding value or PTA, other selection strategies have been proposed to increase the rate of genetic progress. One idea was to select animals that will provide greater genetic gains in the future rather than choosing the best animals in the current population.

Goiffon et al., 2017

showed improved genetic gains when selecting for the best gametes from a subset of individuals in a population. Segelke et al., 2014

discussed the potential use of the variation within groups of offspring, which allows the assignment of probabilities to obtain progeny with a breeding value over a given threshold, as well as the number of matings required. In a follow-up study, Bonk et al., 2016

showed how exact within-family genetic variation can be calculated using data from phased genotypes. Recently, Müller et al., 2018

proposed a new selection criterion based on the expected maximum haploid breeding value. Collectively, these studies suggest that the incorporation of variation of future gametic values into mating decisions can improve genetic progress on top of the selection on breeding values.However, a few questions need to be answered before the application of gametic variance to breeding programs, such as how to assess the variation of future gametic values of an individual, how large is the gametic variance, how to use this information for selection, and how to estimate the variance of gametic diversity and use it in existing genomic evaluations. In this study, we aimed to address these questions from a statistical point of view, demonstrating the equivalence between gametic variance and Mendelian sampling variance in the classical BLUP (pedigree) model. We also sought to explore how this variance can be used as a selection criterion in conjunction with breeding values, with the goal of maximizing future genetic gains. We propose an approach for estimating this variance from routine genomic evaluations, verifying the adequacy of the estimates for individuals with and without progeny, and estimating the variance of breeding values of future progeny for a given mating. Finally, we evaluate the application of gametic variance to improve the selection of dairy traits in the US Holstein and Jersey populations.

## MATERIALS AND METHODS

### Estimation of the Variance of Gametic Diversity

We refer to the variance of gametic diversity as
${\sigma}_{\text{gamete}}^{2}$, which is equivalent to half of the Mendelian sampling variance (Appendix A1).
${\sigma}_{\text{gamete}}^{2}$ measures the deviation of progeny breeding values from parent average and can be calculated using the probabilities of transmission of alleles at all QTL from an individual to its gametes. Gametic variance represents the variability of all possible gametic values generated by the permutation and recombination of each parental chromosome. In fact, only the heterozygous loci of an individual contribute to
${\sigma}_{\text{gamete}}^{2}$, so we only consider heterozygous loci in the following text.

Let's first consider one locus. For a biallelic locus

where

*j*of an individual*i*with allele substitution effect*αj*, ${\sigma}_{\text{gamete}}^{2}$ can be calculated from a binomial variance of ${\sigma}_{\left[j\right]}^{2}=np\left(1-p\right){\alpha}_{j}^{2}$, where the probability of transmission of a reference allele to a gamete*p*= 0.5 and the number of alleles transmitted to a gamete*n*= 1. When 2 loci,*j*and*k*, are considered for an individual*i*, the resulting variance can be obtained as${\sigma}_{\left[j+k\right]}^{2}={\sigma}_{\left[j\right]}^{2}+{\sigma}_{\left[k\right]}^{2}+2{\sigma}_{jk}$

$\text{and}{\sigma}_{jk}=\left({p}_{jk}-{p}_{j}{p}_{k}\right){\alpha}_{j}{\alpha}_{k},$

1

where

*pj*=*pk*= 0.5, and*pjk*is the probability that the 2 reference alleles of the 2 loci are transmitted together;*pjk*can be obtained from the linkage phase and recombination rate between the 2 loci. For example,*pjk*= 0.25 and σ_{jk}= 0 when the loci are in linkage equilibrium;*pjk*= 0.5 and σ_{jk}= 0.25α_{j}α_{k}when the 2 reference alleles are on the same chromosome and the loci are in complete linkage.Extending this calculation from 2 loci to all QTL on the genome, the
${\sigma}_{\text{gamete}}^{2}$ of individual

This can be represented in matrix format as follows:

where α

where

Instead of using genetic distances,

*i*can be obtained as the sum across all*N*heterozygous QTL:${\sigma}_{\text{gamete}}^{2}=\sum _{j=1}^{N}{\sigma}_{\left[j\right]}^{2}+2\sum _{j=1}^{N}\sum _{k=j+1}^{N}{\sigma}_{jk}.$

This can be represented in matrix format as follows:

${\sigma}_{\text{gamete}}^{2}=\left[{\alpha}_{1}\cdots {\alpha}_{N}\right]M\left[{\alpha}_{1}\cdots {\alpha}_{N}\right]\prime ,$

2

where α

_{j}(*j*= 1,…,*N*) are the allele substitution effects, and**M**is the (co)variance matrix of the Mendelian transmission probabilities for the*N*heterozygous loci:$\text{M}=\left[\begin{array}{ccc}\text{0.25}& \cdots & a{l}_{1,N}\left(-\frac{c{M}_{1,N}}{200}+0.25\right)\\ \vdots & \ddots & \vdots \\ a{l}_{N,1}\left(-\frac{c{M}_{N,1}}{200}+0.25\right)& \dots & 0.25\end{array}\right],$

where

*aljk*is a phase indicator for loci*j*and*k*, with value 1 when both loci have the reference allele on the same chromosome and −1 otherwise;*cMjk*is the genetic distance between the 2 loci (in centimorgans). Any 2 loci with genetic distance >50 cM on the same chromosome, or on different chromosomes, are assumed to be independent and thus have zero values for the corresponding elements of**M**. When all the loci are independent,$\text{M}=\left[\begin{array}{ccc}\text{0.25}& 0& 0\\ 0& \ddots & 0\\ 0& 0& 0.25\end{array}\right].$

Instead of using genetic distances,

**M**can be set up when direct recombination rates are available.To estimate gametic variance in real data where genomic evaluation is available, we proposed to use the estimated SNP effects to replace true QTL effects in Equation [2]. This approximation of QTL with SNP marker effects is similar to that described by

Bonk et al., 2016

. Note that using estimated SNP effects in [2] may bias the estimation due to the covariance between estimated effects of SNP in linkage disequilibrium (LD) and potential biases from shrunken estimators of SNP effects, which warrants further investigation.### Application of Gametic Variance in Selection Programs

A new selection strategy using
${\sigma}_{\text{gamete}}^{2}$ can be proposed, focusing on the future genetic progress (

Bijma et al., 2018

). When a small proportion of animals are selected for breeding,
${\sigma}_{\text{gamete}}^{2}$ can help identify those that are most likely to produce progeny with extreme breeding values. Assuming selection intensity is maintained across generations, the average genetic value of the animals selected in the future will be related to the variance of gametes of the selected animals in the current generation. The average breeding value transmitted to future progeny can be calculated by summing the expected value and *i*times the standard deviation of gametic diversity $\left(i{\sigma}_{\text{gamete}}\right)$. The selection intensity (*i*) represents the number of standard deviations between the population average and the average of selected individuals. The same intensity can be applied when using PTA as the expected value and ${\sigma}_{\text{gamete}}^{2}$ as standard deviation, to obtain the mean breeding value transmitted to the selected individuals in the next generation. Similar approaches have been proposed byLehermeier et al., 2017

via a usefulness criterion (**UC**) with genomic EBV (**GEBV**) and the standard deviation of a given mating.Here, we propose a new selection criterion relative to the intensity of selection applied in the next generation (

where

*if*) for an individual*i*(unknowing mating),$RPT{A}_{i}=PT{A}_{i}+{\sigma}_{gamete\_i}\times {i}_{f},$

[3]

where

*RPTAi*(relative PTA) is the average of the genetic values relative to the group of progeny that will be selected in the future (see Appendix A2). In addition, we introduce a new concept of coefficient of relative variation (**CRV**) as a measure of the variability of the additive genetic values (*u*) transmitted from an individual to its progeny (Appendix A3). The CRV of an individual*i*is defined as follows (where*E*indicates expected value):$CR{V}_{i}=\frac{{\sigma}_{gamete}}{0.5\sqrt{E\left({u}_{i}^{2}\right)}}.$

[4]

### Simulation

To verify the estimation of
${\sigma}_{\text{gamete}}^{2}$ by genomic models and the use of this new parameter to aid selection, we simulated different scenarios with various QTL, genotype, and phenotype data using the QMSim version 1.10 software (

Sargolzaei and Schenkel, 2009

). In brief, we simulated a historical population, a 10-generation recent population, and a 10-generation future population (Table 1).Table 1Summary of simulation parameters

Parameter | Value |
---|---|

Genome parameter | |

Genome size | 200 cM |

Number of chromosomes | 4 |

Number of QTL | 20 and 200 |

Number of markers | 10,000 (high-density panel) and 20,000+ QTL (sequence data) |

Mutation rate, QTL | 2.5 × 10^{−5} |

Mutation rate, marker | 2.5 × 10^{−3} |

Marker positions in genome | Evenly spaced |

QTL position in genome | Random (uniform distribution) |

QTL allele effect | Gamma distribution (β = 0.4) |

Trait parameters | |

Number of traits | 6 |

Heritability | 0.10, 0.30, 0.50 |

Phenotypic variance | 1 |

Sex-limited trait | No |

Population structure parameters | |

Historical generation | |

Phase 1 | |

Number of generations | 500 |

Number of animals | Constant (500 males and 500 females) |

Mating | Random |

Phase 2 | |

Number of generations | 500 |

Initial number of animals | 1,000 |

Final number of animals | 200 (100 males and 100 females) |

Mating | Random |

Phase 3 | |

Number of generations | 10 |

Initial number | 200 (100 males and 100 females) |

Final number | 3,000 (1,500 males and 1,500 females) |

Mating | Random |

Recent generation | |

Number of generations | 10 |

Reference population | 9th |

Validation population | 9th and 10th |

Number of offspring per dam | 5 |

Founders | 1,000 (200 males 800 females) |

Mating | Random |

Selection | BLUP |

Cutting | BLUP |

Replacement | 20% females and 60% males |

Overlapping generation | Yes |

Generation 9–10 (predictability) | Correlation$\left({\sigma}_{\text{gamete},}^{2}/{\sigma}_{\text{gamete}\_\text{estimated}}^{2}\right)$ |

Future generation | |

Number of generations | 10 |

Criterion of selection | T_PTA = TRUE/2 or T_RPTA (TRUE/2) +${\sigma}_{\text{gamete}}^{2}$ |

Number of offspring per dam | 5 or 10 |

Replacement | 100% females and 100% males |

Better criterion | Genetic gain per generation |

1 T_PTA = true PTA; T_RPTA = true relative PTA;
${\sigma}_{\text{gamete}}^{2}$ = variance of gametic diversity.

To mimic real populations, a historical population was simulated with the same proportion of males and females that were mated randomly. This population was generated in 3 phases: the first phase consisted of 500 generations with a constant population size of 1,000 individuals; the second phase had 500 generations with a constant reduction of population size from 1,000 to 200 to generate LD and establish drift-mutation balance; and the third phase included 10 generations of expansion, where the population size increased from 200 to 3,000. From the last generation of this historical population, 200 males and 800 females were randomly selected as founders to generate the study population, which consisted of 10 generations with 5 progeny per dam and a ratio of 50% males in the offspring. We simulated a selection for breeding values estimated by the classical BLUP (

Henderson, 1975

). The replacement ratio was set at 20% for dams and 60% for sires (Brito et al., 2011

), and mating was random among selected individuals. The replacement ratio is the proportion of animals to be culled and replaced in each generation.From the study population (last 10 generations of the simulation), genotype and QTL data were obtained for the 9th generation (treated as a reference population) and the 10th generation (the validation population). The marker effects were first estimated in the reference generation. The
${\sigma}_{\text{gamete}}^{2}$ values for all individuals were estimated for both the reference and validation populations using the marker effects estimated in the reference generation. For comparison, true gametic variance was also calculated using the QTL effects and their genotype data from the simulation.

To reduce computational load, a small genome, with 4 autosomal chromosomes of 50 cM each, was simulated. The mutation rate was fixed at 2.5 × 10

^{−5}in the historical population. The number of crossovers was sampled from a Poisson distribution. A total of 200,000 markers and different sets of QTL were simulated to be randomly distributed along the genome. After the genome was simulated, a panel with 10% of the polymorphic markers was sampled every 0.5 cM and another panel with 20% of the markers was sampled every 0.5 cM. The first panel was chosen to mimic a high-density SNP panel and the second for sequence data. A detailed description of the parameters is reported in Table 1.Six traits were simulated with heritabilities of 0.1, 0.3, and 0.5 and 20 QTL (i.e., 0.1 QTL per cM) or 200 QTL (i.e., 1 QTL per cM), respectively. We used 2 QTL densities similar to those used by

Meuwissen et al., 2001

. The QTL effects were generated based on a gamma distribution with parameter β = 0.4 (Hayes and Goddard, 2001

). The phenotypic variance was assumed to be 1 for all traits. Four replicates were used for each trait. In addition, 10 future generations were simulated where the individuals were selected either by the true breeding value (T_PTA) or by true RPTA (T_RPTA) to verify and compare the genetic gains obtained using these criteria. To assess the effect of these indices on selection in the future generations, the replacement ratio was maintained at 100% and the number of offspring per dam was 5 (corresponding to a selection intensity of 0.996 for females and 1.76 for males) or 10 (corresponding selection intensities of 1.4 for females and 2.06 for males). As the predicted
${\sigma}_{\text{gamete}}^{2}$ is a latent variance, its realized value depends on the number of progeny of an individual. Any inference using this variance should be regarded as a bet (probability of an event considering the number of attempts). Therefore, the selection intensity applied to RPTA (*if*) may need to be adjusted accordingly, and 3 values of*if*(0.5, 0.8, and 1) were tested in this study.### Genomic Analysis

Because
${\sigma}_{\text{gamete}}^{2}$ depends on the marker effects in genomic models, we used a model that assumed homogeneity of variance of marker effects,

**GBLUP**(SNP-BLUP), and another model that allowed heterogeneity of marker effects with differential shrinkage through the improved Bayesian LASSO (**BLASSO**;Legarra et al., 2011

). The analyses were performed using the GS3 v.3 software (Legarra et al., 2015

). The model included the population mean, marker effects, and residual. Only markers with minor allele frequency (**MAF**) >0.05 were considered. For estimation of additive and residual variances, the simulated true values were used as initial values to reduce computational complexity, followed by 20,000 iterations with the burn-in of 2,000 initial chains.### Application of Gametic Variance to Real Data

The data used were part of the 2017 US genomic evaluations from the Council on Dairy Cattle Breeding (CDCB, Bowie, MD), consisting of 1,364,278 Holstein and 164,278 Jersey cattle from the national dairy cattle database. Five dairy traits based on up to 5 lactations were analyzed: milk (

when the recombination rate is <0.5; and

**MY**), fat (**FY**) and protein (**PY**) yields, and fat (**F%**) and protein (**P%**) percentages. The genotype data were generated from different SNP arrays with the number of SNP ranging from 7K to 50K. All individuals were imputed to a common panel of 60,671 SNP and their linkage phase were determined by FindHap version 3 (VanRaden et al., 2011

). The
${\sigma}_{\text{gamete}}^{2}$ was calculated using Equation [2] with estimated SNP effects (
${\stackrel{\u02c6}{\alpha}}_{1}$). The marker effects were derived from the PTA obtained from the genomic evaluation. Sex-specific recombination rates between SNP markers in Holstein and Jersey cattle were directly used in this study (Ma et al., 2015

; Shen et al., 2018

). Thus, a modification to the off-diagonal elements of the **M**matrix in Equation [2] was applied to incorporate recombination rate${\text{M}}_{jk}=a{l}_{jk}\left(-\frac{rat{e}_{jk}}{2}+0.25\right),$

when the recombination rate is <0.5; and

**M***jk*= 0 when the rate ≥0.5.## RESULTS AND DISCUSSION

### Estimation of Gametic Variance with Genomic Models

The variance of progeny breeding values has been investigated in previous studies (

Cole and VanRaden, 2011

; Segelke et al., 2014

; Bonk et al., 2016

). Here, we sought to use simulation to evaluate the predictability of gametic variance as a parameter for selection. To evaluate the predictability, a comparison with classical simulation studies with genomic prediction was adopted. The variance of gametic diversity (
${\sigma}_{\text{gamete}}^{2}$) was calculated considering both dependence and independence between loci, using all QTL and QTL with MAF ≥5%, and utilizing high-density SNP and sequence data with marker effects obtained from genomic models. The Pearson correlation between the true and estimated
${\sigma}_{\text{gamete}}^{2}$ ranged from medium to high (Table 2), similar to those studies on breeding values (Meuwissen et al., 2001

; Daetwyler et al., 2010

; Clark et al., 2011

). In general, the correlation increased when the heritability (*h*^{2}) of traits increased, whereas the same relation was not apparent when the number of QTL was large. Differently, for the GEBV prediction, the increase in accuracy has been reported with increased*h*^{2}and for scenarios with a small number of QTL, particularly when these were estimated by differential shrinkage models (Daetwyler et al., 2010

; Clark et al., 2011

).Table 2Pearson correlations between variance of gametic diversity for all QTL
$\left({\sigma}_{g}^{2}\right)$ for QTL with minor allele frequency (MAF) ≥0.05
$\left({\sigma}_{\mathrm{gm}}^{2}\right)$ and disregarding the covariances for all QTL
$\left({\sigma}_{d}^{2}\right)$ and QTL with MAF ≥0.05
$\left({\sigma}_{\mathrm{dm}}^{2}\right)$ and their estimations using a high-density marker panel and sequence data by genomic BLUP (

*bp*) and Bayesian LASSO (*ls*), considering $\left({\sigma}_{gbp}^{2}\text{and}{\sigma}_{gls}^{2}\right)$ and disregarding $\left({\sigma}_{dbp}^{2}\text{and}{\sigma}_{dls}^{2}\right)$ the dependency between the markersTrait | Gametic variance | High-density SNP | Sequence data | QTL data | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

h^{2} | QTL (no.) | ${\sigma}_{gbp}^{2}$ | ${\sigma}_{gls}^{2}$ | ${\sigma}_{dpb}^{2}$ | ${\sigma}_{dls}^{2}$ | ${\sigma}_{gbp}^{2}$ | ${\sigma}_{gls}^{2}$ | ${\sigma}_{dpb}^{2}$ | ${\sigma}_{dls}^{2}$ | ${\sigma}_{g}^{2}$ | ${\sigma}_{gm}^{2}$ | ${\sigma}_{d}^{2}$ | ${\sigma}_{dm}^{2}$ | |

0.1 | 20 | ${\sigma}_{g}^{2}$ | 0.49 | 0.56 | 0.17 | 0.39 | 0.46 | 0.57 | 0.20 | 0.40 | — | 0.75 | 0.96 | 0.69 |

${\sigma}_{\mathrm{gm}}^{2}$ | 0.53 | 0.74 | 0.21 | 0.54 | 0.48 | 0.75 | 0.25 | 0.55 | 0.75 | — | 0.66 | 0.93 | ||

${\sigma}_{d}^{2}$ | 0.45 | 0.53 | 0.15 | 0.43 | 0.43 | 0.53 | 0.19 | 0.43 | 0.96 | 0.66 | — | 0.71 | ||

${\sigma}_{\mathrm{dm}}^{2}$ | 0.50 | 0.74 | 0.18 | 0.61 | 0.45 | 0.73 | 0.24 | 0.61 | 0.69 | 0.93 | 0.71 | — | ||

200 | ${\sigma}_{g}^{2}$ | 0.50 | 0.60 | 0.29 | 0.37 | 0.46 | 0.61 | 0.29 | 0.40 | — | 0.96 | 0.50 | 0.48 | |

${\sigma}_{\mathrm{gm}}^{2}$ | 0.48 | 0.61 | 0.29 | 0.39 | 0.45 | 0.63 | 0.30 | 0.41 | 0.96 | — | 0.46 | 0.49 | ||

${\sigma}_{d}^{2}$ | 0.29 | 0.28 | 0.51 | 0.30 | 0.28 | 0.27 | 0.48 | 0.31 | 0.50 | 0.46 | — | 0.97 | ||

${\sigma}_{\mathrm{dm}}^{2}$ | 0.27 | 0.29 | 0.52 | 0.32 | 0.26 | 0.29 | 0.49 | 0.33 | 0.48 | 0.49 | 0.97 | — | ||

0.3 | 20 | ${\sigma}_{g}^{2}$ | 0.64 | 0.83 | 0.28 | 0.66 | 0.59 | 0.83 | 0.07 | 0.65 | — | 0.94 | 0.95 | 0.90 |

${\sigma}_{\mathrm{gm}}^{2}$ | 0.65 | 0.87 | 0.28 | 0.68 | 0.59 | 0.87 | 0.07 | 0.68 | 0.94 | — | 0.90 | 0.95 | ||

${\sigma}_{d}^{2}$ | 0.60 | 0.81 | 0.30 | 0.69 | 0.54 | 0.81 | 0.07 | 0.68 | 0.95 | 0.90 | — | 0.95 | ||

${\sigma}_{\mathrm{dm}}^{2}$ | 0.60 | 0.85 | 0.30 | 0.71 | 0.55 | 0.85 | 0.07 | 0.70 | 0.90 | 0.95 | 0.95 | — | ||

200 | ${\sigma}_{g}^{2}$ | 0.63 | 0.77 | 0.25 | 0.49 | 0.59 | 0.77 | 0.29 | 0.48 | — | 0.95 | 0.55 | 0.52 | |

${\sigma}_{\mathrm{gm}}^{2}$ | 0.62 | 0.78 | 0.25 | 0.51 | 0.57 | 0.78 | 0.29 | 0.49 | 0.95 | — | 0.53 | 0.53 | ||

${\sigma}_{d}^{2}$ | 0.42 | 0.48 | 0.52 | 0.63 | 0.40 | 0.49 | 0.54 | 0.62 | 0.55 | 0.53 | — | 0.99 | ||

${\sigma}_{\mathrm{dm}}^{2}$ | 0.41 | 0.48 | 0.52 | 0.63 | 0.39 | 0.48 | 0.54 | 0.63 | 0.52 | 0.53 | 0.99 | — | ||

0.5 | 20 | ${\sigma}_{g}^{2}$ | 0.54 | 0.67 | 0.28 | 0.50 | 0.48 | 0.66 | 0.18 | 0.49 | — | 0.86 | 0.94 | 0.81 |

${\sigma}_{\mathrm{gm}}^{2}$ | 0.51 | 0.67 | 0.26 | 0.47 | 0.44 | 0.65 | 0.16 | 0.46 | 0.86 | — | 0.79 | 0.93 | ||

${\sigma}_{d}^{2}$ | 0.52 | 0.64 | 0.30 | 0.53 | 0.46 | 0.63 | 0.19 | 0.51 | 0.94 | 0.79 | — | 0.85 | ||

${\sigma}_{\mathrm{dm}}^{2}$ | 0.49 | 0.64 | 0.28 | 0.51 | 0.43 | 0.63 | 0.18 | 0.49 | 0.81 | 0.93 | 0.85 | — | ||

200 | ${\sigma}_{g}^{2}$ | 0.79 | 0.85 | 0.37 | 0.51 | 0.76 | 0.84 | 0.29 | 0.51 | — | 0.95 | 0.65 | 0.62 | |

${\sigma}_{\mathrm{gm}}^{2}$ | 0.77 | 0.86 | 0.37 | 0.55 | 0.74 | 0.86 | 0.30 | 0.55 | 0.95 | — | 0.64 | 0.65 | ||

${\sigma}_{d}^{2}$ | 0.53 | 0.61 | 0.49 | 0.83 | 0.52 | 0.61 | 0.38 | 0.83 | 0.65 | 0.64 | — | 0.98 | ||

${\sigma}_{\mathrm{dm}}^{2}$ | 0.51 | 0.61 | 0.49 | 0.85 | 0.50 | 0.61 | 0.37 | 0.85 | 0.62 | 0.65 | 0.98 | — |

1 Values in bold represent the best estimates.

We observed higher correlations between the true and predicted
${\sigma}_{\text{gamete}}^{2}$ using BLASSO compared with GBLUP in all scenarios (Table 2). These results were partly due to the small genome and large QTL effects simulated. Although GBLUP can have a similar or slightly better performance for prediction of GEBV than differential shrinkage models for scenarios with a large number of QTL (

Daetwyler et al., 2010

), the accuracy of the estimated marker effects, mainly for QTL regions, is greater from differential shrinkage models (Meuwissen et al., 2001

; Shepherd et al., 2010

; Legarra et al., 2011

). For estimating
${\sigma}_{\text{gamete}}^{2}$ the marker effect has a greater impact than for GEBV prediction because
${\sigma}_{\text{gamete}}^{2}$ uses the squared marker effects as well as the dependency of the chromosome segments. Therefore, this observation can also be attributed to the greater accuracy of the marker effects estimated by BLASSO and to the high dependency of the chromosome segments simulated.The effect on prediction was inferred by a linear regression between true and estimated
${\sigma}_{\text{gamete}}^{2}$ For the intercept of regression (a), GBLUP had a lower scale effect (close to zero) than BLASSO but the difference was not large (Table 3). A low scale effect is important for
${\sigma}_{\text{gamete}}^{2}$ prediction because it affects the precision of the limit values of the confidence interval for future progeny PTA. The scale effect may be affected by the prediction models and by factors inherent to the trait. However, GBLUP had a larger prediction bias, worse values of mean squared error, and regression coefficients (b) more different from 1 (Table 3). For genomic prediction, lower bias has also been reported for differential shrinkage models (

Meuwissen et al., 2001

). Our result can be attributed to the accuracy of the estimated marker effects and to the small number of independent chromosome segments simulated.Table 3Mean squared prediction (MSE), intercept (a), and coefficient (b) of the linear regression between the variance of gametic diversity for QTL and its estimates using a high-density SNP panel and sequence data by genomic models

Trait | Model | High-density SNP | Sequence data | |||||
---|---|---|---|---|---|---|---|---|

h | QTL (no.) | MSE | a | b | MSE | a | b | |

0.1 | 20 | GBLUP | 0.0014 | −0.0010 | 0.27 | 0.0022 | −0.00033 | 0.20 |

BLASSO | 8e-05 | 0.0027 | 1.20 | 8e-05 | 0.00185 | 1.26 | ||

200 | GBLUP | 0.0010 | 0.0058 | 0.23 | 0.0016 | 0.00637 | 0.18 | |

BLASSO | 0.0001 | 0.0074 | 1.01 | 0.0001 | 0.00681 | 1.03 | ||

0.3 | 20 | GBLUP | 0.0017 | −0.00697 | 0.43 | 0.0028 | −0.00625 | 0.35 |

BLASSO | 0.0002 | 0.00282 | 1.46 | 0.0002 | 0.00247 | 1.41 | ||

200 | GBLUP | 0.0021 | 0.00979 | 0.40 | 0.0035 | 0.01123 | 0.33 | |

BLASSO | 0.0004 | 0.00945 | 1.14 | 0.0004 | 0.00950 | 1.13 | ||

0.5 | 20 | GBLUP | 0.0019 | −0.00294 | 0.26 | 0.0030 | −0.002039 | 0.19 |

BLASSO | 0.0001 | 0.00188 | 1.41 | 0.0001 | 0.001866 | 1.37 | ||

200 | GBLUP | 0.0022 | 0.00560 | 0.62 | 0.0033 | 0.006547 | 0.56 | |

BLASSO | 0.0008 | 0.00851 | 1.10 | 0.0007 | 0.008799 | 1.09 |

1 Values in bold represent the least-biased estimates.

2 GBLUP = genomic BLUP; BLASSO = Bayesian LASSO.

For a trait with

*h*^{2}= 0.10 and 20 QTL (Table 2), the correlations between ${\sigma}_{\text{gamete}}^{2}$ obtained with all QTL and with QTL of MAF ≥5% were of moderate to high magnitude, lower than that of other traits (high magnitude), resulting in lower correlations with the ${\sigma}_{\text{gamete}}^{2}$ estimated by genomic models. Although this result may be due to allele frequency fluctuations in historical population, it also implies that QTL with low MAF are important for obtaining accurate estimates of ${\sigma}_{\text{gamete}}^{2}$. This variance does not depend directly on population allele frequencies but on the individual's heterozygote status. Although MAF filtering (≥5%) can be used to improve the prediction of GEBV (Uemoto et al., 2015

), markers with low MAF may have greater linkage disequilibrium with QTL with low MAF, providing better predictions of gametic variance.To facilitate rapid calculation of
${\sigma}_{\text{gamete}}^{2}$ we tested some scenarios without considering the covariance (dependence) between markers. However, the correlation between true and estimated
${\sigma}_{\text{gamete}}^{2}$ was always lower compared with the full model, with the difference ranging from moderate to high when the estimates were obtained from QTL, and from low to high when obtained from the marker effects (Table 2). However, the high correlation observed for one of the scenarios (

*h*^{2}= 0.30 and QTL = 20) can be attributed to the random distribution of QTL in the genome. Therefore, covariance between markers should always be included for calculating ${\sigma}_{\text{gamete}}^{2}$ and thus, be preferred over the traditional Mendelian sampling variance (Appendix A1). This result is consistent withBonk et al., 2016

, who recommended the use of haplotype and direct recombination data (Cole and VanRaden, 2011

).No difference in correlation between true and estimated
${\sigma}_{\text{gamete}}^{2}$ from BLASSO was observed between the high-density SNP and sequence data scenarios (Table 2), indicating that SNP panels with moderate densities are sufficient for estimating
${\sigma}_{\text{gamete}}^{2}$ However, a decrease in correlation was observed for estimates obtained with GBLUP when the sequence data panel was used, regardless of the number of simulated QTL. For GEBV prediction,

Clark et al., 2011

observed a small difference in performance using differential shrinkage with sequence data compared with medium-density SNP panels. Pérez-Enciso et al., 2017

also reported a modest increase in accuracy using differential shrinkage model on sequence data. Therefore, sequence data are unlikely to offset SNP panels for predicting GEBV when the number of loci is large and the prior given to each SNP is uniform. Although no improvement in accuracy for
${\sigma}_{\text{gamete}}^{2}$ was observed with an increased number of markers, the difference in performance between the 2 types of methods was in line with the literature on GEBV studies. This fact, together with the increase in overestimation due to an increased number of markers (Table 3), confirms the preference of shrinkage models for estimating
${\sigma}_{\text{gamete}}^{2}$ in our simulation of small genome and relative large QTL effects.The correlation between true and predicted CRV was lower than that of
${\sigma}_{\text{gamete}}^{2}$ (Supplemental Table S1; https://doi.org/10.3168/jds.2018-15971). There was no unanimous model, but GBLUP showed better prediction performance for many scenarios, whereas BLASSO had better results when ignoring the covariance between markers in scenarios with moderate heritability and a small number of QTL. Generally, the prediction with high-density markers showed a higher accuracy than that with sequence data. The CRV is a relative parameter that indicates how variable the GEBV of an individual is when transmitted to its gametes. The magnitude of the correlation showed that this parameter can be predicted, although the decreased accuracy with an increased number of markers indicated some difficulty for prediction in these cases.

These results may also be explained by a partition of CRV (Appendix A3). Similar results were observed for
${\sigma}_{\text{gamete}}^{2}$ and CRV in the 10th generation using the marker effects estimated from the 9th generation. It means that predictions for these parameters can follow the same design in genomic selection programs to calculate GEBV, and
${\sigma}_{\text{gamete}}^{2}$ can be estimated using the training data from previous generations (

Habier et al., 2007

).### Application of Gametic Variance in Selection Programs

The percentage of additional genetic gain (

**ΔG**) per generation in selection by using RPTA compared with PTA (ΔG_{RPTA-PTA}/ΔG_{PTA}), as well as the accumulated gain for a period of 10 generations, was used to assess the suitability of the new selection index (Figure 1 and Supplemental Figure S1; https://doi.org/10.3168/jds.2018-15971). The accumulated genetic gains obtained with RPTA were higher than those obtained with PTA when the number of QTL increased. No significant increase was observed for a small number of QTL (20). However, in scenarios with more QTL, the genetic gain was close to expected (Appendix A2), with ΔG ranging from 5 to 16% in 10 generations, indicating an advantage of RPTA for traits with large numbers of QTL. These results were in agreement with those reported byDaetwyler et al., 2015

using a genomic optimal haploid value (**OHV**) for selection. In addition, we noted that RPTA tended to increase the frequency of the best alleles in the population more quickly than PTA, which resulted in a permanent effect over generations (Supplemental Figure S1; https://doi.org/10.3168/jds.2018-15971). This trend was also observed byDaetwyler et al., 2015

using OHV, which verified the increase of genetic gain and a smaller reduction of genetic diversity compared with genomic selection. According to those authors, the selection of the individuals with the highest breeding values can lead to the loss of rare favorable alleles in the population, but individuals that carry these alleles may be more favorable in the long term, even though their GEBV can be below the truncation point. The importance of the selection for favorable minor alleles was also reported by Sun and VanRaden, 2014

using a weighted genomic selection (**WGS**) by the favorable MAF.The percentages of additional genetic gain per generation using RPTA were generally positive across scenarios (Figure 1 and Supplemental Figure S1; https://doi.org/10.3168/jds.2018-15971). Some were negative in scenarios with a few QTL but they were all positive and relatively large for scenarios with more QTL, between 3 and 8%. These results highlight the advantage of using RPTA compared with conventional PTA. Although for all scenarios we observed that the first generation under the RPTA selection obtained less genetic gain than PTA, a large increase was obtained in subsequent generations.

Daetwyler et al., 2015

reported similar trends in the first few generations of selection using OHV. Thus, the selection by RPTA initially provided an increase of variability, with a subsequent reduction by selection of the best alleles in the following generations.The effect of

*h*^{2}was not evident across the scenarios, and in general, the best RPTA performance was observed for scenarios with moderate heritability.Lehermeier et al., 2017

compared UC and OHV in 2 scenarios with low and high *h*^{2}and observed greater gain in the latter scenario. However, as the true values (PTA and RPTA) were used for selection in this study, heritability would not affect the estimation but the magnitude of genetic variances.When the number of progeny per dam increased, the genetic gain also increased for both PTA and RPTA, but the RPTA achieved a faster, and therefore greater, genetic gain (Figure 1 and Supplemental Figure S1; https://doi.org/10.3168/jds.2018-15971). These results agreed with those reported by

Daetwyler et al., 2015

, who observed a rapid increase in genetic gain using OHV when the number of progeny increased from 10 to 1,000. Lehermeier et al., 2017

also observed greater gains for higher selection intensities for UC and OHV. For the scenarios with a few QTL, there was no large gain by using RPTA when increasing selection intensity (Figure 1 and Supplemental Figure S1). For scenarios with many QTL, we observed a greater gain using RPTA in early generations than in scenarios with lower selection intensity, as predicted by the expected gain (Appendix A2). For traits with many QTL, higher values of *if*had better performance than those with lower intensities, although the trend was not observed for scenarios with a few QTL. It suggests that for scenarios with many QTL, the increase in selection intensity allows the use of higher values for*if*.One of the advantages of using RPTA is the choice of

*if*for weighting ${\sigma}_{\text{gamete}}^{2}$ in the index (Figure 1 and Supplemental Figure S1). Initially, integral values (without adjustment) for*if*were tested but gains were much smaller than those with traditional selection. The component ${i}_{f}\times {\sigma}_{\text{gamete}}^{2}$ from RPTA is stochastic, and high values for*if*can be risky when the number of progeny per individual is small.Lehermeier et al., 2017

used integral value of *if*to obtain UC; however, they simulated plant crosses with large numbers of progeny (100) to realize the predicted variance in the offspring. The increase in*if*proved to be unsuitable for scenarios with few QTL, where lowest values should be prioritized for this type of trait. However, for scenarios with many QTL, large*if*values are desired, especially when selection intensity is high. Thus, the risk related to the number of progeny should be considered and standardized equally for all individuals, preferably by increasing the minimum number of offspring.Lehermeier et al., 2017

obtained the selection intensity value in UC from the proportion of individuals selected within plant homogeneous crosses, which is equivalent to using dam selection intensity in animal breeding. Empirically, to find the optimal value for *if*, we can adjust the real value of the future selection intensity of dams (have the least number of offspring) by $1-\overline{PV}$, so the adjusted intensity value is ${i}_{f}^{*}=\left(1-\overline{PV}\right)\times {i}_{f}$ where the individual percentage of variation (*PV*) is obtained from equation [A3.1] in Appendix A3. Given the number of progeny per dam (*n*) and the average CRV of the population, we have $\overline{PV}=\frac{\left({Z}_{1-\alpha /2}\right)\times \overline{CRV}}{\sqrt{n}}$, where*Z*is the corresponding percentile value from a standard normal distribution.In this section, we verified the feasibility of using
${\sigma}_{\text{gamete}}^{2}$ in selection programs to explore the whole additive genetic potential of individuals. The proposed index (RPTA) is easy to obtain and apply. As the variance of GEBV of future progeny is
${\sigma}_{gameteSire}^{2}+{\sigma}_{gameteDam}^{2}$, the UC for a given mating can be easily obtained as RPTA

_{sire}+ RPTA_{dam}. With greater genetic gains and better preservation of genetic diversity (Supplemental Figures S2 and S3; https://doi.org/10.3168/jds.2018-15971), the RPTA can accelerate genetic selection compared with other indices that also preserve diversity, such as the OHV, WGS, optimal population value, and genotype building (Goiffon et al., 2017

). This statement is supported by the literature, because first Goiffon et al., 2017

showed a better performance of OHV than cited indices, and then Lehermeier et al., 2017

introduced the UC (similar to RPTA, but used for mating) and demonstrated its superiority to OHP within the crosses when different selection intensities are applied. Thus, RPTA can be a better option to maximize genetic gain per generation and the profitability of a breeding program. In contrast, although RPTA represents an optimal equilibrium between the expected value (PTA) and the variability, the weighted value for
${\sigma}_{gamete}^{2}\left({i}_{f}^{*}\right)$ can be modified to preserve diversity and accelerate genetic progress. Another point to consider is that although RPTA and WGS have a similar purpose, the expected component of RPTA (PTA) can still be adjusted for a greater emphasis in the selection of favorable minor alleles if desired, as suggested by Sun and VanRaden, 2014

. Besides, although the genetic gain obtained using RPTA with random mating has been impressive for traits with many QTL, greater gains can be obtained with this criterion using a sophisticated mating design (Allaire, 1980

; Sonesson and Meuwissen, 2000

; Lehermeier et al., 2017

).### Estimation and Application of Gametic Variance in Real Data

The suitability of applying
${\sigma}_{\text{gamete}}^{2}$ in livestock breeding programs was evaluated using 2 dairy populations, Holstein and Jersey, for 5 milk production traits. Because chromosomes are independent genome segments,
${\sigma}_{\text{gamete}}^{2}$ was calculated separately for each chromosome (Figure 2 and Supplemental Figure S4; https://doi.org/10.3168/jds.2018-15971). The average, standard deviation, and amplitude of the estimated
${\sigma}_{\text{gamete}}^{2}$ were largest on BTA14 for all production traits. This was expected because BTA14 contains the largest QTL for milk production,

*DGAT1*(Grisart et al., 2002

; - Grisart B.
- Coppitiers W.
- Farnir F.
- Karim L.
- Ford C.
- Berzi P.
- Cambisano N.
- Mni M.
- Reid S.
- Simon P.
- Spelman R.
- Georges M.
- Snell R.

Positional candidate cloning of a QTL in dairy cattle: Identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition.

*Genome Res.*2002; 12 (11827942): 222-231

Bouwman et al., 2011

, Bouwman et al., 2018

). However, the 5 traits had different distributions of
${\sigma}_{\text{gamete}}^{2}$ among chromosomes. The
${\sigma}_{\text{gamete}}^{2}$ for PY was more evenly distributed, but for other traits, especially for F%,
${\sigma}_{\text{gamete}}^{2}$ showed a skewed distribution with major mass concentrated on BTA14. This is due to the greater effect of - Bouwman A.C.
- Daetwyler H.D.
- Chamberlain A.J.
- Ponce C.H.
- Sargolzaei M.
- Schenkel F.S.
- Sahana G.
- Govignon-Gion A.
- Boitard S.
- Dolezal M.
- Pausch H.
- Brøndum R.F.
- Bowman P.J.
- Thomsen B.
- Guldbrandtsen B.
- Lund M.S.
- Servin B.
- Garrick D.J.
- Reecy J.
- Vilkki J.
- Bagnato A.
- Wang M.
- Hoff J.L.
- Schnabel R.D.
- Taylor J.F.
- Vinkhuyzen A.A.E.
- Panitz F.
- Bendixen C.
- Holm L.E.
- Gredler B.
- Hozé C.
- Boussaha M.
- Sanchez M.P.
- Rocha D.
- Capitan A.
- Tribout T.
- Barbat A.
- Croiseau P.
- Drögemüller C.
- Jagannathan V.
- Jagt C.V.
- Crowley J.J.
- Bieber A.
- Purfield D.C.
- Berry D.P.
- Emmerling R.
- Götz K.U.
- Frischknecht M.
- Russ I.
- Sölkner J.
- Van Tassell C.P.
- Fries R.
- Stothard P.
- Veerkamp R.F.
- Boichard D.
- Goddard M.E.
- Hayes B.J.

Meta-analysis of genome-wide association studies for cattle stature identifies common genes that regulate body size in mammals.

*Nat. Genet.*2018; 50 (29459679): 362-367

*DGAT1*on milk fat than on milk protein (Thaller et al., 2003

). Similar results were observed in both Holstein and Jersey cattle. Although the recombination rate is different between males and females in cattle (Shen et al., 2018

), little difference was observed for
${\sigma}_{\text{gamete}}^{2}$ between the 2 sexes (Figure 2 and Supplemental Figure S4).The distribution of
${\sigma}_{\text{gamete}}^{2}$ varied in the 2 cattle populations (Figure 3 and Supplemental Figure S5; https://doi.org/10.3168/jds.2018-15971). The results showed a distribution close to the typical Gaussian curve for PY, but non-Gaussian curves for other traits. For F%, the distribution had 2 peaks. Similar results were observed for Holstein and Jersey, but the effect of BTA14 was more pronounced in Jersey, possibly because this breed has a higher milk fat percentage, as well as different composition of fatty acid content in milk (

White et al., 2001

). Segelke et al., 2014

, studying genetic variation in Holstein, observed a similar distribution for PY and FY.The predictability of the offspring variance of dairy traits for bulls was assessed using a Pearson correlation between the variance of progeny breeding values and the
${\sigma}_{\text{gamete}}^{2}$ of bulls estimated by genomic models (Table 4). This correlation increased with an increased number of offspring. The moderate to high correlation indicated the feasibility of predictions. These results were consistent with

Segelke et al., 2014

, who reported the same trend with slightly higher values for the correlation with standard deviations of gamete breeding value (**SDGBV**) in Holstein. Although the variance of progeny GEBV also contains a dam effect, our results using only sires validated the ${\sigma}_{\text{gamete}}^{2}$ estimated by genomic models as a predictor of the variance of GEBV for future offspring. In general, traits with larger coefficient of variation, such as PY, MY, and FY, had a lower correlation, whereas traits with lower coefficient of variation, F%, and P%, exhibited larger correlations. The difference in variability also explains the second trend, where traits with a biased distribution of ${\sigma}_{\text{gamete}}^{2}$ per chromosome had a better prediction than those showing even distributions.Segelke et al., 2014

also reported a larger correlation for FY than PY using SDGBV.Table 4Pearson correlations (r) of
${\sigma}_{\text{gamete}}^{2}$ for milk, protein, and fat yields (

*MY*,*PY*, and*FY*, respectively) and protein and fat percentages (*P%*and*F%*, respectively) with variances of progeny breeding values for different minimum numbers of offspring per sireBreed and minimum no. of offspring | Sires (no.) | rMY | rPY | rFY | rP% | rF% |
---|---|---|---|---|---|---|

Jersey | ||||||

10 | 1,109 | 0.24 | 0.16 | 0.20 | 0.30 | 0.58 |

50 | 451 | 0.40 | 0.33 | 0.46 | 0.50 | 0.75 |

100 | 311 | 0.53 | 0.34 | 0.47 | 0.60 | 0.85 |

200 | 183 | 0.64 | 0.31 | 0.49 | 0.77 | 0.95 |

300 | 128 | 0.68 | 0.40 | 0.55 | 0.86 | 0.96 |

400 | 97 | 0.66 | 0.43 | 0.61 | 0.90 | 0.97 |

500 | 77 | 0.66 | 0.51 | 0.62 | 0.90 | 0.97 |

600 | 66 | 0.69 | 0.54 | 0.66 | 0.92 | 0.97 |

Holstein | ||||||

10 | 6,797 | 0.29 | 0.16 | 0.32 | 0.39 | 0.57 |

50 | 2,753 | 0.55 | 0.24 | 0.60 | 0.69 | 0.85 |

100 | 1,887 | 0.66 | 0.27 | 0.67 | 0.78 | 0.91 |

200 | 1,241 | 0.71 | 0.23 | 0.70 | 0.83 | 0.93 |

300 | 903 | 0.75 | 0.26 | 0.74 | 0.85 | 0.94 |

400 | 706 | 0.77 | 0.29 | 0.77 | 0.87 | 0.95 |

500 | 569 | 0.78 | 0.30 | 0.78 | 0.89 | 0.96 |

600 | 478 | 0.78 | 0.30 | 0.77 | 0.89 | 0.95 |

1 ${\sigma}_{\text{gamete}}^{2}$ = variance of gametic diversity.

The correlations of
${\sigma}_{\text{gamete}}^{2}$ between traits were all positive, ranging from moderate to high magnitude, although most of the estimated correlations between production and content percentage traits were negative (Table 5). In general, many large correlations of
${\sigma}_{\text{gamete}}^{2}$ were observed with MY, indicating that selection using gametic variation of MY can result in the preservation of variability in other production traits as well. The magnitude of correlation for FY and PY was similar to those studies using other measures of progeny variation by

Segelke et al., 2014

and by Bonk et al., 2016

.Table 5Pearson correlations between
${\sigma}_{\text{gamete}}^{2}$ for different traits (above diagonal) and breeding values (below diagonal) and inbreeding coefficients (right side columns)

Breed and trait | MY | FY | PY | F% | P% | F_{G} | F_{P} |
---|---|---|---|---|---|---|---|

Jersey | |||||||

MY | — | 0.77 | 0.61 | 0.92 | 0.84 | −0.07 | −0.03 |

FY | 0.47 | — | 0.55 | 0.83 | 0.73 | −0.08 | −0.02 |

PY | 0.86 | 0.74 | — | 0.41 | 0.37 | −0.15 | −0.08 |

F% | −0.67 | 0.34 | −0.30 | — | 0.87 | −0.03 | −0.00 |

P% | −0.59 | 0.24 | −0.10 | 0.83 | — | −0.07 | −0.01 |

F_{G} | 0.04 | −0.03 | 0.03 | −0.07 | −0.03 | — | 0.48 |

F_{P} | 0.19 | 0.13 | 0.20 | −0.10 | −0.06 | 0.48 | — |

Holstein | |||||||

MY | — | 0.73 | 0.62 | 0.86 | 0.67 | −0.12 | −0.03 |

FY | 0.44 | — | 0.43 | 0.90 | 0.46 | −0.11 | −0.04 |

PY | 0.83 | 0.68 | — | 0.42 | 0.28 | −0.19 | −0.07 |

F% | −0.50 | 0.56 | −0.12 | — | 0.59 | −0.06 | −0.01 |

P% | −0.40 | 0.33 | 0.18 | 0.69 | — | −0.10 | −0.03 |

F_{G} | 0.24 | 0.30 | 0.31 | 0.07 | 0.10 | — | 0.51 |

F_{P} | 0.24 | 0.31 | 0.33 | 0.08 | 0.13 | 0.51 | — |

1 MY = milk yield; PY = protein yield; FY = fat yield; F% = fat percentage; P% = protein percentage; F

_{G}and F_{P}= genomic and pedigree inbreeding coefficients, respectively; ${\sigma}_{\text{gamete}}^{2}$ = variance of gametic diversity.Correlations of
${\sigma}_{\text{gamete}}^{2}$ with inbreeding coefficients, given as the diagonal of the

Wright, 1931

and genomic (VanRaden, 2008

) relationship matrices, were negative but close to zero across dairy traits, with the highest correlation observed for PY and the lowest for F% (Table 5). The results observed for PY and FY were similar to those reported by Segelke et al., 2014

using SDGBV. The negative correlation was expected because the greater the inbreeding, the lower the heterozygosity of the loci and consequently the lower the
${\sigma}_{\text{gamete}}^{2}$. However, because
${\sigma}_{\text{gamete}}^{2}$ depends on the heterozygosity of those genomic regions with QTL effects, the low correlations were not unexpected, because inbreeding coefficients are global indicators of whole-genome heterozygosity.## CONCLUSIONS

This study verified the feasibility of estimating and applying variance of gametic diversity in livestock breeding programs. The
${\sigma}_{\text{gamete}}^{2}$ can be accurately obtained from genomic models. To improve the estimation of
${\sigma}_{\text{gamete}}^{2}$ covariance between markers needs to be considered.
${\sigma}_{\text{gamete}}^{2}$ can be especially useful for traits with many QTL using a newly developed selection index, RPTA. Additionally, the confidence level of this index can be adjusted by the number of future progeny, making it suitable for dairy cattle breeding. For Holstein and Jersey cattle,

*DGAT1*had a large effect on the prediction of ${\sigma}_{\text{gamete}}^{2}$ across all production traits. Inbreeding coefficients had a small impact on gametic variability of dairy traits, with a greater effect on traits less affected by*DGAT1*. Collectively, ${\sigma}_{\text{gamete}}^{2}$ can be easily obtained and applied to existing genomic selection programs to improve genetic progress and control genetic diversity.## ACKNOWLEDGMENTS

This research was supported in part by the Agriculture and Food Research Initiative grant no. 2016-67015-24886 from the USDA National Institute of Food and Agriculture (Washington, DC) and grant no. US-4997-17 from the US-Israel Binational Agricultural Research and Development Fund. Santos was also supported by São Paulo Research Foundation (FAPESP; research scholarship no. 2017/00462-5 and no. 2015/12396-1). Cole and VanRaden were supported by appropriated project 8042-31000-002-00-D, “Improving Dairy Animals by Increasing Accuracy of Genomic Prediction, Evaluating New Traits, and Redefining Selection Goals,” of the Agricultural Research Service of the United States Department of Agriculture. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The USDA is an equal opportunity provider and employer. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## APPENDIX A1

### Variance of Gametic Diversity in Traditional BLUP Models

The additive genetic value of an individual (

where the variance of

Here, the

Separating by components related to equality (α

Knowing that ${\sigma}_{u}^{2}=\sum _{i}^{N}2{p}_{i}{q}_{i}{\sigma}_{a}^{2}$, where ${\sigma}_{a}^{2}$ is the variance of the effect of allelic substitution considered homogeneous for all loci in BLUP,

When homogeneous variance is assumed for the allelic substitution effects across all loci that are inherited independently:

*u*) can be expressed as the sum of half the value of sire (*S*), half of the dam (*D*), and a Mendelian error (*m*). The components transmitted by the sire (*TS*) and dam (*TD*) can be represented as${T}_{s}=\left(\frac{{u}_{s}}{2}\right)+{m}_{s}$

$\text{and}{\text{T}}_{D}=\left(\frac{{u}_{D}}{2}\right)+{m}_{D},$

[A1.1]

where the variance of

*m*, the Mendelian sample variance,*var*(*m*), is obtained as inMrode, 2005

:
$var\left(m\right)=\frac{\left(1-F\right){\sigma}_{u}^{2}}{4}.$

[A1.2]

Here, the

*var*(*m*) of an EBV is obtained from the parental inbreeding coefficients (*F*) (Dempfle, 1990

). From the genomics point of view, *var*(*m*) can be obtained considering locus by locus of the parent. In the traditional BLUP model, the diagonal elements of the relationship matrix are obtained by doubling the expected variance if this diploid individual mated with itself. This variance can be divided into 2 parts for the progeny having the same alleles (α_{i}= α_{j}) for a given locus or different alleles (α_{i}≠ α_{j}) Considering that a homozygous locus has probability Pr(α_{i}= α_{j}) = 1 and a heterozygous locus Pr(α_{i}= α_{j}) = 0.5 and Pr(α_{i}≠ α_{j}) = 0.5, the total variance of a hypothetical mating will be a function of the number of homozygous loci (*NHom*) and the number of heterozygous loci (*NHet*):$2\times var\left({T}_{s}+{T}_{D}\right)=\frac{2\times \left(NHom+NHet\right)}{\sum _{i}^{N}2{p}_{i}{q}_{i}}{\sigma}_{u}^{2}.$

[A1.3]

Separating by components related to equality (α

_{i}= α_{j}) and to the difference (α_{i}≠ α_{j}) among the loci of the future gametes, we have$\begin{array}{c}2\times var\left({T}_{s}+{T}_{D}\right)=\frac{2\times \left[NHom+NHetPr\left({a}_{i}={a}_{j}\right)\right]}{\sum _{i}^{N}2{p}_{i}{q}_{i}}{\sigma}_{u}^{2}\\ +\frac{2\times NHetPr\left({a}_{i}\ne {a}_{j}\right)}{\sum _{i}^{N}2{p}_{i}{q}_{i}}\\ 2\times var\left({T}_{S}+{T}_{D}\right)=\left(1+F\right){\sigma}_{u}^{2}+\left(1-F\right){\sigma}_{u}^{2}.\end{array}$

Knowing that ${\sigma}_{u}^{2}=\sum _{i}^{N}2{p}_{i}{q}_{i}{\sigma}_{a}^{2}$, where ${\sigma}_{a}^{2}$ is the variance of the effect of allelic substitution considered homogeneous for all loci in BLUP,

*var*(*m*) is$var\left(m\right)=\frac{1-F}{4}{\sigma}_{u}^{2}=0.25\frac{2\times NHetPr\left({a}_{i}\ne {a}_{j}\right)}{\sum _{i}^{N}2{p}_{i}{q}_{i}}{\sigma}_{u}^{2}$

$var\left(m\right)=0.25\times NHet{\sigma}_{a}^{2}.$

When homogeneous variance is assumed for the allelic substitution effects across all loci that are inherited independently:

$var\left(m\right)={\sigma}_{gamete}^{2}.$

## APPENDIX A2

### Expected Genetic Gain Using the Relative PTA

The RPTA

The relative genetic gain (Δ

where ${\sigma}_{u}^{2}$ is the additive genetic variance, $var\left({\sigma}_{gamete}^{2}\right)$ is the variance of gametic diversity, and

_{i}(relative PTA) refers to the average of the genetic value relative to the group of gametes that will be selected in the future. Thus, from the key equation of genetic change, the future genetic gain considering the selection of animals can be estimated. We will differentiate between the 2 selection intensities,*ir*(recent) and*if*(future), although they are identical in most cases. The relative selection differential (SR) for the next generation is given by $\text{SR}=\text{E}\left[RBVi>\text{selected}\right]$, where the $RB{V}_{i}=2\times \left(PT{A}_{i}+{\sigma}_{i}\times {i}_{f}\right)$. The variance of $RBV=2\times \left(PT{A}_{i}+{\sigma}_{gamete}\times {i}_{f}\right)$ can be obtained as$\begin{array}{l}E\left\{{\left[RBV-E\left(RBV\right)\right]}^{2}\right\}=\hfill \\ {\sigma}_{u}^{2}+4\times {i}_{f}^{2}\times \left[E\left({\sigma}_{gamete}^{2}\right)-E{\left({\sigma}_{gamete}\right)}^{2}\right].\hfill \end{array}$

[A2.1]

The relative genetic gain (Δ

*GR*) can then be obtained as$GR=r\times {i}_{r}\times \sqrt{{\sigma}_{u}^{2}+4\times var\left({\sigma}_{gamete}^{2}\right)\times {i}_{f}^{2}\times {r}_{g}^{2},}$

[A2.2]

where ${\sigma}_{u}^{2}$ is the additive genetic variance, $var\left({\sigma}_{gamete}^{2}\right)$ is the variance of gametic diversity, and

*r*and ${r}_{g}^{2}$ are, respectively, accuracies of the genetic evaluation and the prediction of the ${\sigma}_{\text{gamete}}^{2}$.The increase in rate of genetic gain from using the relative criterion in place of the traditional criterion, disregarding the accuracy of
${\sigma}_{\text{gamete}}^{2}$, will then be

$\Delta GR-\Delta G=\sqrt{{\sigma}_{u}^{2}+4{i}_{f}^{2}var\left({\sigma}_{gamete}^{2}\right)-\sqrt{{\sigma}_{u}^{2}.}}$

[A2.3]

The expected genetic gain can be stratified using different

*if*for sex, where the term $4{i}_{f}^{2}var\left({\sigma}_{gamete}^{2}\right)$ is expanded to explicitly express male (*S*) and female (*D*) contributions:$\begin{array}{l}{i}_{f}^{2}Svar\left({\sigma}_{gameteS}^{2}\right)+{i}_{fD}^{2}var\left({\sigma}_{gameteD}^{2}\right)\hfill \\ +2\times {i}_{fS}{i}_{fD}cov\left({\sigma}_{gameteS}{\sigma}_{gameteD}\right).\hfill \end{array}$

[A2.4]

## APPENDIX A3

### Sample Size and Coefficient of Relative Variation

Given
${\sigma}_{\text{gamete}}^{2}$, it is useful to find the optimal number of progeny for an individual to realize the expected variability in its offspring. This is similar to the optimal number of daughters needed for a reliable conventional progeny test. It is difficult to obtain such an estimate using a general model, but estimating a percentage variability in real data is possible.

Van Belle and Martin, 1993

proposed an approach to obtain the number of samples from the coefficient of variation considering the margin of error as a percentage of variation. Because the gametic variation is a random component proportional to the additive genetic variance, a modification of Van Belle and Martin, 1993

by substituting the coefficient of variation for the relative variation (CRV) of the value transmitted to the progeny can be used to estimate the required sample size (*n*); CRV (equation [4]) considers the value related to the average transmission of additive variance of an individual*i*to its progeny $\left(\sqrt{E\left[{\left(0.5{u}_{i}\right)}^{2}\right]}\right)$ as $0.5\sqrt{\sum _{i}^{NHom}2{\alpha}_{i}^{2}+{\sigma}_{gamete}^{2}}$.Thus, we can use the CRV to obtain the sample size according to levels of percentage variation admitted between an estimated value from the sample and the expected value (

where Z

Van Belle and Martin, 1993

). The percentage of variation in a sample relative to the expected value (*PV*) in $0.5\sqrt{E\left[{u}_{i}^{2}\right]}$ can be represented as $P{V}_{0.5u}=\frac{0.5{T}_{1}-0.5T}{0.5\sqrt{E\left[{u}_{i}^{2}\right]}}$ where*T*_{1}represents the change (alternative) in the mean of PTA that may differ from the expected transmitted value (*T*). The smaller the change desired in PTA, the greater the number of progeny will be required and, consequently, greater variability will be observed among these progenies. The number of progeny (*n*) to be used from the percentage of variation ( $P{V}_{0.5{u}_{i}}$) as a margin of error can be represented as$n=\frac{{\left({Z}_{1-\alpha /2}\right)}^{2}\times {\left(CR{V}_{i}\right)}^{2}}{{\left(PV0.5{u}_{i}\right)}^{2}},$

[A3.1]

where Z

_{1 - α/2}is the critical value associated with the degree of confidence. For example, at significance level of 95% if we accept a percentage change in the mean of only 10% ( $\left(P{V}_{0.5{u}_{i}}=0.1\right)$) of the PTA, we have*n*= 400 (CRV_{i})^{2}, where the homozygous animals will have a lower CRV than the heterozygous animals, requiring a smaller number of offspring for progeny testing.## Supplementary Material

## REFERENCES

- Mate selection by selection index theory.
*Theor. Appl. Genet.*1980; 57 (24301147): 267-272 - Increasing genetic gain by selecting for higher Mendelian sampling variance.in: Proc. World Congr. Genet. Appl. Livest. Prod., Auckland, New Zealand. 2018: 11.47
- Mendelian sampling covariability of marker effects and genetic values.
*Genet. Sel. Evol.*2016; 48 (27107720): 36 - Genome-wide association of milk fatty acids in Dutch dairy cattle.
*BMC Genet.*2011; 12 (21569316): 43 - Meta-analysis of genome-wide association studies for cattle stature identifies common genes that regulate body size in mammals.
*Nat. Genet.*2018; 50 (29459679): 362-367 - Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle.
*BMC Genet.*2011; 12 (21933416): 80 - Different models of genetic variation and their effect on genomic evaluation.
*Genet. Sel. Evol.*2011; 43 (21575265): 18 - Use of haplotypes to estimate Mendelian sampling effects and selection limits.
*J. Anim. Breed. Genet.*2011; 128 (22059578): 446-455 - Selection on optimal haploid value increases genetic gain and preserves more genetic diversity relative to genomic selection.
*Genetics.*2015; 200 (26092719): 1341-1348 - The impact of genetic architecture on genome-wide evaluation methods.
*Genetics.*2010; 185 (20407128): 1021-1031 - Problems in the use of the relationship matrix in animal breeding.in: Gianola D. Hammond K. Advances in Statistical Methods for Genetic Improvement of Livestock. Vol 1. Springer, New York, NY1990: 454-473
- Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection.
*Proc. Natl. Acad. Sci. USA.*2016; 113 (27354521): E3995-E4004 - Improving response in genomic selection with a population-based selection strategy: Optimal population value selection.
*Genetics.*2017; 206 (28526698): 1675-1682 - Positional candidate cloning of a QTL in dairy cattle: Identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition.
*Genome Res.*2002; 12 (11827942): 222-231 - The impact of genetic relationship information on genome-assisted breeding values.
*Genetics.*2007; 177 (18073436): 2389-2397 - The distribution of effects of genes affecting quantitative traits in livestock.
*Genet. Sel. Evol.*2001; 33 (11403745): 209-229 - Best linear unbiased estimation and prediction under a selection model.
*Biometrics.*1975; 31 (1174616): 423-447 - GS3 Genomic Selection — Gibbs Sampling — Gauss Seidel (and BayesCπ).
- Improved Lasso for genomic selection.
*Genet. Res. (Camb.).*2011; 93 (21144129): 77-87 - Genetic gain increases by applying the usefulness criterion with improved variance prediction in selection of crosses.
*Genetics.*2017; 207 (29038144): 1651-1661 - Cattle sex-specific recombination and genetic control from a large pedigree analysis.
*PLoS Genet.*2015; 11 (26540184): e1005387 - Prediction of total genetic value using genome-wide dense marker maps.
*Genetics.*2001; 157 (11290733): 1819-1829 - Linear Models for the Prediction of Animal Breeding Values.2nd ed. CAB International, Wallingford, UK2005
- Selection on expected maximum haploid breeding values can increase genetic gain in recurrent genomic selection.
*G3 (Bethesda).*2018; 8 (29434032): 1173-1181 - Effect of total allelic relationship on accuracy of evaluation and response to selection.
*J. Anim. Sci.*1997; 75 (9222829): 1738-1745 - Evaluating sequence-based genomic prediction with an efficient new simulator.
*Genetics.*2017; 205 (27913617): 939-953 - QMSim: A large-scale genome simulator for livestock.
*Bioinformatics.*2009; 25 (19176551): 680-681 - Strategy for applying genome-wide selection in dairy cattle.
*J. Anim. Breed. Genet.*2006; 123 (16882088): 218-223 - Prediction of expected genetic variation within groups of offspring for innovative mating schemes.
*Genet. Sel. Evol.*2014; 46 (24990472): 42 - Characterization of recombination features and the genetic basis in multiple cattle breeds.
*BMC Genomics.*2018; 19 (29703147): 304 - Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers.
*BMC Bioinformatics.*2010; 11 (20969788): 529 - Mating schemes for optimum contribution selection with constrained rate of inbreeding.
*Genet. Sel. Evol.*2000; 32 (14736390): 231-248 - Increasing long-term response by selecting for favorable minor alleles.
*PLoS One.*2014; 9 (24505495): e88510 - Effects of DGAT1 variants on milk production traits in German cattle breeds.
*J. Anim. Sci.*2003; 81 (12926772): 1911-1918 - Impact of QTL minor allele frequency on genomic evaluation using real genotype data and simulated phenotypes in Japanese Black cattle.
*BMC Genet.*2015; 16 (26586567): 134 - Sample size as a function of coefficient of variation and ratio of means.
*Biometrics.*1993; 58: 612-620 - Efficient methods to compute genomic predictions.
*J. Dairy Sci.*2008; 91 (18946147): 4414-4423 - Genomic evaluations with many more genotypes.
*Genet. Sel. Evol.*2011; 43 (21366914): 10 - Comparison of fatty acid content of milk from Jersey and Holstein cows consuming pasture or a total mixed ration.
*J. Dairy Sci.*2001; 84 (11699461): 2295-2301 - Evolution in Mendelian populations.
*Genetics.*1931; 16 (17246615): 97-159

## Article Info

### Publication History

Published online: April 10, 2019

Accepted:
February 27,
2019

Received:
November 10,
2018

### Identification

### Copyright

© 2019 American Dairy Science Association®.