If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
INRA, UMR1282 Infectiologie et Santé Publique, F-37380 Nouzilly, FranceUniversité François Rabelais de Tours, UMR1282 Infectiologie et Santé Publique, F-37000 Tours, France
Genomic selection in Lacaune dairy sheep was investigated based on genotypes from the OvineSNP50 BeadChip (Illumina Inc., San Diego, CA). Historical artificial insemination progeny-tested rams formed a population of 2,892 genotyped rams. Additional ungenotyped rams and females were included by single-step genomic BLUP (ssGBLUP). Three prediction strategies were tried: pseudo-BLUP (using all rams and daughter yield deviations), pseudo-ssGBLUP (using all rams and daughter yield deviations), and regular ssGBLUP (using all phenotypes and pedigree in an animal model). The population linkage disequilibrium was determined, with an average squared correlation coefficient of 0.11 for markers closer than 0.1 cM (lower than in dairy cattle). The estimated effective population is 370 individuals. Gain in accuracy of genomic selection over parent averages ranged from 0.10 to 0.20. Highest accuracies and lowest bias were found using regular ssGBLUP. Transition to a genomic breeding scheme is possible but costs need to be carefully evaluated.
Accurate genomic predictions that combine genotypic, phenotypic, and pedigree data available early in the life of livestock have the potential to reduce the generation interval for breeding schemes that focus on progeny testing (
). However, the potential reduction of generation interval for Lacaune dairy sheep is limited because the use of AI with fresh semen and a very rapid turnover of males result in an already short generation interval of 4.2 yr. However, early selection of fewer rams might decrease the maintenance costs of AI rams that are currently waiting for first-crop progeny tests.
The Lacaune dairy sheep breed was defined in the 1950s to 1960s by the pooling of several local breeds (
). Genetic improvement programs for Lacaune dairy sheep started in the 1960s and have been managed by 2 AI companies [Conféderation Générale de Roquefort (Millau, France) and Ovitest (Onet-le-Château, France)] since 1972. They progeny test 470 rams annually in a nucleus group of 368 commercial flocks (i.e., the companies are AI centers that do not own the flocks), with around 170,000 ewes recorded yearly for several traits. Those nucleus flocks, which are at the top of a pyramidal organization and include 20% of the Lacaune population (
), benefit from extensive performance recording and AI. This is important because, contrary to dairy cattle, performance recording for sheep has high associated costs.
In practice, the existence of 2 companies implies 2 little-related subpopulations (which can be clearly seen using, for example, principal components analysis). Modeling the 2 subpopulations as a single one or as separate ones did not significantly change the results of this work (results not shown). That is, similar accuracies were observed when validation was performed within each subpopulation or the joint data set (results not shown).
Extensive AI for sheep is accomplished through the use of fresh semen, which requires the availability of numerous living rams to face seasonal AI demands that peak, in Lacaune, at 26,000 inseminations per week per company. Therefore, there is no storage of frozen semen. Thus, collection and storage of blood samples of AI rams began in 1995 in the view of QTL detection and localization and marker-assisted selection.
To develop a dense SNP chip for sheep, 20 countries organized the International Sheep Genomics Consortium (http://www.sheephapmap.org) in 2002. The OvineSNP50 BeadChip (Illumina Inc., San Diego, CA) was finally released in 2009.
Recently, a large set of Lacaune rams has been genotyped, with a focus on QTL detection and genomic selection.
compared methods for genomic selection of Lacaune dairy sheep and reported an absolute increase in accuracy of 0.05 to 0.10 as assessed by forward validation compared with parent average. They used the same data for the reduced and full data sets as defined by Interbull (
) and makes the benefit of genomic selection for candidates unclear.
In addition, linkage disequilibrium (LD; the nonrandom association of alleles between 2 loci) was investigated because it strongly influences the power of QTL detection and the accuracy of genomic prediction (
, genomic predictions reach an accuracy of 0.85 with a simulated LD of 0.20. Such a level of LD was hypothesized to be achieved with 35,000 multiallelic markers (
Other members of the International Sheep Genomics Consortium Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection.
reported lower LD with the commercial OvineSNP50 BeadChip, which achieves such a density.
A factor related to LD is the effective population size: the smaller the population size, the higher the disequilibrium and the expected accuracy of linkage disequilibrium (
The objectives of this study were to provide consolidated and updated LD measures across SNP markers and to determine the accuracy of genomic predictions through forward validation in the Lacaune breed. Concerning estimation of accuracy, this study differs from that of
), in the use of reduced and full data sets to assess accuracy by forward validation, and in a larger data set.
Materials and Methods
Genomic Data
Currently, 5,000 progeny-tested rams with more than 10 daughters have DNA stored. Because exhaustive genotyping of rams was too expensive, the following genotyping strategy was chosen. Rams born from 2008 through 2009 and progeny tested were constituted as the validation population (see Table 1). Then, complete generations were (as much as possible) genotyped backward, including most rams born from 1998 through 2007 to form the training population. These included most ancestors (sires and maternal grandsires) of the validation population. No ram with less than 20 daughters in progeny testing was genotyped.
Table 1Distribution of Lacaune dairy rams in the genomic selection test.
Extraction of DNA from blood samples and genotyping was conducted by the LABOGENA laboratory (Jouy-en-Josas, France). Extracted DNA was available for 90% of the AI rams born from 2003 to 2009 and 25% for rams born from 1998 through 2002. Genotyped rams were roughly structured in 452 half-sib families, with a mean of 9 sons. Both breeding companies contributed to samples in accordance with the number of rams enrolled in their progeny-testing programs. The Illumina OvineSNP50 BeadChip developed through the International Sheep HapMap project (
) was used for genotyping. Illumina GenomeStudio software was used with default thresholds and cluster definitions for genotype calling.
From the set of SNP markers available, 92% were read for 99% of rams. Data for some SNP were discarded because of low minor allele frequency (<0.01), Hardy-Weinberg disequilibrium (P < 10−5), or insufficient genotyping rate (<0.97). Mendelian inconsistencies were set to missing. Missing genotypes and genotyping errors (0.25%) were imputed using BEAGLE v3.4 software (
). The DYD and EDC were used as the variable response to compute genomic predictions from a pseudo-ssGBLUP or pedigree-based prediction from a pseudo-BLUP that will be detailed later. Alternatively, observed performance measures (as used in official evaluation), were used in ssGBLUP. Training and validation populations were defined according to Interbull (Uppsala, Sweden) rules (
) for both methods. This defines a reduced data set and a full data set to fairly predict genetic merit of validation rams.
The full data set (all data available in 2011) was required to compute reliable EBV estimates for the validation population. More than 4,431,000 phenotypic records were available for 1,436,000 animals. For the reduced data set, data from 2008 through 2011 was excluded from the full data set (i.e., the reduced data set included records available for genetic evaluation in 2007). The reduced data set contained 3,822,000 phenotypic records for 1,262,000 animals. Prediction was based on rams born up to 2005, whereas validation was based on rams born after 2007 (Table 1). Rams born in 2006 or 2007 (24% of rams) were excluded from the test to comply with Interbull rules (<25%;
In addition to genotyped rams, 5,904 nongenotyped AI and natural-mating rams with progeny and born from 1990 through 2005 were included in the prediction step via pseudo-BLUP or ssGBLUP. Figure 2 shows the proportion of genotyped to nongenotyped rams by birth year from 1996 through 2009. Between 1990 and 1996, only nongenotyped rams were included.
Figure 2Proportion of genotyped rams (gray) among all AI rams included in the genomic test by birth year.
Taking into account functional traits in dairy sheep breeding programs through the French exampleEuropean Association for Animal Production (EAAP) Publ. No. 121.
in: Kyntäjä J. Lampinen K. Rosati A. Mosconi C. Breeding, Production Recording, Health and the Evaluation of Farm Animals. Wageningen Academic Publishers,
Wageningen, the Netherlands2007: 57-64
formulated that the approximate expectation of r2 is 1/(4Nec + 1), where Ne is the effective population size and c is the recombination distance in morgans between SNP. Further, LD at short distance is dependent on long-term population history (
). The Ne for c = 10 Mbp was considered to be the most recent Ne (5 generations ago).
The pedigree-based estimators of Ne were the realized effective size, defined as
where
is the average of individual increase in inbreeding. Individual increase in inbreeding was computed following the definition of
). A subpedigree of 1,636,751 animals born from 1999 through 2010 and all their ancestors for 11 generations resulted in a total of 1,947,402 animals.
Availability of Ne, heritabilities of traits, and size of the training population make possible to predict the expected accuracy of future genomic predictions. According to
,
where
is the correlation between true and estimated genetic values, NP is the number of ewes and rams phenotyped, h2 is the heritability of the trait, and Me is the effective number of loci (
, who did not observe any improvement using either nonparametric or Bayesian methods such as BayesCPi, only BLUP-like methods were considered. Production traits were evaluated with a single-trait animal model with permanent environmental effects. Heterogeneous variances among herds were also considered for production traits (similar to the French dairy cattle model;
For pseudo-BLUP, phenotypes were DYD, and the pedigree-based relationship matrix included genotyped and nongenotyped rams. The “pseudo” name indicates that we did not use either a true BLUP animal model or parent average. Therefore, the model for pseudo-BLUP was yDYD = 1μ + Wurams + ε, where yDYD are DYD for each ram; μ is the overall mean; W is the incidence matrix allocating DYD to the breeding values; urams are breeding values, and it was assumed that
, where Arams is the numerator relationship matrix across rams and
is the genetic variance; and ε is a vector of residuals. Pseudo-residual variances of ε were weighted by the inverse of the EDC number for each DYD.
For pseudo-ssGBLUP, the model was as above, but it was assumed that
, where Hrams is the combined pedigree and genomic relationship matrix described by
. Finally, ssGBLUP included all animals using the following animal model: y = Xb + Wuu (+ Wpp) + e, which includes observed performances (i.e., ewe's records; y), a matrix X relating the fixed effects (b) to the performance, whole-population breeding values (u), permanent effects (p) for the respective traits, and the residuals e. This model included all genotyped and nongenotyped animals (rams and sheep) and a joint genomic-pedigree relationship matrix (
Arams=numerator relationship matrix across rams; Hrams=combined pedigree and genomic relationship matrix described by Legarra et al. (2009), across all rams; Hewes+rams=combined pedigree and genomic relationship matrix described by Legarra et al. (2009), across all ewes and rams.
yDYD=DYD for each ram; μ=overall mean; W=incidence matrix allocating DYD to the breeding values; urams=breeding values; ε=vector of residuals; y=ewe's records; X=incidence matrix relating the fixed effects to the performances; b=fixed effects; u=whole-population breeding values; p=permanent effects; e=the residuals.
4 yDYD = DYD for each ram; μ = overall mean; W = incidence matrix allocating DYD to the breeding values; urams = breeding values; ε = vector of residuals; y = ewe's records; X = incidence matrix relating the fixed effects to the performances; b = fixed effects; u = whole-population breeding values; p = permanent effects; e = the residuals.
5 Progeny tested.
6 Including the 7,497 rams above (1,593 of them genotyped).
). This blending is automatically done in ssGBLUP. Both ssGBLUP methods used the genomic relationship matrix (G) constructed as
, where Z is the incidence matrix of marker effects, corrected by the observed genotype frequencies, pi is allele frequencies, qi = 1 – pi, and A22 is the matrix of pedigree relationship of genotyped animals obtained from the whole pedigree (
). These weights make G invertible and imply that 95% of the genetic variance is explained by markers. The matrix G was then corrected using
(where FST is defined as the mean relationship between gametes in a recent population with respect to an older base population) as in
to make it compatible with pedigree relationships. The programs PREGSf90 (computation of needed matrices), BLUPF90 (storage of mixed-model equa tions), and BLUP90IOD2 (iteration on data) from the BLUPF90 program family were used (
The derivation of expected reliability was by sparse inversion of the mixed-model equations in pseudo-BLUP and pseudo-ssGBLUP. However, expected reliabilities could not be computed for the reduced data set used in ssGBLUP, which had 1,254,767 animals in pedigrees and 3,738,475 phenotypic records. Those reliabilities were obtained using the approximation described by
. Regardless of method (ssGBLUP or pseudo-ssGBLUP), pseudo-phenotypes (DYD) of the validation rams obtained from the full data set were taken as a surrogate variable for true breeding value. Different estimators of breeding values were compared with DYD using regression: DYD = b0 + b1EBV + e, where b0 is the intercept, b1 is the slope, and e is the residual. The coefficient of determination (R2) was the reliability of the prediction. The realized reliability (accuracy) was adjusted (adj) to take into account the reliability of the surrogate variable (
, the degree of relationship between the training set and selection candidates affects prediction accuracy. As indicated in a preliminary investigation of a genomic breeding scheme for Lacaune sheep (
Assessment of technical and economic efficiency of genomic-based breeding programs in dairy sheep in France.
in: Book of Abstracts of the 64th Annual Meeting of the European Federation of Animal Science, Nantes, France, Wageningen Academic Publishers, Wageningen, the Netherlands2013: 369
), in a fast genomic scheme, a fraction of candidates will have sires that will have no progeny test or daughter performance. Therefore, the effect of removing from the training data set the sires of individuals in the validation data set was tested using pseudo-ssGBLUP as described previously.
Results
LD
For the 42,039 SNP for 2,892 rams included in the genomic selection test, the mean minor allele frequency was 0.29 ± 0.13. All possible SNP pairs with a distance of <10 Mbp within chromosome produced 6,859,395 pairwise r2 on the 26 ovine autosomes with pairs. To visualize the decay of LD, r2 were averaged within category of intermarker spacing (0.02 Mbp) and plotted against that intermarker distance (Figure 3). The observed pattern shows LD in inverse relationship with genetic distance as regularly observed in cattle (e.g.,
Given that the map coverage of the informative SNP reached 2,419 cM, the mean SNP interval was 0.056 cM. Assuming that 1 cM = 1 Mbp, the mean r2 was 0.11 ± 0.16 for a distance of <0.1 cM and 0.13 ± 0.18 for a marker spacing of <0.05 cM. A similar study by
Other members of the International Sheep Genomics Consortium Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection.
, who reported that LD for an intermarker distance of 1 cM could reach 0.1.
Compared with LD decay for a population with Ne = 1,000, r2 for Lacaune sheep was lower for small SNP intervals and higher for large intervals (Figure 3). According to
, the LD for small SNP intervals indicates that the Lacaune sheep had broad diversity (Ne of >1,000) 5,000 generations ago. Intense selection later led to high LD for large SNP intervals.
Effective Population Size
Plotting the effective population at various generations (Figure 4) showed a linear decrease of Ne from 6,000 individuals 5,000 generations ago to 3,000 individuals 500 generations ago. Then, Ne was reduced 10-fold with a 2-fold reduction, from 800 to 370 individuals in the last 50 generations. This confirms the reduction on effective size due to the definition of the Lacaune standards after the pooling of small local breeds and last, the intense dairy selection. Based on pedigree, the mean number of known ancestors was 5.41 (ranged from 0.5 to 9.42) and the realized effective population size is 192. Considering an older population (5 generations ago) to be fully comparable with the LD estimate, Ne reached 252. Those figures are remarkably close to the estimate based on markers.
Figure 4Effective population size (Ne) for the last 5,000 generations of Lacaune dairy sheep.
, the expected accuracy of genomic prediction across traits is, on average, 0.57. Parameters used were a Ne of 370, a training population composed of 1,593 rams genotyped. Progeny testing was supposed to include 40 daughters per ram.
The observed accuracy for different scenarios of genomic evaluation is listed in Table 3. The accuracy of genomic predictions from pseudo-ssGBLUP and ssGBLUP was around 0.6 to 0.7, regardless of trait heritability, except for accuracy around 0.45 for milk yield. The pseudo-BLUP accuracy was 0.12 lower, on average, for all traits. Accuracy gain (genomic minus pedigree) was maximal for udder cleft (h2 = 0.23) and minimal for fat percentage (h2 = 0.35). Molecular data roughly enhanced parent-average accuracy by 0.10. In regard to regression slopes (Table 4), pseudo-ssGBLUP outperformed pseudo-BLUP (i.e., slopes were closer to 1.00) except for SCS. Single-step GBLUP clearly improved slopes, except for fat percentage and SCS.
Table 3Accuracies obtained with pseudo-BLUP, pseudo-single-step genomic BLUP (pseudo-ssGBLUP), or ssGBLUP.
Table 4Slopes for regression of predictions from pseudo-BLUP, pseudo-single-step genomic BLUP (pseudo-ssGBLUP), or ssGBLUP on daughter yield deviations.
The exclusion of the 93 sires of validation rams from the training data set decreased the size of the training set by 6%. Based on the entire genomic relationship matrix, reduction of the mean genomic relationship between the training and validation sets was limited to 0.48%. Also, the maximum genomic relationship coefficient was 0.46 instead of 0.65. Forward validation accuracy decreased in the validation set for all traits (Table 5). The accuracy loss was severe for udder depth, teat angle, and SCS (0.12–0.13) and moderate for milk yield (0.05). Applying the same truncation to pseudo-BLUP led to an even higher (0.20) loss of accuracy. Hence, this confirms that a selection scheme with rapid turnover of males is more efficient using genomic than pedigree information.
Table 5Accuracies obtained with pseudo-BLUP and pseudo-single-step genomic BLUP (pseudo-ssGBLUP), with sire of rams included or excluded from the training population.
Other members of the International Sheep Genomics Consortium Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection.
based on 103 Lacaune dairy sheep. However, as r2 tends to be biased upward for low minor allele frequencies, the threshold of 0.01 minor allele frequency in the current study (versus 0.10 in
Other members of the International Sheep Genomics Consortium Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection.
) might have led to underestimated r2. Such low LD levels suggest that the Lacaune sheep come from highly heterogeneous populations because 500 generations ago Ne was 3,000, whereas the current Ne confirmed by a pedigree-based approach was between 200 (contemporary population) and 370 individuals (5 generations ago). Most of the breeds that
Other members of the International Sheep Genomics Consortium Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection.
studied had a similar LD pattern and level. Compared with dairy cattle, the steeper LD trajectory in Lacaune sheep sheds light on moderate selection and bottlenecks. Although LD was not as favorable as in dairy cattle (
. The smaller gain might result from the following reasons: lower level of LD, lower DYD accuracy (or, equivalently, less information per genotyped animal), lower coverage of old history (essentially all bulls are genotyped in Holsteins, whereas this is not the case in the current work), and less reliable forward validation because of the less accurate EBV in the full data set. In addition, higher reliability for pseudo-BLUP predictions than in dairy cattle was caused by beneficial aspects of data structure, such as the large size of contemporary groups in large sheep flocks and lack of preferential treatments (
Beyond accuracies, slopes of DYD versus genomic predictions were of major importance. A slope >1 indicated underdispersion of genomic predictions, whereas a slope <1 indicated overdispersion. In this study, although pseudo-ssGBLUP was computed taking into account DYD of nongenotyped rams, ssGBLUP gave better results for slopes, which agrees with the findings of
. Enhanced fit with ssGBLUP results from including more and less biased information. In particular, ssGBLUP considers information from dams of candidates (or, in other words, full parent average), which is not the case in pseudo-ssGBLUP, where the maternal information is plainly ignored (although the maternal grandsire is considered). In dairy cattle, either using selection index to combine sources of information (genomic proofs + parent averages;
Presently, using a matrix that includes both pedigree-based relationships and differences between pedigree and genomic relationships (ssGBLUP) is equivalent to imputing missing genotypes for females with phenotypic records (
); using such a matrix enlarges the training population. However, slopes are still 0.14 points lower than 1.00 and should be improved, or they will result in giving an unfair advantage to juvenile over older selection candidates. By weighting the genomic and pedigree relationships matrices,
pointed out that accuracy of genomic selection greatly depends on the degree of relationship between validation and training populations. In effect, a slight loss of accuracy was observed when excluding sires of validation rams. Such a change resulted in 0.10 and 0.20 lower accuracy for pseudo-ssGBLUP and pseudo-BLUP, respectively. Greater sensitivity to the additive relationship compared with that reported by
might be explained by the smaller size of the training set and low LD (0.11 instead of 0.20 at <100 kbp), and also by the different design: full data sets and random sampling was used for
also observed an increase in accuracy using BayesB but this increase was only visible for relationships lower than 0.125 (e.g., if candidates were great-grandsons of progeny-tested males), which seems an unrealistic case. It was also possibly explained by the existence of the major gene diacylglycerol O-acyltransferase 1 (DGAT1).
Accounting for preselection of genomic candidates is of additional interest when using ssGBLUP. Even with low selection intensity (25%) at moderate heritability,
found deteriorated accuracy of GEBV after only 1 generation. For future genomic breeding schemes, a selection intensity of 25% or more might be practiced. The ssGBLUP method takes selection into account and therefore provides unbiased GEBV.
Conclusions
This study confirms that a low level of LD is detected with the 50,000-marker (50K) chip in the Lacaune breed compared with the Holstein breed. In agreement with this, we assessed a gain in accuracy between 0.10 and 0.20 across traits and models. This is considerably lower than in dairy cattle. The technical parameters about the accuracy of genomic prediction in Lacaune sheep are now known. Routine evaluation in the event of genomic selection is still to be determined, but the ssGBLUP offers the most general framework, possibly with improvements to consider heterogeneity of variances within herds. In our case, it results in better accuracy and lowest bias at the same computational cost compared with a multi–step procedure. The current focus is on designing a genomic breeding scheme. First, technical and biological parameters such as number of rams or seasonality of AI need to be considered. The high cost of genotyping relative to the individual value of the animal (even using low-density chips and imputation) imposes a significant cost/benefit challenge, and one which makes optimization of use of genomic prediction in dairy sheep likely to be quite different from that in dairy cattle.
Acknowledgments
This work benefited from financial support of the Agence Nationale de la Recherche (ANR)-SheepSNPQTL (Paris, France), ApisGene (Paris, France), Fonds Unique Interministériel (FUI)-Roquefort’in projects, Midi-Pyrénees region (France), Le Fonds européen de développement régional (FEDER), Aveyron and Tarn department (France), and the city of Rodez (France). We thank the 5 breeder partners of the Roquefort’in project, the genotyping platform LABOGENA (http://www.labogena.fr/; Jouy-en-Josas, France), the bioinformatics support of SIGENAE (http://www.sigenae.org/; Toulouse, France), the computing facilities of the Centre de Traitement de l’Information Genetique (CTIG; Jouy-en-Josas, France), and the bioinformatics platform GenoToul (http://bioinfo.genotoul.fr/; Toulouse, France). We thank the 2 reviewers for their constructive comments. Ignacy Misztal, Shogo Tsuruta (University of Georgia, Athens), and Ignacio Aguilar [Instituto Nacional de Investigación Agropecuaria (INIA), Canelones, Uruguay] answered many questions related to the use of software. Manuscript review by S. M. Hubbard of the Animal Improvement Programs Laboratory (US Department of Agriculture-Agricultural Research Service, Beltsville, MD) is gratefully acknowledged.
References
Aguilar I.
Misztal I.
Johnson D.L.
Legarra A.
Tsuruta S.
Lawlor T.J.
Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.
Taking into account functional traits in dairy sheep breeding programs through the French exampleEuropean Association for Animal Production (EAAP) Publ. No. 121.
in: Kyntäjä J. Lampinen K. Rosati A. Mosconi C. Breeding, Production Recording, Health and the Evaluation of Farm Animals. Wageningen Academic Publishers,
Wageningen, the Netherlands2007: 57-64
Assessment of technical and economic efficiency of genomic-based breeding programs in dairy sheep in France.
in: Book of Abstracts of the 64th Annual Meeting of the European Federation of Animal Science, Nantes, France, Wageningen Academic Publishers, Wageningen, the Netherlands2013: 369