If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122Institute of Agricultural Biology and Biotechnology (IBBA), Consiglio Nazionale di Ricerca, Milano, Italy
Sperm motility is directly related to the ability of sperm to move through the female reproductive tract to reach the ovum. Sperm motility is a complex trait that is influenced by environmental and genetic factors and is associated with male fertility, oocyte penetration rate, and reproductive success of cattle. In this study we carried out a GWAS in Italian Holstein bulls to identify candidate regions and genes associated with variations in progressive and total motility (PM and TM, respectively). After quality control, the final data set consisted of 5,960 records from 949 bulls having semen collected in 10 artificial insemination stations and genotyped at 412,737 SNPs (call rate >95%; minor allele frequency >5%). (Co)variance components were estimated using single trait mixed models, and associations between SNPs and phenotypes were assessed using a genomic BLUP approach. Ten windows that explained the greatest percentage of genetic variance were located on Bos taurus autosomes 1, 2, 4, 6, 7, 23, and 26 for TM and Bos taurus autosomes 1, 2, 4, 6, 8, 16, 23, and 26 for PM. A total of 150 genes for TM and 72 genes for PM were identified within these genomic regions. Gene Ontology enrichment analyses identified significant Gene Ontology terms involved with energy homeostasis, membrane functions, sperm-egg interactions, protection against oxidative stress, olfactory receptors, and immune system. There was significant enrichment of quantitative trait loci for fertility, calving ease, immune response, feed intake, and carcass weight within the candidate windows. These results contribute to understanding the architecture of the genetic control of sperm motility and may aid in the development of strategies to identify subfertile bulls and improve reproductive success.
In 2020, the world market for AI was valued ∼$3.96 billion with an expected annual growth of ∼5.94% by 2028. Among other factors, the quantity, quality, and sanitary status of the semen will determine the reproductive success of a bull. Although fertility of semen is highly influenced by age, season, nutrition, management, and the interval between ejaculations (
). Sperm motility is routinely measured by insemination centers using computer-assisted sperm analysis. However, it has been shown that SM is a complex phenotype which is affected by the environmental (
). Consequently, a better understanding of the genomic architecture of semen-related complex traits is expected to help improve fertility through the assisted selective breeding of bulls. The heritability estimates for SM vary widely among studies on dairy and beef cattle, from very low values of 0.01–0.13 (
A large number of genes, some having pleiotropic effects on different semen traits and cell types, are involved in the production, maturation, and sperm viability and can affect semen quality and fertility (
). Here, we identify novel genomic regions and candidate genes putatively involved in total and progressive motility (PM) in Italian Holstein bulls using a GWAS approach.
MATERIALS AND METHODS
Phenotypic and Genotypic Data
We used 5,960 records of total and progressive SM from 949 Italian Holstein bulls with ages between 1 and 11 yr from 10 AI centers (Figure 1a), collected between 1995 and 2014 (Figure 1b). The number of measurements per sire varied between 3 and 78 for both total motility and PM traits. The procedures for semen collection are standardized across AI centers. Frozen-thawed sperm are routinely analyzed in the collecting centers using a CASA Integrated Visual Optical System (IVOS; CASA System, Hamilton Thorne Inc.) and a standardized protocol to assess SM. Total motility (TM) is the percentage of sperm moving irrespective of direction, and PM is the percentage of sperm that exhibit linear movement. Sperm phenotypic data were retrieved from the database of the Istituto Lazzari Spallanzani, in charge of official sperm quality controls of national bulls in Italy. The DNA extracted from semen was already available at the Biobank of Università Cattolica (see
), and duplicated markers from both data sets. For the BovineSNP50 data set, only SNPs in common with HD data set were used. Finally, the filtered reference data set, comprising 569,630 SNPs, was phased and missing genotypes imputed through Beagle 5.2 (
). Low quality imputed variants, with DR2 (dosage R2, which is the estimated squared correlation between the estimated allele dose and the true allele dose), lower than 0.3 were excluded (
), giving a final imputed data set of 417,256 SNPs. As 91 animals were genotyped both with 50k and HD panels, these were removed from the imputed data set and their HD data were considered for the downstream analysis. Therefore, the GWAS analysis was performed using 949 animals with phenotype and genotype information (404 animals genotyped with BovineHD and 545 animals imputed from BovineSNP50). A further quality control was carried out to exclude SNPs with minor allele frequency <0.05. After quality control, 412,737 SNPs and 949 animals were used for further analyses.
Genome-Wide Association Study
The genomic BLUP using individuals with repeated measurements were analyzed with a repeatability single trait animal model:
y = Xβ + Zu + Wpe + e,
where y is a vector of observed phenotypes (TM or PM) for all sires, β is the vector of fixed effects (AI center, contemporary group: year-season, quadratic, and linear effects of age), u is the random vector of additive genetic effects, pe is the random vector of permanent environmental effects, and e is the random vector of residual effects. The matrices X, Z, and W are the incidence matrices of fixed effects, additive genetic effects, and permanent environmental effects, respectively. Random effects were assumed to follow a multivariate normal distribution, as follows:
where
and
are the additive genetic variance, the permanent environment variance, and the residual variance, respectively; I is an identity matrix. G is a genomic relationship matrix derived from marker information (
). The heritability was estimated as the ratio of additive genetic variance to the total phenotypic variance
+
+
and the repeatability was calculated as sum of individual variance
+
divided by phenotypic variance.
The percentage of additive genetic variance explained by the ith SNP window was then calculated as
where ui is the genetic value of the ith genomic region under consideration,
is the total additive genetic variance, x is the total number of adjacent SNPs within the 1 Mb region, Zj is the vector of gene content of the jth SNP for all windows, and uj is the effect of the jth SNP within ith window. GWAS results are reported as the percentage of genetic additive variance explained by windows of 1 Mb and plotted using CMplot package in R v.1.3.1073 (R core team 2020). Additionally, we calculated the P-value of individual SNP according to
where m is the number of times to be tested, Pmax is the genome-wide significance level empirical P-value of FDR adjusted, and n is the number of significant SNPs at the assigned FDR level (i.e., 0.05).
Gene Annotation and Functional Enrichment
Our results indicated that single SNPs with the smallest P-values did not explain a large percentage of genetic variance. Therefore, we used the percentage of variance explained by SNPs within consecutive 1 Mb windows to identify genomic regions of interest for further investigation. Genes within the 10 windows explaining the largest amount of variance were annotated using the GALLO v.0.99 R package (
) and the ENSEMBL annotation of the Bos taurus ARS-UCD 1.2 genome assembly (http://bovinegenome.org). The genes identified from ENSEMBL were collected to perform a functional annotation by DAVID v6.8 (https://david.ncifcrf.gov/home.jsp) and REVIGO (
) using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (https://www.genome.jp/kegg) databases for B. taurus. The annotation and enrichment analyses of QTL in adjacent windows (1 Mb) were assessed for the effects of neighboring SNPs, which can be captured due to linkage disequilibrium. The QTL enrichment was evaluated for all the annotated QTL and the P-values were calculated and corrected for multiple testing, using a value of FDR <0.05 to identify the enriched QTL. The annotation and enrichment analyses were performed using the GALLO v.0.99. package and the Animal QTLdb for the ARS-UCD 1.2 bovine genome (http://bovinegenome.org).
RESULTS
Descriptive Statistics and Genetic Parameters
The bulls had a mean of 55.33% (12.70 SD) for TM and 43.21% (11.04 SD) for PM. The fixed effect of the contemporary group and the quadratic and linear effect of age (covariable) were highly significant in the model for both traits (P < 0.001). The effect of the AI center was significant (P < 0.01) and the estimates of heritability were low for both TM (0.13 ± 0.0062) and PM (0.10 ± 0.0045). The genetic correlation between TM and PM was high (0.62 ± 0.010) and the repeatability was similar for the 2 traits: 0.31 and 0.33 for TM and PM, respectively (Table 1).
Table 1Additive genetic
permanent environmental
and residual
variance and genetic parameters heritability
genetic correlation
and repeatability (± SE) for total motility (TM) and progressive motility (PM) in Italian Holsteins bulls
As previously noted, our results indicated that the SNPs with the smallest P-values did not explain a large percentage of genetic variance (Figure 2–3), contrary to expectation. P-values are influenced by sample size (
Figure 2Manhattan plot displaying the association of all SNP for progressive motility (PM; P-values of individual SNP effects) and quantile-quantile plots (right), which show the observed distribution of P-values against the expected P-values under the null hypothesis of no association. The horizontal axis shows −log10 transformed expected P-values and the vertical axis indicates −log10 transformed observed P-values.
Figure 3Manhattan plot displaying the association of all SNP for progressive motility (PM; P-values of individual SNP effects) and quantile-quantile plots (right), which show the observed distribution of P-values against the expected P-values under the null hypothesis of no association. The horizontal axis shows −log10 transformed expected P-values and the vertical axis indicates −log10 transformed observed P-values.
The 10 windows that explained the greatest percentage of genetic variance were located on BTA 1, 2, 4, 6, 7, 23, and 26 for TM and BTA 1, 2, 4, 6, 8, 16, 23, and 26 for PM (Figure 4). These regions explained 6.91% and 6.07% of the additive genetic variance for TM and PM, respectively (Table 2). A total of 150 genes for TM and 107 genes for PM were annotated by GALLO within these significant windows (Table 2). Among them, 130 genes for TM and 90 genes for PM were found in the Bos taurus DAVID database.
Figure 4Circular Manhattan plot for the proportion of genetic variance explained by nonoverlapping windows of 1 Mb for (a) total motility (TM) and (b) progressive motility (PM) in Italian Holstein Bulls. The scale bar represents the percentage of variance explained by each window. Chr = chromosome.
Table 2Genomic regions and genes in the 10 windows explaining the highest percentage of genetic variance for total motility (TM) and progressive motility (PM) in Italian Holstein Bulls
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM. *Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
* Regions not identified for the trait; agenes associated only with TM, bgenes associated only PM.*Regions not identified for the trait; agenes associated only with TM, cgenes associated with TM and PM.
The DAVID annotations and their relative P-values were grouped using REVIGO into biological processes (BP), cellular components (CC), and molecular functions (MF), each comprising 13, 23, and 45 genes for TM and 24, 18, and 61 for PM (Table 3). A total of 42 genes are in common between TM and PM.
Table 3Gene Ontology terms significantly associated with biological processes (BP), cellular components (CC), and molecular functions (MF) related to total motility (TM) and progressive motility (PM)
The BoLA and JSP.1 genes on BTA 23 were associated with both PM and TM, whereas the OR (olfactory receptor) family genes were only related to PM. The gutathione S-transferase alpha genes, involved with the metabolism of glutathione, which is an antioxidant, were only associated with TM.
Some QTL within the genomic windows explaining the greatest percentage of genetic variance were associated with reproductive traits: these included BTA 1 and 6 for TM, and BTA 1 and 23 for PM. The enrichment analysis of the QTL (FDR-corrected P < 0.05) for TM identified 28 QTL related to the ease of calving and 6 QTL associated with the immune response on BTA 6 and BTA 23. In addition, QTL involved with feed intake (7), carcass weight (24), and fat thickness (10) were associated with TM on BTA 6. For PM, 2 QTL related to reproductive efficiency and with milk yield were identified on BTA 23 (Table 4).
Table 4Quantitative trait loci annotated and significant in the enrichment analysis for total (TM) and progressive (PM) motility
The heritability for TM and PM estimated in this study was low (TM: 0.14 ± 0.0062 and PM: 0.10 ± 0.0045) but comparable to estimates obtained by others using the pedigree relationship matrix A (0.13,
). The differences in heritability estimates may be the result of environmental and management differences that we found to have a significant effect on these traits. In our study estimated repeatability was moderate (0.31 and 0.33 for TM and PM, respectively) which suggests an effect of the permanent environment and an even higher effect of the residual variance.
Candidate Genes and QTL Annotation
In the present study, no loci with significant P-values were found using a single SNP analysis. The SNPs with the smallest P-values would be expected to explain a relatively higher proportion of genetic variation. However, this is not always the case, as P-values are affected by sample size, the magnitude of the effect detected (
, in a single-step GWAS in swine, found no overlap between the first 20 SNPs with lowest P-values and those explaining the largest variance. These authors attribute this discrepancy to linkage disequilibrium between SNPs and the causal variants and interaction between SNPs. Therefore, we used the variance explained by sliding window intervals to identify genomic regions involved in sperm motility.
The ten 1-Mb windows with highest additive genetic variance for TM and PM explained, respectively, 6.91% and 6.07% of the additive genetic variance, indicating the highly polygenic nature of these traits. Other studies have also found that several regions across the genome each make small contributions to the total genetic variation for SM. For example,
found that the most significant regions, located on BTA 1, 3, 4, 5, 11, 17, and 19, explained less than 0.1% of the genetic variance for SM in Holstein bulls. In beef cattle,
identified 10 windows on BTA 9, 13, 20, and 24 that explained 7.17% of the genetic variance of this trait. There is little consistency in the regions identified across studies, even for signals on the same chromosomes;
identified genes associated with SM on BTA 5, 11, 22, and 25 in Jersey cattle, with no regions shared with other studies.
The window with highest additive genetic variance we identified was on BTA1 (102,482,169–103,484,977), which explained 1.29% of the genetic variance for TM and 1.62% for PM. This window included the gene E1BHJ0, which codes for profilin. Profilins are ubiquitous proteins present in mammals that regulate actin polymerization and are essential for cell viability (
). The second most important region (0.98% variance) for PM was on BTA26 (46,055,585–47,048,342) and included cytokinesis dedicator 1 (DOCK1) which encodes a member of the migration and invasion dedicator of the cytokinesis protein superfamily which is involved in cell motility (
). With regard to TM, the second most important region (0.83% variance) was on BTA 23 (24,376,529–25,372,486) and included IL17A and IL17F which have an immunoregulatory role in the male reproductive tract (
). Many of the genes in the other significant windows that fell into BP, CC, and MF categories (Table 3) are associated with spermatogenesis and male fertility [e.g., the peptidylarginine deiminase family (PAD), on BTA 2]. The PAD are a class of Ca+2 dependent enzymes that are widely expressed in mammalian tissues including ovules, ovaries, testes, and in early embryos (
). The PAD enzymes perform the citrullination of arginine residues, which affects the stability and degradation of proteins, and play a crucial role during cell development, differentiation, and fertility (
). The BoLA class 1 genes and JSP.1 are associated with both PM and TM. These genes map on BTA 23 and are part of the major histocompatibility complex involved in antigen presentation (
). The major histocompatibility complex antigens play a key role in immune response and affect production, diseases, and fertility traits, in particular the sperm-ovum interaction (
Five genes in the most significant GO biological processes term (GO:BP), were associated exclusively with TM. Glutathione S-transferase alpha (GSTA), which is within one of the regions with the highest effect on BTA 23, belongs to the GST superfamily of multigenic and multifunctional isoenzymes that catalyze glutathione-dependent reactions (GO:BP Glutathione metabolic process and GO:MF Glutathione transferase activity). Expression of the GST has been reported in male reproductive tissues in cattle (
). GSTA also plays a role in spermatogenesis, sperm capacitation and sperm-oocyte binding, and is involved in the nuclear decondensation of the sperm chromatin (
Genes on BTA 23 associated with TM included tripartite motifs (TRIM31 and TRIM10) and interleukins (IL17A, IL17F) that code for proteins involved in immune processes in mammals (GO:BP: antigen processing and presentation and immune system process). TRIM31 is a member of the E3 ubiquitin ligases protein family that is involved with innate immunity (
) and to play an important role in successful embryo implantation. In addition, IL-17 regulates the secretion of cytokines (IL-1β and TNF-α), chemokines, and promotes the recruitment of neutrophils (
) which are related to the modulation of the immune response to sperm antigens and possibly the protection of sperm from the response of the female immune system (
The GO:CC terms contained the α-tubulin (ATAT1), tubulin beta class I (TUBB), and flotillin 1 (FLOT1) genes, which code for proteins that have been proposed as biomarkers of male fertility (
). Alpha-tubulin is a cytoskeletal protein that contributes to microtubule organization, sperm morphology, and flagellar function, and is present in the cytoplasm of Sertoli cells, Leydig cells, spermatogonia, primary spermatocytes, secondary spermatocytes, and spermatids of seminiferous tubules (
). In buffalo sperm, α-tubulin has been shown to provide structural support, facilitate cell motility, and to participate in oocyte binding and activation (
), respectively. MDC1, EPHAH2, ATAT1, and FLOT1 code for proteins that are associated with focal adhesion, a process involved in fertilization, ovum transport, and sperm-ovum interaction (
), PPP2R5A codes for protein phosphatase 2 regulatory subunit B alpha and is correlated with spermatogonial mitosis and it is related to sphingolipid signaling pathway which is associated with SM (
). Genes within the region of BTA 23, which showed an association with PM in the current study, belong to the OR family and KHDRBS2. The OR are involved in chemotaxis and determine speed, strength, and direction of movement of the spermatozoa toward the ovum (
). In humans, the OR influence spermatogenesis, epididymal growth, acrosome reaction, initiate flagella movement and facilitate cell-to-cell communication (
The QTL enrichment analysis for TM revealed QTL associated with feed intake, calving ease, carcass weight, and fat thickness on BTA 6, and antibody-mediated immune response and cell-mediated immune response on BTA 23. Previous studies have identified other QTL on BTA 23 related to genes such as IL4 (interleukin-4), antibody-mediated immune response, and DMI that may be associated with SM (
In this study, we identified a large number of genes associated with SM and involved with sperm and sperm-related traits in Italian Holsteins bulls. These findings suggest that many genes controlling SM could have pleiotropic effects and be functionally correlated with other traits related to fertility. Our results show the highly polygenic nature of SM, with each of numerous candidate genes responsible for a small proportion of the genetic variance. SM influences male fertility, a trait that has probably experienced strong selective pressure even before cattle domestication, about 10,000 years ago. As such, beneficial alleles of genes with major effects on this trait are likely to have reached fixation, leaving only variants with small effect in the domesticated populations, which were identified by GWAS as presented here. The characterization of selection signatures across different timescales, as well as the analysis of ancient genomes from domestic and wild animals, may help in identifying variants related to fertility that have experienced selection before and after domestication events.
ACKNOWLEDGMENTS
J.L.W. was supported by the project LEO Livestock Environment Open-data, 16.2 – PSRN 2014-2020 funded through Fondo Europeo Agricolo per lo Sviluppo Rurale (FEASR). A.A. and P.A.M. received support from the Italian Ministry of Education, University and Research (MUR; Rome, Italy) for Progetti PRIN2017 20174BTC4R. The authors have not stated any conflicts of interest.
REFERENCES
Aguilar I.
Legarra A.
Cardoso F.
Masuda Y.
Lourenco D.
Misztal I.
Frequentist p-values for large-scale-single step genome-wide association, with an application to birth weight in American Angus cattle.