Identification of genomic regions associated with total and progressive sperm motility in Italian Holstein bulls

Sperm motility is directly related to the ability of sperm to move through the female reproductive tract to reach the ovum. Sperm motility is a complex trait that is influenced by environmental and genetic factors and is associated with male fertility, oocyte penetration rate, and reproductive success of cattle. In this study we carried out a GWAS in Italian Holstein bulls to identify candidate regions and genes associated with variations in progressive and total motility (PM and TM, respectively). After quality control, the final data set consisted of 5,960 records from 949 bulls having semen collected in 10 artificial insemination stations and genotyped at 412,737 SNPs (call rate >95%; minor allele frequency >5%). (Co)variance components were estimated using single trait mixed models, and associations between SNPs and phenotypes were assessed using a genomic BLUP approach. Ten windows that explained the greatest percentage of genetic variance were located on Bos taurus autosomes 1, 2, 4, 6, 7, 23, and 26 for TM and Bos taurus autosomes 1, 2, 4, 6, 8, 16, 23, and 26 for PM. A total of 150 genes for TM and 72 genes for PM were identified within these genomic regions. Gene Ontology enrichment analyses identified significant Gene Ontology terms involved with energy homeostasis, membrane functions, sperm-egg interactions, protection against oxidative stress, olfactory receptors, and immune system. There was significant enrichment of quantitative trait loci for fertility, calving ease, immune response, feed intake, and carcass weight within the candidate windows. These results contribute to understanding the architecture of the genetic control of sperm motility and may aid in the development of strategies to identify subfertile bulls and improve reproductive success.


INTRODUCTION
In 2020, the world market for AI was valued ~$3.96 billion with an expected annual growth of ~5.94% by 2028. Among other factors, the quantity, quality, and sanitary status of the semen will determine the reproductive success of a bull. Although fertility of semen is highly influenced by age, season, nutrition, management, and the interval between ejaculations (Fuerst-Waltl et al., 2006;Suchocki and Szyda, 2015), it is also known that genetics and biological processes involved in sperm production and sperm-ovum interactions play a significant role (Druet et al., 2009;Yin et al., 2019b;Sweett et al., 2020).
Sperm motility (SM) affects the ability of the sperm to reach and penetrate the zona pellucida, fertilize the ovum, and initiate embryogenesis (Alves et al., 2020). Sperm motility is routinely measured by insemination centers using computer-assisted sperm analysis. However, it has been shown that SM is a complex phenotype which is affected by the environmental (Druet et al., 2009) and both additive and nonadditive genetic effects (VanRaden et al., 2011;Peñagaricano et al., 2012). Consequently, a better understanding of the genomic architecture of semen-related complex traits is expected to help improve fertility through the assisted selective breeding of bulls. The heritability estimates for SM vary widely among studies on dairy and beef cattle, from very low values of 0.01-0.13 (Gredler et al., 2007;Corbet et al., 2013;Suchocki and Szyda, 2015) to moderate and high values of 0.43-0.60 (Hering et al., 2014).
A large number of genes, some having pleiotropic effects on different semen traits and cell types, are involved in the production, maturation, and sperm viability and can affect semen quality and fertility (Ballow et al., 2006;Zhang et al., 2016). Here, we identify novel genomic regions and candidate genes putatively involved in total and progressive motility (PM) in Italian Holstein bulls using a GWAS approach.

Phenotypic and Genotypic Data
We used 5,960 records of total and progressive SM from 949 Italian Holstein bulls with ages between 1 and 11 yr from 10 AI centers (Figure 1a), collected between 1995 and 2014 ( Figure 1b). The number of measurements per sire varied between 3 and 78 for both total motility and PM traits. The procedures for semen collection are standardized across AI centers. Frozenthawed sperm are routinely analyzed in the collecting centers using a CASA Integrated Visual Optical System (IVOS; CASA System, Hamilton Thorne Inc.) and a standardized protocol to assess SM. Total motility (TM) is the percentage of sperm moving irrespective of direction, and PM is the percentage of sperm that exhibit linear movement. Sperm phenotypic data were retrieved from the database of the Istituto Lazzari Spallanzani, in charge of official sperm quality controls of national bulls in Italy. The DNA extracted from se-men was already available at the Biobank of Università Cattolica (see Mancini et al., 2014).
Illumina BovineSNP50 genotypes for 636 animals were imputed to high-density (777,962) with Beagle 4.1 (Browning and Browning, 2016) using a reference set of 1,006 animals, genotyped with the Illumina BovineHD array. Both data sets were previously updated to the ARS-UCD1.2 (Rosen et al., 2020) bovine assembly version using Plink 1.9 (Chang et al., 2015). Quality control excluded indels, SNPs with minor allele frequency lower than 0.05 (Wiggans et al., 2009), and duplicated markers from both data sets. For the BovineSNP50 data set, only SNPs in common with HD data set were used. Finally, the filtered reference data set, comprising 569,630 SNPs, was phased and missing genotypes imputed through Beagle 5.2 (Browning et al., 2021). Low quality imputed variants, with DR2 (dosage R 2 , which is the estimated squared correlation between the estimated allele dose and the true allele dose), lower than 0.3 were excluded (Wu et al., 2019;Chen et al., 2021), giving a final imputed data set of 417,256 SNPs. As 91 animals were genotyped both with 50k and HD panels, these were removed from the imputed data set and their HD data were considered for the downstream analysis. Therefore, the GWAS analysis was performed using 949 animals with phenotype and genotype information (404 animals genotyped with BovineHD and 545 animals imputed from BovineSNP50). A further quality control was carried out to exclude SNPs with minor allele frequency <0.05. After quality control, 412,737 SNPs and 949 animals were used for further analyses.

Genome-Wide Association Study
The genomic BLUP using individuals with repeated measurements were analyzed with a repeatability single trait animal model: where y is a vector of observed phenotypes (TM or PM) for all sires, β is the vector of fixed effects (AI center, contemporary group: year-season, quadratic, and linear effects of age), u is the random vector of additive genetic effects, pe is the random vector of permanent environmental effects, and e is the random vector of residual effects. The matrices X, Z, and W are the incidence matrices of fixed effects, additive genetic effects, and permanent environmental effects, respectively. Random effects were assumed to follow a multivariate normal distribution, as follows: where σ u 2 , σ pe 2 , and σ e 2 are the additive genetic variance, the permanent environment variance, and the residual variance, respectively; I is an identity matrix. G is a genomic relationship matrix derived from marker information (VanRaden, 2008). The heritability was estimated as the ratio of additive genetic variance to the total phenotypic variance (σ u 2 + σ pe 2 + σ e 2 ) and the repeatability was calculated as sum of individual variance (σ u 2 + σ pe 2 ) divided by phenotypic variance.
The percentage of additive genetic variance explained by the ith SNP window was then calculated as where u i is the genetic value of the ith genomic region under consideration, σ u 2 is the total additive genetic variance, x is the total number of adjacent SNPs within the 1 Mb region, Z j is the vector of gene content of the jth SNP for all windows, and u j is the effect of the jth SNP within ith window. GWAS results are reported as the percentage of genetic additive variance explained by windows of 1 Mb and plotted using CMplot package in R v.1.3.1073 (R core team 2020). Additionally, we calculated the P-value of individual SNP according to Aguilar et al. (2019). All procedures were performed using BLUPF90 family programs (Misztal et al., 2014). To control false positives, we applied the false discovery rate (FDR) method for multiple testing according to Benjamini and Hochberg (1995) as follows: where m is the number of times to be tested, Pmax is the genome-wide significance level empirical P-value of FDR adjusted, and n is the number of significant SNPs at the assigned FDR level (i.e., 0.05).

Gene Annotation and Functional Enrichment
Our results indicated that single SNPs with the smallest P-values did not explain a large percentage of genetic variance. Therefore, we used the percentage of variance explained by SNPs within consecutive 1 Mb windows to identify genomic regions of interest for further investigation. Genes within the 10 windows explaining the largest amount of variance were annotated using the GALLO v.0.99 R package  and the ENSEMBL annotation of the Bos taurus ARS-UCD 1.2 genome assembly (http: / / bovinegenome .org). The genes identified from ENSEMBL were collected to perform a functional annotation by DAVID v6.8 (https: / / david .ncifcrf .gov/ home .jsp) and REVIGO (Supek et al., 2011) using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (https: / / www .genome .jp/ kegg) databases for B. taurus. The annotation and enrichment analyses of QTL in adjacent windows (1 Mb) were assessed for the effects of neighboring SNPs, which can be captured due to linkage disequilibrium. The QTL enrichment was evaluated for all the annotated QTL and the P-values were calculated and corrected for multiple testing, using a value of FDR <0.05 to identify the enriched QTL. The annotation and enrichment analyses were performed using the GALLO v.0.99. package and the Animal QTLdb for the ARS-UCD 1.2 bovine genome (http: / / bovinegenome .org).

Descriptive Statistics and Genetic Parameters
The bulls had a mean of 55.33% (12.70 SD) for TM and 43.21% (11.04 SD) for PM. The fixed effect of the contemporary group and the quadratic and linear effect of age (covariable) were highly significant in the model for both traits (P < 0.001). The effect of the AI center was significant (P < 0.01) and the estimates of herita-bility were low for both TM (0.13 ± 0.0062) and PM (0.10 ± 0.0045). The genetic correlation between TM and PM was high (0.62 ± 0.010) and the repeatability was similar for the 2 traits: 0.31 and 0.33 for TM and PM, respectively (Table 1).

QTL Regions and Candidate Genes
As previously noted, our results indicated that the SNPs with the smallest P-values did not explain a large percentage of genetic variance ( Figure 2-3), contrary to expectation. P-values are influenced by sample size (Greenland et al., 2016) and, in particular, by allele frequencies (Aguilar et al., 2019).
The 10 windows that explained the greatest percentage of genetic variance were located on BTA 1, 2, 4, 6, 7, 23, and 26 for TM and BTA 1, 2, 4, 6, 8, 16, 23, and 26 for PM ( Figure 4). These regions explained 6.91% and 6.07% of the additive genetic variance for TM and PM, respectively (Table 2). A total of 150 genes for TM and 107 genes for PM were annotated by GALLO within these significant windows (Table 2). Among them, 130 genes for TM and 90 genes for PM were found in the Bos taurus DAVID database.
The DAVID annotations and their relative P-values were grouped using REVIGO into biological processes (BP), cellular components (CC), and molecular functions (MF), each comprising 13, 23, and 45 genes for TM and 24, 18, and 61 for PM (Table 3). A total of 42 genes are in common between TM and PM.
The BoLA and JSP.1 genes on BTA 23 were associated with both PM and TM, whereas the OR (olfactory receptor) family genes were only related to PM. The gutathione S-transferase alpha genes, involved with the metabolism of glutathione, which is an antioxidant, were only associated with TM.
Some QTL within the genomic windows explaining the greatest percentage of genetic variance were associated with reproductive traits: these included BTA 1 and 6 for TM, and BTA 1 and 23 for PM. The enrichment analysis of the QTL (FDR-corrected P < 0.05) for TM identified 28 QTL related to the ease of calving and 6 QTL associated with the immune response on BTA 6 and BTA 23. In addition, QTL involved with feed intake (7), carcass weight (24), and fat thickness (10) were associated with TM on BTA 6. For PM, 2 QTL related to reproductive efficiency and with milk yield were identified on BTA 23 (Table 4).

Genetic Parameters
The heritability for TM and PM estimated in this study was low (TM: 0.14 ± 0.0062 and PM: 0.10 ± 0.0045) but comparable to estimates obtained by others using the pedigree relationship matrix A (0.13, Druet et al., 2009;0.12, Yin et al., 2019a) for PM. High heritability estimates, of up to 0.43 have been reported for motility in Holstein bulls (Al-Kanaan et al., 2015;Butler et al., 2020); however, estimates for PM in this breed are low (0.13-0.15, Butler et al., 2020). The differences in heritability estimates may be the result of environmental and management differences that we found to have a significant effect on these traits. In our study estimated repeatability was moderate (0.31 and 0.33 for TM and PM, respectively) which suggests an effect of the permanent environment and an even higher effect of the residual variance.

Candidate Genes and QTL Annotation
In the present study, no loci with significant P-values were found using a single SNP analysis. The SNPs with the smallest P-values would be expected to explain a relatively higher proportion of genetic variation. However, this is not always the case, as P-values are affected by sample size, the magnitude of the effect detected (Greenland et al., 2016), and in particular, by allele frequencies (Aguilar et al., 2019). Liu et al. (2021), in a single-step GWAS in swine, found no overlap between the first 20 SNPs with lowest P-values and those explaining the largest variance. These authors attribute this discrepancy to linkage disequilibrium between SNPs and the causal variants and interaction between SNPs. Therefore, we used the variance explained by sliding window intervals to identify genomic regions involved in sperm motility.
The ten 1-Mb windows with highest additive genetic variance for TM and PM explained, respectively, 6.91% and 6.07% of the additive genetic variance, indicating the highly polygenic nature of these traits. Other studies have also found that several regions across the genome each make small contributions to the total genetic variation for SM. For example, Yin et al. (2019b) found that the most significant regions, located on BTA 1, 3, 4, 5, 11, 17, and 19, explained less than 0.1% of the genetic variance for SM in Holstein bulls. In beef cattle, Sweett et al. (2020) identified 10 windows on BTA 9, 13, 20, and 24 that explained 7.17% of the genetic variance of this trait. There is little consistency in the regions identified across studies, even for signals on the same chromosomes; Rezende et al. (2018) identified genes associated with SM on BTA 5, 11, 22, and 25 in Jersey cattle, with no regions shared with other studies.
The window with highest additive genetic variance we identified was on BTA1 (102,482,484,977), which explained 1.29% of the genetic variance for TM and 1.62% for PM. This window included the gene E1BHJ0, which codes for profilin. Profilins are ubiquitous proteins present in mammals that regulate actin polymerization and are essential for cell viability (Witke et al., 2001), fertilization, and normal embryo development in cattle (Rawe et al., 2006). Four profilin isoforms are expressed in mammalian testis, including spermatogenic cells (Behnen et al., 2009). Profilin-1 and profilin-3 play an important role in mouse sperm metabolism and motility (Vicens et al., 2014) and the loss of profilin-3 has been associated with impairment of spermiogenesis (Umer et al., 2021). The second most important region (0.98% variance) for PM was on BTA26 (46,055,585-47,048,342) and included cytokinesis dedicator 1 (DOCK1) which encodes a member of the migration and invasion dedicator of the cytokinesis protein superfamily which is involved in cell motility (Valente et al., 2016). With regard to TM, the second most important region (0.83% variance) was on BTA 23 (24,376,372,486) and included IL17A and IL17F which have an immunoregulatory role in the male reproductive tract (Bagri et al., 2022). Many of the genes in the other significant windows that fell into BP, CC, and MF categories (Table 3) are associ-ated with spermatogenesis and male fertility [e.g., the peptidylarginine deiminase family (PAD), on BTA 2]. The PAD are a class of Ca +2 dependent enzymes that are widely expressed in mammalian tissues including ovules, ovaries, testes, and in early embryos (Wang and Wang, 2013). The PAD enzymes perform the citrullination of arginine residues, which affects the stability and degradation of proteins, and play a crucial role during cell development, differentiation, and fertility (Fichtner et al., 2020). The enzyme PADI6 is involved in decondensation of sperm chromatin (Wright et al., 2003;Alghamdi et al., 2019). In humans, sperm chromatin organization has been shown to affect sperm function during fertilization and early embryo development (Miller et al., 2010) and has been implicated     . Gene Ontology terms significantly associated with biological processes (BP), cellular components (CC), and molecular functions (MF) related to total motility (TM) and progressive motility (PM) in subfertility (Souza et al., 2018). The BoLA class 1 genes and JSP.1 are associated with both PM and TM. These genes map on BTA 23 and are part of the major histocompatibility complex involved in antigen presentation (Tijjani et al., 2019). The major histocompatibility complex antigens play a key role in immune response and affect production, diseases, and fertility traits, in particular the sperm-ovum interaction (Ellis and Hammond, 2014). BoLA-NC1 is essential for the establishment and maintenance of pregnancy and in the maternal immune response to the fetus (Shu et al., 2012).
Five genes in the most significant GO biological processes term (GO:BP), were associated exclusively with TM. Glutathione S-transferase alpha (GSTA), which is within one of the regions with the highest effect on BTA 23, belongs to the GST superfamily of multigenic and multifunctional isoenzymes that catalyze glutathionedependent reactions (GO:BP Glutathione metabolic process and GO:MF Glutathione transferase activity). Expression of the GST has been reported in male reproductive tissues in cattle (Juyena andStelletta, 2012), pigs (Pérez-Patiño et al., 2019), and humans (Dutta et al., 2021) including sperm, seminal plasma, and in testicular somatic cells, particularly in Leydig and Sertoli cells (Benbrahim-Tallaa et al., 2002). GSTA also plays a role in spermatogenesis, sperm capacitation and sperm-oocyte binding, and is involved in the nuclear decondensation of the sperm chromatin (Kumar et al., 2014;Kwon et al., 2014;Llavanera et al., 2020).
Genes on BTA 23 associated with TM included tripartite motifs (TRIM31 and TRIM10) and interleukins (IL17A, IL17F) that code for proteins involved in immune processes in mammals (GO:BP: antigen processing and presentation and immune system process). TRIM31 is a member of the E3 ubiquitin ligases protein family that is involved with innate immunity (Ozato et al., 2008;Tijjani et al., 2019) and IL17A is a member of the interleukin family that initiate a potent inflammatory response (Zenobia and Hajishengallis, 2015). In humans the level of IL17 in seminal fluid has been shown to inversely correlate with SM and mortality (Qian, 2012) and to play an important role in successful embryo implantation. In addition, IL-17 regulates the secretion of cytokines (IL-1β and TNF-α), chemokines, and promotes the recruitment of neutrophils (Song et al., 2016;Wattegedera et al., 2017) which are related to the modulation of the immune response to sperm antigens and possibly the protection of sperm from the response of the female immune system (Vera et al., 2003).
The GO:CC terms contained the α-tubulin (ATAT1), tubulin beta class I (TUBB), and flotillin 1 (FLOT1) genes, which code for proteins that have been proposed as biomarkers of male fertility (Gilbert et al., 2007;Fu et al., 2019;Patil et al., 2020). Alpha-tubulin is a cytoskeletal protein that contributes to microtubule organization, sperm morphology, and flagellar function, and is present in the cytoplasm of Sertoli cells, Leydig cells, spermatogonia, primary spermatocytes, secondary spermatocytes, and spermatids of seminiferous tubules (Kalebic et al., 2013). Tubulin plays a role in microtubule formation in sperm cells of buffaloes (Fu et al., 2019) and is expressed in spermatogonia, Sertoli cells, spermatids, and spermatozoa of pig testes (Luo et al., 2010). In buffalo sperm, α-tubulin has been shown to provide structural support, facilitate cell motility, and to participate in oocyte binding and activation (Patil et al., 2020). JAKMIP1 and PKDH1 are related to sperm function and cell polarization (Hatzirodos et al., 2014), embryo development and survival (Gigarel et al., 2008), respectively. MDC1, EPHAH2, ATAT1, and FLOT1 code for proteins that are associated with focal adhesion, a process involved in fertilization, ovum transport, and sperm-ovum interaction (Talbot et al., 2003). Additionally, PPP1R10, PPP1R11, PPP2R2C, and FSCN3 genes were identified and code for proteins related to spermatogenesis, SM, and fertility (Han et , 2008Cheng et al., 2009;Gao et al., 2019), PPP2R5A codes for protein phosphatase 2 regulatory subunit B alpha and is correlated with spermatogonial mitosis and it is related to sphingolipid signaling pathway which is associated with SM (Sarakul et al., 2018;Cai et al., 2021). Genes within the region of BTA 23, which showed an association with PM in the current study, belong to the OR family and KHDRBS2. The OR are involved in chemotaxis and determine speed, strength, and direction of movement of the spermatozoa toward the ovum (Ali et al., 2021). Of the 20 to 66 testicular OR known in mammals, at least 8 are expressed in sperm cells, testis, and epididymus (Milardi et al., 2018). In humans, the OR influence spermatogenesis, epididymal growth, acrosome reaction, initiate flagella movement and facilitate cell-to-cell communication (Olaniyan et al., 2021). The OR in the sperm cell activate proteins that increase intracellular calcium ions and hence motility (Vanderhaeghen et al., 1997). KHDRBS2, has been associated with fertility in San Martinero cattle (De León et al., 2019) and pregnancy status in Brahman beef cattle (Reverter et al., 2016) The QTL enrichment analysis for TM revealed QTL associated with feed intake, calving ease, carcass weight, and fat thickness on BTA 6, and antibody-mediated immune response and cell-mediated immune response on BTA 23. Previous studies have identified other QTL on BTA 23 related to genes such as IL4 (interleukin-4), antibody-mediated immune response, and DMI that may be associated with SM (Wang et al. 2012).

CONCLUSIONS
In this study, we identified a large number of genes associated with SM and involved with sperm and sperm-related traits in Italian Holsteins bulls. These findings suggest that many genes controlling SM could have pleiotropic effects and be functionally correlated with other traits related to fertility. Our results show the highly polygenic nature of SM, with each of numerous candidate genes responsible for a small proportion of the genetic variance. SM influences male fertility, a trait that has probably experienced strong selective pressure even before cattle domestication, about 10,000 years ago. As such, beneficial alleles of genes with major effects on this trait are likely to have reached fixation, leaving only variants with small effect in the domesticated populations, which were identified by GWAS as presented here. The characterization of selection signatures across different timescales, as well as the analysis of ancient genomes from domestic and wild animals, may help in identifying variants related to fertility that have experienced selection before and after domestication events.