Advertisement

Identification of sperm proteins as biomarkers of field fertility in Holstein-Friesian bulls used for artificial insemination

Open AccessPublished:October 25, 2022DOI:https://doi.org/10.3168/jds.2022-22273

      ABSTRACT

      Despite passing stringent quality control, bulls used in artificial insemination can vary significantly in their fertility, emphasizing the need for reliable markers of sperm quality. This study aimed to identify sperm proteins acting as biomarkers of fertility in 2 different populations of dairy bulls classified based on their field fertility. Semen was collected and cryopreserved from: 54 Holstein bulls located in Ireland, classified according to fertility indexes as low fertility (LF, n = 23), medium fertility (n = 14), or high fertility (HF, n = 17); and 18 Holstein bulls located in Denmark, classified as LF (n = 8) or HF (n = 10). The proteome was measured through liquid chromatography-mass spectrometry and data were analyzed with the R software. Differentially abundant proteins between HF and LF bulls and biomarker proteins were determined through a modified t-test and random forest, respectively, selecting 301 differentially abundant proteins and 34 biomarker proteins. The predictive ability of the 34 biomarkers was evaluated employing support vector machine as the classifier, using their abundance levels in the Irish bulls to train the model and in the Danish bulls for validation. The prediction accuracy was 94.4%, with only one HF bull misclassified, corresponding to the lowest fertility index bull in the HF group. The biomarkers more abundant in sperm of HF bulls enriched axoneme assembly and sperm motility (false discovery rate <0.05), according to functional analysis. In conclusion, a robust model coupled with the application of appropriate bioinformatic tools allowed the identification of functionally relevant sperm proteins predictive of the fertility of Holstein bulls used in artificial insemination.

      Key words

      INTRODUCTION

      In livestock species of economic importance, such as cattle, reproductive efficiency is critical for achieving on-farm profitability (
      • Shalloo L.
      • Cromie A.
      • McHugh N.
      Effect of fertility on the economics of pasture-based dairy systems.
      ). Thus, improving pregnancy per AI is an area of considerable basic and applied research interest. Much of the research effort in this regard has understandably been focused on the female, due in part to the well-described effects of lactation-induced metabolic stress on the quality of the oocyte, embryo, and reproductive tract environment (
      • Walsh S.W.
      • Williams E.J.
      • Evans A.C.
      A review of the causes of poor fertility in high milk producing dairy cows.
      ). Nonetheless, the fertility of the bull is a major contributor to overall reproductive performance and can have a major effect on productivity and economic return. This is true not only in herds where natural service is predominantly used but even more so where AI is employed, given that semen from elite bulls can be used simultaneously in several countries, frequently reaching tens of thousands of inseminations per year.
      Artificial insemination is one of the most extensively used assisted-reproductive technologies, facilitating high selection intensity and the exploitation of a sire's genetic value through widespread dissemination of his semen. Ejaculates of bulls used for AI are subjected to extensive quality control checks in breeding centers to minimize the chances of adverse pregnancy outcomes after semen is released into the field (
      • Thundathil J.
      • Gil J.
      • Januskauskas A.
      • Larsson B.
      • Soderquist L.
      • Mapletoft R.
      • Rodriguez-Martinez H.
      Relationship between the proportion of capacitated spermatozoa present in frozen-thawed bull semen and fertility with artificial insemination.
      ;
      • Harstine B.R.
      • Utt M.D.
      • DeJarnette J.M.
      Review: Integrating a semen quality control program and sire fertility at a large artificial insemination organization.
      ). Semen assessments before cryopreservation, and post-thaw, to evaluate potential fertility include microscopy-based parameters, such as motility, viability and morphology, and other more advanced and objective methods, using computer-assisted sperm analysis and flow cytometry (
      • Harstine B.R.
      • Utt M.D.
      • DeJarnette J.M.
      Review: Integrating a semen quality control program and sire fertility at a large artificial insemination organization.
      ).
      Nevertheless, despite improvements and modernization of semen evaluation methods, semen from bulls used in AI still exhibits significant variation in field fertility, even after passing stringent quality control checks (
      • Fair S.
      • Lonergan P.
      Review: Understanding the causes of variation in reproductive wastage among bulls.
      ). The challenge is exemplified in the study of
      • Sellem E.
      • Broekhuijse M.L.
      • Chevrier L.
      • Camugli S.
      • Schmitt E.
      • Schibler L.
      • Koenen E.P.
      Use of combinations of in vitro quality assessments to predict fertility of bovine semen.
      who correlated bull fertility (153 ejaculates from 19 Holstein bulls) with a combination of sperm functional parameters. At best, these authors could account for no more than 40% of the variation in sire fertility. Taking a similar approach of using flow cytometric and computer-assisted sperm analysis-based parameters, our group recently reported that 47% of the variation in AI bull field fertility could be explained (
      • Bernecic N.C.
      • Donnellan E.
      • O'Callaghan E.
      • Kupisiewicz K.
      • O'Meara C.
      • Weldon K.
      • Lonergan P.
      • Kenny D.A.
      • Fair S.
      Comprehensive functional analysis reveals that acrosome integrity and viability are key variables distinguishing artificial insemination bulls of varying fertility.
      ). High-throughput (omics) technologies have emerged as promising tools to determine seminal plasma or sperm components associated with fertility at the molecular level (
      • Klein E.K.
      • Swegen A.
      • Gunn A.J.
      • Stephen C.P.
      • Aitken R.J.
      • Gibb Z.
      The future of assessing bull fertility: Can the 'omics fields identify usable biomarkers?.
      ). Previous studies have employed bioinformatic approaches for the analysis of proteomic (
      • Viana A.G.A.
      • Martins A.M.A.
      • Pontes A.H.
      • Fontes W.
      • Castro M.S.
      • Ricart C.A.O.
      • Sousa M.V.
      • Kaya A.
      • Topper E.
      • Memili E.
      • Moura A.A.
      Proteomic landscape of seminal plasma associated with dairy bull fertility.
      ;
      • Kasimanickam R.K.
      • Kasimanickam V.R.
      • Arangasamy A.
      • Kastelic J.P.
      Sperm and seminal plasma proteomics of high- versus low-fertility Holstein bulls.
      ), transcriptomic (
      • Bissonnette N.
      • Levesque-Sergerie J.P.
      • Thibault C.
      • Boissonneault G.
      Spermatozoal transcriptome profiling for bull sperm motility: a potential tool to evaluate semen quality.
      ;
      • Feugang J.M.
      • Rodriguez-Osorio N.
      • Kaya A.
      • Wang H.
      • Page G.
      • Ostermeier G.C.
      • Topper E.K.
      • Memili E.
      Transcriptome analysis of bull spermatozoa: Implications for male fertility.
      ), or metabolomic (
      • Menezes E.B.
      • Velho A.L.C.
      • Santos F.
      • Dinh T.
      • Kaya A.
      • Topper E.
      • Moura A.A.
      • Memili E.
      Uncovering sperm metabolome to discover biomarkers for bull fertility.
      ;
      • Saraf K.K.
      • Kumaresan A.
      • Dasgupta M.
      • Karthikkeyan G.
      • Prasad T.S.K.
      • Modi P.K.
      • Ramesha K.
      • Jeyakumar S.
      • Manimaran A.
      Metabolomic fingerprinting of bull spermatozoa for identification of fertility signature metabolites.
      ) data from semen samples to identify proteins, mRNAs, or metabolites, respectively, as markers of bull fertility. However, a robust and repeatable set of such biomarkers is still lacking. The challenge of identifying potential biomarkers could benefit from the application of machine-learning approaches, which aim to find patterns in complex data sets such as the proteome (
      • Li R.
      • Li L.
      • Xu Y.
      • Yang J.
      Erratum to: Machine learning meets omics applications and perspectives.
      ). The relevance of these methods can be demonstrated through validations with external, independent data.
      We hypothesized that differences in fertility of AI sires can be predicted through the proteomic profile of their sperm. This study aimed to identify sperm proteins behaving as biomarkers of fertility in 2 different populations of dairy bulls whose semen had already passed quality control checks and was used in the field. One of the populations was employed to screen the proteome for selection of the potential biomarkers, while the other was used to validate the sperm proteins as biomarkers of fertility.

      MATERIALS AND METHODS

      No human or animal subjects were used, so this analysis did not require approval by an Institutional Animal Care and Use Committee or Institutional Review Board.

      Animals

      Frozen-thawed sperm from 3 ejaculates collected from Holstein-Friesian bulls located in AI centers in Ireland (n = 54) or Denmark (n = 18) classified as high fertility, medium, or low fertility (Ireland) or high- and low-fertility (Denmark) were pooled within bull and analyzed.

      Irish Bulls

      A panel of Holstein-Friesian bulls (n = 54) was selected from the national population from which cryopreserved semen was used commercially for AI in Ireland. Data on field fertility were obtained from the Irish Cattle Breeding Federation database based on an adjusted sire fertility index (FI;
      • Berry D.P.
      • Evans R.D.
      • Mc Parland S.
      Evaluation of bull fertility in dairy and beef cattle using cow field data.
      ). Sire fertility was defined as pregnancy to a given service identified retrospectively either from a calving event or where a repeat service (or a pregnancy scan) deemed the animal not to be pregnant. These raw data were then adjusted for factors including semen type (frozen, fresh), cow parity, month of service, day of the week when serviced, service number, cow genotype, herd, AI technician, and bull breed. The adjusted sire FI given for each bull was then weighted for the number of service records, resulting in an adjusted pregnancy rate. Holstein-Friesian bulls that had a minimum of 500 inseminations formed the base population (840 bulls), from which low fertility (LF; FI: −0.268 to −0.016, n = 23), medium fertility: (MF; FI: 0.023 to 0.029, n = 14), or high fertility (HF; FI: 0.058 to 0.072, n = 17) were selected. Some of these same bulls were used in previous studies by our group (
      • Bernecic N.C.
      • Donnellan E.
      • O'Callaghan E.
      • Kupisiewicz K.
      • O'Meara C.
      • Weldon K.
      • Lonergan P.
      • Kenny D.A.
      • Fair S.
      Comprehensive functional analysis reveals that acrosome integrity and viability are key variables distinguishing artificial insemination bulls of varying fertility.
      ;
      • Donnellan E.M.
      • O'Brien M.B.
      • Meade K.G.
      • Fair S.
      Comparison of the uterine inflammatory response to frozen-thawed sperm from high and low fertility bulls.
      ;
      • O'Callaghan E.
      • Sanchez J.M.
      • McDonald M.
      • Kelly A.K.
      • Hamdi M.
      • Maicas C.
      • Fair S.
      • Kenny D.A.
      • Lonergan P.
      Sire contribution to fertilization failure and early embryo survival in cattle.
      ;
      • Donnellan E.M.
      • Lonergan P.
      • Meade K.G.
      • Fair S.
      An ex-vivo assessment of differential sperm transport in the female reproductive tract between high and low fertility bulls.
      ).

      Danish Bulls

      Danish bulls were selected from Holstein-Friesian population of 2,220 bulls used for AI in Denmark. Data on field fertility were obtained from Danish National Cattle Register and were expressed as 56-d nonreturn rates, adjusted for the parity of inseminated cow, month of service, herd, year, and season. Eighteen bulls with a minimum of 500 mating were selected for the study based on their 56-d nonreturn rates and semen availability. These bulls were classified as LF (FI: −0.07 to −0.049, n = 8) or HF (FI: 0.062 to 0.114, n = 10).
      The mean FI of both populations of bulls was zero. Information about each bull can be found in Supplemental Table S1 (https://figshare.com/articles/dataset/20367708;
      • Rabaglino M.
      • Lonergan P.
      Table S1. Details about the bulls used in Project 16/IA/4474 (Science Foundation Ireland). figshare. Dataset.
      ).

      Determination of the Sperm Proteome

      Protein Extraction

      Three straws of semen from each bull were thawed in a water-bath at 37°C for 30 s. The contents were centrifuged at 3,000 × g for 7 min at room temperature to obtain a sperm pellet. After removal of the supernatant, 4 washes were performed to eliminate the extender (1 in water and 3 in PBS, both supplemented with cOmplete Protease Inhibitor cocktail; Roche). The resulting pellet was resuspended in 200 µL of extraction buffer (4% SDS, 0.2% deoxycholic acid, 50 mM Tris, 100 mM ammonium bicarbonate, 10 mM dithiothreitol, pH 8), boiled for 5 min and incubated for 2 h in an ultrasound bath. Complete lysis was checked by microscope. After centrifugation to remove cellular debris (12,000 × g for 15 min at 4°C), the supernatant was retained and proteins precipitated with acetone by mixing 1 vol of sample with 4 vol of cold acetone (−20°C) overnight. Precipitated proteins were centrifugated (12,000 × g for 15 min at 4°C), and the pellet dried for 5 min under nitrogen flux followed by resuspension in 0.4% SDS. Protein sample content was quantified using the BCA Protein Assay Kit (Thermo Fisher Scientific).

      Nano-Liquid Chromatography-Mass Spectrometry Analysis

      To carry out liquid chromatography-mass spectrometry analysis, 50 µg of sperm protein extracts were concentrated using stacking acrylamide SDS-PAGE gels. The migration was performed for 20 min at 150-V constant. Then, proteins were stained using colloidal Coomassie blue staining R-250 and the stained band was excised. In-gel digestion was performed with an automated protein digestion system, a MassPrep Station (Waters). The gel plugs were washed twice with 50 µL of 25 mM ammonium hydrogen carbonate (NH4HCO3) and 50 µL of acetonitrile. The cysteine residues were reduced by addition of 50 µL of 10 mM dithiothreitol at 57°C and alkylated by addition of 50 µL of 55 mM iodoacetamide. After dehydration with acetonitrile, the proteins were cleaved in-gel with a 12.5 ng/µL solution of modified porcine trypsin (Promega) in 25 mM NH4HCO3 (∼30 µL). The digestion was performed overnight at room temperature. Tryptic peptides were extracted twice: the first time with 60% acetonitrile in 5% formic acid for 1 h, and the second time with a 100% acetonitrile solution until the gel pieces were dehydrated. The collected extracts were pooled to a final volume of 60 µL. Excess of acetonitrile was evaporated at 37°C before analysis. Peptide mixtures were analyzed by nano-liquid chromatography-mass spectrometry using an ultimate 3000 system coupled to a Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific). For each sample, peptide mixtures were automatically fractionated onto commercial C18 reverse phase column (75 µm × 150 mm, 2-µm particle, PapMap100 RSLC column, Thermo Fisher Scientific) at 35°C. Trapping was performed for 4 min at 5 µL/min, with 98% H2O, 2% acetonitrile, and 0.1% formic acid. Elution was performed using 2 solvents, A (0.1% formic acid in water) and B (0.1% formic acid in acetonitrile) at a flow rate of 300 nL/min. Gradient separation was 2 min from 2 to 5%B, 12 min from 5 to 25% B, 2 min from 25 to 80% B, 3 min 80% B. The column was equilibrated for 8 min with 2% buffer B before the next sample analysis. Eluted peptides were electro-sprayed in positive-ion mode at 1.9 kV through a nanoelectrospray ion source heated at 275°C. Full MS scans were acquired in the Orbitrap mass analyzer over m/z 400–1,200 range with a resolution of 70,000 (m/z 200). The target value was 5.00E+5 and the maximum allowed ion accumulation times were 250 ms. Fifteen most intense peaks with charge state between 2 and 5 were fragmented in the HCD collision cell with 27 eV, and tandem mass spectra were acquired with a resolution of 17,500 (m/z 200). The target value was 5.00E+4 and the maximum allowed ion accumulation times were 150 ms. Dynamic exclusion was set to 7 s.

      Data Processing, Protein Identification, and Abundance

      Raw data from the MS/MS were processed and converted into *.mgf peak list format with Proteome Discover 1.4 (Thermo Fisher Scientific). Data were interpreted with Mascot v2.4 (Matrix Science) against the Bos taurus database (i.e., 32,284 sequences) fused with the sequences of recombinant trypsin and a list of classical contaminants (118 entries). The following parameters were considered for the search: precursor mass tolerance of 0.2 Da and fragment mass tolerance of 0.2 Da, a maximum of one miss cleavage sites of trypsin, carbamidomethylation (C), oxidation (M), propionamidation (C), and protein N-terminal acetylation set as variable modifications. For each sample, peptides were filtered out according to the cut-off set for proteins hits with 2 or more peptides larger than 9 residues, ion score >15, false discovery rate <1%. Protein identification was validated when at least 2 peptides originated from one protein showed statistically significant identity.

      Bioinformatic Analysis of the Proteomic Data

      Determination of Differentially Abundant Proteins and Identification of Putative Biomarker Proteins in the Sperm of Bulls Located in Ireland

      All bioinformatic procedures were carried out using the R software (
      • R Core Team
      R: A language and environment for statistical computing.
      ). Data on protein abundance were log10 transformed, filtered to retain proteins present in more than 14 samples (smaller class), and normalized with the Combat method (

      Leek, J., W. Johnson, H. Parker, E. Fertig, A. Jaffe, Y. Zhang, J. D. Storey, and L. Torres. 2021. sva: Surrogate variable analysis. R package version 3.42.0.

      ) to control for the potential effect of different diluents. Data from the 2,156 proteins retained after filtering (out of 2,742) were analyzed with the limma package (
      • Smyth G.K.
      Limma: Linear models for microarray data.
      ). Briefly, an empirical Bayes method was applied to calculate a moderated t statistic for differential abundance for each protein by performing a linear model fit on the data. Empirical Bayes moderated the standard errors of the estimated log fold changes to produce more stable estimates. A contrast was done between HF versus LF bulls, to determine those differentially abundant sperm proteins (DAP, P < 0.05) between bulls at the extremes of the FI, which were visualized by a volcano plot. A volcano plot is a type of scatterplot that shows statistical significance (P-value) for the statistical comparison between 2 groups on the y-axis versus the magnitude of change between the 2 groups on the x-axis. Each dot represents a protein; those on the right (in red) are proteins upregulated (i. e., more abundant in the HF bulls), whereas those on the left (in green) are downregulated (i. e., more abundant in the LF bulls). The remainder (in black) are not differentially abundant between HF versus LF bulls. However, a P > 0.05 and < 0.1 was considered a tendency.
      Biomarker proteins were selected through a wrapper algorithm around random forest using the Boruta package (
      • Kursa M.B.
      • Rudnicki W.R.
      Feature selection with the Boruta package.
      ). Briefly, the algorithm adds shadow features to the data to train a random forest classifier, evaluating the importance of each feature (protein) through a Mean Decrease Accuracy. Selected features are those with a higher score than the best of their shadow features. The algorithm was set to run a maximum of 100,000 interactions, to select important features with a P < 0.0001.
      Sample distribution and clustering, according to the abundance of the DAP and biomarker proteins, were evaluated through a principal component analysis (PCA) and a hierarchical clustering dendrogram, respectively, using internal packages of R. Briefly, a PCA is a dimension reduction technique that captures sample variability in several components (corresponding to the number of samples, 40 in this case), with component 1 being the one that explains most of the variability. The PCA plot is a type of scatterplot showing the result of the PCA, where samples are clustered based on their similarity. Hierarchical clustering is an algorithm that groups samples according to their similarities (in protein abundance levels for this study). A Spearman Rank Correlation was employed as the similarity metric and complete linkage as the clustering method. Plotting and coloring of the dendrogram were done with the weighted gene co-expression network analysis (WGCNA) package (
      • Langfelder P.
      • Horvath S.
      WGCNA: An R package for weighted correlation network analysis.
      ).
      The lists of DAP and biomarkers more or less abundant in the HF bulls compared with the LF bulls were interrogated for enriched functional terms (false discovery rate <0.05) with ShinyGO v0.75 (
      • Ge S.X.
      • Jung D.
      • Yao R.
      ShinyGO: A graphical gene-set enrichment tool for animals and plants.
      ). Enriched terms were visualized through lollipop plots using the ggplot2 package (
      • Wickham H.
      Ggplot2: Elegant Graphics for Data Analysis.
      ) and networks downloaded from the ShinyGo software.

      Validation of the Biomarker Proteins With an External Population

      Data from the sperm of bulls located in Denmark were used to validate the selected biomarkers. The training data consisted of the abundance data of the biomarker proteins in sperm from the bulls located in Ireland. In contrast, the abundance of those proteins in the sperm of bulls located in Denmark was employed to test the model. The testing data were normalized with the training data through an add-on batch effect adjustment with the bapred package (
      • Hornung R.
      • Causeur D.
      • Bernau C.
      • Boulesteix A.
      Improving cross-study prediction through addon batch effect adjustment or addon normalization.
      ). The classifier was a support vector machine with linear kernels, employing the leave-one-out cross-validation (LOOCV) method as the internal control, and applied with the kernlab package (
      • Karatzoglou A.
      • Smola A.
      • Hornik K.
      • Zeileis A.
      kernlab - An S4 package for kernel methods in R.
      ), through the caret package (
      • Kuhn M.
      Building predictive models in R using the caret package.
      ).

      RESULTS

      There were 301 DAP in the sperm from HF and LF bulls located in Ireland, of which 106 and 195 proteins were more abundant in the HF or LF bulls, respectively (Figure 1A and Supplemental Table S2; https://figshare.com/articles/dataset/20367708;
      • Rabaglino M.
      • Lonergan P.
      Table S2. List of differentially abundant proteins in the sperm of high fertility bulls compared to low fertility bulls. figshare. Dataset.
      ), which separated samples from each group in the PCA plot (Figure 1B). This plot shows that samples belonging to the HF and LF bulls cluster apart in the first component, which captures the main source of variability among the samples, explaining 18.9% of overall variability when considering the 40 components. Samples from LF bulls are more spread in both the first and second component than samples from HF bulls, reflecting the largest standard deviation (SD) in the FI of LF bulls than of HF bulls (mean FI ± SD for: HF bulls: 0.065 ± 0.004; LF bulls: −0.064 ± 0.059). Those DAP which were more abundant in HF bulls enriched 15 biological terms mainly related to the axoneme and cilium assembly, and ribonucleotide binding (Figure 2A). However, the 195 DAP that were more abundant in the sperm of LF bulls enriched 221 terms involved mainly in metabolism, proteosome and cell-cell recognition, including binding of sperm to the zona pellucida (Figure 2B).
      Figure thumbnail gr1
      Figure 1Differentially abundant proteins (DAP) between the sperm of high fertility (HF) and low fertility (LF) bulls. (A) The volcano plot shows the DAP more abundant in HF bulls (106 red dots) or LF bulls (195 green dots). (B) Principal component (PC) analysis plot depicting distribution of samples from HF and LF bulls according to the DAP. FDR = false discovery rate; FC = fold change.
      Figure thumbnail gr2
      Figure 2Functional analysis of differentially abundant proteins (DAP) between the sperm of high fertility (HF) and low fertility (LF) bulls. The lollipop plots illustrate the main 10 ontological terms enriched with the DAP in (A) HF bulls or (B) LF bulls. N Genes refers to the number of genes involved in the ontological term, also represented by the size of the dot. The significance of the overrepresentation for each term is depicted by the length of the line and the color: the longer the line and the redder, the more significant the enrichment for that term. FDR = false discovery rate.
      Application of the machine learning method selected 34 DAP after 27,061 iterations (Table 1). According to the abundance levels of these proteins, samples from HF and LF bulls clustered apart (Figure 3A). When samples from the MF bulls were considered, most of them tended to cluster with those from LF bulls, with only 2 clustering with the HF bulls, as can be appreciated in the hierarchical clustering (Figure 3B). Accordingly, the PCA plot showed, in the first component, that although samples from MF bulls clustered in between samples from HF and LF bulls, most of them were on the right side of the plot, together with samples from the LF bulls (Figure 3C). Functional analysis of the 19 biomarker proteins more abundant in HF Irish bulls revealed enrichment of ontological terms involved mainly in the axoneme and sperm motility (Figure 4). These terms were enriched because of the following proteins (coded by the following genes): E1BMD1 (CFAP43, cilia and flagella associated protein 43); E1BAJ3 (CCDC40, coiled-coil domain containing 40); and F1N5R7 (DNAH7, dynein axonemal heavy chain 7). The 15 biomarker proteins more abundant in LF Irish bulls enriched only 2 terms: hexose and monosaccharide metabolic process.
      Table 1Differentially abundant proteins between the sperm of high fertility and low fertility bulls selected as biomarker proteins according to a random forest algorithm
      UniProtSymbolGene IDNameFold changeP-value
      O77834PRDX6282438Peroxiredoxin 66.35520.0003
      A6QP39MSLN516237Mesothelin3.79830.0032
      A4IFP7ARF5511918ADP ribosylation factor 53.63760.0052
      A6QP01SELENOT783831Selenoprotein T3.53420.0063
      O46385SVIL281509Supervillin3.37200.0070
      Q2KII2MS4A5777599Membrane spanning 4-domains A52.60800.0074
      P43896TSFM281551Elongation factor Ts, mitochondrial2.38380.0019
      F1MBB7TGM1407997Transglutaminase 12.27420.0129
      A2VDZ8RUSC2507809RUN and SH3 domain containing 22.09320.0099
      E1BMP8SBNO1540582Strawberry notch homolog 12.00700.0298
      A8E4P2FARSB788792Phenylalanyl-tRNA synthetase subunit beta1.90990.0297
      Q32PH2TMEM143505862Transmembrane protein 1431.79670.0400
      Q08DF4DNM1508794Dynamin 11.75130.0038
      Q07536ALDH6A1327692Aldehyde dehydrogenase 6 family member A11.69520.0002
      F1MIZ9KPNA3540719Karyopherin subunit alpha 31.69440.0302
      E1BJI6FBXL13101905084F-box and leucine rich repeat protein 131.56390.0035
      E1BMD1CFAP43518801Cilia and flagella associated protein 431.26110.0042
      E1BAJ3CCDC40529788Coiled-coil domain containing 401.14880.0019
      F1N5R7DNAH7517781Dynein axonemal heavy chain 71.05590.0462
      E1BHC8ESPN104976499Espin−1.60910.0042
      Q148H0APOO513847Apolipoprotein O−1.62570.0075
      Q5E9B7CLIC1515646Chloride intracellular channel 1−2.36510.0018
      P16368TIMP2282093TIMP metallopeptidase inhibitor 2−2.38300.0335
      E1BP31ANKRD42522004Ankyrin repeat domain 42−2.49650.0053
      G5E5M8LOC532207532207Epididymis-specific alpha-mannosidase-like−2.57230.0023
      A6QP36LMAN2790870Lectin, mannose binding 2−2.63660.0088
      P29392SPADH1282373Spermadhesin 1−2.86150.0261
      F1MQN0PPP2R1B100138150Protein phosphatase 2 scaffold subunit Abeta−3.22500.0028
      Q3T0H0LCMT1618021Leucine carboxyl methyltransferase 1−3.34120.0063
      G3X7H8LOC617406617406Serpin peptidase inhibitor, clade B (ovalbumin), member 6-like−3.51030.0118
      F1MQ88FHDC1784913FH2 domain containing 1−3.85450.0063
      Q58DL1ARG2518752Arginase 2−3.94940.0001
      F1N1E9LEPR497205Leptin receptor−4.78610.0020
      A5PJI5GET3504586Guided entry of tail-anchored proteins factor 3, ATPase−6.01640.0031
      Figure thumbnail gr3
      Figure 3Distribution of samples according to the abundance levels of the 34 sperm proteins selected as biomarkers. Hierarchical clustering of the samples from (A) high fertility (HF) and low fertility (LF) bulls (red and blue colors, respectively) and (B) including the medium fertility (MF) bulls (green color). (C) Principal component (PC) analysis plot for samples from HF, MF, and LF bulls.
      Figure thumbnail gr4
      Figure 4Functional analysis of 19 sperm proteins selected as biomarkers more abundant in samples from high fertility bulls when compared with low fertility bulls. The lollipop plot illustrates the main 10 ontological terms. N Genes refers to the number of genes involved in the ontological term, also represented by the size of the dot. The significance of the overrepresentation for each term is depicted by the length of the line and the color: the longer the line and the redder, the more significant the enrichment for that term. FDR = false discovery rate.
      Construction of the machine learning model using the abundance levels of the 34 biomarker proteins identified in the samples from HF and LF bulls in the Irish population led to a prediction of the HF or LF bulls in the Danish validation population with 94.4% accuracy: all 8 LF samples were correctly classified as such, whereas 9 out of 10 HF samples were correctly classified as HF. The only HF sample misclassified corresponded to the bull with the lowest FI among the HF bulls (bull 64). This high accuracy could not be reached if the model was built using the abundance levels of all 301 DAP; although all the 8 LF samples were predicted as such, only 3 of the 10 HF samples were correctly classified (61% accuracy). Selection of the 301 DAP were based on the raw P-value and, therefore, it is possible that several of those proteins were considered DAP when in fact they were not differentially abundant (i.e., false positive). Thus, a further selection of important sperm proteins for fertility through machine learning tools improved pinpointing those that would behave as such in an external independent population. This fact was reflected when the hierarchical clustering was performed on all 27 HF and all 31 LF samples from bulls in both populations. Considering the abundance of the 301 DAP, 3 LF samples clustered with the HF samples but 10 HF samples clustered with the LF samples (Figure 5A). However, only 3 HF samples clustered with the LF samples when the abundances of the 34 biomarkers were used to construct the hierarchical clustering (Figure 5B).
      Figure thumbnail gr5
      Figure 5Hierarchical clustering of sperm samples from high fertility (HF) and low fertility (LF) bulls from 2 different populations. Samples were clustered according to the abundance levels of (A) 301 differentially abundant proteins (DAP) between sperm of HF and low fertility LF bulls (red and blue colors, respectively), or (B) 34 DAP selected as biomarkers. Sperm samples obtained from bulls located in Ireland or Denmark are represented in gray or black colors, respectively.

      DISCUSSION

      Genomically assisted selection has revolutionized how the dairy industry identifies and selects bulls for use in AI. However, even though it has increased the reliability of breeding values and shortened the generation interval, the reliable prediction of the field fertility of individual bulls is still challenging. This was never more important, given the widespread use of first season sires internationally before data are available on their field fertility. Identification of bulls of compromised fertility before their semen is released into the field is critical to avoid the economic effect of low pregnancy rates, particularly in seasonal systems of production with short breeding seasons where compact calving patterns are essential (
      • Shalloo L.
      • Cromie A.
      • McHugh N.
      Effect of fertility on the economics of pasture-based dairy systems.
      ).
      A reliable in vitro test, or a combination of tests predictive of bull field fertility, would enable AI companies to identify and eliminate bulls with potential LF before their widespread use in the field, thus improving overall reproductive and productive efficiency. Most AI centers use pre- and post-thawing assessment of the sperm through traditional microscopy-based methods, more modern computational approaches such as computer-assisted sperm analysis, or assays using fluorescent staining and high sample throughput, such as flow cytometry (
      • Harstine B.R.
      • Utt M.D.
      • DeJarnette J.M.
      Review: Integrating a semen quality control program and sire fertility at a large artificial insemination organization.
      ). Nevertheless, in terms of pregnancy rate after AI, fertility in the field can vary by up to 20 to 30%, despite the semen being deemed acceptable after stringent quality control measures (
      • Donnellan E.M.
      • Lonergan P.
      • Meade K.G.
      • Fair S.
      An ex-vivo assessment of differential sperm transport in the female reproductive tract between high and low fertility bulls.
      ). Thus, there is still a need to identify sperm biomarkers that can reliably and repeatedly predict bull fertility. The use of omics technologies combined with traditional bioinformatic approaches, such as regression analysis for hypothesis testing, could determine potentially relevant molecules from the vast data generated through these techniques. Furthermore, the application of machine learning tools could help identify the most important molecules, as these methods are not limited to the scores from the statistical test. In this sense, a random forest classifier is a powerful wrapper method for feature selection, particularly when applied with the Boruta algorithm (
      • Kumar S.
      • Shaikh T.
      Empirical evaluation of the performance of feature selection approaches on Random Forest.
      ). In this study, we have employed both statistical and machine learning methods to select sperm proteins characterizing bulls classified according to their fertility in the field. We used the protein data of the sperm from bulls classified in both extremes (HF and LF) to avoid confusion in the data that could be introduced by bulls classified in the middle (MF). Semen collected from these animals had satisfactorily passed all the quality control measures carried out before and after thawing. In other words, animals classified as LF were not infertile, but, in the overall population, semen from LF bulls achieved a lower pregnancy rate than those from HF bulls.
      Among the various molecules that can be measured through high-throughput techniques, protein presence or abundance have been related to male fertility in several species (
      • Druart X.
      • Rickard J.P.
      • Tsikis G.
      • de Graaf S.P.
      Seminal plasma proteins as markers of sperm fertility.
      ;
      • Griffin R.A.
      • Swegen A.
      • Baker M.
      • Aitken R.J.
      • Skerrett-Byrne D.A.
      • Silva Rodriguez A.
      • Martin-Cano F.E.
      • Nixon B.
      • Pena F.J.
      • Delehedde M.
      • Sergeant N.
      • Gibb Z.
      Mass spectrometry reveals distinct proteomic profiles in high- and low-quality stallion spermatozoa.
      ;
      • Leahy T.
      • Rickard J.P.
      • Pini T.
      • Gadella B.M.
      • Graaf S.P.
      Quantitative proteomic analysis of seminal plasma, sperm membrane proteins, and seminal extracellular vesicles suggests vesicular mechanisms aid in the removal and addition of proteins to the ram sperm membrane.
      ;
      • Mills K.M.
      • Aryal U.K.
      • Sobreira T.
      • Minton A.M.
      • Casey T.
      • Stewart K.R.
      Shotgun proteome analysis of seminal plasma differentiate boars by reproductive performance.
      ), and may represent a better source than transcripts for finding biomarkers, as sperm are transcriptionally silent. Several reports have employed proteomics in sperm, seminal plasma, or both, from bulls with divergent fertility (
      • Peddinti D.
      • Nanduri B.
      • Kaya A.
      • Feugang J.M.
      • Burgess S.C.
      • Memili E.
      Comprehensive proteomic analysis of bovine spermatozoa of varying fertility rates and identification of biomarkers associated with fertility.
      ;
      • D'Amours O.
      • Frenette G.
      • Fortier M.
      • Leclerc P.
      • Sullivan R.
      Proteomic comparison of detergent-extracted sperm proteins from bulls with different fertility indexes.
      ;
      • Soggiu A.
      • Piras C.
      • Hussein H.A.
      • De Canio M.
      • Gaviraghi A.
      • Galli A.
      • Urbani A.
      • Bonizzi L.
      • Roncada P.
      Unravelling the bull fertility proteome.
      ;
      • Somashekar L.
      • Selvaraju S.
      • Parthipan S.
      • Patil S.K.
      • Binsila B.K.
      • Venkataswamy M.M.
      • Karthik Bhat S.
      • Ravindra J.P.
      Comparative sperm protein profiling in bulls differing in fertility and identification of phosphatidylethanolamine-binding protein 4, a potential fertility marker.
      ;
      • Kasimanickam R.K.
      • Kasimanickam V.R.
      • Arangasamy A.
      • Kastelic J.P.
      Sperm and seminal plasma proteomics of high- versus low-fertility Holstein bulls.
      ). In a recently published review,
      • Klein E.K.
      • Swegen A.
      • Gunn A.J.
      • Stephen C.P.
      • Aitken R.J.
      • Gibb Z.
      The future of assessing bull fertility: Can the 'omics fields identify usable biomarkers?.
      identified the common, more abundant proteins across several studies, in HF or LF bulls. Among the 28 DAP they identified in sperm, 23 are present among the sperm proteins detected in the current study. Six of the 23 proteins were differentially abundant between HF or LF bulls (P < 0.1), and 2 are consistent with the results found by other authors: CCT5 (chaperonin containing TCP1 subunit 5; P = 0.005) and PEBP4 (phosphatidylethanolamine binding protein 4; P = 0.06), which were more abundant in LF or HF bulls, respectively. The abundance of CCT5 in sperm decreased with fertility in bulls from the Irish population, whereas it tended to decrease in bulls from the Danish population (Figure 6A), consistent with previous findings (
      • D'Amours O.
      • Frenette G.
      • Fortier M.
      • Leclerc P.
      • Sullivan R.
      Proteomic comparison of detergent-extracted sperm proteins from bulls with different fertility indexes.
      ;
      • Kasimanickam R.K.
      • Kasimanickam V.R.
      • Arangasamy A.
      • Kastelic J.P.
      Sperm and seminal plasma proteomics of high- versus low-fertility Holstein bulls.
      ). This protein is a chaperonin involved in protein folding, particularly of actin and tubulin (
      • Sternlicht H.
      • Farr G.
      • Sternlicht M.
      • Driscoll J.
      • Willison K.
      • Yaffe M.
      The T-complex polypeptide 1 complex is a chaperonin for tubulin and actin in vivo.
      ). Cytoplasmatic CCT in the spermatid should be discarded in residual bodies at spermiation, at least in rats (
      • Souès S.
      • Kann M.
      • Fouquet J.
      • Melki R.
      The cytosolic chaperonin CCT associates to cytoplasmic microtubular structures during mammalian spermiogenesis and to heterochromatin in germline and somatic cells.
      ). Therefore, the higher abundance of this protein in sperm samples could indicate an incomplete process during spermatogenesis, potentially comprising fertility.
      Figure thumbnail gr6
      Figure 6Box plots showing the abundance levels for selected proteins in the sperm of bulls with high fertility (HF; red boxes), medium fertility (MF; green boxes), and low fertility (LF; blue boxes). The top or bottom plots correspond to data from bulls from an Irish or Danish population, respectively. CCT5: chaperonin containing TCP1 subunit 5; CFAP43: cilia and flagella associated protein 43; CCDC40: coiled-coil domain containing 40; and DNAH7: dynein axonemal heavy chain 7. The solid or dotted horizontal bars show differences (P < 0.05) or a tendency toward a difference (P > 0.05 and < 0.1), respectively, in protein abundance between groups after ANOVA. Upper and lower edges of boxes represent the third and first quartiles, respectively. The midlines show the median, or second quartile. The upper and lower whiskers extend from the third and first quartile to the maximum and minimum values, respectively. Dots represent extreme values, i.e., higher or lower than the third or first quartiles, respectively, plus 1.5 times the interquartile range (difference between the third and first quartile).
      Apart from CCT5 and PEBP4, the other 21 proteins in common with our study and that of
      • Klein E.K.
      • Swegen A.
      • Gunn A.J.
      • Stephen C.P.
      • Aitken R.J.
      • Gibb Z.
      The future of assessing bull fertility: Can the 'omics fields identify usable biomarkers?.
      were not differentially abundant or did not follow the same trend in our study, most likely due to the many factors involved in fertility and differences in animal and sperm handling across the experiments. Among these factors is the different diluents used during the cryopreservation of sperm; nonetheless, it is worth noting that we applied a bioinformatic approach to remove the potential effect of the diluent, as it has been shown that it can affect sperm parameters (
      • Murphy E.M.
      • O'Meara C.
      • Eivers B.
      • Lonergan P.
      • Fair S.
      Comparison of plant- and egg yolk-based semen diluents on in vitro sperm kinematics and in vivo fertility of frozen-thawed bull semen.
      ).
      Sperm proteins have great potential as predictive biomarkers of fertility. Nevertheless, although each new study provides increased clarity, the complex nature of male fertility, and the multifactorial nature of the causes of subfertility, means that the selection of a single marker of male fertility is unlikely. In contrast, the combined abundance levels of many proteins can lead to a reliable and repeatable indicator of fertility. In this study, biomarker proteins were selected in one of the bull populations (Ireland) and validated in the other (Denmark), classifying the sperm samples correctly from this latter population with 94.4% accuracy. Among the proteins selected as potential biomarkers, CFAP43, CCDC40, and DNAH7 may have a major role in fertility, as they are involved with axoneme assembly and cilium movement, including sperm motility and regulation of the beating frequency. Indeed, the abundance of these proteins increased with increasing fertility in the sperm from bulls in the Irish population. For bulls in the Danish population, the abundance of DNAH7 increased, whereas CCDC40 tended to increase in the sperm of HF bulls compared with LF bulls. The abundance of CFAP43 followed the same pattern, but it was not different (P = 0.1; Figure 6 B–D). Although these proteins have not previously been associated with fertility in cattle, several studies in other species have found an association between mutations in the coding genes and infertility. Two independent studies have reported that mutations in CFAP43 and CFAP44 are related to male infertility in men and mice because of axonemal disorganization in the sperm tail, which can lead to 100% of immotile, morphologically abnormal sperm if the mutation is biallelic (
      • Tang S.
      • Wang X.
      • Li W.
      • Yang X.
      • Li Z.
      • Liu W.
      • Li C.
      • Zhu Z.
      • Wang L.
      • Wang J.
      • Zhang L.
      • Sun X.
      • Zhi E.
      • Wang H.
      • Li H.
      • Jin L.
      • Luo Y.
      • Wang J.
      • Yang S.
      • Zhang F.
      Biallelic mutations in CFAP43 and CFAP44 cause male infertility with multiple morphological abnormalities of the sperm flagella.
      ;
      • Coutton C.
      • Vargas A.S.
      • Amiri-Yekta A.
      • Kherraf Z.E.
      • Ben Mustapha S.F.
      • Le Tanno P.
      • Wambergue-Legrand C.
      • Karaouzene T.
      • Martinez G.
      • Crouzy S.
      • Daneshipour A.
      • Hosseini S.H.
      • Mitchell V.
      • Halouani L.
      • Marrakchi O.
      • Makni M.
      • Latrous H.
      • Kharouf M.
      • Deleuze J.F.
      • Boland A.
      • Hennebicq S.
      • Satre V.
      • Jouk P.S.
      • Thierry-Mieg N.
      • Conne B.
      • Dacheux D.
      • Landrein N.
      • Schmitt A.
      • Stouvenel L.
      • Lores P.
      • El Khouri E.
      • Bottari S.P.
      • Faure J.
      • Wolf J.P.
      • Pernet-Gallay K.
      • Escoffier J.
      • Gourabi H.
      • Robinson D.R.
      • Nef S.
      • Dulioust E.
      • Zouari R.
      • Bonhivers M.
      • Toure A.
      • Arnoult C.
      • Ray P.F.
      Mutations in CFAP43 and CFAP44 cause male infertility and flagellum defects in Trypanosoma and human.
      ). Mutations in CCDC39 and CCDC40 are involved in primary ciliary dyskinesia, characterized by abnormal ciliary motility because of axonemal disorganization (
      • Blanchon S.
      • Legendre M.
      • Copin B.
      • Duquesnoy P.
      • Montantin G.
      • Kott E.
      • Dastot F.
      • Jeanson L.
      • Cachanado M.
      • Rousseau A.
      • Papon J.F.
      • Beydon N.
      • Brouard J.
      • Crestani B.
      • Deschildre A.
      • Desir J.
      • Dollfus H.
      • Leheup B.
      • Tamalet A.
      • Thumerelle C.
      • Vojtek A.M.
      • Escalier D.
      • Coste A.
      • de Blic J.
      • Clement A.
      • Escudier E.
      • Amselem S.
      Delineation of CCDC39/CCDC40 mutation spectrum and associated phenotypes in primary ciliary dyskinesia.
      ;
      • Liu L.
      • Zhou K.
      • Song Y.
      • Liu X.
      CCDC40 mutation as a cause of infertility in a Chinese family with primary ciliary dyskinesia.
      ). Finally, mutations in DNAH7, which is part of the ciliary or flagellar axonemal inner dynein arm-heavy chain (
      • Neesen J.
      • Koehler M.R.
      • Kirschner R.
      • Steinlein C.
      • Kreutzberger J.
      • Engel W.
      • Schmid M.
      Identification of dynein heavy chain genes expressed in human and mouse testis: chromosomal localization of an axonemal dynein gene.
      ), are a cause of idiopathic asthenozoospermia in men (sperm with low motility) given the defects in the structure and function of the sperm flagellum (
      • Wang Y.Y.
      • Ke C.C.
      • Chen Y.L.
      • Lin Y.H.
      • Yu I.S.
      • Ku W.C.
      • O'Bryan M.K.
      • Lin Y.H.
      Deficiency of the Tbc1d21 gene causes male infertility with morphological abnormalities of the sperm mitochondria and flagellum in mice.
      ;
      • Wei X.
      • Sha Y.
      • Wei Z.
      • Zhu X.
      • He F.
      • Zhang X.
      • Liu W.
      • Wang Y.
      • Lu Z.
      Bi-allelic mutations in DNAH7 cause asthenozoospermia by impairing the integrality of axoneme structure.
      ). The sperm of bulls employed in this study and classified as LF passed the commercial quality control parameters, including progressive motility. This would suggest that the issues leading to LF were post-insemination events, for example, sperm transport in the female reproductive tract, ability to fertilize the oocyte, or to sustain the embryo.

      CONCLUSIONS

      The development of an accurate and robust bull fertility prediction model is challenging. Ultimately, the combination of state-of-the-art genetic, physiological, immunological, and molecular approaches, combined with in vitro sperm functional and bioinformatic analyses will likely lead to a better prediction of field fertility. In this study, we screened the sperm proteome in bulls with high- or low-field fertility to identify potential biomarker proteins. Application of bioinformatics and machine learning methods identified 34 sperm proteins, the abundance of which could predict fertility in an independent population of bulls with 94% accuracy. Some of these proteins, more abundant in HF bulls, were involved in sperm motility. Thus, this study introduces a set of proteins that could help to discriminate the semen from bulls with undesirable performance when used in AI.

      ACKNOWLEDGMENTS

      This project was funded by Science Foundation Ireland (Dublin, Ireland; Project 16/IA/4474). MBL was supported by an H2020-MSCA-Individual Fellowship (Proposal 101021311). The authors have not stated any conflicts of interest.

      REFERENCES

        • Bernecic N.C.
        • Donnellan E.
        • O'Callaghan E.
        • Kupisiewicz K.
        • O'Meara C.
        • Weldon K.
        • Lonergan P.
        • Kenny D.A.
        • Fair S.
        Comprehensive functional analysis reveals that acrosome integrity and viability are key variables distinguishing artificial insemination bulls of varying fertility.
        J. Dairy Sci. 2021; 104 (34253371): 11226-11241
        • Berry D.P.
        • Evans R.D.
        • Mc Parland S.
        Evaluation of bull fertility in dairy and beef cattle using cow field data.
        Theriogenology. 2011; 75 (20875673): 172-181
        • Bissonnette N.
        • Levesque-Sergerie J.P.
        • Thibault C.
        • Boissonneault G.
        Spermatozoal transcriptome profiling for bull sperm motility: a potential tool to evaluate semen quality.
        Reproduction. 2009; 138 (19423662): 65-80
        • Blanchon S.
        • Legendre M.
        • Copin B.
        • Duquesnoy P.
        • Montantin G.
        • Kott E.
        • Dastot F.
        • Jeanson L.
        • Cachanado M.
        • Rousseau A.
        • Papon J.F.
        • Beydon N.
        • Brouard J.
        • Crestani B.
        • Deschildre A.
        • Desir J.
        • Dollfus H.
        • Leheup B.
        • Tamalet A.
        • Thumerelle C.
        • Vojtek A.M.
        • Escalier D.
        • Coste A.
        • de Blic J.
        • Clement A.
        • Escudier E.
        • Amselem S.
        Delineation of CCDC39/CCDC40 mutation spectrum and associated phenotypes in primary ciliary dyskinesia.
        J. Med. Genet. 2012; 49 (22693285): 410-416
        • Coutton C.
        • Vargas A.S.
        • Amiri-Yekta A.
        • Kherraf Z.E.
        • Ben Mustapha S.F.
        • Le Tanno P.
        • Wambergue-Legrand C.
        • Karaouzene T.
        • Martinez G.
        • Crouzy S.
        • Daneshipour A.
        • Hosseini S.H.
        • Mitchell V.
        • Halouani L.
        • Marrakchi O.
        • Makni M.
        • Latrous H.
        • Kharouf M.
        • Deleuze J.F.
        • Boland A.
        • Hennebicq S.
        • Satre V.
        • Jouk P.S.
        • Thierry-Mieg N.
        • Conne B.
        • Dacheux D.
        • Landrein N.
        • Schmitt A.
        • Stouvenel L.
        • Lores P.
        • El Khouri E.
        • Bottari S.P.
        • Faure J.
        • Wolf J.P.
        • Pernet-Gallay K.
        • Escoffier J.
        • Gourabi H.
        • Robinson D.R.
        • Nef S.
        • Dulioust E.
        • Zouari R.
        • Bonhivers M.
        • Toure A.
        • Arnoult C.
        • Ray P.F.
        Mutations in CFAP43 and CFAP44 cause male infertility and flagellum defects in Trypanosoma and human.
        Nat. Commun. 2018; 9 (29449551): 686
        • D'Amours O.
        • Frenette G.
        • Fortier M.
        • Leclerc P.
        • Sullivan R.
        Proteomic comparison of detergent-extracted sperm proteins from bulls with different fertility indexes.
        Reproduction. 2010; 139 (19952166): 545-556
        • Donnellan E.M.
        • Lonergan P.
        • Meade K.G.
        • Fair S.
        An ex-vivo assessment of differential sperm transport in the female reproductive tract between high and low fertility bulls.
        Theriogenology. 2022; 181 (35063920): 42-49
        • Donnellan E.M.
        • O'Brien M.B.
        • Meade K.G.
        • Fair S.
        Comparison of the uterine inflammatory response to frozen-thawed sperm from high and low fertility bulls.
        Theriogenology. 2021; 176 (34564014): 26-34
        • Druart X.
        • Rickard J.P.
        • Tsikis G.
        • de Graaf S.P.
        Seminal plasma proteins as markers of sperm fertility.
        Theriogenology. 2019; 137 (31285051): 30-35
        • Fair S.
        • Lonergan P.
        Review: Understanding the causes of variation in reproductive wastage among bulls.
        Animal. 2018; 12 (29779500): s53-s62
        • Feugang J.M.
        • Rodriguez-Osorio N.
        • Kaya A.
        • Wang H.
        • Page G.
        • Ostermeier G.C.
        • Topper E.K.
        • Memili E.
        Transcriptome analysis of bull spermatozoa: Implications for male fertility.
        Reprod. Biomed. Online. 2010; 21 (20638337): 312-324
        • Ge S.X.
        • Jung D.
        • Yao R.
        ShinyGO: A graphical gene-set enrichment tool for animals and plants.
        Bioinformatics. 2020; 36 (31882993): 2628-2629
        • Griffin R.A.
        • Swegen A.
        • Baker M.
        • Aitken R.J.
        • Skerrett-Byrne D.A.
        • Silva Rodriguez A.
        • Martin-Cano F.E.
        • Nixon B.
        • Pena F.J.
        • Delehedde M.
        • Sergeant N.
        • Gibb Z.
        Mass spectrometry reveals distinct proteomic profiles in high- and low-quality stallion spermatozoa.
        Reproduction. 2020; 160 (32805711): 695-707
        • Harstine B.R.
        • Utt M.D.
        • DeJarnette J.M.
        Review: Integrating a semen quality control program and sire fertility at a large artificial insemination organization.
        Animal. 2018; 12: 1-12
        • Hornung R.
        • Causeur D.
        • Bernau C.
        • Boulesteix A.
        Improving cross-study prediction through addon batch effect adjustment or addon normalization.
        Bioinformatics. 2017; 33 (27797760): 397-404
        • Karatzoglou A.
        • Smola A.
        • Hornik K.
        • Zeileis A.
        kernlab - An S4 package for kernel methods in R.
        J. Stat. Softw. 2004; 11: 1-20
        • Kasimanickam R.K.
        • Kasimanickam V.R.
        • Arangasamy A.
        • Kastelic J.P.
        Sperm and seminal plasma proteomics of high- versus low-fertility Holstein bulls.
        Theriogenology. 2019; 126 (30529997): 41-48
        • Klein E.K.
        • Swegen A.
        • Gunn A.J.
        • Stephen C.P.
        • Aitken R.J.
        • Gibb Z.
        The future of assessing bull fertility: Can the 'omics fields identify usable biomarkers?.
        Biol. Reprod. 2022; 106 (35136971): 854-864
        • Kuhn M.
        Building predictive models in R using the caret package.
        J. Stat. Softw. 2008; 28: 5
        • Kumar S.
        • Shaikh T.
        Empirical evaluation of the performance of feature selection approaches on Random Forest.
        in: 2017 International Conference on Computer and Applications. 2017: 227-231
        • Kursa M.B.
        • Rudnicki W.R.
        Feature selection with the Boruta package.
        J. Stat. Softw. 2010; 36: 1-13
        • Langfelder P.
        • Horvath S.
        WGCNA: An R package for weighted correlation network analysis.
        BMC Bioinformatics. 2008; 9 (19114008): 559
        • Leahy T.
        • Rickard J.P.
        • Pini T.
        • Gadella B.M.
        • Graaf S.P.
        Quantitative proteomic analysis of seminal plasma, sperm membrane proteins, and seminal extracellular vesicles suggests vesicular mechanisms aid in the removal and addition of proteins to the ram sperm membrane.
        Proteomics. 2020; 20 (32383290)1900289
      1. Leek, J., W. Johnson, H. Parker, E. Fertig, A. Jaffe, Y. Zhang, J. D. Storey, and L. Torres. 2021. sva: Surrogate variable analysis. R package version 3.42.0.

        • Li R.
        • Li L.
        • Xu Y.
        • Yang J.
        Erratum to: Machine learning meets omics applications and perspectives.
        Brief. Bioinform. 2022; 23bbab560
        • Liu L.
        • Zhou K.
        • Song Y.
        • Liu X.
        CCDC40 mutation as a cause of infertility in a Chinese family with primary ciliary dyskinesia.
        Medicine (Baltimore). 2021; 100 (34941110)e28275
        • Menezes E.B.
        • Velho A.L.C.
        • Santos F.
        • Dinh T.
        • Kaya A.
        • Topper E.
        • Moura A.A.
        • Memili E.
        Uncovering sperm metabolome to discover biomarkers for bull fertility.
        BMC Genomics. 2019; 20 (31533629): 714
        • Mills K.M.
        • Aryal U.K.
        • Sobreira T.
        • Minton A.M.
        • Casey T.
        • Stewart K.R.
        Shotgun proteome analysis of seminal plasma differentiate boars by reproductive performance.
        Theriogenology. 2020; 157 (32810790): 130-139
        • Murphy E.M.
        • O'Meara C.
        • Eivers B.
        • Lonergan P.
        • Fair S.
        Comparison of plant- and egg yolk-based semen diluents on in vitro sperm kinematics and in vivo fertility of frozen-thawed bull semen.
        Anim. Reprod. Sci. 2018; 191 (29496341): 70-75
        • Neesen J.
        • Koehler M.R.
        • Kirschner R.
        • Steinlein C.
        • Kreutzberger J.
        • Engel W.
        • Schmid M.
        Identification of dynein heavy chain genes expressed in human and mouse testis: chromosomal localization of an axonemal dynein gene.
        Gene. 1997; 200 (9373155): 193-202
        • O'Callaghan E.
        • Sanchez J.M.
        • McDonald M.
        • Kelly A.K.
        • Hamdi M.
        • Maicas C.
        • Fair S.
        • Kenny D.A.
        • Lonergan P.
        Sire contribution to fertilization failure and early embryo survival in cattle.
        J. Dairy Sci. 2021; 104 (33714587): 7262-7271
        • Peddinti D.
        • Nanduri B.
        • Kaya A.
        • Feugang J.M.
        • Burgess S.C.
        • Memili E.
        Comprehensive proteomic analysis of bovine spermatozoa of varying fertility rates and identification of biomarkers associated with fertility.
        BMC Syst. Biol. 2008; 2 (18294385): 19
        • R Core Team
        R: A language and environment for statistical computing.
        R Foundation for Statistical Computing, 2020
        • Rabaglino M.
        • Lonergan P.
        Table S1. Details about the bulls used in Project 16/IA/4474 (Science Foundation Ireland). figshare. Dataset.
        • Rabaglino M.
        • Lonergan P.
        Table S2. List of differentially abundant proteins in the sperm of high fertility bulls compared to low fertility bulls. figshare. Dataset.
        • Saraf K.K.
        • Kumaresan A.
        • Dasgupta M.
        • Karthikkeyan G.
        • Prasad T.S.K.
        • Modi P.K.
        • Ramesha K.
        • Jeyakumar S.
        • Manimaran A.
        Metabolomic fingerprinting of bull spermatozoa for identification of fertility signature metabolites.
        Mol. Reprod. Dev. 2020; 87 (32452071): 692-703
        • Sellem E.
        • Broekhuijse M.L.
        • Chevrier L.
        • Camugli S.
        • Schmitt E.
        • Schibler L.
        • Koenen E.P.
        Use of combinations of in vitro quality assessments to predict fertility of bovine semen.
        Theriogenology. 2015; 84 (26296523): 1447-1454.e5
        • Shalloo L.
        • Cromie A.
        • McHugh N.
        Effect of fertility on the economics of pasture-based dairy systems.
        Animal. 2014; 8 (24679449): 222-231
        • Smyth G.K.
        Limma: Linear models for microarray data.
        in: Gentleman R.V.C.R. Dudoit S. Irizarry R. Huber W. Bioinformatics and Computational Biology Solutions using R and Bioconductor. Springer, 2005: 397-420
        • Soggiu A.
        • Piras C.
        • Hussein H.A.
        • De Canio M.
        • Gaviraghi A.
        • Galli A.
        • Urbani A.
        • Bonizzi L.
        • Roncada P.
        Unravelling the bull fertility proteome.
        Mol. Biosyst. 2013; 9 (23392320): 1188-1195
        • Somashekar L.
        • Selvaraju S.
        • Parthipan S.
        • Patil S.K.
        • Binsila B.K.
        • Venkataswamy M.M.
        • Karthik Bhat S.
        • Ravindra J.P.
        Comparative sperm protein profiling in bulls differing in fertility and identification of phosphatidylethanolamine-binding protein 4, a potential fertility marker.
        Andrology. 2017; 5 (28859251): 1032-1051
        • Souès S.
        • Kann M.
        • Fouquet J.
        • Melki R.
        The cytosolic chaperonin CCT associates to cytoplasmic microtubular structures during mammalian spermiogenesis and to heterochromatin in germline and somatic cells.
        Exp. Cell Res. 2003; 288 (12915127): 363-373
        • Sternlicht H.
        • Farr G.
        • Sternlicht M.
        • Driscoll J.
        • Willison K.
        • Yaffe M.
        The T-complex polypeptide 1 complex is a chaperonin for tubulin and actin in vivo.
        Proc. Natl. Acad. Sci. USA. 1993; 90 (8105476): 9422-9426
        • Tang S.
        • Wang X.
        • Li W.
        • Yang X.
        • Li Z.
        • Liu W.
        • Li C.
        • Zhu Z.
        • Wang L.
        • Wang J.
        • Zhang L.
        • Sun X.
        • Zhi E.
        • Wang H.
        • Li H.
        • Jin L.
        • Luo Y.
        • Wang J.
        • Yang S.
        • Zhang F.
        Biallelic mutations in CFAP43 and CFAP44 cause male infertility with multiple morphological abnormalities of the sperm flagella.
        Am. J. Hum. Genet. 2017; 100 (28552195): 854-864
        • Thundathil J.
        • Gil J.
        • Januskauskas A.
        • Larsson B.
        • Soderquist L.
        • Mapletoft R.
        • Rodriguez-Martinez H.
        Relationship between the proportion of capacitated spermatozoa present in frozen-thawed bull semen and fertility with artificial insemination.
        Int. J. Androl. 1999; 22 (10624605): 366-373
        • Viana A.G.A.
        • Martins A.M.A.
        • Pontes A.H.
        • Fontes W.
        • Castro M.S.
        • Ricart C.A.O.
        • Sousa M.V.
        • Kaya A.
        • Topper E.
        • Memili E.
        • Moura A.A.
        Proteomic landscape of seminal plasma associated with dairy bull fertility.
        Sci. Rep. 2018; 8 (30397208)16323
        • Walsh S.W.
        • Williams E.J.
        • Evans A.C.
        A review of the causes of poor fertility in high milk producing dairy cows.
        Anim. Reprod. Sci. 2011; 123 (21255947): 127-138
        • Wang Y.Y.
        • Ke C.C.
        • Chen Y.L.
        • Lin Y.H.
        • Yu I.S.
        • Ku W.C.
        • O'Bryan M.K.
        • Lin Y.H.
        Deficiency of the Tbc1d21 gene causes male infertility with morphological abnormalities of the sperm mitochondria and flagellum in mice.
        PLOS Genet. 2020; 16 (32976492)e1009020
        • Wei X.
        • Sha Y.
        • Wei Z.
        • Zhu X.
        • He F.
        • Zhang X.
        • Liu W.
        • Wang Y.
        • Lu Z.
        Bi-allelic mutations in DNAH7 cause asthenozoospermia by impairing the integrality of axoneme structure.
        Acta Biochim. Biophys. Sin. (Shanghai). 2021; 53 (34476482): 1300-1309
        • Wickham H.
        Ggplot2: Elegant Graphics for Data Analysis.
        Springer-Verlag, New York2016