Identification of sperm proteins as biomarkers of field fertility in Holstein-Friesian bulls used for artificial insemination

Despite passing stringent quality control, bulls used in artificial insemination can vary significantly in their fertility, emphasizing the need for reliable markers of sperm quality. This study aimed to identify sperm proteins acting as biomarkers of fertility in 2 different populations of dairy bulls classified based on their field fertility. Semen was collected and cryopreserved from: 54 Holstein bulls located in Ireland, classified according to fertility indexes as low fertility (LF, n = 23), medium fertility (n = 14), or high fertility (HF, n = 17); and 18 Holstein bulls located in Denmark, classified as LF (n = 8) or HF (n = 10). The proteome was measured through liquid chromatography-mass spectrometry and data were analyzed with the R software. Differentially abundant proteins between HF and LF bulls and bio-marker proteins were determined through a modified t -test and random forest, respectively, selecting 301 differentially abundant proteins and 34 biomarker proteins. The predictive ability of the 34 biomarkers was evaluated employing support vector machine as the classifier, using their abundance levels in the Irish bulls to train the model and in the Danish bulls for validation. The prediction accuracy was 94.4%, with only one HF bull misclassified, corresponding to the lowest fertility index bull in the HF group. The biomarkers more abundant in sperm of HF bulls enriched axoneme assembly and sperm motility (false discovery rate <0.05), according to functional analysis. In conclusion, a robust model coupled with the application of appropriate bioinformatic tools allowed the identification of functionally relevant sperm proteins predictive of the fertility of Holstein bulls used in artificial insemination.


INTRODUCTION
In livestock species of economic importance, such as cattle, reproductive efficiency is critical for achieving on-farm profitability (Shalloo et al., 2014).Thus, improving pregnancy per AI is an area of considerable basic and applied research interest.Much of the research effort in this regard has understandably been focused on the female, due in part to the well-described effects of lactation-induced metabolic stress on the quality of the oocyte, embryo, and reproductive tract environment (Walsh et al., 2011).Nonetheless, the fertility of the bull is a major contributor to overall reproductive performance and can have a major effect on productivity and economic return.This is true not only in herds where natural service is predominantly used but even more so where AI is employed, given that semen from elite bulls can be used simultaneously in several countries, frequently reaching tens of thousands of inseminations per year.
Artificial insemination is one of the most extensively used assisted-reproductive technologies, facilitating high selection intensity and the exploitation of a sire's genetic value through widespread dissemination of his semen.Ejaculates of bulls used for AI are subjected to extensive quality control checks in breeding centers to minimize the chances of adverse pregnancy outcomes after semen is released into the field (Thundathil et al., 1999;Harstine et al., 2018).Semen assessments before cryopreservation, and post-thaw, to evaluate potential fertility include microscopy-based parameters, such as motility, viability and morphology, and other more advanced and objective methods, using computerassisted sperm analysis and flow cytometry (Harstine et al., 2018).
Nevertheless, despite improvements and modernization of semen evaluation methods, semen from bulls used in AI still exhibits significant variation in field fertility, even after passing stringent quality control checks (Fair and Lonergan, 2018).The challenge is exemplified in the study of Sellem et al. (2015) who correlated bull fertility (153 ejaculates from 19 Holstein bulls) with a combination of sperm functional parameters.At best, these authors could account for no more than 40% of the variation in sire fertility.Taking a similar approach of using flow cytometric and computer-assisted sperm analysis-based parameters, our group recently reported that 47% of the variation in AI bull field fertility could be explained (Bernecic et al., 2021).High-throughput (omics) technologies have emerged as promising tools to determine seminal plasma or sperm components associated with fertility at the molecular level (Klein et al., 2022).Previous studies have employed bioinformatic approaches for the analysis of proteomic (Viana et al., 2018;Kasimanickam et al., 2019), transcriptomic (Bissonnette et al., 2009;Feugang et al., 2010), or metabolomic (Menezes et al., 2019;Saraf et al., 2020) data from semen samples to identify proteins, mRNAs, or metabolites, respectively, as markers of bull fertility.However, a robust and repeatable set of such biomarkers is still lacking.The challenge of identifying potential biomarkers could benefit from the application of machine-learning approaches, which aim to find patterns in complex data sets such as the proteome (Li et al., 2022).The relevance of these methods can be demonstrated through validations with external, independent data.
We hypothesized that differences in fertility of AI sires can be predicted through the proteomic profile of their sperm.This study aimed to identify sperm proteins behaving as biomarkers of fertility in 2 different populations of dairy bulls whose semen had already passed quality control checks and was used in the field.One of the populations was employed to screen the proteome for selection of the potential biomarkers, while the other was used to validate the sperm proteins as biomarkers of fertility.

MATERIALS AND METHODS
No human or animal subjects were used, so this analysis did not require approval by an Institutional Animal Care and Use Committee or Institutional Review Board.

Animals
Frozen-thawed sperm from 3 ejaculates collected from Holstein-Friesian bulls located in AI centers in Ireland (n = 54) or Denmark (n = 18) classified as high fertility, medium, or low fertility (Ireland) or high-and low-fertility (Denmark) were pooled within bull and analyzed.
Irish Bulls.A panel of Holstein-Friesian bulls (n = 54) was selected from the national population from which cryopreserved semen was used commercially for AI in Ireland.Data on field fertility were obtained from the Irish Cattle Breeding Federation database based on an adjusted sire fertility index (FI; Berry et al., 2011).Sire fertility was defined as pregnancy to a given service identified retrospectively either from a calving event or where a repeat service (or a pregnancy scan) deemed the animal not to be pregnant.These raw data were then adjusted for factors including semen type (frozen, fresh), cow parity, month of service, day of the week when serviced, service number, cow genotype, herd, AI technician, and bull breed.The adjusted sire FI given for each bull was then weighted for the number of service records, resulting in an adjusted pregnancy rate.Holstein-Friesian bulls that had a minimum of 500 inseminations formed the base population (840 bulls), from which low fertility (LF; FI: −0.268 to −0.016, n = 23), medium fertility: (MF; FI: 0.023 to 0.029, n = 14), or high fertility (HF; FI: 0.058 to 0.072, n = 17) were selected.Some of these same bulls were used in previous studies by our group (Bernecic et al., 2021;Donnellan et al., 2021;O'Callaghan et al., 2021;Donnellan et al., 2022).
Danish Bulls.Danish bulls were selected from Holstein-Friesian population of 2,220 bulls used for AI in Denmark.Data on field fertility were obtained from Danish National Cattle Register and were expressed as 56-d nonreturn rates, adjusted for the parity of inseminated cow, month of service, herd, year, and season.Eighteen bulls with a minimum of 500 mating were selected for the study based on their 56-d nonreturn rates and semen availability.These bulls were classified as LF (FI: −0.07 to −0.049, n = 8) or HF (FI: 0.062 to 0.114, n = 10).
The mean FI of both populations of bulls was zero.Information about each bull can be found in Supplemental Table S1 (https: / / figshare .com/articles/ dataset/ 20367708; Rabaglino and Lonergan, 2022a).

Determination of the Sperm Proteome
Protein Extraction.Three straws of semen from each bull were thawed in a water-bath at 37°C for 30 s.The contents were centrifuged at 3,000 × g for 7 min at room temperature to obtain a sperm pellet.After removal of the supernatant, 4 washes were performed to eliminate the extender (1 in water and 3 in PBS, both supplemented with cOmplete Protease Inhibitor cock- tail; Roche).The resulting pellet was resuspended in 200 µL of extraction buffer (4% SDS, 0.2% deoxycholic acid, 50 mM Tris, 100 mM ammonium bicarbonate, 10 mM dithiothreitol, pH 8), boiled for 5 min and incubated for 2 h in an ultrasound bath.Complete lysis was checked by microscope.After centrifugation to remove cellular debris (12,000 × g for 15 min at 4°C), the supernatant was retained and proteins precipitated with acetone by mixing 1 vol of sample with 4 vol of cold acetone (−20°C) overnight.Precipitated proteins were centrifugated (12,000 × g for 15 min at 4°C), and the pellet dried for 5 min under nitrogen flux followed by resuspension in 0.4% SDS.Protein sample content was quantified using the BCA Protein Assay Kit (Thermo Fisher Scientific).
Nano-Liquid Chromatography-Mass Spectrometry Analysis.To carry out liquid chromatographymass spectrometry analysis, 50 µg of sperm protein extracts were concentrated using stacking acrylamide SDS-PAGE gels.The migration was performed for 20 min at 150-V constant.Then, proteins were stained using colloidal Coomassie blue staining R-250 and the stained band was excised.In-gel digestion was performed with an automated protein digestion system, a MassPrep Station (Waters).The gel plugs were washed twice with 50 µL of 25 mM ammonium hydrogen carbonate (NH 4 HCO 3 ) and 50 µL of acetonitrile.The cysteine residues were reduced by addition of 50 µL of 10 mM dithiothreitol at 57°C and alkylated by addition of 50 µL of 55 mM iodoacetamide.After dehydration with acetonitrile, the proteins were cleaved in-gel with a 12.5 ng/µL solution of modified porcine trypsin (Promega) in 25 mM NH 4 HCO 3 (∼30 µL).The digestion was performed overnight at room temperature.Tryptic peptides were extracted twice: the first time with 60% acetonitrile in 5% formic acid for 1 h, and the second time with a 100% acetonitrile solution until the gel pieces were dehydrated.The collected extracts were pooled to a final volume of 60 µL.Excess of acetonitrile was evaporated at 37°C before analysis.Peptide mixtures were analyzed by nano-liquid chromatography-mass spectrometry using an ultimate 3000 system coupled to a Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific).For each sample, peptide mixtures were automatically fractionated onto commercial C18 reverse phase column (75 µm × 150 mm, 2-µm particle, PapMap100 RSLC column, Thermo Fisher Scientific) at 35°C.Trapping was performed for 4 min at 5 µL/ min, with 98% H 2 O, 2% acetonitrile, and 0.1% formic acid.Elution was performed using 2 solvents, A (0.1% formic acid in water) and B (0.1% formic acid in acetonitrile) at a flow rate of 300 nL/min.Gradient separation was 2 min from 2 to 5%B, 12 min from 5 to 25% B, 2 min from 25 to 80% B, 3 min 80% B. The column was equilibrated for 8 min with 2% buffer B before the next sample analysis.Eluted peptides were electro-sprayed in positive-ion mode at 1.9 kV through a nanoelectrospray ion source heated at 275°C.Full MS scans were acquired in the Orbitrap mass analyzer over m/z 400-1,200 range with a resolution of 70,000 (m/z 200).The target value was 5.00E+5 and the maximum allowed ion accumulation times were 250 ms.Fifteen most intense peaks with charge state between 2 and 5 were fragmented in the HCD collision cell with 27 eV, and tandem mass spectra were acquired with a resolution of 17,500 (m/z 200).The target value was 5.00E+4 and the maximum allowed ion accumulation times were 150 ms.Dynamic exclusion was set to 7 s.
Data Processing, Protein Identification, and Abundance.Raw data from the MS/MS were processed and converted into *.mgf peak list format with Proteome Discover 1.4 (Thermo Fisher Scientific).Data were interpreted with Mascot v2.4 (Matrix Science) against the Bos taurus database (i.e., 32,284 sequences) fused with the sequences of recombinant trypsin and a list of classical contaminants (118 entries).The following parameters were considered for the search: precursor mass tolerance of 0.2 Da and fragment mass tolerance of 0.2 Da, a maximum of one miss cleavage sites of trypsin, carbamidomethylation (C), oxidation (M), propionamidation (C), and protein Nterminal acetylation set as variable modifications.For each sample, peptides were filtered out according to the cut-off set for proteins hits with 2 or more peptides larger than 9 residues, ion score >15, false discovery rate <1%.Protein identification was validated when at least 2 peptides originated from one protein showed statistically significant identity.

Determination of Differentially Abundant Proteins and Identification of Putative Biomarker Proteins in the Sperm of Bulls Located in Ireland.
All bioinformatic procedures were carried out using the R software (R Core Team, 2020).Data on protein abundance were log10 transformed, filtered to retain proteins present in more than 14 samples (smaller class), and normalized with the Combat method (Leek et al., 2021) to control for the potential effect of different diluents.Data from the 2,156 proteins retained after filtering (out of 2,742) were analyzed with the limma package (Smyth, 2005).Briefly, an empirical Bayes method was applied to calculate a moderated t statistic for differential abundance for each protein by performing a linear model fit on the data.Empirical Bayes moderated the standard errors of the estimated log fold changes to produce more stable estimates.A contrast was done between HF versus LF bulls, to determine those differentially abundant sperm proteins (DAP, P < 0.05) between bulls at the extremes of the FI, which were visualized by a volcano plot.A volcano plot is a type of scatterplot that shows statistical significance (P-value) for the statistical comparison between 2 groups on the y-axis versus the magnitude of change between the 2 groups on the x-axis.Each dot represents a protein; those on the right (in red) are proteins upregulated (i.e., more abundant in the HF bulls), whereas those on the left (in green) are downregulated (i.e., more abundant in the LF bulls).The remainder (in black) are not differentially abundant between HF versus LF bulls.However, a P > 0.05 and < 0.1 was considered a tendency.
Biomarker proteins were selected through a wrapper algorithm around random forest using the Boruta package (Kursa and Rudnicki, 2010).Briefly, the algorithm adds shadow features to the data to train a random forest classifier, evaluating the importance of each feature (protein) through a Mean Decrease Accuracy.Selected features are those with a higher score than the best of their shadow features.The algorithm was set to run a maximum of 100,000 interactions, to select important features with a P < 0.0001.Sample distribution and clustering, according to the abundance of the DAP and biomarker proteins, were evaluated through a principal component analysis (PCA) and a hierarchical clustering dendrogram, respectively, using internal packages of R. Briefly, a PCA is a dimension reduction technique that captures sample variability in several components (corresponding to the number of samples, 40 in this case), with component 1 being the one that explains most of the variability.The PCA plot is a type of scatterplot showing the result of the PCA, where samples are clustered based on their similarity.Hierarchical clustering is an algorithm that groups samples according to their similarities (in protein abundance levels for this study).A Spearman Rank Correlation was employed as the similarity metric and complete linkage as the clustering method.Plotting and coloring of the dendrogram were done with the weighted gene co-expression network analysis (WGCNA) package (Langfelder and Horvath, 2008).
The lists of DAP and biomarkers more or less abundant in the HF bulls compared with the LF bulls were interrogated for enriched functional terms (false discovery rate <0.05) with ShinyGO v0.75 (Ge et al., 2020).Enriched terms were visualized through lollipop plots using the ggplot2 package (Wickham, 2016) and networks downloaded from the ShinyGo software.
Validation of the Biomarker Proteins With an External Population.Data from the sperm of bulls located in Denmark were used to validate the selected biomarkers.The training data consisted of the abundance data of the biomarker proteins in sperm from the bulls located in Ireland.In contrast, the abundance of those proteins in the sperm of bulls located in Denmark was employed to test the model.The testing data were normalized with the training data through an add-on batch effect adjustment with the bapred package (Hornung et al., 2017).The classifier was a support vector machine with linear kernels, employing the leave-oneout cross-validation (LOOCV) method as the internal control, and applied with the kernlab package (Karatzoglou et al., 2004), through the caret package (Kuhn, 2008).

RESULTS
There were 301 DAP in the sperm from HF and LF bulls located in Ireland, of which 106 and 195 proteins were more abundant in the HF or LF bulls, respectively (Figure 1A and Supplemental Table S2; https: / / figshare .com/articles/ dataset/ 20367708; Rabaglino and Lonergan, 2022b), which separated samples from each group in the PCA plot (Figure 1B).This plot shows that samples belonging to the HF and LF bulls cluster apart in the first component, which captures the main source of variability among the samples, explaining 18.9% of overall variability when considering the 40 components.Samples from LF bulls are more spread in both the first and second component than samples from HF bulls, reflecting the largest standard deviation (SD) in the FI of LF bulls than of HF bulls (mean FI ± SD for: HF bulls: 0.065 ± 0.004; LF bulls: −0.064 ± 0.059).Those DAP which were more abundant in HF bulls enriched 15 biological terms mainly related to the axoneme and cilium assembly, and ribonucleotide binding (Figure 2A).However, the 195 DAP that were more abundant in the sperm of LF bulls enriched 221 terms involved mainly in metabolism, proteosome and cell-cell recognition, including binding of sperm to the zona pellucida (Figure 2B).
Application of the machine learning method selected 34 DAP after 27,061 iterations (Table 1).According to the abundance levels of these proteins, samples from HF and LF bulls clustered apart (Figure 3A).When samples from the MF bulls were considered, most of them tended to cluster with those from LF bulls, with only 2 clustering with the HF bulls, as can be appreciated in the hierarchical clustering (Figure 3B).Accordingly, the PCA plot showed, in the first component, that although samples from MF bulls clustered in between samples from HF and LF bulls, most of them were on the right side of the plot, together with samples from the LF bulls (Figure 3C).Functional analysis of the 19 biomarker proteins more abundant in HF Irish bulls revealed enrichment of ontological terms involved mainly in the axoneme and sperm motility (Figure 4).These terms were enriched because of the following proteins (coded by the following genes): E1BMD1 (CFAP43, cilia and flagella associated protein 43); E1BAJ3 (CCDC40, coiled-coil domain containing 40); and F1N5R7 (DNAH7, dynein axonemal heavy chain 7).The 15 biomarker proteins more abundant in LF Irish bulls enriched only 2 terms: hexose and monosaccharide metabolic process.
Construction of the machine learning model using the abundance levels of the 34 biomarker proteins identified in the samples from HF and LF bulls in the Irish population led to a prediction of the HF or LF bulls in the Danish validation population with 94.4% accuracy: all 8 LF samples were correctly classified as such, whereas 9 out of 10 HF samples were correctly classified as HF.The only HF sample misclassified corresponded to the bull with the lowest FI among the HF bulls (bull 64).This high accuracy could not be reached if the model was built using the abundance levels of all 301 DAP; although all the 8 LF samples were predicted as such, only 3 of the 10 HF samples were correctly classified (61% accuracy).Selection of the 301 DAP were based on the raw P-value and, therefore, it is possible that several of those proteins were considered DAP when in fact they were not differentially abundant (i.e., false positive).Thus, a further selection of important sperm proteins for fertility through machine learning tools improved pinpointing those that would behave as such in an external independent population.This fact was reflected when the hierarchical clustering was performed on all 27 HF and all 31 LF samples from bulls in both populations.Considering the abundance of the 301 DAP, 3 LF samples clustered with the HF samples but 10 HF samples clustered with the LF samples (Figure 5A).However, only 3 HF samples clustered with the LF samples when the abundances of the 34 biomarkers were used to construct the hierarchical clustering (Figure 5B).

DISCUSSION
Genomically assisted selection has revolutionized how the dairy industry identifies and selects bulls for use in AI.However, even though it has increased the reliability of breeding values and shortened the generation interval, the reliable prediction of the field fertility of individual bulls is still challenging.This was never more important, given the widespread use of first season sires internationally before data are available on their field fertility.Identification of bulls of compromised fertility before their semen is released into the field is critical to avoid the economic effect of low pregnancy rates, particularly in seasonal systems of production with short breeding seasons where compact calving patterns are essential (Shalloo et al., 2014).
A reliable in vitro test, or a combination of tests predictive of bull field fertility, would enable AI companies to identify and eliminate bulls with potential LF before their widespread use in the field, thus improving overall reproductive and productive efficiency.Most AI centers use pre-and post-thawing assessment of the sperm through traditional microscopy-based methods, more modern computational approaches such as computer-assisted sperm analysis, or assays using fluorescent staining and high sample throughput, such as flow cytometry (Harstine et al., 2018).Nevertheless, in terms of pregnancy rate after AI, fertility in the field can vary by up to 20 to 30%, despite the semen being deemed acceptable after stringent quality control measures (Donnellan et al., 2022).Thus, there is still a need to identify sperm biomarkers that can reliably and repeatedly predict bull fertility.The use of omics technologies combined with traditional bioinformatic approaches, such as regression analysis for hypothesis testing, could determine potentially relevant molecules from the vast data generated through these techniques.
Furthermore, the application of machine learning tools could help identify the most important molecules, as these methods are not limited to the scores from the statistical test.In this sense, a random forest classifier is a powerful wrapper method for feature selection, particularly when applied with the Boruta algorithm (Kumar and Shaikh, 2017).In this study, we have employed both statistical and machine learning methods to select sperm proteins characterizing bulls classified according to their fertility in the field.We used the protein data of the sperm from bulls classified in both extremes (HF and LF) to avoid confusion in the data that could be introduced by bulls classified in the middle (MF).Semen collected from these animals had satisfactorily passed all the quality control measures carried out before and after thawing.In other words, animals classified as LF were not infertile, but, in the overall population, semen from LF bulls achieved a lower pregnancy rate than those from HF bulls.Among the various molecules that can be measured through high-throughput techniques, protein presence or abundance have been related to male fertility in several species (Druart et al., 2019;Griffin et al., 2020;Leahy et al., 2020;Mills et al., 2020), and may represent a better source than transcripts for finding biomarkers, as sperm are transcriptionally silent.Several reports have employed proteomics in sperm, seminal plasma, or both, from bulls with divergent fertility (Peddinti et al., 2008;D'Amours et al., 2010;Soggiu et al., 2013;Somashekar et al., 2017;Kasimanickam et al., 2019).In a recently published review, Klein et al. (2022) identified the common, more abundant proteins across several studies, in HF or LF bulls.Among the 28 DAP they identified in sperm, 23 are present among the sperm proteins detected in the current study.Six of the 23 proteins were differentially abundant between HF or LF bulls (P < 0.1), and 2 are consistent with the results found by other authors: CCT5 (chaperonin containing TCP1 subunit 5; P = 0.005) and PEBP4 (phosphatidylethanolamine binding protein 4; P = 0.06), which were more abundant in LF or HF bulls, respectively.The abundance of CCT5 in sperm decreased with fertility in bulls from the Irish population, whereas it tended to decrease in bulls from the Danish population (Figure 6A), consistent with previous findings (D 'Amours et al., 2010;Kasimanickam et al., 2019).This protein is a chaperonin involved in protein folding, particularly of actin and tubulin (Sternlicht et al., 1993).Cytoplasmatic CCT in the spermatid should be discarded in residual bodies at spermiation, at least in rats (Souès et al., 2003).Therefore, the higher abundance of this protein in sperm samples could indicate an incomplete process during spermatogenesis, potentially comprising fertility.
Apart from CCT5 and PEBP4, the other 21 proteins in common with our study and that of Klein et al. (2022) were not differentially abundant or did not follow the same trend in our study, most likely due to the many factors involved in fertility and differences in animal and sperm handling across the experiments.Among these factors is the different diluents used during the cryopreservation of sperm; nonetheless, it is worth noting that we applied a bioinformatic approach to remove the potential effect of the diluent, as it has been shown that it can affect sperm parameters (Murphy et al., 2018).
Sperm proteins have great potential as predictive biomarkers of fertility.Nevertheless, although each new study provides increased clarity, the complex nature of male fertility, and the multifactorial nature of the causes of subfertility, means that the selection of a single marker of male fertility is unlikely.In contrast, the combined abundance levels of many proteins can lead to a reliable and repeatable indicator of fertility.In this study, biomarker proteins were selected in one of the bull populations (Ireland) and validated in the other (Denmark), classifying the sperm samples correctly from this latter population with 94.4% accuracy.Among the proteins selected as potential biomarkers, CFAP43, CCDC40, and DNAH7 may have a major role in fertility, as they are involved with axoneme assembly and cilium movement, including sperm motility and regulation of the beating frequency.Indeed, the abundance of these proteins increased with increasing fertility in the sperm from bulls in the Irish population.For bulls in the Danish population, the abundance of DNAH7 increased, whereas CCDC40 tended to increase in the sperm of HF bulls compared with LF bulls.The abundance of CFAP43 followed the same pattern, but it was not different (P = 0.1; Figure 6 B-D).Although these proteins have not previously been associated with fertility in cattle, several studies in other species have found an association between mutations in the coding genes and infertility.Two independent studies have reported that mutations in CFAP43 and CFAP44 are  Box plots showing the abundance levels for selected proteins in the sperm of bulls with high fertility (HF; red boxes), medium fertility (MF; green boxes), and low fertility (LF; blue boxes).The top or bottom plots correspond to data from bulls from an Irish or Danish population, respectively.CCT5: chaperonin containing TCP1 subunit 5; CFAP43: cilia and flagella associated protein 43; CCDC40: coiledcoil domain containing 40; and DNAH7: dynein axonemal heavy chain 7. The solid or dotted horizontal bars show differences (P < 0.05) or a tendency toward a difference (P > 0.05 and < 0.1), respectively, in protein abundance between groups after ANOVA.Upper and lower edges of boxes represent the third and first quartiles, respectively.The midlines show the median, or second quartile.The upper and lower whiskers extend from the third and first quartile to the maximum and minimum values, respectively.Dots represent extreme values, i.e., higher or lower than the third or first quartiles, respectively, plus 1.5 times the interquartile range (difference between the third and first quartile).related to male infertility in men and mice because of axonemal disorganization in the sperm tail, which can lead to 100% of immotile, morphologically abnormal sperm if the mutation is biallelic (Tang et al., 2017;Coutton et al., 2018).Mutations in CCDC39 and CCDC40 are involved in primary ciliary dyskinesia, characterized by abnormal ciliary motility because of axonemal disorganization (Blanchon et al., 2012;Liu et al., 2021).Finally, mutations in DNAH7, which is part of the ciliary or flagellar axonemal inner dynein arm-heavy chain (Neesen et al., 1997), are a cause of idiopathic asthenozoospermia in men (sperm with low motility) given the defects in the structure and function of the sperm flagellum (Wang et al., 2020;Wei et al., 2021).The sperm of bulls employed in this study and classified as LF passed the commercial quality control parameters, including progressive motility.This would suggest that the issues leading to LF were postinsemination events, for example, sperm transport in the female reproductive tract, ability to fertilize the oocyte, or to sustain the embryo.

CONCLUSIONS
The development of an accurate and robust bull fertility prediction model is challenging.Ultimately, the combination of state-of-the-art genetic, physiological, immunological, and molecular approaches, combined with in vitro sperm functional and bioinformatic analyses will likely lead to a better prediction of field fertility.In this study, we screened the sperm proteome in bulls with high-or low-field fertility to identify potential biomarker proteins.Application of bioinformatics and machine learning methods identified 34 sperm proteins, the abundance of which could predict fertility in an independent population of bulls with 94% accuracy.Some of these proteins, more abundant in HF bulls, were involved in sperm motility.Thus, this study introduces a set of proteins that could help to discriminate the semen from bulls with undesirable performance when used in AI.

Figure 1 .
Figure 1.Differentially abundant proteins (DAP) between the sperm of high fertility (HF) and low fertility (LF) bulls.(A) The volcano plot shows the DAP more abundant in HF bulls (106 red dots) or LF bulls (195 green dots).(B) Principal component (PC) analysis plot depicting distribution of samples from HF and LF bulls according to the DAP.FDR = false discovery rate; FC = fold change.

Figure 2 .
Figure2.Functional analysis of differentially abundant proteins (DAP) between the sperm of high fertility (HF) and low fertility (LF) bulls.The lollipop plots illustrate the main 10 ontological terms enriched with the DAP in (A) HF bulls or (B) LF bulls.N Genes refers to the number of genes involved in the ontological term, also represented by the size of the dot.The significance of the overrepresentation for each term is depicted by the length of the line and the color: the longer the line and the redder, the more significant the enrichment for that term.FDR = false discovery rate.

Figure 3 .
Figure 3. Distribution of samples according to the abundance levels of the 34 sperm proteins selected as biomarkers.Hierarchical clustering of the samples from (A) high fertility (HF) and low fertility (LF) bulls (red and blue colors, respectively) and (B) including the medium fertility (MF) bulls (green color).(C) Principal component (PC) analysis plot for samples from HF, MF, and LF bulls.
Figure 4. Functional analysis of 19 sperm proteins selected as biomarkers more abundant in samples from high fertility bulls when compared with low fertility bulls.The lollipop plot illustrates the main 10 ontological terms.N Genes refers to the number of genes involved in the ontological term, also represented by the size of the dot.The significance of the overrepresentation for each term is depicted by the length of the line and the color: the longer the line and the redder, the more significant the enrichment for that term.FDR = false discovery rate.

Figure 6 .
Figure 6.Box plots showing the abundance levels for selected proteins in the sperm of bulls with high fertility (HF; red boxes), medium fertility (MF; green boxes), and low fertility (LF; blue boxes).The top or bottom plots correspond to data from bulls from an Irish or Danish population, respectively.CCT5: chaperonin containing TCP1 subunit 5; CFAP43: cilia and flagella associated protein 43; CCDC40: coiledcoil domain containing 40; and DNAH7: dynein axonemal heavy chain 7. The solid or dotted horizontal bars show differences (P < 0.05) or a tendency toward a difference (P > 0.05 and < 0.1), respectively, in protein abundance between groups after ANOVA.Upper and lower edges of boxes represent the third and first quartiles, respectively.The midlines show the median, or second quartile.The upper and lower whiskers extend from the third and first quartile to the maximum and minimum values, respectively.Dots represent extreme values, i.e., higher or lower than the third or first quartiles, respectively, plus 1.5 times the interquartile range (difference between the third and first quartile).

Table 1 .
Rabaglino et al.:SPERM PROTEINS AS BOVINE FERTILITY BIOMARKERS Differentially abundant proteins between the sperm of high fertility and low fertility bulls selected as biomarker proteins according to a random forest algorithm