Underlying genetic architecture of resistance to mastitis in dairy cattle: A systematic review and gene prioritization analysis of genome-wide association studies

Mastitis, the most frequent disease in dairy cattle. Resistance to mastitis is a complex, polygenic trait controlled by several genes, each with small effects. Genome-wide association studies have been widely used to identify genomic variants associated with complex traits, including resistance to mastitis, to elucidate the underlying genetic architecture of the trait. However, no systematic review and gene prioritization analysis have been conducted to date on GWAS results for resistance to mastitis in dairy cattle. Hence, the ob-jective was to perform a systematic review and gene prioritization analysis of GWAS studies to identify potential functional candidate genes associated with resistance to mastitis-related traits in dairy cattle. Four electronic databases were searched from inception to December 2020, supplemented with multiple sources of gray literature, to identify eligible articles. Annotation for genes and quantitative trait loci (QTL), and QTL enrichment analysis were conducted using GALLO. Gene prioritization analysis was performed by a guilty-by-association approach using GUILDify and ToppGene. From 52 articles included within this systematic review, 30 articles were used for further functional analyses. Gene and QTL annotation resulted in 9,125 and 43,646 unique genes and QTL, respectively, from 39 studies. In general, overlapping of genes across studies was very low (mean ± SD = 0.02% ± 0.07%). Most annotated genes were associated with somatic cell count-related traits and the Holstein breed. Within all annotated genes, 74 genes were shared among Holstein, Jersey, and Ayrshire breeds. Approximately 7.5% of annotated QTL were related to QTL class “health.” Within


INTRODUCTION
Mastitis is the most common and expensive infectious disease in dairy cattle (Halasa et al., 2007). It is caused by IMI with pathogenic bacteria in the udder which, if not cleared, can manifest as subclinical mastitis (SCM) or clinical mastitis (CM). The prevalence of IMI ranges from 17 to 76% depending on lactation stage and parity (Oliver et al., 2003;Piepers et al., 2007;Narayana et al., 2021), whereas the prevalence of SCM in first-parity Canadian Holstein heifers in early lactation ranges from 13 to 24% (Narayana et al., 2018). Incidence rate of CM ranges from 23.7 cases per 100 cows per year in Canada (Levison et al., 2016), to 32.5 cases per 100 cows per year in the Netherlands (Santman-Berends et al., 2015), and 39.6 cases per 100 cows per year on large dairy farms in China (Gao et al., 2017). Mastitis can affect animal welfare and farm profitability through reduced milk yield, milk quality, and treatment costs (Halasa et al., 2007). Costs associated with mastitis have been estimated at Can$66,178/100 cows per year for a typical Canadian dairy farm (Aghamohammadi et al., 2018).
Effectiveness of traditional methods to control mastitis (e.g., antimicrobial therapy and disinfectants) is relatively low (Pyörälä, 2002). Therefore, selective genetic breeding for enhanced mastitis resistance is a potential strategy to control mastitis, together with better management practices. Genetic improvement is cumulative and results in permanent and cost-efficient changes. However, the success of genetic selection of a trait depends on its heritability and additive genetic variation. Because the heritability of resistance to mastitis is low (≤0.05; Koeck et al., 2012;Narayana et al., 2018Narayana et al., , 2021, genomics provides exceptional opportunities to increase the frequency of favorable alleles in the population, compared with conventional pedigree-based breeding. This was demonstrated by an increase in genetic gain of 0.46 (standard units) when comparing the past 5 years to the 5 years before introduction of genomic selection for mastitis resistance (CDN, 2019). In addition, identifying genomic variants associated with mastitis resistance aids in understanding its genetic architecture (Tiezzi et al., 2015).
Genome-wide association studies are advanced techniques commonly used to detect significant genetic markers (SNPs, haplotypes) associated with complex traits, resulting in identification of QTL and candidate genes. Resistance to mastitis is a complex and polygenic trait controlled by several genes, each with small effects (Rupp and Boichard, 2003;Tiezzi et al., 2015). Therefore, numerous studies have performed GWAS of CM (Tiezzi et al., 2015;Welderufael et al., 2018;Kurz et al., 2019) and SCC, or its log-transformation SCS, an indicator trait of SCM (Durán Aguilar et al., 2017;Oliveira et al., 2019) to identify genetic variants associated with resistance to mastitis and to understand its architecture. Several different as well as a few similar genetic variants associated with CM and SCC-related traits have been reported. However, these genetic variants depended on the phenotype, population, breed, genotyping, and GWAS approach used for the analysis. This has resulted in a long list of positional candidate genes associated with resistance to mastitis, although only a few genes have been validated (Leyva-Baca et al., 2007;Chen et al., 2015). In the absence of appropriate prioritization analysis, a long list of candidate genes causes pitfalls in confirming potential functional candidate genes. A review of reported SNPs associated with SCC, using 7 published GWAS studies (Chen et al., 2015), did not use a systematic approach to select relevant literature and there were no formal comparisons between reported genes. More recently, several other studies have conducted GWAS and reported novel and similar genomic variants associated with resistance to mastitis. Therefore, there is a need for a systematic review of GWAS on resistance to mastitis and prioritization analyses to pinpoint potential functional candidate genes to improve mastitis resistance through genetic selection.
To our knowledge, there are no formal systematic reviews nor gene prioritization analyses on this topic. Given this knowledge gap, our objectives were to: (1) conduct a systematic review of GWAS on resistance to mastitis; and (2) perform gene prioritization analysis of GWAS results of resistance to mastitis to identify potential functional candidate genes.

MATERIALS AND METHODS
No human or animal subjects were used, so this analysis did not require approval by an Institutional Animal Care and Use Committee or Institutional Review Board.

Systematic Review Search Strategy
This systematic review followed a pre-specified study protocol and the standards of preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement (Page et al., 2021). A combination of online databases and conference proceedings were searched to identify relevant literature. Online databases searched included CAB Abstracts (EBSCO), MEDLINE (OVID), Web of Science, and BIOSIS (Web of Science) from the database inception to December, 2020, with

Gene Prioritization Analysis
A "guilt by association" based gene prioritization analysis (Walker et al., 1999) was conducted using "trained" (genes obtained from GUILDify software; Guney et al., 2014;Aguirre-Plans et al., 2019) and "test" genes' list (positional candidate genes derived from annotation of chosen articles from the systematic review) using GUILDify and ToppGene. The top 100 ranked trained genes (after removing the 36 common genes with test gene list, GUILD score ≥0.7322) were obtained from GUILDify v2.0 Web Server. GUILDify uses a Biologic Interaction and Network Analysis (BI-ANA) knowledge database to retrieve genes associated with the outcome (mastitis or SCC-related traits), using given keywords related to the outcome. Keywords used in the present study were; "mastitis," "subclinical mastitis," "somatic cell count," "somatic cell score," "SCC," "SCS," "intramammary infection," "lactation," "milk," and "mammary gland." Subsequently, GUILDify uses selected genes and species-specific (Homo sapiens) protein interaction network and applies graph theory algorithms to prioritize genes.
Only genes annotated from studies with ≥400 samples were used for the prioritization analysis to minimize a potential sample size effect. To obtain the test gene list, first, for the positional candidate genes without gene name, human orthologs in Ensembl database 106 were used to derive gene names. If the percentage of id. query gene identical to target human was ≥70%, the human gene name was used for further analyses. Second, final sets of test and trained gene lists were simultaneously used in ToppGene (Chen et al., 2009), which performs functional annotation-based prioritization analysis using a fuzzy-based multivariate approach to estimate the relationship between any 2 genes, based on semantic annotations. Multivariate analyses were performed using functional information shared between test and trained gene sets derived from various sources including: Gene Ontology (GO) terms for molecular function (MF), biological process (BP), and cellular component (CC); human and mouse phenotypes; metabolic pathways; PubMed publications; co-expression pattern; and diseases. An overall P-value per functional candidate was estimated using a statistical meta-analysis where P-values for each functional annotation of each gene were combined into a final P-value. Finally, prioritized genes were selected after false discovery rate (FDR) multiple correction of 5% from significant genes.

Gene Ontology, Metabolic Pathway, and Gene Network Analyses
Gene ontology and metabolic pathway analyses for Bos taurus were performed using Over-Representation analysis in WebGestalt (Liao et al., 2019) for prioritized genes. Gene ontology analyses were performed for 3 GO term categories: BP, MF, and CC using a nonredundant function database. Additionally, metabolic pathway analyses were performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway within the WebGestalt database. Gene network analyses were conducted for the list of prioritized genes using STRING database (Szklarczyk et al., 2015) using co-expression as an interaction source with a medium confidence (0.400) interaction score. The resulting network was imported into Cytoscape 3.9.0 (Shannon et al., 2003), and the Cytoscape network analyzer plugin with undirected network criteria was used for network analyses. Key genes (nodes) were selected from proteins with degree ≥8, betweenness centrality ≥0.28, and closeness centrality ≥0.62.

Systematic Review
A summary of the search strategy is displayed in Figure 1. A total of 4,877 studies were identified from the databases (CAB Abstracts, MEDLINE, Web of Science and BIOSIS) and gray literature (115) search. After exclusion of duplicates (n = 2,828), 2,164 abstracts were screened separately by Saranya Narayana and Ellen de Jong from which 1,471 studies were excluded. A total of 693 studies were included in the full-text review, where 641 were excluded due to facts such as: not original study, duplication, analyzed population was not dairy cattle, outcome was not resistance to mastitis, did not use GWAS, or did not have full-text available. In total, 52 unique publications met our inclusion criteria. Descriptive details of these 52 articles, including country, breed and phenotype (Table 1), were extracted. Most of the studies were from Europe (34 studies) and North America (11 studies). Of the 52 manuscripts, 38 and 40 reported on studies using SCC-related traits as phenotype and Holstein breed, respectively, for GWAS analysis. We also extracted other details regarding genotyping (SNP chip, assembly), quality control summary (call rate, minor allele frequency and Hardy-Weinberg threshold; Table 2), and GWAS summary(e.g., approach used for GWAS, number of animals, and SNPs used for GWAS, and significant SNPs per windows identified; Table 3). Most of the studies used 50K SNP chip (31 studies) with 3 different versions. Of 52 manuscripts, 27 and 7 studies used assembly UMD 3.1 and UMD 3.1.1, respectively, whereas only 2 studies used the new bovine assembly ARS-UCD 1.2. Number of animals used for GWAS ranged from 34 to 293,467.
Moreover, significant markers or windows associated with CM and SCC-related traits and their coordinates were extracted for further analyses. Functional analyses, such as gene and QTL annotation, QTL enrichment analysis, GO terms enrichment, and KEGG pathway analyses, were conducted for 40 studies, with 37 and 7 studies reported markers and windows, respectively, and 4 studies reporting both. Twelve studies were excluded in further analysis due to not reporting the significant makers or windows or inability to convert reported coordinates to new bovine assembly, due to inadequate information.

Conversion of Genomic Coordinates to ARS-UCD1.2.
All markers and windows identified in the 40 studies are provided in Supplementary Material 2 (https: / / doi .org/ 10 .7910/ DVN/ HNKBJS; Narayana et al., 2022). During conversion of genomic coordinates to new bovine assembly (ARS-UCD1.2) 1, 7, 1, and 1 significant markers from Jiang et al. (2019), Oliveira et al. (2019b), Pegolo et al. (2020), andTribout et al. (2020), respectively were deleted. Likewise, 8, 1, and 9 windows from Durán Aguilar et al. (2017), Meredith et al. (2013), and Oliveira et al. (2019b), respectively, were deleted during the conversion. These markers and windows were deleted due to one of the following reasons: previously reported genomic coordinates did not align well with the new bovine assembly or partially aligned, but not enough to convert, or the region was split up into various parts. After conversion, an additional 8 markers were removed from the data set as they did not have chromosome number [7 from Cole et al. (2011)

Annotation of Markers and Windows.
A total of 19,403 positional candidate genes (10,624 genes from markers and 8,779 genes from windows) were annotated for markers and windows, resulting in a total of 9,125 unique positional candidate genes within 0.  Mulder et al. (2013). Moreover, genes GC, ENSBTAG00000049290, and NPFFR2 were reported by 9, 8, and 7 studies, whereas the remaining genes were reported by ≤6 studies.
In terms of traits associated with resistance to mastitis, 52 types of traits were derived from 39 studies. These traits were mainly grouped into CM and SCC-related traits. Many annotated genes (8,664; unique genes per trait) were related to SCC-related traits, whereas fewer genes (930; unique genes per trait) were associated with CM-related traits ( Figure 2a). However, 469 genes were shared between those 2 traits. Among 13 studies that conducted CM-related GWAS, in general, gene overlap among studies was low (mean ± SD = 0.02 ± 0.10; Figure 2b). However, among those studies, Sahana et al. (2013) and Fang et al. (2018) shared 39 genes. Similarly, gene overlap across studies which performed GWAS on SCC-related traits was in general also low Narayana et Narayana et al., 2022). Interestingly, Durán Aguilar et al. (2017) who used copy number variations of SCS to perform GWAS shared a high number of genes with Cole et al. (2011) and Oliveira et al. (2019a,b). In terms of breeds used for GWAS analysis of resistance to mastitis, there were 5,843, 2,311, and 1,915 genes annotated for Holstein, Jersey and Ayrshire, respectively (the 3 main breeds; Figure 3). Furthermore, 74 common genes were identified among the 3 main breeds in the present study. QTL annotation yielded 132,140 QTL (87,737 QTL from markers and 44,403 QTL from windows) which resulted in 43,646 unique QTL from 39 studies. A high number of QTL of 8,513, 6,450 and 2,780 were located in BTA 14, 6 and 20, respectively. A high proportion of 64% of annotated QTL were associated with milk QTL class, whereas only 7.5% was associated with health QTL class ( Figure 4). Moreover, within health QTL class, 2.6 and 2.2% of QTL were associated with CM and SCC-related traits (SCC_SCS; Figure 5). Interestingly, a low percentage of QTL was also related to susceptibility to other diseases such as ketosis, bovine tuberculosis, and bovine respiratory disease. QTL enrichment analysis demonstrated that a high number of enriched QTL was associated with SCS in BTA 5, 6, 16, and 20 ( Figure 6). Additionally, QTL enrichment analysis for CM resulted in fewer significant QTL on BTA 8, 10, 18, and 25.

Gene Prioritization Analysis
Thirty studies (Supplementary Material 3; https: / / doi .org/ 10 .7910/ DVN/ HNKBJS; Narayana et al., 2022) with ≥400 samples were used for the prioritization analysis, which resulted in 7,280 unique positional candidate genes (5,188 with unique gene symbol). A total of 1,689 genes from 7,280 unique positional candidate genes did not have a gene symbol. Therefore, the human gene symbol was retrieved for 197 unique genes from Ensembl if the query gene was ≥70% identical to the human gene. Thirty-six genes of the 197 genes were common with 5,188 genes. After removing the duplicate genes, a total of 5,349 test genes with gene symbol (5,188 + 197-36) Narayana et al., 2022) were used for the analyses, due to unavailability of some of the genes in the ToppGene software. Gene prioritization analy-  (Figure 7). The greatest number of prioritized genes were reported for Holsteins (243), and 139 and 102 genes were reported for Jersey and Ayrshire, respectively, whereas 5 genes (ARC, IL7, ITPR2, MAP2K1, and MC3R) were shared among all 3 breeds.

DISCUSSION
This systematic review identified 52 unique GWAS on resistance to mastitis, of which 30 (sample size ≥400) were used for further gene prioritization analysis. To our knowledge, this is the most comprehensive

337
review of this topic to date and provides valuable insight into the gene architecture of resistance to bovine mastitis. Heterogeneity was high among the 52 included manuscripts, in terms of population, breed, phenotype, number of animals used, genotyping, GWAS quality control, and GWAS approach. Most studies were from Europe (34 studies), with fewer from North America (11 studies). Holstein was the most common breed, as it is the most prevalent breed in dairy farming, particularly in North America and Europe. Many studies (38) used SCC-related traits for GWAS analysis, as SCC is recorded regularly, readily available, and costeffective compared with obtaining the CM phenotype. As identification of significant genomic variants in a GWAS differs depending on the traits, hence genes associated with mastitis resistance-related traits will also differ. Therefore, we did not impose any restriction on number of samples, GWAS approach, software and population or breed used for the GWAS during the annotation process to capture the breadth of published QTL and genes. Heterogeneity among studies was considered 1 reason for the low overlap of genes among studies. Because many (31 out of 40 studies) studies used SCC-related traits for GWAS analysis, a high number of annotated genes were related to SCC-related traits compared with CM-related ones. However, in this study, 469 genes were common to both SCC-related traits and CM, indicating that even though SCM and CM were controlled largely by separate genes, some genes regulated both SCM and CM. Annotation analysis of QTL resulted in a high percentage (64%) of QTL in the "milk" QTL class and low percentage (7.5%) in the "health" QTL class, perhaps due to publication bias. Milk-related traits are more researched and published, resulting in more QTL in the "milk" QTL class database. To account for this bias, QTL enrichment analysis was performed, where a hypergeometric test was conducted using the number of annotated QTL within the candidate regions and the total number of the same QTL in the QTL database using the GALLO (Fonseca et al., 2020). Moreover, within the "health" QTL class, annotated QTL were also associated with other health traits. This provided evidence for genetic correlation for susceptibility between mastitis and other diseases such as ketosis and bovine tuberculosis (Bermingham et al., 2010;Koeck et al., 2012) and pleiotropic effects of these regions over various health traits.
Gene prioritization analysis conducted in this study used several sources of functional information, as follows: GO terms for MF, BP, and CC; human and mouse phenotypes; metabolic pathways; PubMed publications; co-expression pattern; and diseases to prioritize genes, which promoted identifying interesting functional candidate genes. Many prioritized genes (197; 46%) were from the cytokine superfamily, such as chemokines, interleukins, transforming growth factors, and tumor necrosis factor genes. Inflammatory mediators such as chemokines and other cytokines produced by mammary epithelial cells, drive leucocytes from the blood stream to mammary gland to confer protection against invading foreign pathogens that cause mastitis (Rainard and Riollet, 2006).
Chemokines and their receptors have vital roles in development and homeostasis of the immune system, especially in maintaining innate and adaptive immunity (Gangur et al., 2002;Raman et al., 2011;Hughes and Nibbs, 2018). Chemokines belong to a family of chemoattractant cytokines, small structurally related proteins secreted by cells (Matsushima, 2000;Mollica Poeta et al., 2019). Eighteen chemokines and their receptors were identified among prioritized genes. The first ranked chemokine receptor CCR2 (C-C Motif Chemokine Receptor 2) gene located on BTA 22 was identified by Qanbari et al. (2014) for the Fleckvieh breed and was associated with SCC-related traits. Interestingly, Leyva-Baca et al. (2007) demonstrated that the CCR2 gene had a significant (at the comparison-wise level) allele substitution effect (−0.04 ± 0.02, P = 0.05) on SCS in Canadian Holstein bulls. However, in the present study, CCR2 was not annotated for the Holstein breed. Perhaps manuscripts included in this study that used Holstein breed for GWAS did not identify genomic coordinate (52945500 bp) that harbor the CCR2 gene. Some of the interesting BP nonredundant GO terms related to CCR2 were immune response related, regulation of response to external stimulus, cell activation and adhesion secretion, and locomotion (FDR <1e −15 ). Molecular process GO terms associated with CCR2 were cytokine receptor binding, G protein-coupled receptor binding, cytokine receptor activity, and cytokine binding (FDR ≤2.69e −7 ). Moreover, CCR2 was involved in  cytokine-cytokine receptor interaction, and chemokine signaling KEGG pathways (FDR ≤2.90e −14 ).
Other interesting chemokine and receptor genes were CCL2, CCR5, CXCL10, CCR6, CXCL9, CXCL12, CCR10, CCL3, CXCR5, CCRL2, CCL11, CCL4, CXCL11, CCL28, CXCL8, CCL24, and CCL26. Genes CCL2 and CXCL10, which are involved in initial recruitment of leucocytes, were more highly expressed in bovine mammary epithelial cells with mastitis induced by Escherichia coli than with mastitis induced by Staphylococcus aureus (Gilbert et al., 2013). Genes CCL2, CXCL10, and CCL3 were significantly upregulated after an intramammary infusion of lipopolysaccharide containing E. coli in a mouse model (Zheng et al., 2006). Chemokine receptor CCR5 is the receptor for leukocidins secreted by S. aureus such as γ-hemolysin (HlgAB) and leukotoxin ED (LukED) during bovine mastitis (Alonzo III et al., 2013;Vrieling et al., 2016). C-C motif chemokine receptor 6 (CCR6) only binds with CCL20 chemokine (Schutyser et al., 2003), expressed in udder and mammary epithelial cells during E. coli-induced mastitis (Günther et al., 2009;Günther et al., 2010), and the ligand-receptor pair is involved in a humoral immune response (Schukken et al., 2011). Recently, CXCL9 was reported to be a potential molecular marker of mastitis caused by infection with S. aureus in Chinese Holstein dairy cattle . Gene CXCL12 was highly differentially expressed between control quarters from cows infected with S. aureus than by E. coli . Expression of CCRL2 was reported in the mammary gland and mammary epithelial cells (Suzuki et al., 2015). Additionally, CCL4 and CXCL8 genes were identified in mammary tissue of cattle as hub genes with important roles in inflammation and immune response (Gorji et al., 2019).
Approximately 17 interleukin and receptor-related genes were among 427 prioritized genes. Protein encoded by interleukin 6 cytokine family signal trans- ducer (IL6ST) on BTA20, associated with SCC-related traits, was ranked 11th among prioritized genes. The protein encoded by IL6ST binds with proinflammatory cytokines such as IL6 which has an important role in immune response during mastitis. Concentrations of IL6 were higher in milk and blood of cows with SCM infected with coagulase-negative staphylococci (432.09 and 254.32 pg/mL) compared with healthy cows (164.47 and 13.02 pg/mL, respectively; Bochniarz et al., 2017). In the present study, IL6 was not annotated; however, only IL6ST was identified from Meredith et al. (2013) in Holsteins. Perhaps genomic coordinates that harbor IL6 were not reported in studies that included annotation analysis. Other interleukin genes (e.g., those coding for IL15, IL5, IL10, IL3, IL7, IL19, IL20, IL24) and its receptors (e.g., IL12RB1, IL23A, IL7R, IL10RB, IL31RA, IL10RA, IL17B, and IL21R) were some of the interesting prioritized genes. Interleukins 15 gene was differentially expressed in blood mononuclear cells and milk somatic cells of cows with chronic S. aureus mastitis versus healthy controls in Canadian Holsteins cows (Tao and Mallard, 2007). Interleukins 10 gene was highly expressed in Holstein and Gyr cows with CM compared with healthy cows (Fonseca et al., 2009). A similar result was also reported by Faaz and Abdullah (2022), where authors stated that the expression of IL10 was higher (P < 0.001) in cows with CM than in healthy animals. Interleukins 19 gene was upregulated in the breast milk somatic cells of women with mastitis as a response to probiotic (de Andrés et al., 2018).
Genes ARC, GNRHR, MC3R, and PRLR were identified in 4 studies, whereas other prioritized genes were Narayana et al.: GENETIC ARCHITECTURE OF MASTITIS RESISTANCE Figure 6. Top 25 traits identified in QTL enrichment analysis. Area of bubbles represents the number of observed QTL for that trait, and color of bubbles represents the P-value scale (darker color indicates smaller P-value). P < 10 −50 were truncated to 10 −50 . Richness factor on the x-axis represents the ratio of number of QTL and expected umber of that QTL.
identified in <4 studies. Gene ARC located on BTA 14 was identified by Oliveira et al. (2019b), Oliveira et al. (2019a), Tiezzi et al. (2015), and Wang et al. (2015). Additionally, ARC was ranked 402nd among prioritized genes and some BP GO terms related to ARC genes were positive regulation of transport, regulation of signaling receptor activity, ion transport, and transmembrane transport (FDR <1e −15 ). Interestingly, ARC was associated with both CM and SCC-related traits and identified in both Holstein, Jersey, and Ayrshire cows.
Prolactin receptor (PRLR; 142nd ranked prioritized gene) located on BTA20 is a protein coding gene that belongs to the type I cytokine receptor family; it was identified by Sahana et al. (2013), Jiang et al. (2019), Oliveira et al. (2019a), and Oliveira et al. (2019b). Gene PRLR was associated with both CM and SCC-related traits and identified in both Holsteins and Jerseys. Prolactin receptor (PRLR) transmits signal from prolactin to milk protein genes (Brym et al., 2005). Many studies reported associations of PRLR with milk performance  traits (Brym et al., 2005;Viitala et al., 2006;Zhang et al., 2008). Fontanesi et al. (2014) reported that gene PRLR was associated with milk, fat yield, protein yield, protein percentage, SCC, longevity index, fertility index and productivity, functionality, and type index, except for fat percentage. Authors also stated that favorable alleles for yield traits were unfavorable for SCC and longevity and fertility index. In the present study, the PRLR gene was involved in cytokine receptor activity MF GO terms and PI3K-Akt signaling pathway, cytokine-cytokine receptor interaction, neuroactive ligand-receptor interaction, and JAK-STAT signaling pathway KEGG pathway (FDR <1e −15 ). Gonadotropin releasing hormone receptor (GNRHR; 255th) on BTA6 was identified in Abdel-Shafy et al. (2014), Sahana et al. (2013), Cole et al. (2011) andJiang et al. (2019). Gene GNRHR was associated with both CM and SCC-related traits and was identified only in Holstein in the current study. Gene GNRHR was reported to be associated with SCC in 7 different population of Holsteins (Chen et al., 2015) and mastitis resistance in Canadian Holsteins (Grossi et al., 2014). Melanocortin 3 receptor (MC3R; 346th) gene on BTA13 was reported by Oliveira et al. (2019b), Cole et al. (2011), Fang et al. (2018 and Sahana et al. (2013) as being associated with SCC-related traits and CM in Ayrshire, Holsteins, and Jersey breeds, respectively. However, MC3R gene has not been explored widely related to mastitis.
Most prioritized genes (397) were associated with SCC-related traits, as many studies used SCC-related traits for their GWAS. In that regard, obtaining the SCC-related phenotype is easier and cheaper compared with obtaining CM phenotype. In the present study, 24 genes (ABCC9, ACHE, ADCYAP1, ARC, BCL2L1, CDKN1A, EPO, GABBR2, GDNF, GNRHR, IKBKE, Narayana et al.: GENETIC ARCHITECTURE OF MASTITIS RESISTANCE Figure 8. Co-expression gene network analysis produced using STRING (Szklarczyk et al., 2015) and Cytoscape (Shannon et al., 2003), consisting of 48 genes and 92 edges. JAG1, KCNJ8,KCNQ1,LIFR,MC3R,MYOZ3,NFKB1,OSMR,PPP3CA,PRLR,SHARPIN,and TNFRSF25) were identified as associated with both CM and SCC-related traits. From these genes, KCNQ1 was among the top 50 prioritized genes (38th). This gene is located on BTA29 and was identified by Cole et al. (2011) and Marete et al. (2018b) in the Holstein breed as associated with CM and SCCrelated traits. This gene is a protein coding gene and encodes a voltage-gated potassium channel required for repolarization phase of the cardiac action potential (Stelzer et al., 2016). In human mammary epithelium, KCNQ1 gene plays an important role in regulating cell volume and potentially mediating transepithelial K + secretion (vanTol et al., 2007). Moreover, most of the top 50 prioritized genes were from Oliveira et al., (2019a,b). These authors used the same population of 3 main breeds (Holsteins, Jersey, and Ayrshire) for both analyses, and as a result shared more genes.
Most annotated genes (243) were associated in Holsteins, the most prevalent dairy breed. In the present study, genes ARC, IL7, ITPR2, MAP2K1, and MC3R were common for Holsteins, Jerseys, and Ayrshires. Gene IL7 is located on BTA14 and ranked 186th among prioritized genes. Interleukin 7 is a cytokine protein coding gene, and its protein is important for development of both B and T cells (Stelzer et al., 2016). Similar to the current study, IL7 gene was identified to be associated with SCC in 7 different Holstein population (Chen et al., 2015). In the present study, IL7 gene was reported to be involved in pathway in cancer, PI3K-Akt signaling pathway, cytokine-cytokine receptor interactions, and JAK-STAT signaling pathway (FDR <1e −15 ) KEGG pathways and positive regulation of intracellular signal transduction, immune system development, positive regulation of immune system process, response to cytokine, and regulation of cell activation, adhesion, and differentiation BP GO terms (FDR <1e −15 ). Gene ITPR2 on BTA5 (342nd) encodes a protein that mediates a rise in cytoplasmic calcium in response to receptor activated production of inositol triphosphate (Stelzer et al., 2016). Gene ITPR2 was reported to be associated with SCM in Norwegian Red cattle (Kirsanova et al., 2020). Gene MAP2K1 on BTA10 encodes kinase protein, which is involved in many cellular processes, such as proliferation, differentiation, transcription regulation, and development (Stelzer et al., 2016). Gene MAP2K1 was identified in dairy cows with mastitis caused by E. coli  and Staph. aureus ). In the current study, gene MAP2K1 was involved in pathways in cancer, PI3K-Akt signaling pathway, MAPK signaling pathway, Kaposi sarcoma-associated herpesvirus infection and proteoglycans in cancer KEGG pathways (FDR <1e −15 ). However, LD phase across breeds needs to be considered when utilizing these genes for genomic predictions across breeds .
Gene network analysis is important in understanding the shared biological processes among prioritized candidate genes. Gene network analysis based on coexpression as interaction source provides insights into genes that are expressed in similar magnitude for similar stimulus and are controlled by the same pathway or transcriptional regulatory program and are functionally related (Weirauch, 2011). Of 427 prioritized genes in this study, 48 were identified as co-expressed during mastitis and were primarily involved in various immune-related pathways: pathways in cancer, cytokine-cytokine receptor interaction, PI3K-Akt signaling pathway, MAPK signaling KEGG pathways primary immunodeficiency, chemokine signaling, T-cell receptor signaling, toll-like receptor signaling, and Th17 cell differentiation (FDR ≤1.74e −5 ). Six key genes (CSF1R, CD3E, CXCL10, STAT1, CD3D, and IRF1) identified in co-expression network analysis were more informative, as they had high interactions with other co-expressed genes. Colony stimulating factor 1 receptor (CSF1R; 153rd; degree = 12, betweenness centrality = 0.93, closeness centrality = 1.00) is a cytokine receptor that encodes protein that serves as receptor for colony stimulating factor 1 (CSF1), with important roles in production, differentiation, and function of macrophages (Stelzer et al., 2016;Sayers et al., 2021). Thus, the CSF1R gene has an important part in the process of innate immunity and inflammation. Gene CSF1R has been associated with mastitis in cows, sheep, and swine (Sajjanar et al., 2019;Hu et al., 2020;Wathes et al., 2021). The protein encoded by CD3E (53rd) plays an essential role in Tcell development and fault in this gene can cause immunodeficiency (Stelzer et al., 2016). Gene CD3E has also been identified as one of the most interconnected hub gene related to bovine mastitis (Bakhtiarizadeh et al., 2020). Signal Transducer and Activator of Transcription 1 (STAT1; 16th) is a member of STAT protein family. The protein encoded by STAT1 gene has an important part in immune responses to viral, fungal, and mycobacterial pathogens (Stelzer et al., 2016). The role of STAT1 gene in mastitis resistance and milk production had been reported in several studies (Cobanoglu et al., 2006;Bakhtiarizadeh et al., 2020;Ghahramani et al., 2021). Protein encoded by CD3D gene (163rd) is part of T-cell receptor and defects in this gene will lead to severe immunodeficiency issues (Stelzer et al., 2016). Gene CD3D has been reported by several studies as an important gene involved in immune response and mastitis resistance in Holstein cows (Bakhtiarizadeh et al., 2020) and sheep (Bonnefont et al., 2011). Gene IRF1 (332nd) produces a protein which serves as an activator of genes involved in both innate and acquired immune responses (Stelzer et al., 2016). The IRFI gene has been reported as a potential marker for the detection of mastitis in cows (Huma et al., 2020).

Limitations
The main limitation of this study was that our systematic review was confined to GWAS, a common high-throughput technology to identify genetic variants associated with traits of interest. Studies that had used other technologies including molecular technology, validation of makers associated with resistance to mastitis, and other omics technologies (e.g., high-throughput transcriptome profiling technologies that uses microarray and RNA sequencing), were not included. Hence, the present study may not have included genomic coordinates of all significant markers or windows associated with resistance to mastitis. Therefore, all genes associated with resistance to mastitis may not have been annotated and used in further analyses. Additionally, during conversion of genome assembly to new bovine assembly (ARS-UCD1.2) some studies were lost due to an inability to convert. Another limitation is that 1,689 genes did not have a gene name required to be included in prioritization analyses, so their names had to be retrieved from Ensembl using human orthologs. During this process, names of only 374 genes (197 unique gene symbols) were retrieved, with loss of 1,315 genes. Hence, some genes without a name may not be included in the prioritization analysis.
The GUILDify software, which was used to obtain the trained gene list used for the prioritization analysis, uses 3 algorithms (NetScore, NetZcore, and NetShort) available to prioritize genes potentially involved in diseases using a priori disease-gene associations and protein-protein interactions. This might introduce some level of bias of trained gene list related to diseases in the initial stage of prioritization analysis. However, as the ToppGene software ranks the positional candidate genes (test genes) based on the functional similarity between each gene from the list of positional candidate genes and functional profile of the trained list, effect of above-mentioned bias on the prioritization analysis should not be significant. As the software GUILDify does not have a database for cattle, human (Homo sapiens) protein interaction network was used to prioritize the genes. However, the use of the human database instead of cattle database is not expected to have substantial effect on the results as cattle share 80% of human orthologs (Weitzman, 2000;Elsik et al., 2009). Moreover, the conservation between protein structures is expected to be higher than to gene sequence. Consequently, the interactions between proteins tend to be more conserved (Kwon et al., 2018;Pérez-Bercoff et al., 2013).

Implications
Many studies have conducted GWAS to identify genetic variants associated with resistance to mastitisrelated traits and they had reported numerous genetic variants. These genetic variants differed based on population, breed, traits, genotyping, and approach used for GWAS. In the present study, heterogeneity among these articles is highlighted in Tables 1-3. This heterogeneity among articles emphasized the importance of and need for a systematic review and reinforced the need to confirm previously identified genetic variants associated with resistance to mastitis.
This systematic review identified studies that performed GWAS on resistance to mastitis-related traits and summarized them into one comprehensive document, extracted significant variants reported on those studies, and annotated to genes and QTL. Moreover, in the present study gene prioritization analyses and other functional analyses were performed to identify interesting functional candidate genes. This study also mapped previously reported genomic regions and will inform evidence-based makers for genetic selection for mastitis resistance. However, these candidate genes and their markers should be validated before being used in the SNP chip panels for genetic selection for resistance to mastitis. This study opens the possibility to further evaluate in depth the key functional candidate regions identified. For example, it is possible to perform target sequencing of those regions or use whole genome sequences to identify causal mutations and evaluate the potential effect using tools such as Variant Effect Predictor (McLaren et al., 2016).
Moreover, this study greatly enhances the current knowledge of the underlying genetic architecture of resistance to mastitis, thereby facilitating development of better genomic selection methods to improve mastitis resistance in dairy cattle.

CONCLUSIONS
In this first systematic review of GWAS for mastitis resistance-related traits in dairy cattle, we identified 52 relevant manuscripts, of which 30 qualified for gene prioritization analysis. There was a high heterogeneity in populations, phenotypes, genotypes, and methods used for GWAS, resulting in the annotation of 9,125 and 43,646 unique genes and QTL, respectively. The high heterogeneity among GWAS led to lower overlap of genes across studies. Subsequent gene prioritization analysis resulted in 427 prioritized genes, among which the CCR2 gene ranked first. Most prioritized genes were from the cytokine family and were related to the immune response. Future studies are needed to validate and identify SNPs in these key candidate genes, which may enhance genomic prediction. Moreover, this study confirmed and fine-mapped previously identified genomic regions and candidate genes associated with mastitis resistance, providing additional insights into the genetic architecture of mastitis resistance in dairy cattle. Consequently, the implementation of results from this study is expected to reduce mastitis incidence and prevalence in dairy cattle through genetic selection in the medium and long-term.