Genome-wide association analysis for susceptibility to infection by Mycobacterium avium ssp. paratuberculosis in US Holsteins

Paratuberculosis, or Johne’s disease, is a chronic, granulomatous, gastrointestinal tract disease of cattle and other ruminants caused by the bacterium Mycobacterium avium subspecies paratuberculosis (MAP). Control of Johne’s disease is based on programs of testing and culling animals positive for infection with MAP and concurrently modifying management to reduce the likelihood of infection. The current study was motivated by the hypothesis that genetic variation in host susceptibility to MAP infection can be dissected and quantifiable associations with genetic markers identified. Two separate GWAS analyses were conducted, the first using 897 genotyped Holstein artificial insemination sires with phenotypes derived from incidence of MAP infection among daughters based on milk ELISA testing records. The second GWAS analysis was a case-control design using US Holstein cows phenotyped for MAP infection by serum ELISA or fecal culture tests. Cases included cows positive for either serum ELISA, fecal culture, or both. Controls consisted of animals negative for all tests conducted. A total of 376 samples (70 cases and 306 controls) from a University of Minnesota Johne’s management demonstration project and 184 samples (76 cases and 108 controls) from a Michigan State University study were used. Medium-density (sires) and high-density (cows) genotype data were imputed to full genome sequence for the analyses. Marker-trait associations were analyzed using the single-step (ss)GWAS procedure implemented in the BLUPF90 suite of programs. Evidence of significant genomic contributions for susceptibility to MAP infection were observed on multiple chromosomes. Results were combined across studies in a meta-analysis, and increased support for genomic regions on BTA7 and BTA21 were observed. Gene set enrichment analysis suggested pathways for antigen processing and presentation, antimicrobial peptides and natural killer cell


INTRODUCTION
Paratuberculosis, or Johne's disease, is a chronic, granulomatous, gastrointestinal tract disease of cattle and other ruminants caused by the bacterium Mycobacterium avium ssp.paratuberculosis (MAP).The clinical signs of disease in cattle are pipestream diarrhea, weight loss, and edema due to hypoproteinemia caused by protein-losing enteropathy (Sweeney et al., 2012).Calves less than 6 mo of age are generally considered to be at the greatest risk of becoming infected with MAP (Lombard, 2011), but clinical signs of infection usually do not appear until second or third lactation (Jubb and Galvin, 2000).Even if not showing clinical signs of disease, MAP test-positive cows produce less milk and are culled earlier in their productive lives (Lombard, 2011).
The disease occurs worldwide in dairy cattle and other ruminants.Control programs for paratuberculosis have been established in some nations including Australia (Kennedy and Allworth, 2000), Norway (Tharaldsen et al., 2003), Iceland (Gunnarsson et al., 2003), Japan (Momotani, 2012), the Netherlands (Groenendaal et al., 2003), and the United States (Carter, 2011).Recent estimates suggest that 68% of US dairy herds (National Animal Health Monitoring System, 2008) and 7.9% of US beef herds have infected animals (Dargatz et al., 2001).After accounting for assay sensitivity, true prevalence at a herd level has been estimated to exceed 90% (Lombard et al., 2013).The economic impact of paratuberculosis on the US dairy industry has been estimated to be from $200 million to $1.5 billion annually (Ott et al., 1999;Harris and Barletta, 2001).An additional concern is the potential zoonotic role of MAP in Crohn's disease in humans, which, at the current time, remains uncertain (Chiodini et al., 2012;Timms et al., 2016;Kuenstner et al., 2017).
There is currently no cure for Johne's disease, and vaccination is problematic.A reduction in incidence of infection has been shown in herds of dairy cattle after management changes to reduce exposure of cattle as well as testing to inform management decisions (Espejo et al., 2012).One modeling study showed that testing and culling is an important aspect (Smith et al., 2017).Knowledge concerning genetics of susceptibility to MAP infection can contribute to disease control programs by facilitating genetic selection for a less susceptible population to reduce incidence of infection in the future.The opportunity for genetic improvement in susceptibility to infection is evidenced by estimates of heritability of MAP infection in dairy cattle ranging from 0.03 to 0.28 (Koets et al., 2000;Mortensen et al., 2004;Gonda et al., 2006;Hinger et al., 2008;van Hulzen et al., 2011;Küpper et al., 2012;Zare et al., 2014;Kirkpatrick and Lett, 2018).
Genetic factors affect susceptibility to common diseases, and GWAS provide a tool for identifying these genetic factors (Brachi et al., 2011;Chapman and Hill, 2012;Finlay et al., 2012).The objectives of this study were to identify genomic regions associated with susceptibility to MAP infection in Holstein cattle and develop tools for genomic selection for lesser susceptibility to MAP infection.

MATERIALS AND METHODS
Three different groups of animals were used in the following studies.The first consisted of the US commercial dairy cattle population from 2010 to 2016 based on records for MAP infection recorded in databases of farm records maintained by the Dairy Records Management System and AgSource Cooperative.These data consist of milk ELISA test results for US Holstein cows and have been described in more detail previously (Kirkpatrick and Lett, 2018).After eliminating herds with no positive tests, fewer than 100 test records, and sires with fewer than 50 daughters, records remained for 202,367 cows.A total of 897 sires met these criteria and also had genotype data available from the Council on Dairy Cattle Breeding.The other 2 groups of animals were resource populations developed by researchers at the University of Minnesota and Michigan State University by testing cows from commercial dairy herds in their respective states.All cows tested were of the Holstein breed.The University of Minnesota project demonstrated management strategies aimed at control-ling Johne's disease and enrolled 8 herds ranging in size from 181 to 341 cows.Duration of participation ranged from 4 to 10 yr between 2000 and 2009 (Espejo et al., 2012).Animals in the Minnesota herds were tested by both ELISA and fecal culture or fecal PCR for evidence of MAP infection at least annually as described previously (Ferrouillet et al., 2009;Espejo et al., 2012).Briefly, serum samples were collected for ELISA testing with an Idexx test kit (Idexx Laboratories, Inc.) with a cutoff for positive samples of a sample-to-positive ratio ≥0.25, per the manufacturer's instructions.Fecal samples were tested by bacterial culture as described previously (Wells et al., 2002).Fecal and serum samples were collected at the same time.In the latter years of the project, DNA samples were obtained from project animals and are used herein.The Michigan State University project characterized immune phenotypes in cows testing positive or negative for MAP infection based upon milk ELISA in all cases, and with serum ELISA and fecal PCR testing as a follow-up.Testing used the same Idexx test kit with testing done at a commercial laboratory (CentralStar Cooperative) or the Michigan State Veterinary Diagnostic Laboratory.Fecal testing used a quantitative PCR test (mParaTeq) by the commercial laboratory (CentralStar Cooperative).A total of 368 samples (69 cases and 299 controls) from the Minnesota project and 183 samples (75 cases and 108 controls) from the Michigan State University project were used with case defined as an animal being positive to any (ELISA or fecal) MAP test, whereas controls were negative to all tests.
The DNA was extracted from buffy coats of whole blood samples obtained from Minnesota and Michigan State project animals using a proteolytic digestion, phenol-chloroform extraction protocol (Cruickshank et al., 2004).The DNA quality was assessed by spectrophotometry and gel electrophoresis, and quantification was by spectrophotometry and pico-green assay.
Sire genotypes were obtained from the Council on Dairy Cattle Breeding as medium-density 60K genotype data.For the Minnesota and Michigan cows, DNA samples were genotyped using both the bovine HD (Illumina) and GGP F-250 (Neogen Genomics) beadchips with 777,962 and 221,115 SNPs, respectively (cow genotype data are accessible online at https: / / figshare .com/articles/ dataset/ Cow _genotypes/ 19285472, Kirkpatrick, 2022a).After removing SNPs for redundancy between chips, low call rate (<0.95), low minor allele frequency (<0.02), and failed Hardy-Weinberg Equilibrium testing (P < 0.00001), a total of 685,003 SNPs remained.Sire genotypes were imputed from medium to high density using the cow genotype data as a reference.Imputation of sire genotypes to high density was performed using Beagle 5.1 (Browning et al., 2018) and Kirkpatrick et al.: GWAS ANALYSIS ON MYCOBACTERIUM AVIUM SUSCEPTIBILITY IN HOLSTEINS the computational resources of the UW-Madison's Center for High Throughput Computing.The SNP data were pruned to remove SNPs that were proxies for one another using the indep-pairwise command in PLINK 1.9 (https: / / www .cog-genomics .org/plink/ ; Chang et al., 2015) with an r 2 linkage disequilibrium threshold of 0.98.After pruning, 343,705 SNPs remained in the data set.
Both sire and cow genotypes were imputed from high density to full genome sequence using the variant information from 700 Holstein-Friesian animals used in run 7 of the 1000 Bull genome projects as reference (Hayes and Daetwyler, 2019).Imputation was performed chromosome by chromosome using FImpute version 3 (Sargolzaei et al., 2014) and the computational resources of the UW-Madison's Center for High Throughput Computing.
The SNP association analysis under an additive model was performed using the single-step (ss)GWAS method (Wang et al., 2012) as implemented in the BLUPF90 suite of programs (Misztal et al., 2018).Data from the Minnesota and Michigan projects were combined and analyzed as one set of data.Phenotypes were dichotomous and coded as 0 for unaffected (no positive test results) and 1 for affected (1 or more positive test results).Sire data were analyzed separately from the Minnesota and Michigan cow data with phenotypes being daughter averages for incidence of infection as described above.Data were analyzed both for individual effects of SNP loci and also for contributions of moving 0.5-Mb windows to phenotypic variance.Considerable variation existed in number of daughters per sire (from 50-2,205); consequently, sire phenotypes (daughter averages) were weighted to account for differences between sires in number of daughter records (Garrick et al., 2009).For individual SNP effects, a meta-analysis was performed combining results across the 2 analyses (sires, Minnesota, and Michigan cows) by calculating combined P-values for SNPs with effect estimates sharing a common sign.Combining P-values was conducted using the weighted Z-test, sometimes called "Stouffer's method," which has more power and precision than does Fisher's test (Whitlock, 2005).The threshold applied for assessment of significance was P < 5 × 10 −8 (Pe'er et al., 2008;Fadista et al., 2016), and an arbitrary threshold of P < 5 × 10 −5 was applied for results suggestive of association.In addition, a false discovery rate approach was used to identify the most significant SNPs using a threshold of 0.05, meaning a 95% or greater probability that the identified association was not a false positive (Benjamini and Hochberg, 1995).False discovery rate was calculated using the qvalue package in R (Storey JD, 2020).
Gene set enrichment analysis was conducted to identify gene sets and pathways implicated from the SNP association data.The SNPs were assigned to genes and gene associations calculated using Multimarker Analysis of GenoMic Annotation (MAGMA; de Leeuw et al., 2015).The SNP assignment to genes was based on SNP location ± 25 kb from the start and end of a gene's transcribed region.Gene sets most likely associated with susceptibility to MAP infection were identified using the SetRank package for R with the gene-based P-values from MAGMA output used as input.SetRank accounts for overlap in gene set membership and performs a correction for multiple testing in identifying gene sets with greatest likelihood of association (Simillion et al., 2017).SetRank considers multiple annotation databases simultaneously, removing overlapping gene sets and thus reducing the occurrence of false positives.Annotation databases included in the analysis were Kyoto Encyclopedia of Genes and Genomes (KEGG; http: / / www .genome.jp/kegg/ ; Kanehisa et al., 2016), Gene Ontology -biological processes, cellular components, molecular function (http: / / geneontology .org/; Ashburner et al., 2000;Gene Ontology, 2021), BioCyc (https: / / biocyc .org/; Karp et al., 2019), REACTOME (https: / / reactome .org/; Jassal et al., 2020), and WikiPathways (http: / / www .wikipathways.org/; Martens et al., 2021).
Efficacy of genomic prediction was examined using the sire data set and a 5-fold cross-validation.Both the original 60K genotype data and the imputed highdensity genotype data were used in separate analyses.Sires were randomly assigned to 5 groups of approximately equal size with data from 4 of the 5 groups combined and used as training data and the remaining fifth used for testing in all 5 possible combinations.The BLUPF90 suite of programs (Misztal et al., 2018) was used to generate SNP effect estimates from the training data sires and genomic breeding value estimates (gEBV) for testing data sires.Accuracy of genomic prediction was assessed by correlation between gEBV and daughter averages of testing sires.

RESULTS
Inspection of QQ plots (Supplemental Figure S1, https: / / figshare .com/articles/ figure/ QQ _plots _for _paratuberculosis _GWAS _studies/ 19100027, Kirkpatrick, 2022b) from the initial ssGWAS analysis of the sire data with SNPs equally weighted and 3 rounds of reweighting indicated that P-values in the initial ssGWAS analysis were inflated above expected values, whereas P-values corresponded better with expectation after 2 rounds of reweighting.Additional rounds of reweighting resulted in essentially no further change.Thus, results following 2 rounds of SNP reweighting are reported.Analysis of the sire data identified SNPs exceeding a false discovery rate of q < 0.01 on BTA4,7,9,10,12,15,17,18,23,and 28 (Figure 1, Table 1).Considering contribution to variance of SNPs in 0.5-Mb windows, BTA3, 7, 9, 13, 23, and 25 contained windows that accounted for greater than 0.38% of variance (Fig- ure 1, Table 2), a threshold representing 20 times the average variance explained for a 0.5-Mb window under an infinitesimal model of genetic variance.Window variance indicated contributions from 4 separate, but nearby regions on BTA7 (Figure 2).BTA23 was the chromosome making the next greatest contribution to variation explained with 2 separate, but nearby windows (Figure 3).There was relatively little correspondence between location of most significant SNPs and 0.5-Mb windows accounting for greatest variance.
Inspection of QQ plots (Supplemental Figure S2, https: / / figshare .com/articles/ figure/ QQ _plots _for _paratuberculosis _GWAS _studies/ 19100027, Kirkpatrick, 2022b) from the initial ssGWAS analysis of the Minnesota and Michigan cow data with SNPs equally weighted and 1 round of reweighting indicated that reweighting deflated P-values relative to expected values; thus, results from the unweighted analysis are reported.The ssGWAS analysis identified SNPs exceeding a suggestive level of significance (P < 5 × 10 −5 ) on 27 of 30 chromosomes (Figure 4), but false discovery rates in all cases exceeded 0.2, suggesting a strong likelihood of false-positive associations.Just one association, at 337,626 bp on BTA21, was significant at P < 5 × 10 −8 .Two 0.5-Mb windows contained SNPs, accounting for more than 0.38% of variance.One window was on BTA21 in the 19.6 to 20.1 Mb region, and the second was on BTA6 in the 79.7 to 80.2 Mb region.
Meta-analysis of the sire and cow data, by combining P-values, identified SNPs in 2 genomic locations that were supported across data sets with false discovery rates of q < 0.05 (Table 3, Figure 5).One of the 2 was at 12.6 Mb on BTA7, near windows contributing greatly to variance explained in the sire analysis.The other SNP was located on BTA21 near the beginning of the chromosome, and was the most significant SNP in the cow analysis.
Gene set enrichment analysis of sire and cow SNP associations generated lists of over 40 gene sets for each (Supplemental Tables S1 and S2, https: / / figshare .com/articles/ figure/ Supplementary _Tables _1 _and _2/ 19280105, Kirkpatrick, 2022c) that were either significant (P < 0.05) for the SetRank value, which accounted for the interaction between sets, or for the contribution of genes within the set alone at a corrected or adjusted P < 0.05.Three gene sets were in common across these separate analyses, including bta05320 (KEGG: autoimmune thyroid disease), bta00590 (KEGG: arachidonic acid metabolism), and R-BTA-381753 (REACTOME: olfactory signaling pathway).These 3 were among the 12 most strongly associated gene sets (SetRank P < 0.05 or corrected or adjusted P-values < 0.001) from a meta-analysis of sire and cow SNP associations (Table 4).Several of the gene sets identified by this analysis had clear relevance to susceptibility to MAP infection and Johne's disease, including KEGG bta04612 (antigen processing and presentation), REACTOME R-BTA-6803157 (antimicrobial peptides), and KEGG bta04650 (natural killer cell-mediated cytotoxicity).Correlation of testing-set sire gEBV and sire's daughter averages were consistently mid-range in the crossvalidation analysis (Table 5), ranging from ~0.43 to 0.53.Estimating SNP effects following reweighting (applying greater weight to more significant SNPs) did not consistently improve the correlation between genomic predictions and daughter average, though the reduction in the correlation was slight (~0.002) and the great-est change occurred in the initial round of reweighting.Results from using 79K and high-density genotypes in genomic prediction were not greatly different, suggesting that use of the 79K genotype data was sufficient to explain most of the genetic variation with little added benefit from imputation to high density in the genomic prediction context.

DISCUSSION
Multiple genomic regions and genetic markers were identified as being associated with variation in susceptibility to MAP infection.As indicated by the positional candidate genes listed in Tables 1 and 2, several of these windows or markers corresponded very well with the locations of genes or gene clusters relevant to host response to pathogens.Given the number of daughter records contributing to sire phenotypes, and the consequent higher reliability for sire phenotypes based on  daughter averages, it is not surprising that the most significant window and marker information comes from the sire analysis.Nonetheless, the cow analysis was useful in providing validation of some results observed for BTA7 in the sire analysis.
The variants identified in Table 1 are all located within intergenic or intronic locations, with one exception.The first variant listed is a SNP within the coding region of a predicted T-cell receptor β chain V region C5-like gene.The base change observed, from A to C, is a missense mutation altering the encoded AA from glycine to proline, with the C allele associated with increased susceptibility to MAP infection.The A allele is the more common allele with an allele frequency of 0.97.
The associations on BTA7 observed in analysis of the sire data are part of several results from this study for which there is identification of a corresponding region associated with susceptibility to MAP infection from previous genomic analyses.A positional candidate gene analysis for BTA7 in Holstein cattle, originally focused on a different region of the chromosome, identified the  strongest association (P < 1 × 10 −7 ) for a SNP at 14.2 Mb on BTA7 (Sallam et al., 2017), overlapping with one of the windows contributing most to variation in the current study.Results from the sire analysis in the current study suggested the possibility that this SNP is located between 2 nearby genomic regions with significant contribution to variation, and thus may reflect associations with both.Authors of the previous study suggested a potential candidate gene in this region (14.4Mb) is peptidylprolyl cis/trans isomerase, NIMA-interacting 1 (PIN1).PIN1 is known for its effect on key proteins involved in the immune response in humans (Lu and Zhou, 2007;Manganaro et al., 2010).
As indicated in Table 2, other plausible positional candidate genes exist in this multiwindow region of BTA7 as well including PRDX2, MUC16, ATG4D, S1PR5, TMED1, and ILF3.Among these, mucin 16 (MUC16) in the 12.3 to 13.3 Mb window is an interesting candidate as a potential physical block to infection.Mucins are high molecular weight proteins that are part of the mucous barrier found at the apical surface of the epithelia, which provide protection against pathogen infection.Variation in this gene could relate to the ability of MAP to enter the body.Interestingly, another mucin gene family member (MUC1) is a positional candidate for the window identified in BTA3.Autophagy related 4D cysteine peptidase (ATG4D), less than 50 kb proximal to the 15.1 to 15.6 Mb window on BTA7, is likewise a strong candidate, as compromised autophagy could contribute to less effective clearing of an intracellular pathogen such as MAP.Sphingosine-1-phosphate receptor 5 (S1PR5), located adjacent to ATG4D, is likewise a plausible candidate gene given its function as a receptor for sphingosine 1-phosphate, which regu-lates apoptosis.The corresponding region of the human genome has shown association with Crohn's disease, an inflammatory bowel disease with similarity to Johne's disease.The gene in this region most strongly associated with Crohn's disease (Ellinghaus et al., 2016) is kelch-like ECH associated protein 1 (KEAP1), which contributes to regulation of response to oxidative stress (Eggler et al., 2005;2009).
A second genomic location, as indicated by significant SNPs at 26.0 and 27.8 Mb on BTA23, is supported by results of previous studies.Two previous studies have identified this region as being associated with susceptibility to MAP infection in French and Chinese Holstein populations (Gao et al., 2018;Sanchez et al., 2020).Additionally, GWAS of cell-mediated and antibody-mediated immune response identified this as the most significantly associated genomic region for both responses (Thompson-Crispi et al., 2014).This region is gene-dense with multiple candidate genes, including but not limited to bovine leukocyte antigen (BoLA) genes, tumor necrosis factor (TNF), NFKB inhibitor like 1 (NFKBIL1), and immediate early response 3 (IER3).Genes of the BoLA family comprise the bovine form of the major histocompatibility complex, which is a fundamental component of infectious disease resistance in vertebrates (Amills et al., 1998).TNF and NFKBIL1 contribute to apoptosis, for which IER3 can be an inhibitor (Wu et al., 1998).Reduced macrophage apoptosis in response to MAP infection has been suggested as a possible mechanism for MAP evasion of the immune system (Kabara and Coussens, 2012).
The second greatest window on BTA23 for variance explained includes 2 cytokines that are plausible candidate genes.Both IL17F and IL17A are expressed by P-values for SNPs with effects having consistent signs across analyses were combined using a weighted Z-test (Whitlock, 2005).Single SNP associations are indicated by dots with genomic location along the x-axis and significance (minus log 10 of P-value) on the y-axis.Horizontal line denotes the threshold for significant linkage (orange, P < 5 × 10 −8 ).
activated T-cells and are proinflammatory, with upregulation of IL17A and IL17F observed in inflammatory bowel disease in humans (Seiderer et al., 2008;Rovedatti et al., 2009).Likewise, increased IL-17 has been observed in plasma from cows that are both ELISA and fecal test-positive for antibodies to MAP (Dudemaine et al., 2014).Also, treatment of bovine peripheral blood mononuclear cells with MAP antigens has resulted in increased secretion of IL-17A (DeKuiper and Coussens, 2019), and MAP antigens induce increased expression of IL17A mRNA in cultured bovine T-cells (DeKuiper et al., 2020).
A third genomic location identified in this study (73.3-73.8Mb window of BTA12) was within approximately 1 Mb of a region (71.7-71.9Mb) identified previously as associated with paratuberculosis (Sanchez et al., 2020).This window is relatively gene-sparse, with only 1 characterized gene within (heparan sulfate 6-Osulfotransferase 3, HS6ST3), which is not an obvious candidate gene.
The 37.1 to 37.6 Mb window on BTA25 has not been associated with genomic variation for Johne's disease in previous studies; nonetheless, it does contain plausible candidate genes relative to Johne's disease and has been associated with Crohn's disease in humans.The genes showing the strongest association with Crohn's disease in this region are SMAD specific E3 ubiquitin protein ligase 1 (SMURF1) and karyopherin subunit α 7 (KPNA7; Liu et al., 2015;Ellinghaus et al., 2016).SMURF1 contributes to regulation of cell motility, cell signaling, and cell polarity through the bone morphogenetic signaling pathway (Zhu et al., 1999), and KPNA7 is involved in nuclear protein import (Kelley et al., 2010).Additionally, tectonin β-propeller repeat containing 1 (TECPR1), immediately adjacent (37.8 Mb) to this window, is a tethering factor in autophagy and has been implicated in selective autophagy of bacterial pathogens (Ogawa et al., 2011).
The candidate genes discussed above and listed in Tables 1 and 2 correspond well with the results of the gene set enrichment analyses, which in part suggest that gene sets and pathways for antigen processing and presentation, antimicrobial peptides, and natural killer cell-mediated cytotoxicity are relevant to variation in host susceptibility to MAP infection.The contribution of some of the other pathways identified as among the most significant, such as hemoglobin complex and olfactory signaling, are not as obvious.The significance of the 3 gene sets mentioned above suggests their consideration not only for explanation of genetic variance but as well as for targets for understanding and modifying host susceptibility to MAP infection.
The cross-validation analysis of genomic predictions for susceptibility to MAP infection suggests SNP effect estimates from the current study would have utility in genomic selection.Genomic predictions for susceptibility to MAP infection could be used by producers in selecting AI service sires or replacement females as a means of producing a herd with lesser susceptibility to infection.Considering only sires with >200 daughters in the current data, the highest and lowest proportion of affected daughters for a sire were 14.79% and 0.65%, respectively, suggesting considerable opportunity to affect disease prevalence within herd by genomic selection.Genomic selection could be a tool for producers that complements other aspects of management aimed at reducing herd incidence of Johne's disease, though not a complete solution alone.Genomic selection for health traits using producer-recorded health trait data has been proposed (Parker Gaddis et al., 2014) and recently implemented (CDCB, 2018).Although there are quantifiable economic costs associated with Johne's disease (Gonda et al., 2007;Smith et al., 2009), which could aid in incorporation of this trait in a selection index, attaching an economic cost to the zoonotic aspect of this disease, arguably the greater concern, would be challenging.
Figure1.Genome-wide association analysis results for US Holstein sires.A: Single SNP associations are indicated by dots with genomic location along the x-axis and significance (minus log 10 of P-value) on the y-axis.Horizontal line denotes the threshold for significant linkage (orange, P < 5 × 10 −8 ).B: Variation attributable to SNPs in moving 0.5-Mb windows are indicated by dots with genomic location along the xaxis and percent variance explained on the y-axis.Only windows with greatest variation explained within a local region are indicated in the plot.

Figure 2 .
Figure 2. Expanded view of window variance and SNP association for the region on chromosome 7 accounting for the greatest variance.Green bars denote % variance explained by 0.5-Mb windows for segments of the 10-20 Mb region of BTA7 (y-axis scale on left).Blue dots denote SNP significance across the same region (y-axis scale on right).Horizontal (orange) line denotes the threshold for significant association (P < 5 × 10 −8 ).

Figure 3 .
Figure 3. Expanded view of window variance and SNP association for chromosome 23.Green bars denote % variance explained by 0.5-Mb windows, and blue dots denote SNP significance across the same region (y-axis scale on right).Horizontal (orange) line denotes the threshold for significant association (P < 5 × 10 −8 ).
Figure 4. Genome-wide association analysis results for Minnesota and Michigan cows.A: Single SNP associations are indicated by dots with genomic location along the x-axis and significance (minus log 10 of p-value) on the y-axis.Horizontal line denotes the threshold for significant linkage (orange, P < 5 × 10 −8 ).B: Variation attributable to SNPs in moving 0.5-Mb windows are indicated by dots with genomic location along the x-axis and percent variance explained on the y-axis.Only windows with greatest variation explained within a local region are indicated in the plot.
Figure5.Genome-wide association analysis results for meta-analysis of US Holstein sires and Minnesota and Michigan Holstein cows.P-values for SNPs with effects having consistent signs across analyses were combined using a weighted Z-test(Whitlock, 2005).Single SNP associations are indicated by dots with genomic location along the x-axis and significance (minus log 10 of P-value) on the y-axis.Horizontal line denotes the threshold for significant linkage (orange, P < 5 × 10 −8 ).
Kirkpatrick et al.: GWAS ANALYSIS ON MYCOBACTERIUM AVIUM SUSCEPTIBILITY IN HOLSTEINS Table 4.Most significant gene sets from gene set enrichment meta-

Table 1 .
Kirkpatrick et al.: GWAS ANALYSIS ON MYCOBACTERIUM AVIUM SUSCEPTIBILITY IN HOLSTEINS Variants with false discovery rate (FDR) <0.01 (most significant within a region) in sire analysis 1 1If another variant meeting the FDR threshold was within 1 Mb of the reported variant but was less significant on the basis of P-value and FDR, it is not reported.2Genomiclocations based on ARS-UCD1.2genomeassembly (https: / / www .ncbi.nlm.nih.gov/assembly/GCF _002263795 .1/).3Genes with potential relevance to MAP infection and located within 100 kb of variant.

Table 2 .
Half-megabase windows accounting for the largest percentage of variance in sire and cow analyses 1 1 Under an infinitesimal model of genetic variance, a 0.5-Mb window would be expected to contribute 0.0191% of the total variance (100%/5,231 windows).Windows reported here exceed 20 times this value.

Table 3 .
Variants increasing in significance in meta-analysis (false discovery rate < 0.05, only most significant within a 1-Mb region shown)