Genome-wide association study with imputed whole-genome sequence variants including large deletions for female fertility in 3 Nordic dairy cattle breeds

Fertility is an economically important trait in live-stock. Poor fertility in dairy cattle can be due to loss-of-function variants affecting any essential gene that causes early embryonic mortality in homozygotes. To identify fertility-associated quantitative trait loci, we performed single-marker association analyses for 8 fertility traits in Holstein, Jersey, and Nordic Red Dairy cattle using imputed whole-genome sequence variants including SNPs, indels, and large deletion. We then performed stepwise selection of independent markers from GWAS loci using conditional and joint association analyses. From single-marker analyses for fertility traits, we reported genome-wide significant associations of 30,384 SNPs, 178 indels, and 3 deletions in Holstein; 23,481 SNPs, 189 indels, and 13 deletions in Nordic Red; and 17 SNPs in Jersey cattle. Conditional and joint association analyses identified 37 and 23 independent associations in Holstein and Nordic Red Dairy cattle, respectively. Fertility-associated GWAS loci were enriched for developmental and cellular processes (Gene Ontology enrichment, false discovery rate < 0.05). For these quantitative trait loci regions (top marker and 500 kb of surrounding regions), we proposed several candidate genes with functional annotations corresponding to embryonic lethality and various fertility-related phenotypes in mouse and cattle. The inclusion of these top markers in future releases of the custom SNP chip used for genomic evaluations will enable their validation in independent populations and improve the accuracy of genomic predictions.


INTRODUCTION
Over the years, GWAS have provided a better understanding of underlying genetic architectures for many economical traits in dairy cattle.These traits include milk yield (Iso-Touru et al., 2016), milk composition (Gebreyesus et al., 2019;Sanchez et al., 2019), milking speed (Jardim et al., 2018;Marete et al., 2018), clinical mastitis (Kadri et al., 2015;Cai et al., 2018), body conformation (Abo-Ismail et al., 2017), stature (Bouwman et al., 2018), and more.Female fertility is an economically important trait, which, before the advent of genomic selection, was difficult to improve by selection due to its very low heritability.The use of GWAS for female fertility provided an opportunity to detect QTL (Höglund et al., 2014(Höglund et al., , 2015a,b) ,b) and to include these QTL in genomic prediction (Brøndum et al., 2015).
However, the accuracy of prediction for female fertility is lower compared with production traits (Brøndum et al., 2015).There is a potential to identify QTL or markers in strong linkage disequilibrium (LD) with QTL using GWAS for female fertility to use the QTL information in breeding value prediction.Two ways to increase the power of GWAS in mapping populations of fixed size are (1) to consider additional classes of genetic variants in addition to SNPs such as small indels and various types of structural variations, and (2) to impute SNP chip genotyping data to the whole-genome sequence.
To include deletions in addition to SNPs and indels in gene mapping and prediction of breeding values in cattle, we previously performed genome-wide mapping of deletions in 175 whole-genome sequence (WGS) animals, 67 Holstein (HOL), 81 Nordic Red (RDC), and 27 Jersey (JER) cattle, and identified ~8,500 variants ranging from 199 bp to 773 kb in size (Mesbah-Uddin et al., 2018).Later, we imputed these deletions along with WGS SNPs and indels, with an average imputation accuracy squared Pearson correlation (r 2 ) > 0.9, in a cohort of 13,000 bulls from HOL, RDC, and JER genotyped using medium-and high-density SNP array chip (Mesbah-Uddin et al., 2019).In this study, we performed GWAS for 8 fertility traits in HOL, RDC, and JER using imputed WGS variants including SNPs, indels, and deletions.We hypothesized that inclusion of indels and large deletions in addition to whole-genome single nucleotide variants could identify candidate genes for female fertility traits in dairy cattle.

METHODS
No animal handling or animal experiment was conducted for this study; therefore, approval from institutional animal care and use was not necessary.

Genotypes and Phenotypes
In an earlier study, using a multibreed WGS reference panel of 772 animals that included 548 HOL, 132 RDC, and 92 JER animals, we performed imputation from the Illumina BovineSNP50 (50k) to WGS SNPs, small indels, and deletions in a cohort of ~13,000 bulls [6,371 HOL,4,955 RDC,and 1,640 JER; for details see Mesbah-Uddin et al. (2018, 2019)].In short, for the imputation reference panel, SNPs and indels were identified from 175 WGS samples from the Nordic sequence data (Brøndum et al., 2014) and 597 WGS samples from the Run-6 of the 1000 Bull Genomes Project (Daetwyler et al., 2014) using the GATK v1.6 (McKenna et al., 2010) and SAMtools 0.1.18mpileup (Li et al., 2009) software, respectively. Additionally, chromosomal deletions (199 bp-773 kb) were mapped and genotyped in the 175 Nordic WGS samples (Mesbah-Uddin et al., 2018) using Genome STRiP software v2.00.1678 (Handsaker et al., 2011).Subsequently, deletion genotyping was extended to the remaining 597 WGS samples of the reference panel using the read-depth data from the variant call format file in a Gaussian mixture model-based genotyping approach (Mesbah-Uddin et al., 2019).
The phenotypic data in each breed were deregressed breeding values of AI bulls (VanRaden and Wiggans, 1991;Garrick et al., 2009) for 8 fertility traits as follows: number of inseminations per conception in heifers, number of inseminations per conception in cows (AISc), interval (number of days) from calving to first insemination in cows (ICF), interval (number of days) from first to last insemination in heifers, interval from first to last insemination in cows (IFLc), fertility index (FI), nonreturn rate in heifers, and nonreturn rate in cows (NRRc); for details, see NAV (2013).Deregressed proofs are a proxy of the average performance of bulls' daughters, adjusted for nongenetic effects and breeding values of their dams.Summary of these phenotypes

Genome-Wide Association Study
We performed single-marker association analyses using the following mixed linear model: where y was a vector of phenotypic records for fertility traits; 1 n was a vector of ones; µ was the mean of the given trait; b was the additive allele substitution effect of the candidate marker to be tested; x was a vector of allele dosages (coded as 0, 1, or 2), PC i was a vector of animals' scores for principal components of genetic ancestry i [first 10 principal components (PC) were considered, calculated from a random set of 10% markers selected from the 29 autosomes]; c i was the regression coefficient for score i; g was a vector of polygenic effects with g ~, , N g 0 2

(
) where G was the genomic relation- ship matrix calculated using the 777k markers without the test chromosome [leave-one-chromosome-out (LOCO) approach]; σ g 2 was the additive genetic variance explained by the 777k markers in LOCO approach; ε was the vector of residual effect with ε ~, , N 0 2 Iσ ε ( ) where I was an n × n identity matrix and σ ε 2 was the residual variance.Differences in reliabilities of deregressed proofs were ignored as we used deregressed breeding values from progeny-tested bulls.The analysis was performed using GCTA version 1.92.1 (Yang et al., 2011a).Any association with the trait of interest was considered statistically significant at P < 5 × 10 −8 , considering 1 million independent tests at 5% level of significance.

Conditional and Joint Association Analysis
Next, to refine the analysis within each QTL region and account for LD, we performed stepwise model selection using genome-wide complex trait analysis (GCTA)-conditional and joint association analysis (COJO) (Yang et al., 2012).For each chromosome, from the single-marker associations at P < 5 × 10 −8 , we selected the marker with lowest association P-value (top marker) and reran the GWAS for the chromosome including the genotype of the top marker and first 10 PC as fixed effects, and the LOCO-genomic relationship matrix as random effect.We used an LD threshold of 0.80 to identify independent marker(s) and assumed complete independence for markers at least 10 Mb away from a conditional marker.After each iteration, the marker with lowest P-value was selected when it was below the GWAS threshold of 5 × 10 −8 and was added to the conditional marker list for the next iteration.
When there were multiple markers with same (lowest) P-value, the first marker based on chromosome position was arbitrarily chosen.Finally, for each chromosome, all selected markers on the chromosome were fitted simultaneously.

RESULTS AND DISCUSSION
We performed single-marker association analysis for 8 female fertility traits (Table 1) in HOL, RDC, and JER cattle using imputed WGS SNPs, indels, and deletions.The GWAS Manhattan plots for 8 traits in the 3 breeds are presented in Figures 1, 2, and 3, and the summary of variants that reached genome-wide significant threshold of P < 5 × 10 −8 is presented in Table 2.
Association test statistics were inflated when we only accounted for relatedness using the genomic relationship matrix derived from the 777k markers in a LOCO approach (Table 3).Therefore, following Jiang et al. (2019), we adjusted for both population stratification using the first 10 PC as fixed effects, and relationships among animals using the LOCO approach.Genomic inflation rates (λ) before and after fitting the PC are presented in Table 3.We observed a decrease in λ values after the inclusion of PC as covariates, although λ was still very far from 1 under the assumption of the distribution of test statistics under the null hypothesis.However, when a trait is under polygenic inheritance [i.e., multiple (small effect) causal variants underlie the trait variation (also known as infinitesimal model)], a substantial inflation of GWAS test statistics is expected even in the absence of population structure (Yang et al., 2011b).The extent of this inflation is governed by several factors, namely, heritability of the trait, study sample size, pattern of LD, and number of causal variants [for detail, see Yang et al. (2011b)].The genomic inflation seen in this study despite the correction for population stratification and relatedness could be explained by the nature of polygenic inheritance of the fertility traits we analyzed (e.g., several nonzero effect genetic variants in Bayesian mixture models as reported by Brøndum et al., 2015).The phenotypes used here are deregressed breeding values for bulls with high reliability.The long-range LD pattern seen in the study population (de Roos et al., 2008;van den Berg et al., 2016) could also explain inflation of λ values observed in this study.Among the 3 breeds, HOL had the largest sample size, and thus the highest power to identify QTL; we also observed the highest λ values in HOL.However, the adjustment for both population stratification and relatedness could be very conservative; nonetheless, we expect that the test statistics will be robust against spurious associations and false discoveries (Jiang et al., 2019).

Fertility-Associated Loci Are Enriched for GO Terms Related to Cellular and Developmental Processes
We performed GO enrichment using genes within a 500-kb surrounding area of FI-associated loci in HOL, RDC, and JER.Interestingly, only 22 genes (within chromosome 13: 51-55 mb) were common among the 3 breeds (Figure 4a).Despite limited overlap, gene-set enrichment analysis showed that genes near the FI- associated GWAS loci are enriched for developmental and cellular processes (false discovery rate < 0.05) in HOL, RDC, and JER (Figure 4b-d).Furthermore, majority of the GO enrichments were in similar molecular, cellular, and biological processes in the 3 breeds (Supplemental Tables S1-S3; https: / / doi .org/ 10 .6084/m9 .figshare.16910521.v2;Mesbah-Uddin, 2021).Although our GWAS was underpowered, especially for JER, this GO enrichment highlighted that common biological pathways may be involved in female fertility in dairy cattle.This pathway sharing across-breeds showed that there is a potential to use multibreed reference in predicting fertility in dairy cattle, where substantial gains in accuracy could be achieved for breeds with numerically small reference populations.
In RDC, 13 deletions reached genome-wide significance; associated phenotypes included 7 fertility traits (Table 4).The strongest association was observed for a previously known lethal deletion on chromosome 12 located between position 20,100,648 and 20,763,119 bp (structural variant id esv4015629, P = 3.5 × 10 −42 ).We found that esv4015629 was significantly associated with 7 fertility traits, in agreement with results reported by Kadri et al. (2014).However, we did not find any association between esv4015629 and ICF.It is worth mentioning that ICF is related to estrous cycle (prepregnancy), and thus is not affected by early embryo losses, whereas the 7 other associated traits are related to pregnancy success.Several traits had multiple associated deletions.Three traits, FI, IFLc, and AISc, had strong genetic correlations (correlation of 0.98 between FI vs. IFLc, and 0.89 between FI vs. AISc).They shared associations with 7 deletions (Table 4).Several genes within these deletions have known fertility-related phenotypes, including embryonic lethality in mice (Supplemental Table S4).
In mice, homozygous mutation of ADARB1 causes postnatal lethality (entry MGI: 891999 in the Mouse Genome Informatics database, http: / / www .informatics.jax.org/), whereas the inactivation of POFUT2 causes embryonic lethality during organogenesis as well as lethality between implantation and somite formation (MGI: 1916863).

FI-Associated Loci in RDC.
In RDC, we identified 23,481 SNPs and 189 indels with genome-wide significant associations with fertility traits (Table 2).The COJO analysis identified 23 FI-associated independent GWAS signals, 9 of which were located on chromosome 12 (Table 6).The LD pattern (correlation) of these 9 lead markers is presented in Table 7. Functional annotations of the genes located within these QTL regions (top markers and 500 kb of the surrounding area) are presented in Supplemental Table S7 (https: / / doi .org/ 10 .6084/m9 .figshare.16910521.v2;Mesbah-Uddin, 2021).Interestingly, the candidate genes within these loci also have known fertility-related functions in cattle, and mouse (Supplemental Table S7).Overall, the COJO analyses for the 8 traits resulted in 111 independent loci associated with either FI or different component traits (Supplemental Table S8; https: / / doi .org/ 10 .6084/m9 .figshare.16910521.v2;Mesbah-Uddin, 2021).
FI-Associated Loci in JER.Finally, for JER, 17 SNPs were significantly associated with either ICF, IFLc, or NRRc (Figure 3 and Table 2).After conditional analyses, we identified 2 independent GWAS loci each for ICF and NNRc, and 1 locus for IFLc (Supplemental Table S9; https: / / doi .org/ 10 .6084/m9 .figshare.16910521.v2;Mesbah-Uddin, 2021).However, our GWAS was underpowered to identify FI-associated loci in JER.In an earlier study, Höglund et al. (2015b) reported 6 FI-associated QTL with P < 5.6 × 10 −9 for this JER population.The allele frequencies and the effect estimates (β) for the Bonferroni significant FIassociated SNPs from the Höglund et al. study was perfectly correlated (r > 0.99) with the corresponding results from this study (Supplemental Figure S1a,b; https: / / doi .org/ 10 .6084/m9 .figshare.16910521.v2;Mesbah-Uddin, 2021); however, on average, the β values were 1.8 times higher in the former study (Supplemental Figure S1b).Moreover, we did not observe significant association with FI in JER, though there were suggestive signals at those loci (Supplemental Figure S1c).Two factors, namely, sample size and control for genomic inflation, could explain this situation.In JER, we had about 4 times fewer samples compared with HOL and RDC, and thus lower QTL detection power.Sample size did not differ much between Höglund et al. and our study (1,225 vs. 1,211 here).However, control of genomic inflation was much more stringent in this study with a λ of 1.27 after correction versus 1.88 in Höglund et al. (2015b).

CONCLUSIONS
In this study, we identified several novel QTL for female fertility in Nordic dairy cattle.With our imputed data, we detected a previously observed deletion acting as a fertility QTL in RDC.Here, with imputed data, the GWAS signal for this QTL was stronger than in the previous report by Kadri et al. (2014).Few other deletions also reached the GWAS significant threshold but were not among the top signals of those loci.We hypothesize that these deletions could be potential causal variants for those loci, given that deletion of an essential gene could be lethal; however, this needs further validation.The majority of the candidate genes within the reported QTL regions had established fertility-related function in mice, illustrating the functional and biological relevance of these QTL for female fertility in dairy cattle.However, our analyses were limited    to within-breed QTL discovery, which needs replication in independent cohorts of animals.Nevertheless, QTL identified in this study will be included in future releases of the custom SNP chip used for genomic evaluations; this will enable their validation in independent populations, and subsequent studies will elucidate the effect of these markers in prediction accuracy for female fertility in dairy cattle.
Figure 1.Manhattan plots of single-marker GWAS for 8 female fertility traits in Holstein cattle.Each dot represents 1 whole-genome sequence (WGS) marker; dashed horizontal line is GWAS threshold of 5 × 10 −8 .AISh = number of inseminations per conception in heifers; AISc = number of inseminations per conception in cows; ICF = interval (number of days) from calving to first insemination (in cows); IFLh = interval (number of days) from first to last insemination in heifers; IFLc = interval (number of days) from first to last insemination in cows; FI = fertility index; NRRh = nonreturn rate in heifers; NRRc = nonreturn rate in cows.

b
= allele substitution effect; P = P-value from single-marker GWAS. 5 Subscript "J" indicates values from conditional and joint analysis.6 Nearby gene = genes within 500 kb upstream and 500 kb downstream of the lead variants.Mesbah-Uddin et al.: GENOME-WIDE ASSOCIATION STUDY FOR FEMALE FERTILITY IN DAIRY CATTLE

b
= allele substitution effect; P = P-value from single-marker GWAS.

Table 1 .
Mesbah-Uddin et al.: GENOME-WIDE ASSOCIATION STUDY FOR FEMALE FERTILITY IN DAIRY CATTLE Summary 1 AISc = number of inseminations per conception in cows; AISh = number of inseminations per conception in heifers; ICF = interval (number of days) from calving to first insemination (in cows); IFLc = interval (number of days) from first to last insemination in cows; IFLh = interval (number of days) from first to last insemination in heifers; FI = fertility index; NRRc = nonreturn rate in cows; NRRh = nonreturn rate in heifers.

Table 2 .
Mesbah-Uddin et al.: GENOME-WIDE ASSOCIATION STUDY FOR FEMALE FERTILITY IN DAIRY CATTLE Summary of single-marker association results (markers with P-value < 5 × 10 −8 )

Table 3 .
Genomic inflation rate, λ, before and after correcting for population stratification using first 10 principal components (PC) 1 AISh = number of inseminations per conception in heifers; AISc = number of inseminations per conception in cows; ICF = interval (number of days) from calving to first insemination (in cows); IFLh = interval (number of days) from first to last insemination in heifers; IFLc = interval (number of days) from first to last insemination in cows; FI = fertility index; NRRh = nonreturn rate in heifers; NRRc = nonreturn rate in cows.
Top 8 GO terms are highlighted.The GO enrichment analysis was performed using g:Profiler webserver (https: / / biit .cs.ut.ee/gprofiler/ gost).Full results of GO enrichment are presented in Supplemental TablesS1-S3(https: / / doi .2Positionbase pair positions in UMD3.1.3Traits included number of inseminations per conception in cows (AISc) and in heifers (AISh); interval (number of days) from calving to first insemination (ICF); interval from first to last insemination in cows (IFLc) or in heifers (IFLh); fertility index (FI); nonreturn rate in cows (NRRc) or in heifers (NRRh).When a deletion was associated with multiple traits, GWAS summary for trait with strongest signal (first trait in the corresponding row) was presented in the table.

Table 5 .
Mesbah-Uddin et al.: GENOME-WIDE ASSOCIATION STUDY FOR FEMALE FERTILITY IN DAIRY CATTLE Fertility index-associated GWAS loci in Holstein cattle

Table 6 .
Fertility index-associated GWAS loci in Nordic Red Dairy cattle

Table 7 .
Linkage disequilibrium (r) pattern among the top markers of chromosome 12 (Chr12) from Table6SNP