Inheritance of genomic regions and genes associated with number of oocytes and embryos in Gir cattle through daughter design

Over the past decades, daughter designs, including genotyped sires and their genotyped daughters, have been used as an approach to identify QTL related to economic traits. The aim of this study was to identify genomic regions inherited by Gir sire families and genes associated with number of viable oocytes (VO), total number of oocytes (TO), and number of embryos (EMBR) based on a daughter design approach. In total, 15 Gir sire families were selected. The number of daughters per family ranged from 26 to 395, which were genotyped with different SNP panels and imputed to the Illumina BovineHD BeadChip (777K) and had phenotypes for oocyte and embryo production. Daughters had phenotypic data for VO, TO, and EMBR. The search for QTL was performed through GWAS based on GBLUP. The QTL were found for each trait among and within families based on the top 10 genomic windows with the greatest genetic variance. For EMBR, genomic windows identified among families were located on BTA4, BTA5, BTA6, BTA7, BTA8, BTA13, BTA16, and BTA17, and they were most frequent on BTA7 within families. For VO, genomic windows were located on BTA2, BTA4, BTA5, BTA7, BTA17, BTA21, BTA22, BTA23, and BTA27 among families, being most frequent on BTA8 within families. For TO, the top 10 genomic windows were identified on BTA2, BTA4, BTA5, BTA7, BTA17, BTA21, BTA22, BTA26, and BTA27, being most frequent on BTA7 and BTA8 within families. Considering all results, the greatest number of genomic windows was found on BTA7, where the VCAN , XRCC4 , TRNAC - ACA , HAPLN1 , and EDIL3 genes were identified in the common regions. In conclusion, 15 Gir sire families with 26 to 395 daughters per family with phenotypes for oocyte and embryo production helped to identify the inheritance of several genomic regions, especially on BTA7, where the EDIL3 , HAPLN1 , and VCAN candidate genes were associated with number of oocytes and embryos in Gir cattle families.


INTRODUCTION
Artificial selection for reproductive traits promotes a slow response to selection due to the highly polygenic nature and low heritability of this type of trait, as they depend on the action of many genes and are influenced by environmental factors (Fathoni et al., 2022;Rocha et al., 2022).In recent years, collection of genomic information of animals has started to be used to identify genetic variants that affect several traits in livestock, in addition to including this molecular information in genetic evaluations, promoting a faster response to selection.As an example, evaluations for molecular traits in dairy Gir in Brazil allowed the inclusion of genes related to milk production and hereditary conditions in its breeding program, enabling genomic selection and increasing the accuracy of the genetic values for Gir sires (Panetto et al., 2021).Over the years, the search for genetic variants associated with innumerous traits has grown in livestock farming as well as the methodologies applied to identify them (Grisart et al., 2002;Otto et al., 2020;Rocha et al., 2023).
Advances in DNA-based marker technology have enabled the development of new methodologies, including family analyses based on daughter and granddaughter designs, as proposed first by Weller et al. (1990).These methodologies have been instrumental in discovering genomic regions associated with important traits in livestock species.Over the past decades, these approaches have been used to find QTL and SNPs associated with economic traits in several cattle breeds (Jiang et al., 2010;Silva et al., 2011;Oliveira Júnior et al., 2019).
In a daughter design approach, genotypic information for sires and their daughters is considered, with phenotypes collected from the daughters, aiming to identify quantitative effects associated with marker alleles as a significant component of variance between daughter subgroups receiving alternative marker alleles from their sires (Weller et al., 1990).According to Weller et al. (1990), this method is helpful to identify genes with relatively small quantitative effects, which is the case for polygenic traits.
Some examples of the use of both approaches (daughter and granddaughter design) are in Grisart et al. (2002), who identified a nonconservative K232A substitution in the DGAT1 gene with a major effect on milkfat content and other milk traits in Holstein-Friesian cattle.In another example, daughter design employed with GWAS in a Holstein cattle population revealed 105 genome-wise significant SNPs for milk production traits, in which the majority (86 out of 105) of the significant SNPs were located within the reported QTL regions, and others, in particular 2 SNPs, ARS-BFGL-NGS-4939 and BFGL-NGS-118998, were located 160 base pairs apart from the DGAT1 gene (Jiang et al., 2010).Using a similar approach to the daughter design in Nellore cattle breed, Oliveira Júnior et al. ( 2019) found QTL on BTA 5 and 14, in which potential causal mutations were associated with reproduction-related biological processes such as oocyte maturation, embryo development, placenta development, and response to reproductive hormones.
The Gir cattle breed is one of the most important Bos taurus indicus used to maintain pure Gir and for crossbreeding with Holstein cattle to generate the composite Girolando breed (Milanesi et al., 2022).Given the importance of the Gir cattle breed for the dairy production system in tropical countries, a daughter design was used to search QTL affecting milk production traits (Silva et al., 2011).The Gir breed has a greater oocyte quality compared with Bos taurus taurus breeds, which allows greater success of in vitro embryo production (Drum et al., 2019).Identification of genetic variants related to such traits are important to provide clues to understanding fertility regulation and genetic architecture in Gir cattle (Ma et al., 2019;Rocha et al., 2023).However, few studies using the daughter design approach for Gir cattle have been reported in literature, and none so far have been reported for embryo production-related traits in this breed.Considering the importance of female fertility to livestock production systems, the aim of this study is to identify genomic regions inherited by Gir sire families and genes associated with number of viable oocytes (VO), total number of oocytes (TO), and number of embryos (EMBR) based on daughter design approach.
The hypothesis was that common genomic regions inherited across Gir cattle families could be identified and, in these regions, genes related to oocytes and embryo production could be found.

MATERIALS AND METHODS
Because no human or animal subjects were used, this analysis did not require approval by an Institutional Animal Care and Use Committee or Institutional Review Board.

Population Structure
In this study, the daughter design methodology proposed by Weller et al. (1990) was applied.This methodology is carried out considering genotypic information for sires and their daughters, with phenotypes collected from the daughters.A pedigree file with 4,679 Gir animals was provided by Brazilian Agricultural Research Corporation (EMBRAPA), Dairy Cattle Center.From this pedigree file, 2,093 animals had genomic data, of which 169 were sires.As exclusion criteria for the daughter design, sire families with fewer than 20 phenotyped and genotyped daughters were not considered for the analyses (Silva et al., 2011).In this study, 15 families of the most important sires of the Gir cattle breed were evaluated.The number of daughters per sire family ranged from 26 to 395, totaling 1,474 daughters in all families (Table 1).These daughters had phenotypic data for VO, TO, and EMBR.Table 2 shows the descriptive statistics of VO, TO, and EMBR for 1,474 daughters of all 15 families.To be included, cows were required to have a minimum of 3 observations for all traits.The description for the phenotypic data were previously described by a recent study from our laboratory (Rocha et al., 2022).Briefly, the ovum pick-up (OPU) sessions in all analyzed herds were conducted between January 2005 and June 2020 in herds located in Minas Gerais State (Brazil).Eight companies responsible for OPU and in vitro fertilization (IVF) processes were listed in the data set, each company using their own protocols; in this way, we considered 8 protocols in total for the analysis, varying among farms or sometimes within farms.The OPU seasons were classified as (1) January to March, (2) April to June, (3) July to September, and (4) October to December.Animals were separated into contemporary groups (CG) by intersecting year and season of oocyte aspiration, farm, and OPU-IVF protocol.Records belonging to CG with fewer than 3 animals were removed.

Genotypic Information
A genotype file with 2,093 animals and 420,718 SNPs was provided by EMBRAPA Dairy Cattle Center.This file is the result of an imputation process where different density SNP panels and their respective numbers of animals (Illumina BovineSNP50 BeadChip v2 [50K; n and GGP Indicus [50K; n = 227]) were extracted or imputed to an HD v. 2 panel using FIMPUTE 2.2 software (Sargolzaei et al., 2014).The quality control parameters considered in the imputation process as exclusion criteria for variants included SNP with minor allele frequency ≤0.05, an absolute difference between observed and expected allele frequency of the heterozygote genotype for Hardy Weinberg Equilibrium of >0.15, GenCall score ≤0.70, call rate ≤0.98, and SNP-SNP correlation >0.995.Samples with call rate <0.90 were also excluded.

Genome-Wide Association Study
The search for QTL among and within families was performed through GWAS analyses.From the provided data with 2,093 animals and 420,718 SNPs, genotypes of the 15 sires and their 1,474 daughters were selected.Thereafter, quality control was applied to the genotype data of each family and all families together.The follow-ing parameters for variant exclusion were considered: SNP with minor allele frequency ≤0.05, an absolute difference between observed and expected allele frequency for Hardy Weinberg Equilibrium >0.15, and SNP heritability (h2 ), in which markers with estimated h 2 < 0.98 were discarded (Forneris et al., 2015).By default, the PreGSf90 program used for molecular quality control also verifies parent-progeny Mendelian conflicts and eliminates progenies with conflicts (Misztal et al., 2002).Numbers of SNPs remaining within and among families after quality control are presented in Table 1.

Statistical Analyses
The GWAS based on GBLUP within and among families was conducted using the BLUPF90 software family (Misztal et al., 2002).Because the considered traits (VO, TO, and EMBR) did not have a normal distribution of the residuals, they were transformed using the logarithmic scale ln(X + 1), where ln is the natural logarithm and X is the original data.After this transformation, the normal distribution of the data was verified via the Anderson-Darling test (Stephens, 1986;Thode, 2002).The following model was used: where, y, β, a, p, and e are the vectors of observations, fixed effects, additive genetic random effects, permanent environment random effects, and residual effects, respectively; X, Z, and W are the incidence matrices of fixed, additive genetic, and permanent environment effects, respectively.For VO and TO, CG was considered as a fixed effect, and for EMBR, bull and bull breed used in the insemination process (Gir or Holstein) were additionally considered as fixed effects.Donor's age in days (linear and quadratic components) was included as a covariate for all traits.The OPU interval (linear component) was included as a covariate for VO and TO.The additive genetic, permanent environment, and residual effects were assumed to be random.In the single-trait model, it was assumed that a , σ ( ) , where σ a 2 , σ p 2 , and σ e 2 are the additive ge-

QTL and Gene Search
To identify regions possibly harboring QTL for each trait (TO, VO, EMBR), the top 10 genomic windows explaining the greatest percentage of additive genetic variance were selected both within each family and among all families (Tiezzi et al., 2015;Vargas et al., 2018;Otto et al., 2020).The genomic windows of all families were overlapped to identify the most common inherited genomic regions across families.Considering these regions, the lower and upper positions in base pairs were noted.Gene searches were carried out within this interval using the National Center for Biotechnology Information (NCBI, 2023, https: / / www .ncbi.nlm.nih.gov/ ) to identify candidate genes associated with all traits.

RESULTS
The top 10 genomic windows with the greatest genetic variance among and within families are shown in Figure 1.For EMBR, the top 10 genomic windows were located on BTA4, BTA5, BTA6, BTA7, BTA8, BTA13, BTA16, and BTA17 among families.For the results in the top 10 genomic windows within each family, BTA7 was the most common for EMBR, present in 8 of 15 families.
For VO, genomic windows in the top 10 were located on BTA2, BTA4, BTA5, BTA7, BTA17, BTA21, BTA22, BTA23, and BTA27 among families.BTA8 was the most common in the top 10 genomic windows within families for VO, and was also present in 8 of 15 families.For TO, genomic windows in the top 10 were present on BTA2, BTA4, BTA5, BTA7, BTA17, BTA21, BTA22, BTA26, and BTA27.Within families, BTA7 and BTA8 were the most common chromosomes, both present in 7 of 15 families.The VO and TO traits shared similar chromosome regions among and within families.
Considering the top 10 genomic windows within 15 families for 3 traits, in total, 450 genomic windows were identified.In addition, animals from all 15 families were included in a single analysis, of which the top 10 genomic windows were also selected for all 3 traits, adding 30 more genomic windows to the results.In total, 480 genomic windows were identified, of which 227 overlapped in different chromosomes.For example, unique genomic windows (with the same initial and final positions in bp) were identified for all traits on BTA4 in positions 69,914,886 to 70,357,035 bp and 92,208,844 to 92,628,843 bp.Other examples of overlapping genomic windows (not necessarily with the same initial and final position in bp) were 2 genomic windows on BTA5, 3 on BTA6, 3 on BTA7, 1 on BTA13, 1 on BTA20, and 2 on BTA24 for all traits.Considering VO and TO traits, 110 unique genomic windows were found for both traits.
Considering all top 10 genomic windows for each trait within 15 families, 9% of these genomic windows were present on BTA7; that is, 42 of 450 genomic windows.In addition to being the most frequent chromosome across the results, common genomic windows were found for all traits among and within some of the families on BTA7 (Table 3; Figure 2).With a gene search, the tRNAC-ACA, VCAN, XRCC4, HAPLN1, and EDIL3 genes were identified in this common region (82,974,837 to 83,997,563 bp).

DISCUSSION
In the current work, we aimed to identify QTL regions and genes associated with VO, TO, and EMBR based on a daughter design approach with GWAS.Considering the results for all 3 traits, BTA7 had the greatest number of genomic windows.In the overlapping genomic windows identified on BTA7 (82,974,837 to 83,997,563 bp) that are common to 40% of the families (6 out of 15 families), tRNAC-ACA, VCAN, XRCC4, HAPLN1, and EDIL3 genes were identified.
Considering the EMBR trait, 8 out of 29 chromosomes were present in the top 10 genomic windows among families.In a previous study, we found that the number of embryos can be affected by many genes, being therefore a polygenic trait (Rocha et al., 2023).The identification of several genomic regions in different chromosomes in the current work corroborates this.A recent study from our laboratory (Rocha et al., 2022) revealed high genetic correlation between VO and TO (1.00), VO and EMBR (0.82), and TO and EMBR (0.82), which may explain why most of the genomic windows identified for VO and TO in the current work are similar in chromosomes and  base pair positions.It is worth mentioning that the population used in this study is a smaller genotyped sample from the data used in our previous study (Rocha et al., 2022), of which variance components were obtained and used in the current study.The results obtained in our previous study indicated that it is possible to select for VO, TO, and EMBR in Gir animals and encouraged us to conduct the present study looking for possible genomic markers related to these traits, which can be incorporated into genetic breeding programs and genomic selection.
On the Animal QTL Database website (Animal QTLdb, 2023; https: / / www .animalgenome.org/QTLdb) only 5 published studies, all in Holsteins, are cited for the traits number of embryos and number of transferable embryos, but none mention the number of oocytes.This demonstrates the scarcity of QTL found for the number of oocytes and embryos, further emphasizing the importance of this study in Zebu breeds.Jaton et al. (2018), studying Holstein cattle, mapped QTL on BTA2, BTA11, BTA25, and BTA29 for number of embryos, and, in addition to other studies, QTL were mapped on BTA2, BTA11, and BTA15 for number of transferable embryos and on BTA2 and BTA11 for number of embryos degenerated.In our study, 12 genomic windows were identified on BTA2 and 10 on BTA11 for EMBR, VO, and TO within families, indicating that regions of these chromosomes may be a source of genetic influence on bovine female fertility.Kadarmideen and Mazzoni (2018) identified expression QTL targeting candidate genes for oocytes and embryos in Holstein cattle, and from those significant identified expression QTL for oocytes, 2 similar regions were also identified in our study, on BTA16 for number of VO and BTA18 for TO.Cai et al. (2019) found 7 QTL associated with female fertility index in Nordic Holstein cattle and identified significant SNPs in 6 chromosomes.One of the SNPs identified by Cai et al. (2019) on BTA17 (71,393,345 bp) indicated the LIF gene as a candidate gene for female fertility, because it is required for blastocyst implantation and establishment of pregnancy.Similarly, in our work, the same region was identified in one of the top 10 genomic windows associated with the number of embryos.
Although many regions were identified in our study for the number of embryos and oocytes, special attention will be given to the region on BTA7 (82,974,837-83,997,563 bp), where most of the genomic windows were identified within families.VCAN was one of the genes found in these common regions within families on BTA7, and according to GeneCards (2023; https: / / www .genecards.org/ ) this gene may take part in the regulation of cell motility, growth, and differentiation.Expression levels of the VCAN gene are related to the cumulus cell expansion process, being associated with the early embryo morphology score (Shen et al., 2020) and clinical pregnancy in humans (Ocampo et al., 2018), and with oocyte quality in cattle (Kussano et al., 2023).The gene HAPLN1 is one of the cumulus cell expansion-related genes in pigs, and, like the VCAN gene, it is critical for oocyte maturation and production of high-quality embryos (Mito et al., 2013;Park et al., 2018).Association between the expression of VCAN and HAPLN1 genes and oocyte maturation found in literature may explain why these genes were identified in the most common BTA7 regions within families for the number of oocytes and embryos in our work, so these genes can be considered good candidates for improving the production of oocytes and embryos in the Gir breed.Expression of the VCAN and HAPLN1 genes can be explored in the future to confirm their role in oocyte maturation and embryo quality in Gir cattle.Another gene found on BTA7 was XRCC4, which encodes a protein responsible for DNA double-strand break repair.This was verified by XRCC4 transcript levels in porcine aged oocytes (Lin et al., 2021).However, DNA repair as basic function of body cells can also be observed in other types of tissue.Ruis et al. (2020) mention that the absence of XRCC4 in human cells mostly influences DNA repair and recombination.
Another interesting gene found in our work was EDIL3, which encodes an integrin ligand protein and may be involved in regulation of vascular morphogenesis of remodeling in embryonic development, according to GeneCards (2023, https: / / www .genecards.org/).It is interesting to mention that chromosomal locations of the EDIL3/MFGE8 genes revealed a nested syntenous relationship with HAPLN1/HAPLN3 and VCAN/ACAN, genes involved in vertebrate calcification, and EDIL3 overexpression in eggshell tissue occurs at the onset of shell calcification in chicken (Stapane et al., 2019).The relationship mentioned by Stapane et al. (2019) between the genes EDIL3, HAPLN1, and VCAN in chickens, which were also identified in our work, indeed suggests that these genes contribute to embryo development.Willforss et al. (2021), studying Holstein bull fertility through protein markers in seminal plasma, mention a negative correlation of EDIL3 expression with an increase in nonreturn rates to estrus (i.e., the proportion of cows not coming back to estrous after insemination and therefore deemed to be pregnant) and suggested that the EDIL3 gene has a regulatory role in the immune response of the female genital tract.As well as Willforss et al. (2021), we also suggest a further investigation not only for the EDIL3 gene but also for HAPLN1 and VCAN.Exploring the expression of EDIL3, HAPLN1, and VCAN may give a better idea of how these genes regulate the oocyte and embryo production.
Limitations regarding the daughter design are related to the number of phenotyped and genotyped daughters for each sire family.In addition, not only the sires but also the dams have an important contribution, but this methodology becomes more limited regarding the evaluation of dam families, because it is easier to have a sufficient number of daughters per sire than to have enough daughters per dam.In our data, only 1 dam reached the threshold of having at least 20 phenotyped and genotyped daughters that were used to select sires included in our study.For most cattle breeds, it is more common to genotype the sires, because sires can have a greater number of progeny than dams, so sires' genomic information becomes more frequently used in genomic selection.In this study it was possible to obtain sufficient genomic information thanks to the Brazilian Dairy Gir Breeding Program, which has collected genomic data of this breed since 2001 (Panetto et al., 2021).However, for other breeds or even species, application of this methodology might be limited by the availability of genomic data.

CONCLUSIONS
The approach used in this work allowed us to uncover many genomic regions in different chromosomes for the number of total and viable oocytes and for the number of embryos, which is consistent with these traits being polygenic.The results suggest that the largest genomic regions affecting the number of total and viable oocytes may be the same in different families from the Gir breed, which is consistent with earlier observations that these traits are highly genetically correlated.The identified genomic regions located on BTA7, between 82,974,837 and 83,997,563 bp, can be highlighted as the most interesting result, because it is where the most common regions were found among and within families.Several genes (i.e., EDIL3, HAPLN1, and VCAN) associated with oocyte maturation and embryo development-related functions in the literature were identified here as possible candidate genes associated with the numbers of oocytes and embryos.

NOTES
Coordination and Improvement of Higher Level Personnel (CAPES; Brasília, Brazil), the National Council for Scientific and Technological Development (CNPq; Brasília, Brazil), the Ministério da Ciência Tecnologia e Inovação (MCTI; Brasília, Brazil), and the Brazilian National Institute of Science and Technology in Animal Science (INCT-CA; Viçosa, Brazil) provided financial support toward this study.We thank the farms and Brazilian Agriculture Research Corporation (EMBRAPA) Dairy Cattle Research Center (Juiz de Fora, Brazil) for providing data for this study.Because no human or animal subjects were used, this analysis did not require approval by an Institutional Animal Care and Use Com-mittee or Institutional Review Board.The authors have not stated any conflicts of interest.

Figure 1 .
Figure 1.Number of genomic windows per chromosome among and within families associated with number of embryos (EMBR), number of viable oocytes (VO), and total number of oocytes (TO) present in the top 10 with the greatest genetic variance.

Table 1. Families, number and percentage of daughters, and number of SNPs after quality control per family used in the analyses
Rocha et al.: FAMILY ANALYSES IN GIR CATTLE FAMILIES

Table 2 .
Rocha et al.:FAMILY ANALYSES IN GIR CATTLE FAMILIES Descriptive statistics on ovum pick-up (OPU) interval, donors' age, and structures 1 I is the identity matrix; and G is the genomic relationship matrix.The GWAS results are reported as the percentage of genetic variance explained by a sliding genomic window of 100 adjacent SNPs for TO, VO, and EMBR (logarithmic scale).

Table 3 .
Rocha et al.: FAMILY ANALYSES IN GIR CATTLE FAMILIES Common regions on BTA7 for number of embryos (EMBR), number of viable oocytes (VO), and total number of oocytes (TO) within Gir cattle families