Genome-wide association study on milk production and somatic cell score for Thai dairy cattle using weighted single-step approach with random regression test-day model

S Genome-wide association studies are a powerful tool to identify genomic regions and variants associated with phenotypes. However, only limited mutual confirmation from different studies is available. The objectives of this study were to identify genomic regions as well as genes and pathways associated with the first-lactation milk, fat, protein, and total solid yields; fat, protein, and total solid percentage; and somatic cell score (SCS) in a Thai dairy cattle population. Effects of SNPs were estimated by a weighted single-step GWAS, which back-solved the genomic breeding values predicted using single-step genomic BLUP (ssGBLUP) fitting a single-trait random regression test-day model. Genomic regions that explained at least 0.5% of the total genetic variance were selected for further analyses of candidate genes. Despite the small number of genotyped animals, genomic predictions led to an improvement in the accuracy over the traditional BLUP. Genomic predictions using weighted ssGBLUP were slightly better than the ssGBLUP. The genomic regions associated with milk production traits contained 210 candidate genes on 19 chromosomes [ Bos taurus autosome (BTA) 1 to 7, 9, 11 to 16, 20 to 21, 26 to 27 and 29], whereas 21 candidate genes on 3 chromosomes (BTA 11, 16, and 21) were associated with SCS. Many genomic regions explained a small fraction of the genetic variance, indicating polygenic inheritance of the studied traits. Several candidate genes coincided with previous reports for milk production traits in Holstein cattle, especially a large region of genes on BTA14. We identified 141 and 5 novel genes related to milk production and SCS, respectively. These novel genes were also found to be functionally related to heat tolerance (e.g., SLC45A2 , IRAG1 , and LOC101902172 ), longevity (e.g., SYT10 and LOC101903327 ), and fertility (e.g., PAG1 ). These findings may be attributed to indirect selection in our population. Identified biological networks including intracellular cell transportation and protein catabolism implicate milk production, whereas the immunological pathways such as lymphocyte activation are closely related to SCS. Further studies are required to validate our findings before exploiting them in genomic selection.


ABSTRACTS
Genome-wide association studies are a powerful tool to identify genomic regions and variants associated with phenotypes. However, only limited mutual confirmation from different studies is available. The objectives of this study were to identify genomic regions as well as genes and pathways associated with the first-lactation milk, fat, protein, and total solid yields; fat, protein, and total solid percentage; and somatic cell score (SCS) in a Thai dairy cattle population. Effects of SNPs were estimated by a weighted single-step GWAS, which back-solved the genomic breeding values predicted using single-step genomic BLUP (ssGBLUP) fitting a single-trait random regression test-day model. Genomic regions that explained at least 0.5% of the total genetic variance were selected for further analyses of candidate genes. Despite the small number of genotyped animals, genomic predictions led to an improvement in the accuracy over the traditional BLUP. Genomic predictions using weighted ssGBLUP were slightly better than the ssGBLUP. The genomic regions associated with milk production traits contained 210 candidate genes on 19 chromosomes [Bos taurus autosome (BTA) 1 to 7, 9, 11 to 16, 20 to 21, 26 to 27 and 29], whereas 21 candidate genes on 3 chromosomes (BTA 11,16,and 21) were associated with SCS. Many genomic regions explained a small fraction of the genetic variance, indicating polygenic inheritance of the studied traits. Several candidate genes coincided with previous reports for milk production traits in Holstein cattle, especially a large region of genes on BTA14. We identified 141 and 5 novel genes related to milk production and SCS, respectively. These novel genes were also found to be functionally related to heat tolerance (e.g., SLC45A2, IRAG1, and LOC101902172), longevity (e.g., SYT10 and LOC101903327), and fertility (e.g., PAG1). These findings may be attributed to indirect selection in our population. Identified biological networks including intracellular cell transportation and protein catabolism implicate milk production, whereas the immunological pathways such as lymphocyte activation are closely related to SCS. Further studies are required to validate our findings before exploiting them in genomic selection.

INTRODUCTION
Milk production and udder health are economically important traits affecting dairy farming profitability. Improvement of milk production traits will directly bring greater benefits to dairy operations, and improved mastitis resistance will reduce cost of mastitis treatments. These polygenic traits are affected by a variety of factors (e.g., management practices, environmental conditions, animal physiological stage), and controlled by many genes and variants with small effects on the observed phenotype (Snelling et al., 2013). Improvement of management and nutrition along with intense genetic selection can lead to increased milk production and reduce the prevalence of mastitis (Rupp and Boichard, 2003).
In the last decade, the use of genomic information to infer the genomic estimated breeding values (GEBV) of an individual, known as genomic selection (Meuwissen et al., 2001), has become a routine practice in several livestock species facilitated by rapid advances in highthroughput SNP genotyping technologies. Genomewide association studies are becoming practical tools to identify SNP associated with QTL or genes with major effects on phenotypes . The GWAS are advantageous compared with the traditional QTL mapping strategy due to a greater power to detect causal variants with modest effects and ability to define narrower genomic regions harboring causal variants (Hirschhorn and Daly, 2005). Therefore, GWAS have been widely regarded as a primary tool for identifying QTL and genomic regions associated with phenotypes.
The weighted single-step GWAS (WssGWAS; Wang et al., 2012) are a method that allows the estimation of SNP effects using GEBV predicted by single-step genomic BLUP (ssGBLUP, Aguilar et al., 2010) based on all phenotypes, genotypes, and pedigree information. This approach allows complex models such as random regression and multiple traits to be efficiently implemented for genomic prediction of longitudinal traits (Kang et al., (2017). In addition, WssGWAS allow unequal variances for SNPs, which result in improved precision of the estimation of SNP effects . The WssGWAS may perform better than classical GWAS methods when the number of animals with both phenotypes and genotypes is small, and the traits are controlled by QTL with large effects (Marques et al., 2018). The WssGWAS have been implemented in a variety of traits in livestock including broiler chickens (Fragomeni et al., 2014;Wang et al., 2014), dairy cattle Zhou et al., 2019), pigs (Howard et al., 2015;Marques et al., 2018;Wu et al., 2018), and beef cattle Lemos et al., 2016;Valente et al., 2016;Silva et al., 2017;de Melo et al., 2017;Silva et al., 2019).
In Thailand, crossbred cattle constitute the majority of the total dairy cattle population. In 2018, the crossbred population reached more than 283,000 and was mainly raised by smallholders (Buaban et al., 2020). The majority of crossbred cows had at least 75% genetic fraction of Holstein Friesian. Other breed fractions were mainly from Bos indicus (Sahiwal, Red Sindhi, Brahman, and Thai native), which are robust to the tropical environment (hot and humid) in Thailand (Buaban, 2005). A single-trait random regression test-day model (RR-TDM) is currently used for the official genetic evaluation for milk production and milk component traits in dairy cattle because it can improve the accuracy of genetic evaluation and provide better modeling (Buaban et al., 2020;Department of Livestock Development, 2020). In addition, RR-TDM maximizes the amount of information gathered from each animal. Considering the advantages of RR-TDM and ssGBLUP in genetic prediction, it is expected that their combination should result in more accurate predictions, less bias, (Koivula et al., 2015;Baba et al., 2017;Kang et al., 2018;Oliveira et al., 2019b,d) and a better performance of GWAS (Oliveira et al., 2019a,c;Freitas et al., 2020).
Information about regions of the genome involved in milk production traits and SCS is currently lacking in the Thai dairy cattle population. Mastitis and high SCC can lead to changes in milk composition as a result of a decrease in synthesis due to epithelial cell damage (Malek dos Reis et al., 2013). This finding is directly related to mastitis, which is a major problem in the Thai dairy population (Jarassaeng et al., 2012;Hinthong et al., 2017;Horpiencharoen et al., 2019;Pumipuntu et al., 2019). The objectives of this study were to (1) identify genomic regions associated with milk production traits (milk yield and milk components) and SCS in Thai dairy cattle using WssGWAS, and (2) identify genes and pathways influencing the studied traits.

Phenotypic, Pedigree, and Genomic Data
Monthly test-day records of the first-lactation cows calving between November 1993 and March 2017 were extracted from the centralized database of the Bureau of Biotechnology in Livestock Production, Department of Livestock Development of Thailand (Pathum Thani, Thailand). After the extraction, data were edited to retain only cows with known sire, age at first calving between 18 and 48 mo, DIM between 5 and 305 d, the first test date between 5 and 60 d, and daily milk yield between 1 and 45 kg. In addition, a minimum of 5 records were defined for each herd test-date contemporary group (Table 1). Each test-day record comprised at most 8 traits: yields for milk (MY, kg), fat (FY, kg), protein (PY, kg), and total solids (TSY, kg); percentages for fat (FP), protein (PP), and total solids (TSP); and SCC (1,000 cells/mL). Included in TSP are all constituents such as protein, fat, lactose, mineral materials, trace elements, and vitamins of the milk, except water. As the SCC has a highly skewed distribution, log-transformation was applied as log 2 (SCC/100) + 3 (Ali and Shook, 1980). Because some traits were not available on a test day, we classified the traits into 3 groups by missing pattern: MY, milk component traits (FY, PY, TSY, FP, PP, TSP), and SCS. We split the full data into 3 files by trait groups. In each data set, the pedigree was traced back to 3 generations from the recorded cows.
A total of 892 crossbred Thai dairy animals (154 bulls and 738 cows) were genotyped using the Illumina BovineSNP50 BeadChip (Illumina Inc.) version 2 (containing 54,609 SNPs) and version 3 (containing 53,218 SNPs), and only 50,908 common SNPs were considered. Quality control (QC) retained SNPs with call rates >0.9, minor allele frequency >0.05, and departure from Hardy-Weinberg equilibrium <0.15. The SNPs with unknown position or located on sex chromosomes were not considered in the analyses. Animals with call rates <0.9 were discarded. Parent-progeny pairs were tested for conflicts. After the QC process, 876, 868, and 632 genotyped animals with 41,977, 41,975, and 41,930 informative SNPs for milk yield, milk component traits, and SCS, respectively, were available for analysis. The average distance between consecutive markers was 59.39 kb (SD = 3.02 kb). The PREGSF90 software was used for SNPs and sample QC . A summary of data sets used in this study can be seen in Table 1.

Prediction of GEBV
First, (co)variance components for each trait were estimated based on the integration of the RR-TDM into single-step GBLUP procedure (SS-RR-TDM) using the same RR-TDM currently used in the Thai official genetic evaluations (Department of Livestock Development, 2020). The AIREMLF90 and BLUPF90 programs  were used to estimate the (co)variance components and solve the mixed model equations. In matrix notation, the model equation for the SS-RR-TDM was where y is the vector of test-day milk records, b 1 is the vector of systematic effects consisting of AI unityear-season of calving effect (for MY and component traits), breed-age at calving, or year-season of calving effect (for SCS), b 2 is the vector of fixed regression coefficients on Legendre polynomials (LP) nested within breed-age at calving, h is the vector of random herdyear month of testing effect, a and p are the vectors of random regression coefficients on LP nested within the additive genetic effect and permanent environmental effect, respectively, and e is the vector of the residuals.
The matrices X 1 , X 2 , V, Z, and W are the correspond incidence matrices. The order of LP was calculated based on DIM as defined by Gengler et al. (1999). The LP of third (constant, linear, quadratic, and cubic) and second orders (constant, linear, and quadratic) were used for MY and content traits, and for SCS, respectively. The same order of LP was used for both fixed and random regression effects. The residual variable was assumed to be homogeneous throughout lactation to decrease the model complexity. Random effects were assumed to be normally distributed with zero means, and the covariance structure for models was defined as where G 0 and P 0 are 4×4 (for milk yield and component traits) or 3×3 (for SCS) (co)variance matrices of the random regression coefficients for the animal and permanent environment effects respectively, is a modified genetic relationship matrix that combines pedigreebased (A) and genomic-based (G) relationship matrices as shown by Legarra et al. (2009), I is an identity matrix, ⊗ is the Kronecker product operator, σ htm 2 is the variance of the random herd-year month of testing effects, and σ e 2 is the residual variance. The inverse of H needed for mixed model equations is given by where A is the numerator relationship matrix based on pedigree for all animals; A 22 is the numerator relationship matrix for genotyped animals; and α, β, ω, and τ are weighting factors. To avoid singularity problems, α and β were selected to be 0.95 and 0.05, respectively (VanRaden, 2008). The weights τ and ω (τ = 1.00, ω = 0.50) were chosen such that they provided the least bias in the preliminary validation study. The value of ω was lower than the value (0.7) usually used in beef and dairy cattle ssGBLUP evaluations (Lourenco et al., 2020) and this could be in part due to some degrees of pedigree incompleteness in the studied population. As a result of the potential incomplete pedigree, inbreeding was ignored in A matrix although it could have some effects on the results. The G matrix was created according to the second method of VanRaden (2008) as follows: where Z is a matrix of gene content adjusted for allele frequencies; D is a diagonal matrix of weights for SNP variances (initially D = I); m is the number of SNP markers; and p i is the frequency of the reference allele of the ith SNP. The individual genomic breeding value was presented as a 305-d production basis. The 305-d genomic breeding value was defined as the sum of the breeding values between DIM 5 to 305 for yield traits as described by Jamrozik et al. (1997). For milk component traits and SCS, the sum was replaced with the average of breeding values in a lactation.

Weighted Single-Step Genome-Wide Association Study
The analyses were performed using the WssGWAS methodology. The SNP effects (u) were computed using an iterative process similar to the one described by Wang et al. (2014) as implemented in postGSf90 software . The same linear models described for variance component estimation were applied in the WssGBLUP. The additive genomic breeding value (a g : the GEBV for 305-d basis) was backsolved to SNP effects accounting for their shared genomic variance σ u 2 ( ) , using the following formula: where u is the vector of SNP effects, a g is the vector of GEBV for a 305-d basis, and all other terms are as defined earlier. Using the breeding value for 305-d basis, thus all predictions of random regression coefficients associated with LP for each individual were already condensed in the SNP effects.
In the first round of the iterative process, the variance absorbed by each marker was obtained as , where u i is the effect of the ith SNP estimated in the previous iteration. New marker effects were obtained by considering the weighted G matrix in Equation [5]. The detailed description of the iterative reweighting procedure can be seen in Scenario 1 in Wang et al. (2012).
In the first implementation of WssGBLUP,  suggested using a linear weight following the formula for genetic variance due to an additive locus (Falconer and Mackay, 1996) However, Lourenco et al. (2017) showed that this method did not reach convergence under a more polygenic scenario because of extreme weights. Therefore, nonlinear A weights were used in this study as described by VanRaden (2008): where CT is a constant that determines the departure from normality; û j is the absolute estimated SNP effect for marker j, and sd û ( ) is the standard deviation of the estimated SNP effects. Nonlinear A weights had good convergence properties and avoided extreme values (Garcia et al., 2018). This is because the maximum change in weights is limited by the minimum between 5 and the exponent of CT. In our study, CT was 1.125 following Legarra et al. (2018) and VanRaden (2008). Detection of informative regions or loci based on single SNPs may result in noise or underestimation due to the high ratio between the number of SNPs and the number of genotyped animals . Moreover, adjacent SNPs may be in high LD with the same QTL in high-density SNP panels leading to the QTL effect spreading over all SNPs in high LD (Fan et al., 2011). For this reason, the proportion of genetic variance was calculated for each nonoverlapping 1-Mb windows to account for LD, which is more appropriate than using single SNP.
Performance of genetic and genomic predictions for young bulls was evaluated by comparing individual theoretical accuracy of 305-d GEBV (or EBV). Thus, pedigree-based RR-TDM was also used to obtain EBV in addition to SS-RR-TDM. Genomic predictions with WssGBLUP were also used to compare genomic evaluation methods. The young bulls were progeny-tested bulls in the last 6 yr from the data set with more than 75% reliability of EBV. The theoretical accuracy of 305d GEBV (or EBV) was calculated based on the prediction error variance (Misztal and Wiggans, 1988). The prediction error variance was presented using the 305-d GEBV (or EBV) basis (e.g., Mrode, 2014). The number of iterations for weighting procedure in WssGWAS was determined as the iteration number that maximized the accuracy of predictions. The proportion of genetic variance was then calculated using a nonoverlapping 20-SNP moving windows (approximately 1 Mb).
The percentage of genetic variance explained by the ith region was calculated as follows: where a i is the genetic value of the ith region that consists of 20 adjacent SNPs, σ a 2 is the total additive genetic variance, z j is the vector of gene content of the jth SNP for all individuals, and û j is the marker effect of the jth SNP within the ith region .
To identify important genomic regions related to the analyzed traits, moving windows of 20 adjacent SNPs explaining at least 0.5% of the total genetic variance were selected for further analyses. This threshold has been previously investigated in cattle for complex traits (Fragomeni et al., 2014;Oliveira et al., 2017;Lee et al., 2019;Zhou et al., 2019) and in the expected contribution of SNP windows (Sollero et al., 2017). Manhattan plots of these windows were generated using gnuplot 5.2 (Williams and Kelley, 2019).

Candidate Genes and Functional Analyses
To identify possible candidate genes associated with milk production traits and SCS, genes that were located within the most important nonoverlapping genomic windows (i.e., between the start and end of genomic coordinates of the selected windows) were further investigated. We identified genes using the National Center for Biotechnology Information (NCBI) Map Viewer tool for the bovine genome with the UMD 3.1 assembly as the reference map (https: / / ftp .ncbi .nlm .nih .gov/ genomes/ all/ GCF/ 000/ 003/ 055/ GCF _000003055 .6 _Bos _taurus _UMD _3 .1 .1/ ).
For all identified genes, literature and database searches (NCBI: https: / / www .ncbi .nlm .nih .gov/ ; Gen-eCards: https: / / www .genecards .org/ ; and UniProt: https: / / www .uniprot .org/ ) were performed to investigate the metabolic function of the genes identified. A list of genes located in proximity to the windows was used for performing a gene network analysis using the online resource GeneMania (Warde-Farley et al., 2010). Moreover, gene ontology analysis was performed using the Database for Annotation, Visualization and Integrated Discovery (Huang et al., 2009a,b).

Variance Component and Heritability Estimations
Descriptive statistics for test-day milk production traits and SCS in the data set used for genetic evaluation and 305-d variance components and heritability estimates h 305d 2 ( ) for additive, herd test month, permanent environmental and residual variances are presented in Table 2  traits, WssGBLUP provided the greatest accuracies, followed by ssGBLUP and BLUP. For WssGBLUP, the accuracies were maximized by the second iteration for all traits, then remained constant from the third to fifth iterations (results not shown). Thus, 2 iterations of weighting were used in WssGWAS. The gain in accuracy over BLUP for young bulls was 0.015 and 0.017 for ssGBLUP and WssGBLUP, respectively. The increase in accuracy in genomic predictions over EBV is consistent with previous studies in dairy cattle (VanRaden et al., 2009;Ding et al., 2013). Notably, small improvements of accuracies obtained using WssGBLUP compared with ssGBLUP (accuracy gain of 0.002) could be the fact that the studied traits are (highly) polygenic where small or no gain in accuracy are expected with SNP weighting (Hayes et al., 2009;Lourenco et al., 2017). In this study, the optimal weight with 2 iterations is sufficient to maximize genomic accuracy which is in accordance with those reported in previous studies Zhang et al., 2016). In addition, this iteration number was consistent with other literatures, providing the greatest predictive ability and least inflation (Legarra et al., 2008;Garcia et al., 2018), as well as high stability of the marker effect estimates . In spite of the limited number of genotype animals, fitting a ssGBLUP increases the accuracy of breeding values in Thai dairy cattle, eventually improving overall genetic gain. The resolution of Manhattan plots was slightly improved when SNP weights were added. The second iteration of weights SNP weighting resulted in the least noisy percentage of additive variance explained by windows of 20 adjacent SNPs ( Figure 1). Lourenco et al. (2017) showed that WssGBLUP reduced noise which resulted in an increased precision of estimating QTL effects and positions in simulation study. Small improvement of the resolution observed in this study could be attributed to a polygenic nature of the studied traits. In addition, a sample size can critically affect statistical power in GWAS study (Hong and Park, 2012). The small number of genotyped animals used in this study may lead to a low power to detect SNP related to QTL which resulted in a large number of SNPs with small effects across all traits analyzed (all SNPs explained less than 1% of the genetic variance of a trait).

Comparisons Between Genetic and Genomic Prediction Methods
Although the small number of genotyped animals could limit the improved accuracy of genomic prediction (Kosgey and Okeyo, 2007;Mrode et al., 2019), in the present study a positive effect on the accuracy of young bulls was observed. This could be related to the ongoing breeding program (e.g., the existence of a progeny testing scheme and extensive mating within a population), which enhanced relationships between reference population and selection candidates (Habier et al., 2010;Clark et al., 2012;Mrode et al., 2019). Some degree of improvement in accuracy of genomic prediction with the small number of genotyped animals were previously reported in developing countries (Neves et al., 2014;Silva et al., 2016;Boison et al., 2017). Nevertheless, more data on genotypes and records likely facilitate the improvement of genomic prediction accuracy and the reliability (and resolution) of GWAS results for the Thai dairy cattle population. Therefore, it is necessary to enlarge the reference database with phenotypes and genotypes in both cows and bulls.
In this study, the models were assumed to have constant residual variance along the lactation to decrease model complexity. Different assumptions of residual variance (i.e., homogeneity and heterogeneity) have been previously used in ssGWAS procedures with random regression models (Howard et al., 2015;Oliveira et al., 2019b). Howard et al. (2015) modeled homogeneous residual variance due to its simplicity and similar results being observed with heterogeneity assumption in    their preliminary study. It might be that if the accuracy of models was affected by the assumption of residual variance, the accuracy of estimated SNP effects could also be affected, impacting GWAS results. Further investigations might be necessary to compare model accuracies and GWAS results using different assumptions of residual variance.

Identification of Genomic Region and Candidate Genes
In the present study, a WssGBLUP method was used to investigate genomic regions and identified candidate genes responsible for milk yield, milk components, and SCS using medium density SNP panels. The Manhattan plots of percentage additive genetic variance explained by 20-SNP moving windows are reported in Figure 2. The largest genetic variance explained by a window was 0.7% for milk production traits and 0.6% for SCS, respectively. However, most windows explained less than 0.5%, and these low-contributing regions spread across the genome for all traits analyzed.
This indicates that both milk production traits and SCS are moderately to highly polygenic, in which many regions across the genome contribute to the genetic variation of the traits (Figure 2). A summary of the 36 informative windows that explained the largest proportion of additive genetic variance (at least 0.5%) and the important candidate genes for all traits studied are presented in Table 4. The regions associated with milk production traits were located on chromosomes BTA 1,2,3,4,5,6,7,9,11,12,13,14,15,16,20,21,26,27,and 29, whereas regions associated with SCS were found on chromosomes BTA 11, 16, and 21. We compared the results with databases (NCBI, Genecards, and Uni-Prot) and found 210 and 21 reported QTL related to milk production traits and SCS, respectively. In addition, we found several genes which were not previously reported for the studied traits, but they were related to the analyzed traits. For example, TMEM181, a fatrelated QTL in marbling score (Mokry et al., 2013), was identified as novel candidate genes for fat yield in the present study. Relatively concentrated areas related to milk production traits and SCS were noted on chromosomes BTA 1, 5, 7, 12, 14, 16, 20, and 21.

Candidate Genes for MY
Genomic regions associated with 305-d MY were located on chromosomes 1 (from 103,434,094 to 104,344,681 bp), 29 (from 37,949,466 to 40,507,958 bp), and 3 (from 49,431,468 to 50,835,532 bp). Within these windows, there were 42 known genes and 50 uncharacterized genes ( Figure 2A and Table 4).
Two genes were previously reported to directly regulate milk yield (Benyamin et al., 2014;Zabolewicz et al., 2014;Do et al., 2017b;Yodklaew et al., 2017). The gene DR1 plays a role in lactoferrin expression in milk, which is related to calcium metabolism, skeletal homeostasis, and immune system (Zabolewicz et al., 2014) and influences mRNA network regulation of milk yield (Do et al., 2017b). LOC100847667 has been reported as a marker involving peak yield in Thai multibreed dairy cattle population (Yodklaew et al., 2017). This agreement could be due to the similar genetic background of the dairy cattle breeds in Thailand where Holstein and Bos indicus cattle are commonly crossed.
Other genes were demonstrated to have mechanisms indirectly related to milk yield (Ibeagha-Awemu et al., Yang et al., 2016). The gene VPS37C, a subunit of the protein complex required for sorting ubiquitinated transmembrane proteins into internal vesicles in cell transportation, is related to the milk fatty acid and butyric acid (Ibeagha-Awemu et al., 2016). The gene TMED5 is a protein in a secretory pathway that is required for the maintenance of the Golgi apparatus and vesicular protein trafficking. This protein can be found in a milk fat globule membrane of colostrum and milk (Yang et al., 2016). Moreover, several genes in the significant windows were reported to regulate mammary physiology (Farke et al., 2008;Oh et al., 2013;Benyamin et al., 2014;Chen et al., 2015;Han et al., 2018). A member of ABC transporters, ABCA4, is expressed in the bovine mammary gland and may be implicated in lipid and cholesterol transport into milk (Farke et al., 2008). The gene GCLM acts as an antioxidant in mammary tissues (Han et al., 2018), whereas DNTTIP2 is a candidate marker to identify mastitis in cows (Chen et al., 2015). Both BCAR3 and FNBP1L are responsible for cell proliferation, migration, and cytoskeleton reorganization (Oh et al., 2013;Benyamin et al., 2014).

Candidate Genes for FY
For 305-d FY, there were 5 chromosomes that contain associated genomic regions including chromosomes 26 (from 37,336,603 to 38,233,337 bp), 13 (from 46,150,079 to 46,841,024 bp), 1 (from 103,434,094 to 104,344,681 bp), 20 (from 39,625,124 to 40,437,346 bp), and 9 (from 96,298,532 to 97,397,264 bp). Within these regions, 32 known genes and 24 uncharacterized genes were found to be associated with 305-d fat yield ( Figure 2B and Table 4).
Additionally, other genes were demonstrated to have mechanisms related to fat yield. The gene RXFP3 can inhibit oxytocin and vasopressin release (Kania et al., 2017), which affects milk flow and milk fat concentration (Högberg et al., 2014). The gene AMACR is an enzyme involved in fatty acids' metabolism (Wright et al., 2011); FNDC1 downregulates osteogenic differentiation pathway of the adipose stem cells (Zhao et al., 2018). The G protein-coupled receptor, TMEM181, is a fat-related QTL in marbling score (Mokry et al., 2013). Additionally, KCNK18 regulates many cellular processes, such as muscle contraction, hormonal regulation, osmoregulation, maintenance of the action potential and ion flow, and this gene has been previously reported to be associated with MY and PY (Andres-Enguix et al., 2012;Maher et al., 2013;Cai et al., 2020).
Furthermore, HSPA12A is a member of the heat shock protein family, other members of which, such as Hsp90, are related to thermotolerance and milk yield (Sailo et al., 2015). Both TRNAT-UGU and LOC104975279 were reported as candidates for milk yield in buffalo (da Costa Barros et al., 2018). The gene PNLIPRP2 was reported as a milk protein candidate gene in Holstein cattle (Li et al., 2010a). Both EMX2 and VAX1 are candidate genes for milk casein (Fomichev et al., 2012); SLC18A2 is an ATP-dependent transporter of monoamines and it is also a candidate gene for SCS in Holstein (Kolbehdari et al., 2009). The gene C1QTNF3 is associated with SCS by playing a role in metabolism, inflammation, and cell survival Li et al., 2017). A transporter protein in melanin synthesis, SLC45A2, is a determinant of the skin color (Branicki et al., 2008;Wang et al., 2016;Le et al., 2020). It was reported that skin color is correlated with heat tolerance in sheep (McManus et al., 2011).

Candidate Genes for PY
Genomic regions with a high impact on 305-d PY were located in chromosomes 4 (from 61,234,542 to 62,019,561 bp) and 1 (from 103,434,094 to 104,344,681 bp). A total of 6 known genes and 4 uncharacterized genes associated with protein yield were identified in these regions ( Figure 2C and Table 4). From these regions, genomic regions located in chromosome 1 (from 103,434,094 to 104,344,681 bp) were found in both PY and FY. The region contains 2 genes of unknown function, LOC104970979 and LOC781603. In addition, these 2 uncharacterized genes were identified for MY and TSY, and needed further investigation. However, a few genes have been reported to have mechanism indirectly related to protein yield, and HERPUD2 plays a role in metabolic stress (Rabhi et al., 2016). Metabolic stress occurs in lactating cows when homeostasis is disrupted, causing significant production losses (Putman et al., 2018). The actin-binding protein ANLN hat functions in cell growth and migration and can influence levels of β-CN and α-LA (a major protein component) that are produced by the epithelial cells of the mammary gland (Rezaei et al., 2016;Layman et al., 2018). Finally, EEPD1 was identified, and its function is to maintain genomic stability, which is essential to an animal's health and survival (Wu et al., 2015;Feng and Riddle, 2020).

Candidate Genes for TSY
Genomic regions with a high impact on 305-d TSY were located on 2 chromosomes including chromosomes 1 (from 103,434,094 to 104,344,681 bp) and 15 (from 42,657,355 to 43,538,866 bp). A total of 9 known genes and 5 uncharacterized genes associated with TSY were identified in these windows ( Figure 2D and Table 4). However, only a few genes have previously been reported to have mechanism indirectly related to total solid. The gene IRAG1 is found in the membrane of the endoplasmic reticulum, functioning in intracellular calcium modulation, body growth, and adaptation to the arid environment (Widmann et al., 2013;Oberheim Bush and Nedergaard, 2017;Goshu et al., 2018). The gene ADM is found to be involved in several pathways including vasodilation, hormone regulation, angiogenesis, and antimicrobial activity (Schönauer et al., 2017).
Finally, SBF2, a member of the myotubularin-related protein family, is a candidate for fertility traits (Wu et al., 2014).

Candidate Genes for PP
Four chromosomes were found to carry genomic regions associated with PP, including chromosomes 27 (from 10,697,257 to 11,822,624 bp), 7 (from 17,871,862 to 19,220,954 bp), 20 (from 31,933,394 to 33,433,160 bp) and 14 (from 1,463,676 to 2,553,525 bp). A total of 116 known genes and 29 uncharacterized genes associated with PP were identified ( Figure 2F and Table 4).
Five genes were previously reported to directly regulate milk fat. Candidate gene LOC535121 is responsible for fat and protein percentages in Holstein cow (Cole et al., 2011). The gene MCAT is related to fatty acid synthesis in the lactating period , whereas TRNAS-GGA and TRNAC-GCA7-1 are known to be involved in FY, MY, and PY in buffalo milk (Venturini et al., 2014). In addition, a component of intracellular transportation, ARFGAP3, has been reported as a candidate gene for milk production traits in Danish Jersey (Kartberg et al., 2010;Mai et al., 2010).

Candidate Genes for TSP
Genomic regions that were identified to be associated with TSP were located in chromosomes 6 (from 37,868,743 to 38,892,842 bp), 7 (from 5,774,810 to 6,413,319 bp), 5 (from 113,468,743 to 114,594,935 bp, 21 from 8,031,396 to 8,955,497 bp), 14 (from 7,718,808 to 8,454,690 bp) and 12 (from 40,009,119 to 42,606,004 bp). A total of 61 known and 25 uncharacterized genes associated with TSP were found in these regions (Figure 2G and Table 4).
Several candidate genes for TSP coincide with previous literature such as PPM1K, ARFGAP3, LOC535121, MCAT, and ABCG2-LAP3 region. The gene PPM1K, which regulates mitochondrial permeability, has been previously detected in a region of associated SNPs for lactation persistency and feed efficiency in beef cattle (Do et al., 2017a;Abo-Ismail et al., 2018). ARFGAP3 is a candidate gene for milk production traits in Danish Jersey (Mai et al., 2010). Uncharacterized LOC535121 was identified to be a candidate locus for fat and protein percentages in Holstein breed (Cole et al., 2011). The gene MCAT encodes capable enzyme for fatty acid synthesis during lactation . Previous significant QTL on BTA 6 containing 6 genes (ABCG2, PKD2, SPP1, MEPE, IBSP, and LAP3) related to milk production traits (Cohen-Zinder et al., 2005;Olsen et al., 2005Olsen et al., , 2007 were also identified in this study. In this region, ABCG2 could be a causative gene affecting milk yield and milk composition in Holstein and Norwegian Red cattle breeds (Cohen-Zinder et al., 2005;Olsen et al., 2007;Alim et al., 2013). The gene SPP1 plays a crucial role in mammary gland development, lactation, and the synthesis of β-CN and whey protein (Nemir et al., 2000), whereas LAP3 is a regulator of hormone, protein maturation, protein inactivation and protein degradation (Zheng et al., 2011). Genes PKD2, CHERP, and TMEM38A are linked to calcium homeostasis and osmotic status in blood and milk (Olsen et al., 2007;Yazawa et al., 2007;Lin-Moshier et al., 2013). The gene TRNAC-ACA is a candidate for protein percentage (de Camargo et al., 2015), whereas NCAPG was found to be related to milk production (Weikard et al., 2012). Additionally, TRNAG-CCC6-1, which functions in decoding mRNA into protein, is a candidate gene for milk lactose in Fleckvieh cattle and for SCS in dairy buffaloes (de Camargo et al., 2015;Costa et al., 2019), and TRNAC-GCA7-1 is a candidate for milk production and composition (Venturini et al., 2014).

Candidate Genes for SCS
Genomic regions associated with SCS were located in 3 chromosomes, 16 (from 26,671,488 to 27,692,042 bp), 11 (from 45,736,640 to 46,794,995 bp), and 21 (from 68,751,694 to 70,050,042 bp) where 52 known and 19 uncharacterized genes were found to be associated with the trait ( Figure 2H and Table 4).
Milk somatic cells consist of milk-secreting cells and immune cells (Alhussien and Dang, 2018). Importantly, genes known to be related to immunity, inflammation, or cell proliferation were identified. The gene MIA3 regulates extracellular matrix composition that is expressed in mammary tissue after bacterial infection (Chen et al., 2015;Bauersachs et al., 2017). A member of cytokines in an interleukin superfamily (IL1A, IL1B, IL1F10, IL36A, IL36B, IL36G, and IL37) and 2 IL1 receptors (IL36RN and IL1RN) were identified on BTA 11. They play prominent roles in modulating innate and adaptive immune responses and in the process of inflammation (Benveniste, 2014;Queen et al., 2019). In addition, RCOR1 is implicated in B cell differentiation and inflammatory responses (Yao et al., 2015;Xiong et al., 2020). Also identified was TRAF3, plays crucial roles in mechanisms of immunity in bovine mammary epithelial cells (Song et al., 2017). Genes CDC42BPB, EXOC3L4 and TNFAIP2 are implicated in intracellular transportation of vesicles, and have been previously detected in a candidate region for innate immunity in Canadian Holstein cows (Porat-Shliom et al., 2013;de Klerk et al., 2018). The gene PPP1R13B is an activator of p53, a regulator in immunity and apoptotic pathway (Mitchell et al., 2002;Liu et al., 2005), whereas TAF1A regulates cell proliferation and is implicated in breast cancer (Bergstralh et al., 2007;Rossetti et al., 2016). Other genes were also identified to be linked to SCS in this population; however, they have been also found to be related to milk yield and composition in other studies. These genes are TRNAT-UGU, TRNAC-GCA7-1, HHIPL2 and NCK2 (Venturini et al., 2014;da Costa Barros et al., 2018;Johnston et al., 2018;Oliveira et al., 2018).
In this study, 210 and 21 candidate genes were identified as influence milk production and SCS, respectively (Table 4). These candidate genes largely describe polygenic inheritance of the traits. Several candidate genes were coincided with other previous reports for milk production in Holstein cattle, especially a large region of genes on BTA14 (Atashi et al., 2020;Cai et al., 2020;Silva et al., 2020;Wang et al., 2020). This could be attributed to the predomination of Holstein blood in Thai dairy cattle population by upgrading the local population with Holstein and later, mating within population to keep the proportion of Holstein blood between 87.5 and 93.75% as being considered the most suitable for tropical condition in Thailand (Department of Livestock Development, 2020). On the other hand, some identified genes are different from other breeds; for instance, LRRC24 was a candidate gene for milk composition in our population but Jersey nor Ayrshire (Oliveira et al., 2019a). Moreover, we also identified 141 novel genes for milk production and compositions and 5 prospective genes for SCS (Table 4). Most of these genes have mechanism indirectly related to SCS, and they have not been shown to influence the trait in previous studies.
The genes previously reported in literature to be involved in fertility, longevity, and environmental adaptation such as heat resistance were included in several traits. In Thailand's context, such genes could likely influence milk production and composition because low milk production and fertility, as well as mastitis problems, are the main causes of involuntary culling. In addition, dairy cattle population in Thailand was naturally selected by high temperature and humidity in a tropical environment. That environmental condition is not optimal for Holstein populations compared with that of other countries (Boonkum et al., 2011;Collier et al., 2011). Some genomic regions were associated with 2 or more traits. These regions were located on chromosomes 7 (from 5,774,810 to 6,413,319 bp), 5 (from 113,468,743 to 114,594,935 bp), 14 (from 7,718,808 to 8,454,690 bp) and 12 (from 40,009,119 to 42,606,004 bp) were identified to be associated with both TSP and FP. In addition, 2 uncharacterized genes, LOC104970979 and LOC781603, were identified for MY, FY, PY, and TSY. The results suggest these genes may be biological or spurious pleiotropic effects (Solovieff et al., 2013). Further investigation is needed to be conducted.
For SCS, a large member of interleukin superfamily (IL1A, IL1B, IL1F10, IL36A, IL36B, IL36G, and IL37) as well as other genes related to immunity significantly influencing SCS were identified. One of the limitations of this study is that the small number of genotyped animals used in this study could result in inaccurate estimation of SNP effects using ssGBLUP because the inverse of G may be noninformative. Furthermore, we annotated those SNPs on Bovine UMD 3.1, which was sequenced from a beef breed (Hereford). It should be also noted that UMD 3.1 provided genomic difference from Holstein cow genome such as variations in insertion/deletion accounting for 48,537,190 bp of the entire genome (Kõks et al., 2013). This suggests critical concerns for annotation and linked gene identification (Eilbeck et al., 2009;Florea et al., 2011;Wu et al., 2013). Annotation of these SNPs on their own genome map should be more appropriate.

Network and Pathway Analysis
Milk Production and Composition. Gene network analysis of the gene list in milk production demonstrated using GeneMania revealed the dense coexpression network (74.02%; Figure 3). The network included 170 genes with 1,137 interactions. Among those genes, PKD2, TONSL, TNFSF14, WWTR1, TNFSF14, CHERP, PRAP10 and GPIHBP1 are involved in several regulations of intracellular cell transportations. These findings are consistent with the results from functional analyses in gene ontology (GO, Table 5). The pathways involved in cellular process and cell transportation were identified, such as protein ubiquitination (GO: 0016567), endoplasmic reticulum membrane (GO: 0005789), actin cytoskeleton organization (GO: 0030036), positive regulation of receptor recycling (GO: 0001921), cell-cell adherens junction (GO: 0005913), and extracellular exosome (GO: 0070062). In addition, some pathways were found to be related to calcium release, such as the release of sequestered calcium ion into cytosol (GO: 0051209). These cellular mechanisms may influence the transportation of milk constituents via mammary secretory cells. The GO term analysis also identified osmoregulation (GO: 0006970), which influences fluid quantity in the body including milk. Furthermore, pathways related to protein, a major component in milk, were also identified, such as protein catabolic process (GO: 0030163), aspartic-type endopeptidase activity (GO: 0004190), and proteolysis (GO: 0006508). Other pathways related to oxygen-containing radicals (reactive oxygen species) were also identified, such as age-dependent response to reactive oxygen species (GO: 0001315), superoxide metabolic process (GO: 0006801), superoxide dismutase activity (GO: 0004784). Indeed, the uncontrolled free radicals can lead to the breakdown of vital biochemical compounds, such as lipids and protein, and they can affect milk composition . Immunoregulation and inflammation were also identified in GO of milk production and composition, such as complement activation (GO: 0006958), immune response (GO: 0006955), positive regulation of activation of membrane attack complex (GO: 0001970), endocytosis (GO: 0006897), and blood microparticle (GO: 0072562). Results from this study conformed to gene expression patterns in human mammary epithelial cells and breast cancers in GeneMania (Perou et al., 1999;Bild et al., 2006;Warde-Farley et al., 2010).
Based on our findings, in spite of a small number of genotyped animals, genomic prediction seems to be useful for improving genetic gain in the Thai dairy cattle breeding program. The accuracies of predictions using WssGBLUP were greater than ssGBLUP and pedigree-based BLUP for all studied traits. Results from a weighted ssGWAS revealed many genomic regions and candidate genes associated with 7 milk production traits and SCS. However, further investigation is needed for some genomic regions with known or uncharacterized genes (e.g., LOC104970979 and LOC781603). Furthermore, future studies should also consider the usefulness of the random regression model in estimating genomic merits and SNP effects at particular period within a lactation, which will help to elucidate the associated genomic regions and genes specific to time for milk-related traits. In addition, the heterogeneity assumption of residual variance should be considered to better account for measurement errors at different stages of lactation.

CONCLUSIONS
This study performed a weighted GWAS using singlestep procedure and found many genomic regions and  candidate genes associated with 7 milk production traits and SCS, indicating a polygenic nature of the studied traits. Functional analysis revealed genes potentially influencing milk production and composition, including cellular transportations, protein catabolism, and osmoregulation. Moreover, the identified genes re-lated to immunological pathway strongly support their effects on SCS. Novel candidate genes implicated in milk production traits and SCS were also identified. A group of candidate genes for milk production traits were found to be overlapped with other traits (SCS, fertility, longevity, and heat tolerance), likely being attributed to involuntary selection in the population (e.g., culling of cows with fertility and mastitis problems). Knowledge about the genetic architecture of the tropical dairy cattle population is important and will provide valuable information for future studies. Further research with more information on animals, records, and genotypes is required to validate our findings.

ACKNOWLEDGMENTS
This study was a part of the Genomic Evaluation for Genetic Improvement of Tropical Holstein Dairy Cattle in Thailand project, originally established by the Bureau of Biotechnology in Livestock Production (BBLP), the Department of Livestock Development, and Khon Kaen University, with support from the Agricultural Research Development Agency. The authors also acknowledge BBLP for providing the genotype and test-day data. The authors have not stated any conflicts of interest.