Genetic parameters and genome-wide association studies for mozzarella and milk production traits, lactation length, and lactation persistency in Murrah buffaloes

Genetic and genomic analyses of longitudinal traits related to milk production efficiency are paramount for optimizing water buffaloes breeding schemes. Therefore, this study aimed to (1) compare single-trait random regression models under a single-step genomic BLUP setting based on alternative covariance functions (i.e., Wood, Wilmink, and Ali and Schaeffer) to describe milk (MY), fat (FY), protein (PY), and mozzarella (MZY) yields, fat-to-protein ratio (FPR), somatic cell score (SCS), lactation length (LL), and lactation persis-tency (LP) in Murrah dairy buffaloes ( Bubalus bubalis ); (2) combine the best functions for each trait under a multiple-trait framework; (3) estimate time-dependent SNP effects for all the studied longitudinal traits; and (4) identify the most likely candidate genes associated with the traits. A total of 323,140 test-day records from the first lactation of 4,588 Murrah buffaloes were made available for the study. The model included the average curve of the population nested within herd-year-season of calving, systematic effects of number of milkings per day, and age at first calving as linear and quadratic co-variates, and additive genetic, permanent environment, and residual as random effects. The Wood model had the best goodness of fit based on the deviance information criterion and posterior model probabilities for all traits. Moderate heritabilities were estimated over time for most traits (0.30 ± 0.02 for MY; 0.26 ± 0.03 for FY; 0.45 ± 0.04 for PY; 0.28 ± 0.05 for MZY; 0.13 ± 0.02 for FPR; and 0.15 ± 0.03 for SCS). The heritability estimates for LP ranged from 0.38 ± 0.02 to 0.65 ± 0.03 depending on the trait definition used. Similarly, heritabilities estimated for LL ranged from 0.10 ± 0.01 to 0.14 ± 0.03. The genetic correlation estimates across days in milk (DIM) for all traits ranged from −0.06 (186–215 DIM for MY-SCS) to 0.78 (66–95 DIM for PY-MZY). The SNP effects calculated for the random regression model coefficients were used to estimate the SNP effects throughout the lactation curve (from 5 to 305 d). Numerous relevant genomic regions and candidate genes were identified for all traits, confirming their polygenic nature. The candidate genes identified contribute to a better understanding of the genetic background of milk-related traits in Murrah buffaloes and reinforce the value of incorporating genomic information in their breeding programs.


INTRODUCTION
Water buffaloes (Bubalus bubalis) have worldwide socioeconomic importance due to their contribution to farm labor and milk and meat production, especially in developing countries (Işik and Güi, 2016;Di Stasio and Brugiapaglia, 2021).In addition, their milk has high levels of fat, protein, and total solids that lead to high yielding dairy products, and consequently, attractive economic returns to producers (Warriach et al., 2015;Liu et al., 2018;Tonhati et al., 2008).To optimize the production efficiency of water buffaloes, there is a need to refine the current breeding goals for including additional indicators of production efficiency and animal resilience traits.Easy-to-access indicators of productivity efficiency and resilience include (but are not limited to) milk (MY), fat (FY), protein (PY), and mozzarella (MZY) yields, fat-to-protein ratio (FPR), SCS, lactation persistency (LP), and lactation length (LL).
High-producing dairy species tend to undergo a negative energy balance (NEB), especially during early lactation (Gross et al., 2011;Jorjong et al., 2015), which can cause fertility and health issues (Puangdee et al., 2016), including mastitis, metritis, lameness, and metabolic disorders (Ingvartsen et al., 2003;Toni et al., 2011).Cows in NEB will experience lipolysis and body fat mobilization, which can result in increased fat synthesis in the udder and decreased DM and energy intake (Ranaraja et al., 2018).This process is related to higher milk fat concentration and lower milk protein concentration in the postpartum period (Hossein-Zadeh, 2016).Thus, FPR is an easily quantifiable trait that could be used to identify buffaloes that are more resilient to early lactation challenges (Hossein-Zadeh, 2016;Ranaraja et al., 2018).
Although there are studies reporting genetic parameters for production and quality of water buffalo milk (e.g., de Costa Barros et al., 2018;Cesarani et al., 2021;Matera et al., 2022;Silva et al., 2023), most of them have not included SCC.Somatic cell count (and SCS) is a heritable trait in dairy cattle populations and used as an indicator of mastitis (Heringstad et al., 2000;VanRaden, 2004;DeVries et al., 2010;Martin et al., 2018;Oliveira et al., 2019a).Moreover, milk with high SCC is not desirable as it has reduced shelf life and high SCC is related to production losses and increased reproductive problems (Moore et al., 1991;Ma et al., 2000;Hertl et al., 2011).Somatic cell count also has a direct economic value for dairy producers where there is a penalty in milk prices with high SCC.Direct selection for mastitis resistance has been implemented in dairy cattle (e.g., Govignon-Gion et al., 2012;Jamrozik et al., 2013;Koeck et al., 2015), but not yet in water buffaloes.
Lactation persistency is another important indicator of production efficiency.Lactation persistency refers to the animal's ability to maintain its milk production level after reaching the production peak (Jamrozik et al., 1998).Greater LP has been associated with reduced feed costs, lower incidence of metabolic diseases, and better fertility (Sölkner and Fuchs, 1987;Harder et al., 2006;Weller et al., 2006).Therefore, the inclusion of LP in water buffaloes breeding goals could contribute to improving the production efficiency of buffalo herds and the health of the animals.However, the problem in the study of LP lies in how to express the shape of the lactation curve as a single value.Many attempts have been made to find the best way to express it (Jamrozik et al., 1997;Grossman et al., 1999;Jakobsen et al., 2002).For instance, Jamrozik et al. (1997) and Jakobsen et al. (2002) calculated LP based on the EBV for various lactation stages  using a single-trait random regression test-day model, and Grossman et al. (1999) based on the number of days in which the level of milk yield was maintained reasonably constant using Wood's definition of LP.
Statistical models used for genetic and genomic evaluations of milk production traits should consider the shape of the lactation curve.Consequently, lactation curves can be studied using alternative mathematical functions, such as Wood (1967), Wilmink (1987), Rook et al. (1993), and Brody et al. (1923).These functions are useful for predicting milk variables throughout the whole lactation based on partial results, while having biological meaning for their statistical parameters.Modeling of lactation curves using these mathematical functions has been widely used over the past decades in several dairy species (e.g., Winkelman et al., 2015;Aliloo et al., 2016;Jenko et al., 2017;Brito et al., 2018;Oliveira et al., 2019a,b), but there is still limited information for buffaloes.
Specifically for genomic analyses, evaluating the complete lactation curve of dairy species has been shown to increase the power to detect QTL compared with using accumulated phenotypes and repeatability models (Macgregor et al., 2005;Suchocki et al., 2013;Ning et al., 2017).In this context, we aimed to (1) compare different functions (Wood [WD;Wood, 1967], Wilmink [WL;Wilmink, 1987], and Ali and Schaeffer [AS; Ali and Schaeffer, 1987]) for genetically evaluating MY, FY, PY, MZY, FPR, SCS, LL, and LP in the first lactation of dairy buffaloes, using the singlestep genomic BLUP (ssGBLUP) approach based on single-trait random regression models (STRRM); (2) combine the best functions for each trait under a new multiple-trait random regression models (MTRRM) framework; (3) define the optimal indicators of LP for buffaloes; (4) estimate time-dependent effects of SNP associated with the longitudinal traits studied, as well as SNP effects for LP and LL; and (5) identify candidate genes controlling the traits analyzed in this study.The performance of both multiple-trait and repeatability models for the genetic evaluation of LL in the first 3 lactations was also assessed in this study.

MATERIALS AND METHODS
No human or animal subjects were used, so this analysis did not require approval by an Institutional Animal Care and Use Committee or Institutional Review Board.

Phenotypic Data
A total of 323,140 test-day (TD) records of MY, FY, PY, and SCC from the first lactation of 4,587 Murrah buffaloes were used in this study.For LL, 8,824 records from the first 3 lactations were analyzed, as detailed in Table 1.All data were collected from 1979 to 2017 in 6 herds located in the Brazilian states of Ceará, Rio Grande do Norte, and São Paulo (Supplemental Figure S1; https: / / figshare .com/s/ 19652365cd824ee6c1e6).The traits analyzed are MY, FY, PY, MZY, FPR, SCS, LL (LL-RM is lactation length estimated based on a repeatability model; LL-MT is lactation length estimated based on a multitrait model), and LP.Somatic cell score was calculated as SCS = log 2 (SCC/100) + 3. Mozzarella yield was calculated as MZY = MY × {[3.5 × %PY) + (1.23 × %FY) − 0.88]/100} (Altiero et al., 1989).The FPR was calculated as the percentage of milk fat divided by the percentage of milk protein (Puangdee et al., 2016).
The phenotypic quality control (QC) consisted of removing phenotypic records with unknown birth or calving dates and lactations with less than 3 TD records or with the first TD after 70 d postpartum.This was done because the lack of TD records before 70 DIM could affect the estimation of the lactation peak, which is crucial for modeling lactation curves (Tonhati et al., 2008;Aspilcueta-Borquis et al., 2013).Only TD records within the range of 5 to 305 DIM were considered.The average age at calving was 39 mo, with minimum values ranging from 25 to 45 mo and the maximum age at calving was 98 mo (for first parity animals).For LL, as first, second, and third lactation were used, the QC also included average age at calving of 40 mo, with minimum values from 21 mo and the maximum age at calving was 98 mo.
Contemporary groups (CG) were defined based on herd (n = 6), calving year (n = 31), and calving season (n = 2; October to March and April to September).Phenotypic records deviating from the mean ± 3.5 standard deviations within each milk recording month, and CG with less than 3 TD records were removed.The pedigree file included 79,456 animals (1,163 sires and 15,659 dams) spanning up to 16 generations.The average pedigree-based inbreeding was 0.05 ± 0.03, which was estimated through the Pedigree Viewer software (Kinghorn and Kinghorn, 2015).The descriptive statistics of the datasets used were previously described by Lázaro et al. (2021).

Genomic Datasets
The genotype QC was carried out using the Plink v.1.9software (Chang et al., 2015) and the preGSf90 software of the BLUPF90 family (Misztal et al., 2018).A total of 978 animals, born from 1989 to 2014, were genotyped using the 90K Axiom Buffalo Genotyping Array (Affymetrix/ThermoFisher Scientific, Santa Clara, CA).Genomic DNA purification was done from either blood or tail hair follicles, according to the protocols for the commercial extraction kit Macherey-Nagel NucleoSpin Tissue (Düren, Germany).After QC, only SNPs and samples with a call rate greater than 0.95 and 0.90, respectively, SNPs with no extreme deviation from the Hardy-Weinberg equilibrium (P > 10 −6 ), and those with a minor allele frequency greater than 0.03 were retained for further analyses.The SNPs and samples with unknown or duplicated genomic coordinates or those located in the sex chromosomes were also removed.After QC, 45,690 autosomal SNPs and 960 animals (212 males and 748 females) remained for further analyses.Out of the 749 genotyped females, 643, 632, 307, 308, 684, 653, and 698 had records for MY, FY, PY, SCS, FPR, MZY, and LL, respectively.Out of the 211 genotyped males, 15 were sire of 3,734 animals, in which 1,846 were males and 1,888 were females (most of them had their own phenotypes).

Statistical Analyses
The initial analyses were performed using STRRM.These models included the average curve of the popu- where y ijkl represents the phenotypic TD record i (i = 1, 2, …, i) of the animal for each trait (i.e., MY, FY, PY, MZY, FPR, or SCS) measured on DIM j (j = 5, 6, …, 305), with the number of milkings per day (MN) k (k = 1 or 2), and CG l measured on DIM j (l = 1, 2, ..., L) is the fixed regression coefficient for CG effect for the trait l; a iq and p iq are the random regression coefficients associated with covariate q for the additive genetic and permanent environmental effects of animal i, respectively; Γ ijq is the qth covariate for DIM j from buffaloes i, and e ijkl is the random residual term.These covariates are defined as where y ijkl , MN k , CG l , and e ijkl are the same as defined in Equation 1; a ir and p ir are the random regression coefficients associated with covariate r for additive genetic and permanent environmental effects of animal i, respectively; and Ψ ijr is the sth covariate for DIM j from buffaloes i.These covariates are defined as where y ijkl , MN k , CG l , and e ijkl are the same as defined in Equation 1; a is and p is are the random regression coefficients associated with covariate s for additive genetic and permanent environmental effects of animal i, respectively; and Λ ijs is the sth covariate for DIM j from buffaloes i.These covariates are defined as Λ ij0 = 1, Λ ij1 = ln t, and Λ ij2 = (−t).
After the definition of the best STRRM to describe each trait based on the deviance information criterion (DIC) and posterior model probabilities (PMP), the functions were combined to compose the MTRRM.A detailed description of DIC and PMP is presented later in the Model Comparison section.Thereafter, the best functions for each trait were combined under a MTRRM.The MTRRM form can be described as where y is the vector of phenotypic TD records of the animal for each trait (i.e., MY, FY, PY, MZY, FPR, or SCS); β, h, a, p, and e are the vectors of systematic (i.e., milking number, and age at first calving as linear and quadratic covariates), CG, additive genetic, permanent environmental, and residual effects, respectively; and X, Z, W, and Q are the incidence matrices relating β, a, p, and h to y.The model assumptions are

E
, and y X

I HY H G I
, where y, X, and β already defined above; HY 0 , G 0 , and P 0 are the variance-covariance matrices among traits due to the herd-year-season of calving, additive genetic, and permanent environment effects, respectively; and R 0 is the variance matrix of the residual effects among traits.I is an identity matrix, H is the hybrid relationship matrix (Aguilar et al., 2010), and R is a diagonal matrix of residual variances sorted by TD and trait, as described above.Uniform a priori distributions were assumed for the systematic effects, and Gaussian distributions were assumed for the CG along with additive genetic and permanent environment effect coefficients and an inverse Wishart distribution for the variance components: where k a and k p are the (co)variance matrices between the random regression coefficients for the additive genetic and permanent environment effects of the animal, respectively; HY is the (co)variance matrix among traits due to the herd-year-season of calving; I is an identity matrix whose order is equal to the number of animals with records; S a and v a and S p and v p are the a priori values and degrees of freedom for additive genetic and permanent environment effects, respectively.The residual variances for both single-and multiple-trait models, the analyses were sampled from inverted chisquare and inverted Wishart distributions: | , , , ; where v and S are the degrees of freedom and the a priori values for residual variance in the class i in the single-trait analyses or the residual (co)variance matrix class i in the multiple-trait analyses.The inverse of the H matrix was obtained as follows (Aguilar et al., 2010): where G −1 is the inverse of the genomic relationship matrix (calculated as in the first method proposed by VanRaden, 2008), A −1 is the inverse of the traditional relationship matrix, and A 22 1 − is the inverse of the section of A related to the genotyped animals.To make G invertible, 0.05 of A 22 1 − was added to 0.95 of G. Different τ and ω parameters (i.e., the scaling factors) were evaluated to better account for the reduced genetic variance and different pedigree depths, respectively, and make G −1 more compatible with A 22 1 − and A −1 .After the inclusion of genotyped animals in the training population (based on the Pearson correlation and regression coefficients), 6 values for τ (i.e., 0.3, 0.5, 0.8, 1.0, 1.5, and 2.0) and 5 for ω (i.e., 0.3, 0.5, 0.6, 0.8 and 1.0) were tested.The values used for τ and ω were defined based on the literature (Oliveira et al., 2019d;Freitas et al., 2020), and the best combination of τ and ω parameters for each trait was chosen according to the genomic estimated breeding value (GEBV) accuracy and bias.
The (co)variance components were estimated by Bayesian inference using the Gibbs sampler algorithm implemented in the GIBBS2F90 program (Misztal et al., 2018).A chain length of 1,000,000 cycles was used with a burn-in of 200,000 cycles and a sampling interval of 100 cycles.The use of these parameters resulted in 6,000 samples for subsequent analyses.The convergence of the MCMC chains was verified by graphical inspection and based on the Heidelberger and Welch (1983) and Geweke (1992) criteria using the "boa" R package (Smith, 2007).The genetic parameter estimates for each DIM were obtained based on the variance components estimated for the regression coefficients in each cycle.
The additive genetic effect of the random regression coefficients (estimated by ssGBLUP) was used to derive the GEBV for each DIM.The vectors of the GEBV for all DIM of each animal were obtained as follows: where δjkl � is the vector of the predicted genomic coefficients for each animal k and trait j, and T is a matrix of the independent covariates for each DIM (ranging from 5 to 305 d) associated with the better function (i.e., AS, WL, or WD).

Lactation Persistency
After the estimation of the statistical parameters for MY using the best function, 3 measures were used to describe LP.As a small percentage of the animals included in this study produced milk until the last day (at 305 d: 1% MY), these measures were modified based on the lactation curve conditions of buffaloes and adapted for DIM 270 (Hossein-Zadeh et al., 2017) 3. The difference between EBV for d 257 and 80 (adapted from Cobuci et al., 2004Cobuci et al., , 2007)):

LL-MT and Repeatability Analyses
Data on LL (8,824) from 3,921 Murrah buffaloes born between 1979 and 2014 in 6 Brazilian herds were used for this study.The average LL was 278.50 ± 78.46 d.Variance components, heritability coefficients, and genetic and phenotypic correlations were calculated for LL i -MT (first, second, and third lactations).Multipletrait and repeatability analyses were performed.The LL i -MT model used can be described as where y is the vector of phenotypic of the animal for each trait (LL1, LL2, and LL3); β, h, a, and e are the vectors of systematic (i.e., milking number, and age at first calving as linear and quadratic covariates), CG, additive genetic, and residual effects, respectively; and X, Z, and Q are the incidence matrices relating β, a, and h to y.The model assumptions are the same as defined above.The LL-RM model adopted, represented in matrix notation, was where y is the vector of phenotypic of the animal for LL; β, h, a, p, and e are the vectors of systematic (i.e., milking number, and age at first calving as linear and quadratic covariates), CG, additive genetic, permanent environmental, and residual effects, respectively; and X, Z, W, and Q are the incidence matrices relating β, a, p, and h to y.The model assumptions are the same as defined above.
The (co)variance components were estimated by Bayesian inference using the Gibbs sampler algorithm implemented in the GIBBS2F90 program (Misztal et al., 2018).A chain length of 1,000,000 cycles was used with a burn-in of 200,000 cycles and a sampling interval of 100 cycles.This resulted in 6,000 samples for subsequent analyses.The convergence of the MCMC chains was verified by graphical inspection and based on the Heidelberger and Welch (1983) and Geweke (1992) criteria using the "boa" R package (Smith, 2007).The genetic parameter estimates for each trait obtained based on the variance components estimated in each cycle.

Model Comparison
The STRRM, LL i -MT, and LL-RP were compared based on DIC (Spiegelhalter et al., 2002).The DIC can be described as follows: where D θ ( ) is a point estimate of the deviance obtained by replacing the parameters by their posterior mean estimates in the likelihood function and 2p D is the effective number of parameters in the model.Models with lower DIC are preferred.
To facilitate the interpretation of the DIC values in terms of the superiority of one model over another, the PMP were calculated from DIC following Silva et al. (2011) and Ventura et al. (2015), in which

GEBV Accuracy and Bias
After QC, 2 reduced datasets (GEBV red and EB-V red ) were created from the full datasets (GEBV full and EBV full ) by excluding phenotypic information from the last 4 yr without excluding the animals from the analyses (Mäntysaari et al., 2010).Thus, the reduced datasets included all phenotypic information available until 2010 and was used to predict GEBV for the reduced dataset based on the ssGBLUP method (GEBV red ), and EBV for the reduced dataset based on the traditional pedigree-based BLUP (EBV red ).A total of 150 genotyped females were used as validation animals for all traits.Individuals born from 1987 to 2010 were included in the training population.
The GEBV and EBV of the validation animals were used to evaluate the accuracy (r) and bias of genomic predictions for each trait.The accuracy of genomic prediction was calculated, considering only the validation animals, as the Pearson correlation coefficient (r) between daily GEBV red with daily GEBV full , and daily EBV red with daily EBV full .The EBVs from full datasets have also been used to validate the performance of genomic predictions using random regression models in other studies (e.g., Koivula et al., 2015;Baba et al., 2017).The genomic prediction bias was calculated by obtaining the regression coefficient (b 1 ) estimated using a linear regression of EBV full on EBV red , and GEBV full on GEBV red (i.e., full = b 0 + b 1 × red + e; Legarra and Reverter, 2018).The bias represents the deviation of the b 1 coefficient from 1 (i.e., b 1 = 1 indicates no bias in the analyses).Only animals from the validation population were used to calculate r and b 1 .To analyze the advantages of including genomic information in the genetic evaluation of milk traits, the estimates using EBV were compared with those using GEBV for the animals in the validation population.

GWAS and Functional Analyses for MY, FY, PY, MZY, FPR, and SCS
After calculating the GEBVs using the full database based on the best STRRM, the postGSf90 software (Aguilar et al., 2014) was used to back solve the GEBVs predicted from the additive genomic random regression coefficients for the SNP effects.The SNP effects calculated for the regression coefficients were used to estimate the SNP effects throughout the lactation curve (from 5 to 305 d), as detailed in Oliveira et al. (2019a,c).The SNP effects can be described as follows (Wang et al., 2012): where ûc is the vector of estimated SNP solutions for the cth random regression coefficient; Z is the matrix containing the centered genotypes; and âg is a vector of GEBV for the sth random regression coefficient, estimated by the ssGBLUP analyses, which contains the sth random regression coefficients for all the genotyped animals.The SNP-specific solutions for all random regression coefficients (c = 1, 2, 3, 4) were combined into a vector ˆˆˆˆ[ and used to estimate the SNP effects for all DIM (from 5 to 305), as proposed by Oliveira et al. (2019a): where SNP k is a vector that contains the effect of the kth SNP on the trait estimated for each DIM, T is a matrix of covariates for each DIM, associated with the third-order Legendre orthogonal polynomials, and û is the vector of SNP solutions for all random regression coefficients related to the kth SNP.To identify significant SNP, P-values were obtained as follows: , g g where P-value ij is the P-value associated with the jth SNP in trait i and Ф(•) is the standard normal cumulative distribution (Gualdrón Duarte et al., 2014).The SNP effects were ranked according to the significance of association with each trait and the cutoff for considering a significant association was established by Benjamini and Hochberg multiple testing correction of the P-value (false discovery rate = 0.05; Benjamini and Hochberg, 1995).
The percentage of the genetic variance accounted by the ith SNP (V i ) was also estimated to facilitate the identification of possible relevant chromosome regions related to lactation curve parameters based on the following equation: , where p i and q i are the allele frequencies for the ith SNP estimated across the entire population and ŝi 2 is the estimated additive effect from model of the ith SNP squared.The SNP windows consisted of a region of 20 consecutive SNPs located within 500 kb.
The reference genome UOA_WB_1 (Low et al., 2019) was used to seek candidate genes potentially associated with the selected SNPs.Genes mapped in 100 kb up-or down-stream of the important SNPs were considered to capture the regulatory and functional regions close to the SNPs but not necessarily contained in them (Peñagaricano et al., 2013).Thus, based on the start and end positions of each window, important genes were identified as defined above.

GWAS and Functional Analyses for LL and LP
The GEBVs were obtained with the BLUPF90 program (Misztal et al., 2018), applying LL-RM model for LL trait, and the better model used for MY analysis for LP trait.The SNP effects were obtained from the GEBV values of the genotyped animals using the postGSf90 software from the BLUPF90 family (Misztal et al., 2018).The SNP effects were calculated following the equation: where u is the vector of SNP effects, Z is a matrix that relates genotypes of each locus, and âg is a vector of GEBV estimated by the ssGBLUP analyses for all the genotyped animals.To identify significant SNP, P-values were obtained as described in the previous section.The reference genome UOA_WB_1 (Low et al., 2019) was used for identifying candidate genes potentially associated with the selected SNPs as described in the previous section.

Model Comparison
The DIC and the PMP calculated for the comparison between the STRRM for MY, FY, PY, MZY, FPR, and SCS for the lactation curve models based on WD, WL, and AS are shown in Supplemental Table S1 (https: / / figshare .com/s/ 8510faf7a564a82e85d6).The analyses of the AS model for PY, MZY, FPR, and SCS did not converge, which might be due to the larger number of parameters in this model compared with the other tested ones.Although it had a nonlinear (exponential) component, the model with 5 parameters did not have the ability to generate the 2 bends (right and left) that are required by the typical lactation curves (Dematawewa and Dekkers, 2014).The model based on WD function fitted better for most traits based on the lower DIC values (Supplemental Table S1); thus, only WD results will be discussed here.Abdel-Salam et al. (2011) and Soysal et al. (2016) also indicated that WD was one of the best functions for describing lactation curves of dairy buffaloes at the genetic level.The single-trait results (Supplemental Table S1) were used to define the combined MTRRM between traits.

Descriptive Statistics and Heritability Estimates for LL and Other Milk-Related Traits
The WD function is composed by parameters with biological interpretation.Therefore, the lactation curve parameters may be interpreted as follows (Varona et al., 1998): a time-dependent variable (t), initial yield (parameter a), ascent to lactation peak (parameter b), and descent from the lactation peak (parameter c).Functions of these parameters describe traits of the lactation curve, including initial yield, peak yield, and time of peak yield (Gipson, 1989).The parameters of the models compared in this study allow making important inferences for breeding purposes, such as initial production, total milk production, LP, and lactation peak.
The descriptive statistics (mean, minimum, maximum, and SE) and heritability estimates of the coefficients for WD parameters (a, b, and c), for MY, FY, FPR, PY, MZY, SCS, and LL are presented in Table 2.The function used influenced the variance component estimates (Tables 2 and Supplemental Table S2; https: / / figshare .com/s/ 54dc0fac1864bf7a1010).Thus, we compared the heritability estimates obtained using the STRRM based on the different functions (Figure 1).
Parameter a of the lactation curve can be used as a selection criterion to control peak yield that increases when selecting for cows with a successful dry period.Also, parameter b can be used as a selection criterion indicating the rate of the ascending phase to the production peak (Table 2).The heritability values for a, b, and c (for WD model) indicate that these coefficients can be used as indicator traits in breeding programs aiming to improve the lactation curve pattern.However, the c coefficient of the WD models had low heritability estimates (Table 2 and Supplemental Table S2).2015) (i.e., −0.43) in Anatolian buffalo.Based on these findings, buffaloes with higher initial MY took longer to reach their maximum daily MY.The genetic correlations between parameters a and c were also negative (−0.76), suggesting that animals with greater genetic value for initial production (coefficient a) tend to have a less pronounced decline in MY throughout lactation.Thus, genetic selection for MY at the beginning of the lactation would result in buffaloes with a lower rate of decline in MY and, consequently, greater LP.Cobuci et al. (2001) and Şahin et al. (2015) reported an inverse trend of 0.45 and 0.59 in dairy cattle and buffaloes, respectively.
Milk production traits presented moderate heritabilities, ranging from 0.10 (for LL-RM and LL-MT) to 0.45 (for PY_WD) (Tables 3 and 4).The heritabilities for MY, FY, PY, and MZY were moderate (Table 3), indicating that these traits can be genetically improved through direct selection.The estimates of heritability for MY based on WD model was similar (Table 3), with posterior means of 0.30.Lower estimates (0.20 to 0.25) have been reported in Brazilian buffaloes (Tonhati et al., 2000;Rodrigues et al., 2010;Seno et al., 2010;Aspilcueta-Borquis et al., 2012).In general, the heritability estimates were higher at the beginning of the lactation (before DIM 5; Figure 1), except for PY and SCS in which the estimates were higher at the end of the lactation curve.The lower number of records and animals in the current study, and complexity of the traits evaluated, could explain the differences in the results obtained.Similarly, when using high-order Legendre polynomials, unstable heritability estimates were especially observed, which might be related to the Runge's phenomenon.In brief, the Runge phenomenon is a problem of oscillation at the edges of an interval that occurs when using polynomial interpolation with polynomials of high degree over a set of interpolation points (Boyd and Ong, 2009;Ye et al., 2018).
The moderate heritability for MZY (0.33, MZY-AS; Supplemental Table S2) is higher than the estimate of 0.23 reported by Aspilcueta-Borquis (2010b), and the value of 0.22 reported by Rosati and Van Vleck (2002).Differences in methodology applied to estimate the (co)variance components and in the production system (management and environment) might explain the differences in the results.Different models presented similar heritabilities throughout the lactation (Figure 1) with the highest values (0.54) in DIM 5 to 35.
The low heritability for FPR, SCS, and LL (Tables 3  and 4) indicates that these traits are highly influenced by environmental factors.For FPR, the heritability was low and similar between the tested models (Supplemental Table S2).Buttchereit et al. (2011), studying dairy cows, reported heritability estimates ranging from 0.20 to 0.54, with the highest values found at the beginning of lactation and the end of the data recording period, which corroborates with the results found in this study, with estimates varying from 0.08 (DIM 95-155) to 0.28 (DIM 5; Figure 1).The estimate of heritability for SCS was also low (0.15) for both tested models (SCS-WD and SCS-WL) and was lower than the reported in other studies using buffalo datasets (0.26; Aspilcueta-Borquis et al., 2010a).Figure 1 shows the range of heritability over 305 DIM for the different traits, estimated by the combined random regression models.The estimate of heritability for MY were, in general, higher than the estimates for FY, PY_WL, FPR, MZY, and SCS, and ranged from 0.20 to 0.41(WD), 0.18 to 0.43 (WL), and 0.16 to 0.36 (AS).

Genetic Correlations Between LL Estimates
Genetic and residual correlations among LL estimated using the multiple-trait model are presented in Table 5. Genetic correlations were high and positive (LL1 and LL2, 0.95; LL1 and LL3, 0.82; and LL2 and LL3, 0.92).The lowest residual correlations were observed between LL in the first 3 parities, suggesting that the environmental effects that are influencing LL are not also affecting LP in the other parities.The genetic correlations for LL i -MT were higher between adjacent lactation lengths.In addition, all genetic correlations between LL i -MT were greater than 0.80 and, according to Robertson (1959), genetic correlations below the threshold of 0.80 could be considered as having strong biological importance.Therefore, the selection for the LL i -MT could be carried out until the third lactation without losses in the genetic gain for this trait.

Genetic Correlation Estimates Among Milk-Related Traits
The genetic correlations estimated between MZY and PY or FY at different DIM were moderate to high (Table 6), indicating that these traits are generally influenced by similar sets of genes, which was expected because MZY is calculated from PY and FY.Direct selection for PY and FY should result in faster genetic progress compared with indirect selection for these traits.In general, PY was slightly more strongly correlated with MZY than FY, with values ranging from 0.47 (DIM 5-35 and 156-185) to 0.78 (DIM 66-95) and 0.28 (DIM 276-305) to 0.69 (DIM 66-95), for PY and FY, respectively.The genetic correlations observed between these 2 traits with MZY varied throughout the lactation (Figure 2).The genetic correlation estimated between MY, FY, PY, and SCS were close to zero in the middle of the lactation.Genetic correlation estimates between MY, FY, PY, and SCS were moderate and positive.These results may be due to the high SCS values, which usually lead to a decrease in lipo-protein synthesis due to damage of the secretory epithelium and the lipolytic activity of leukocyte enzymes (Alhussien and Dang, 2018).This genetic correlation increased during early lactation, with slight drop in the middle of the curve, and increasing to moderate, during late lactation.This result is similar to those reported by Lázaro et al. (2021).Some studies have reported that SCS decreases at about 40 d postpartum in dairy cattle (Shoshani et al., 2014) and at about 14 d in dairy buffaloes (Alhussien and Dang, 2018).This decrease in SCS could be due to an abrupt increase in production precisely at the time when they started or reached lactation peak.It is important to highlight that changes at the lactation peak period will probably lead to greater losses for the farmer.High SCS values in the colostrum may be due to excess epithelial cells derived from functional activities in the mammary gland in addition to phagocytic cells of the immune system (Alhussien and Dang, 2018).These findings might partially explain the relationship between MY, FY, PY, and SCS as also observed by Lázaro et al. (2021).
Most of the traits evaluated in this study followed the same pattern, showing higher estimates of genetic correlations at the beginning of lactation, with a drop in the middle of the curve, and increasing after of the middle and a slight drop at the end of the lactation (Table 6).Similar results have been reported in the literature (Breda et al., 2010;Sesana et al., 2010;Kheirabadi, 2018;Lázaro et al., 2021).The genetic correlations between FY and PY were also positive and moderate to high, and followed the same pattern reported by Lázaro et al. (2021).However, in the current study the estimates were slightly higher.This could be attributed to the difference in the models used, because nonlinear models (linearized) and the smaller number of animals in the considered population when compared with the mentioned studies in dairy buffalo.
To date, genetic correlation estimates between MY and milk-related traits during lactation in dairy buffaloes using nonlinear parameters as coefficients are limited in the literature.However, in dairy goats, the genetic correlations between FP and PP were positive and approximately constant throughout the lactation (Oliveira et al., 2016).In this study, correlations between FP and PP were positive across DIM, ranging from 0.37 to 0.64, indicating that direct selection for one of these traits will result in substantial genetic responses in the other correlated trait.Our results for FY with FRP and PY with FPR had a similar pattern compared with the results reported by Oliveira et al. (2016) for FP and PP.However, we obtained slightly lower estimates ranging from 0.06 (DIM 156-185) to 0.38 (DIM 66-95) and 0.37 (DIM 126-185) to 0.55 (DIM 36-95), for PY and FY, respectively.According to these correlation analyses results, a selection scheme focusing only on FPR could lead to an increase in FY and PY.This is an additional reason to consider fat and protein as breeding goals.

Genetic Parameters for LP
The heritability estimates and the genetic and permanent environment correlations, for the 3 measurements of LP are shown in Table 7; LP1, LP2, and LP3 are the first, second, and third measures of LP, respectively.The heritability estimates for LP indicators ranged from 0.38 to 0.65.The highest heritability estimates were observed for LP1 over the first 3 lactations (0.65, 0.38, and 0.45 for the first, second, and third lactations, respectively).In general, the heritability estimate values indicate that genetic progress can be achieved for all of them.LP1 is recommended as the selection criterion in the first 3 lactations of Murrah buffaloes due to greater heritabilities compared with 2 other LP measures.Genetic parameter estimates, such as heritability, are still scarce in the literature for measures of LP in buffaloes.Hossein-Zadeh et al. (2017) reported heritability estimates for LP1, LP2, and LP3 of 0.09, 0.29, and 0.26, respectively, in Murrah buffaloes.Pareek and Narang (2014) reported a heritability estimate for LP of 0.19 in the first lactation of Indian Murrah buffaloes.Madsen (1975) reported that differences between heritabilities for LP can be caused by 3 reasons.The first one is related to the biological efficiency of the LP indicator (if the differences between productions in the different periods should be considered in absolute or relative terms).The second and third reasons are associated, respectively, with the statistical efficiency of the type of measurement of persistency and the part of lactation used in the calculation of persistency in lactation.Furthermore, genetic parameters are population specific as a consequence of genetic variability and selection history.Large random deviations from the expected lactation curve affect these measures.This probably explains the rather low heritabilities (LP2) for these measures.Therefore, types of measurement that exclude these parts of lactation can lead to higher heritability estimates (LP1) than indicators that con-sider milk production throughout the whole lactation period.The genetic correlations between LP i (Table 7) were generally high, indicating a strong genetic association between the measures and ranged from −0.99 to 0.98.The permanent correlations of environment followed the same trend and magnitude as the genetic correlations.The high and negative genetic correlation between LP1 and LP2, and LP2 and LP3, referred to the method of defining these persistency measures.High positive genetic correlation between LP1 and LP3 showed that these measures of persistency evaluated the difference of milk production of 2 similar parts of the lactation.
The procedure for defining the LP2 and LP3 measures resulted in a high and negative genetic association between them, as reported by Nazari et al. (2021) in Iranian buffaloes.In contrast, the same authors and Hossein-Zadeh et al. ( 2017) found positive genetic correlations between LP1 and LP2, indicating that there are similar genetic and physiological processes influencing these LP measures.However, a negative genetic correlation between LP1 and LP2 was observed in the current study.Therefore, high negative and positive genetic correlations between LP3 with 2 other persistency measures implied the existence of various mechanisms influencing their phenotypic expression (Nazari et al., 2021).

Genomic Prediction of Breeding Values
The validation accuracies and regression coefficients between full and reduced datasets, obtained for all traits using EBV and GEBV, were moderate and are shown in Table 8 and Supplemental Table S3 (https: / / figshare .com/s/ 4caee6f6afc7474b3b24).Using genomic information provided similar or higher GEBV prediction accuracies compared with EBV.Similar to higher validation accuracies were obtained when incorporating genomic information (Table 8).The accuracy of genomic predictions relies mainly on the trait heritability (Meuwissen et al., 2001), population structure, and genetic diversity of the population (Daetwyler et al., 2012).For Murrah buffaloes, the prediction accuracies for the pedigree and genomic based models followed the expected pattern in being higher for more heritable traits (Table 3; Figure 1).
Slightly different trends were reported by Lázaro et al. (2021) based on the same population, in which the authors evaluated milk-related traits using random regression models based on Legendre orthogonal polynomials.These differences in the genomic prediction accuracies for the same population might be related to the statistical models and validation methods used in the analyses as previously reported by other authors (e.g., Himmelbauer et al., 2023;Araujo et al., 2023).In addition, the regression coefficients used as an indicator of bias (i.e., regression coefficients deviating from 1.0) were closer to 1.0 for the genomic approach in all traits and functions evaluated, except for FPR based on WD and WL functions (0.61 and 0.65, respectively).For the pedigree approach, almost all traits and models evaluated were closer to 1.0, except for MY based on the WD, WL, and AS functions (0.50, 0.59, and 0.65, respectively) and for FPR based on the WD and WL functions (0.65 and 0.64, respectively).Therefore, lower bias was observed in the genomic analyses in comparison to the pedigree-based results.
When the WD, WL, and AS functions were contrasted, slightly different validation accuracy patterns were observed between the traits or between the evaluations based on genomic approach or pedigree-based analyses (Supplemental Table S3).In general, the 3 models showed similar Pearson correlations patterns for traits evaluated.According to Li et al. (2020), models presenting better goodness of fit do not necessarily indicate better predictive ability, whereas a poorly fit model, such as underfitting, could be induced by bias.This is supported by our results about the goodness of fit (Supplemental Table S1) and regression coefficients (Table 8 and Supplemental Table S3) obtained for the models.For most traits evaluated, the models with better goodness of fit also presented higher regression coefficients, indicating lower bias, whereas the models with the worst goodness of fit tended to be more biased.

GWAS and Functional Analyses
Many candidate genes were identified for all traits.The location of each important genomic window and A smaller sample size usually decreases the power to detect SNP or genomic windows affecting the traits of interest, especially when assuming the same threshold to define genomic windows as important, as done in this study.Therefore, the smaller number of important genomic regions found for FY, PY, MZY, SCS, and FPR may be due to the increase in the proportion of animals carrying associated variants with low MAF for these traits, which consequently may affect the estimated SNP effects (Oliveira et al., 2019d).Only the candidate genes common to all parameters within each trait are discussed here, and the complete results are presented in Tables 9,10,11,12,13,14,15,and 16.The gene network of biological processes that showed those processes to be better for all candidate genes is presented in Supplemental Figures S2 (https: / / figshare .com/s/ 7914dac708c28f5492fd), S3 (https: / / figshare .com/s/ 37006909ca9b2dee9118), S4 (https: / / figshare .com/s/ 7c98174ef680d3e5bf3b), S5 (https: / / figshare .com/s/ 46b76f10c364c9c72335), S6 (https: / / figshare .com/s/ cbe7b1a0490799fcc70c), S7 (https: / / figshare .com/s/ bf8d4e59a702e9126e98), S8 (https: / / figshare .com/s/ 46cf7994a697c5167980), and S9 (https: / / figshare .com/s/ 8fea6c23006a35fd9bc1).
The COL23A1 (collagen type XXIII α 1 chain) gene has been reported to explain 0.41% of the genetic variance for daily milk yield in Egyptian buffalo (Abdel-Shafy et al., 2020).SCRN1 is a gene that encodes a member of the secernin family of proteins.A similar protein in rat functions in regulation of exocytosis in mast cells (Way et al., 2002), in which the main function of mast cells is to store potent chemical mediators of inflammation.Therefore, we speculate that SCRN1 might be influencing mastitis resistance.The galectin-3 gene (LGALS3) is the only chimeric S-type lectin is highly expressed in immune cells including monocytes, macrophages, neutrophils, and dendritic cells (Chen et al., 2009).Thus, it plays important roles in inflammation regulation (Ohkura et al., 2014).Neither SCRN1 nor LGALS3 were associated with SCS, because we did not identify significant SNP for SCS, but both were associated with MY.

Candidate Genes for FY
Overall, no similar candidate genes were identified for all 3 curve parameters.More genomic regions were found to be in common between the "a" and "c" parameters (BBU1, BBU3, BBU6, BBU8, and BBU9).Among the "b" and "c" parameters there were only common regions in BBU23 and BBU24.Details regarding all candidate genes found in each parameter for FY are shown in Table 10.
The matrix metalloproteinases (MMP2) gene is expressed in the mammary gland, during ductal development, lobulo-alveolar development in pregnancy and involution after lactation (Uria and Werb, 1998).MMP2 is also associated with the progress of gesta- tion, especially during the peri-implantation period and in late gestation in bovine (Kizaki et al., 2008).After implantation, the newly formed cells produce some hormones, such as placental lactogen, a hormone produced in the placenta (Uria and Werb, 1998).The LDL receptor-related protein 2 (LRP2) is a member of the family of lipoprotein receptors that binds and internalizes a variety of ligands including protease-protease inhibitor complexes, vitamin-vitamin binding protein complexes, other hormones, and lipoproteins (May et al., 2007).BIRC3 (baculoviral IAP repeat containing 3) has been reported as an important gene in mastitis development (Brand et al., 2011;Bakhtiarizadeh et al., 2020), in addition to presenting functions of an inhibitor of apoptosis protein (Lee et al., 2017).Plexin D1 (PLXND1) has been demonstrated to play a role in body fat distribution and insulin signaling and is conserved across species from zebrafish to humans (Minchin et al., 2015), suggesting that PLXND1 could play a role in FY.Moreover, it has been previously reported that there is a greater abundance of EIF4E (eukaryotic translation initiation factor 4E) in lactating bovine mammary gland at the end of the lactation compared with dry-off resulting from abrupt cessation of milking and in lactating mammary in primiparous cows (Long et al., 2001;Toerien and Cant, 2007).
Table 11 (Continued).Candidate genes for protein yield within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05) Table 12.Candidate genes for mozzarella yield within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05) MAZ, ZG16, and CLDN9) for parameter c.The details about all candidate genes identified for each parameter are presented in Table 11.
The SLC24A3 gene (solute carrier family 24 member 3) plays an important role in extrude calcium and potassium ions outside of the cell by entry of sodium ions, in the uterine endometrium during the estrus cycle and pregnancy in rodents, humans, and pigs (Yang et al., 2010(Yang et al., , 2011;;Lázaro et al., 2021).The solute carrier families (SLC) play an important role in nutrient absorption and transport involved in milk ingredient synthesis and secretion (Shu et al., 2012).In this context, the SLC genes are involved in the transport of the substrate for milk component synthesis.PNMT (phenylethanolamine N-methyltransferase) is a protein-coding gene that catalyzes the final step in epinephrine biosynthesis-one of the major hormones involved in glucose counterregulation and gluconeogenesis (Sharara-Chami et al., 2012).The glucose is a more effective general source of carbon in the components of milk; therefore, it is an important precursor to some of the constituents of milk (Kleiber et al., 1955;Zhang et al., 2018).The calcium homeostasis modulators family member 2 (CALHM2) gene located on BBU23, is identified as a family of ion (cation and anion) channels and ATP channels in the plasma membrane that are sensitive to membrane voltage and extracellular Ca 2+ level (Ma et al., 2016;Demura et al., 2020;Ren et al., 2020).In addition, the ADRM1 gene (adhesion regulating molecule 1) located on BBU22 has been previously associated with LP (Do et al., 2019).CH25H (cholesterol-25-hydroxylase), located on BBU23, regulates lipid metabolism and has been hypothesized to modulate cholesterol homeostasis, inflammation process, and immune response (Zhao et al., 2020).

Candidate Genes for MZY
Genomic regions associated with MZY were located on BBU1 (TACC1, TRNAG-UCC, and TRNAG-CCC), BBU2 (CD2AP and LRP1B), BBU3 (ABCA1), BBU4 (MED21, STK38L, TM7SF3, and ANKS1B), BBU7 (AMBN, MUC7, and AMTN), BBU8 (ICA1, DGKB, ARMC10, FBXL13, and LRRC17), BBU9 (MAR-COL and SPINK13), BBU11 (GABPB1, HDC, and USP8), BBU12 (CHST10, LONRF2, NMS, PDCL3, and MEIS1), BBU15 (CLVS1), and BBU21 (SLC6A1 and SLC6A11).Additional details about all candidate genes identified for MZY parameters are shown in Table 12.  13.Candidate genes for fat-to-protein ratio within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05) The TACC1 (transforming acidic coiled-coil containing protein 1) gene also known as circRNA, located on BBU1, has been associated with the c parameter.The TACC1 gene was reported to be expressed in mammary epithelial tissues mastitis as downregulated in healthy Holstein cows (Bai et al., 2021), and may be associated with mastitis in buffaloes.The DGKB gene (diacylglyc-erol kinase β) influences signal transduction, cell proliferation, development, glucose-sensing, and circadian regulation (Gan et al., 2020).In addition, some studies in the literature have shown that DGKB is associated with insulin levels (Dupuis et al., 2010;Wagner et al., 2011;Gan et al., 2020).We can understand the importance of these genes because the increase in SCS is related to the loss of protein and fat from milk to whey, the reduction in the protein content of the MZY, the increase in moisture and the lower industrial yield of mozzarella cheese.The ABCA1 gene (ATP binding cassette subfamily A member 1) mediates the transport of cholesterol, phospholipids, and other lipophilic molecules across cellular membranes to remove excess cellular cholesterol by export into the high-density lipoprotein pathway (Ikonen, 2008).This gene has also been associated with cardiovascular disease and hereditary diseases in cattle (Albrecht et al., 2004).In contrast, ABCA1 may be involved in the transport of cholesterol in the mammary gland and contribute to maintaining cholesterol homeo- stasis in mammary gland epithelial cells during involution and early lactation (Mani et al., 2009).Therefore, it might determine milk lipid content and composition.
Among the genes identified, the region on BBU5 harbors the LRRC38 gene, which is actively involved in K transport, modulating big K channels (Cavani et al., 2022) in the expression of FPR throughout c parameter.The ACADVL gene (acyl-CoA dehydrogenase very long chain) is involved in converting very-long-chain fatty acids into energy by encoding the coenzyme verylong-chain acyl-coenzyme A dehydrogenase (Alfares et al., 2020).The well-studied ADAM metallopeptidase domain 29 (ADAM29) gene belongs to the ADAM family, is a type I integral membrane protein and secrets a glycoprotein that mediates cell-cell and cell-matrix interaction (Chen and Wang, 2018).Therefore, due to the secretion of glycoproteins, which are present in ma- ture buffalo milk in higher concentrations than those present in cow milk, this gene is a potential candidate associated with FRP trait.In multicellular organisms, the HOXB1 gene (homeobox B1) is a protein-coding gene involved in morphogenesis (Luquetti et al., 2012).This gene was also associated with protein yield, protein percentage, and LP in North American Holstein cattle (Pedrosa et al., 2021).In addition, the ITGA5 gene (integrin subunit α 5) located on BBU4 has been associated with immune and inflammatory responses (Zhang et al., 2018), indicating a potential relationship between the ITGA5 and mastitis.Thus, this gene can be considered one of the potential candidate gene influencing SCS (Chen et al., 2015), due to its potential relationship with mastitis, even though there were no SNPs associated with SCS, but was associated with FRP.

Candidate Genes for LP1
Overall, most candidate genes identified for all LP were common for LP1.The common genomic regions are located on BBU1, BBU3, BBU4, BBU6 BBU7, BBU8, BBU9, BBU16, BBU18, and BBU20.The same genomic regions were identified for parameters b and c.Additional information about the candidate genes identified for LP1 are shown in Table 14.
The CD14 (monocyte differentiation antigen cd14 precursor) gene play an important role in mediating bacterial infections (Moncada-Laínez et al., 2022).This suggest that CD14 has a polymorphic pattern associated with SCS and differential expression between extreme genotypes indicating this gene is a candidate for mastitis resistance (Gupta et al., 2018).In addition, the TGFB1 (transforming growth factor β 1) gene encodes a multifunctional cytokine with immunoregulatory and anti-inflammatory properties (McLennan and Koishi, 2004).It is also important in pregnancy maintenance, hepatic stellate cell activation, and hepatic fibrosis (McLennan and Koishi, 2004;Wang et al., 2004).The AREG (amphiregulin) gene is known as an epidermal growth factor receptor and is involved in cell growth, proliferation, differentiation, and migration (Chankeaw et al., 2021).It has an important element in mammary gland development in cows, so it participates in proliferation of epithelial cells of the bovine mammary gland (Choudhary, 2011;Solodneva et al., 2022).In this context, the ISL1 (ISL LIM homeobox 1) gene was identified showing effects on mammary gland morphology in dairy cattle (Tribout et al., 2020).Additionally, SPRY2 (sprouty RTK signaling antagonist 2) has been shown to influence embryo development in African taurine cattle (Xu et al., 2015;Mauki et al., 2022).

Candidate Genes for LP3
The significant genomic regions in common for all parameters are located on BBU1, BBU3, BBU4, BBU6 BBU7, BBU8, BBU9, BBU16, and BBU18.Additional information about the candidate genes identified for LP3 are shown in Table 15.Three important candidate genes will be highlighted here, including PTPN22, TSPAN2, and DCLRE1B, which are common to the 3 parameters.
The PTPN22 (protein tyrosine phosphatase) gene has been previously associated with the immune system and antiviral responses and to regulate inflammasome activation (Spalinger et al., 2017;Sun et al., 2020;Talker et al., 2022).In addition, the TSPAN2 (tetraspanin 2) gene is closely related to CD9 and CD81 proteins, which have been implicated in biological functions related to cell fusion and adhesion and in processes such as the immune response, and neuroinflammation (Yaseen et al., 2017;Sims et al., 2018).It also may be involved in the neuroinflammatory process by playing a role in microglial activation (Reynolds and Mahajan, 2020).The BCL2L15 (B-cell lymphoma 2 [Bcl-2]-like protein 15) gene has been reported expression in the stomach, ovaries, bone marrow, spleen, and uterus (Coultas et al., 2003).Both PTPN22 and BCL2L15 have been reported to be associated with the immune system response in river buffaloes (Sun et al., 2020;Talker et al., 2022) due to its play role in endometrial receptivity regulation (Yang et al., 2020;Talker et al., 2022).Therefore, these genes mentioned above, might play a role in promoting the immunity pathway.Additionally, the CYSTM1 gene has been reported to be significantly associated with vascular endothelial growth factor levels, intelligence (neurogenesis and myelination), and menarche in human, suggesting its important role in development and fertility (Fang et al., 2019).It also significantly associated with gestation length in dairy cattle (Purfield et al., 2019), which may contribute to the association of CYSTM1 with LP3.
The FOXO3 (forkhead box O3) gene is commonly associated with longevity in humans (Willcox et al., 2008;Flachsbart et al., 2009).It also has potential association with milk protein and fat traits (Gao et al., 2017).The INHBB (inhibin beta B) gene plays important roles in regulating follicle growth and ovulation in livestock, and also showed a high expression in the ovary and uterus of Mus musculus (Xu et al., 2018).This gene has important role of inhibin in reproduction thus, it was considered as a possible candidate gene for the prolificacy trait.It also has a significant effect on litter size in some sheep breeds (Sharma et al., 2015;Laird et al., 2019).Additionally, the GRHL1 (grainyhead-like 1) gene is a transcription factor involved in embryonic development and its abnormal expression can cause birth defects (Carpinelli et al., 2020).In the same context, the TFCP2L1 (transcription factor CP2-like 1) gene, also identified in this study, play a critical role in placenta development (Taracha et al., 2018).It is also involved in various processes related to reproduction, such as placental development (Taracha et al., 2018).In addition, the TMEM100 gene is expressed in the notochord, mammary glands, endocardium, atrioventricular cushion, a subset of the dorsal root ganglia, and the ventral region of the neural tube (Mizuta et al., 2015;Moon et al., 2015;Weng et al., 2015).

CONCLUSIONS
Of all the nonlinear lactation models evaluated (i.e., WD, WL, and AS), WD was the most appropriate model to summarize lactation curve patterns of Murrah dairy buffaloes.The additive genetic variance and heritability estimates for MY, FY, PY, and SCS were moderate, indicating that there is enough genetic variation to achieve genetic progress for these traits in the studied population.The heritability values for the parameter estimates of a, b, c (for WD and WL models) and d and f (for AS model) indicate that the shape of the lactation curves can be altered through genetic selection.The genetic correlation estimates varied from low to high, suggesting that the traits evaluated (and parameters within traits) are genetically correlated in the same direction but at different levels throughout lactation.The validation analyses indicated that moderately accurate GEBVs can be obtained for all traits evaluated.The use of estimated parameters with different functions under a random regression model framework and treated as phenotypes in a multitrait GWAS was efficient at identifying SNP-derived candidate genes with functions related to biological processes of lactations in dairy buffaloes.
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOES
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOESTable 6.Estimates of genetic correlation at different DIM, SD (in parentheses), and highest posterior density interval (95%; in brackets) for milk, fat, protein, and mozzarella yields (kg), fat-to-protein ratio, and SCS estimated based on bi-trait Wood models between different traits (only for the first lactation) from test-day records in Murrah buffaloes

Figure 2 .
Figure 2. Estimates genetic correlations (cor) between days of lactation obtained with the univariate by Wood models among different traits, including milk yield (A), fat-to-protein ratio (B), fat yield (C), protein yield (D), mozzarella yield (E), and SCS (F).
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOESTable
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOES

Table 2 .
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOES Mean ± SE, minimum (Min), and maximum (Max) values for the parameters a, b, and c from the Wood lactation model, milk (MY), fat (FY), protein (PY), and mozzarella (MZY) yields, fat-to-protein ratio (FPR), and SCS 1

Table 4 .
Estimates of heritability, SD, and highest posterior density interval (HPD 95%) for lactation length

Table 5 .
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOES Estimates of genetic (above diagonal) and residual (below diagonal) correlations, SD, and highest posterior density interval (HPD 95%, in brackets) for lactation length (LL i -MT) obtained based on a multitrait model

Table 7 .
Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOES   Heritability (diagonal), and genetic (above diagonal) and permanent environment (below diagonal) correlation estimates for different lactation persistency (LP) measures (mean ± SD)

Table 8 .
Accuracy measured by Pearson correlation coefficient estimated between EBV (full and reduced [red] dataset) and the genomic EBV (GEBV; full and reduced dataset)1 MZY, SCS, LL, and LP).These SNPs were distributed across all buffalo autosomal chromosomes (BBU).We presented only relevant markers that have influenced all WD parameters for all traits, because the WD function better fitted the data for most traits; for LL, we presented only the SNPs obtained based on the repeatability model (189-LL-RM) as the genetic correlation between the different LL indicators (LL i -MT) was high.

Table 9 .
Lázaro et al.:GENETICS OF MILK TRAITS IN BUFFALOES Candidate genes for milk yield within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05)1

Table 10 .
Candidate genes for fat yield within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05)

Table 11 .
Lázaro et al.:GENETICS OF MILK TRAITS IN BUFFALOES Candidate genes for protein yield within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05) Lázaro et al.: GENETICS OF MILK TRAITS IN BUFFALOES

Table 14 .
Lázaro et al.:GENETICS OF MILK TRAITS IN BUFFALOES Candidate genes for lactation persistency 1 (LP1) within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05)

Table 15 .
Lázaro et al.:GENETICS OF MILK TRAITS IN BUFFALOES Candidate genes for lactation persistency 3 (LP3) within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05)

Table 16 .
Lázaro et al.:GENETICS OF MILK TRAITS IN BUFFALOES Candidate genes for the length lactation-repeatability model within 500 kb of significant windows in each parameter, their positions in base pairs at buffalo chromosome (BBU; Chr) and the SNP window selected that were significant (P < 0.05) 1LL-RM = length lactation with repeatability model.