Genetic analysis of milk citrate predicted by milk mid-infrared spectra of Holstein cows in early lactation

Milk citrate is regarded as an early biomarker of negative energy balance (NEB) in dairy cows during early lactation and serves as a suitable candidate phenotype for genomic selection due to its wide availability across a large number of cows through milk mid-infrared spectra prediction. However, its genetic background is not well known. Therefore, the objectives of this study were to (1) analyze the genetic parameters of milk citrate; (2) identify genomic regions associated with milk citrate; (3) analyze the functional annotation of candidate genes and quantitative trait loci (QTL) related to milk citrate in Walloon Holstein cows. In total, 134,517 test-day milk citrate phenotypes (mmol/L) collected within the first 50 d in milk ( DIM ) on 52,198 Holstein cows were used. These milk citrate phenotypes, predicted by milk mid-infrared spectra, were divided into 3 traits according to the first (citrate1), second (ci-trate2), and third to fifth parity (citrate3+). Genomic information for 566,170 SNPs was available for 4,479 animals. A multiple-trait repeatability model was used to estimate genetic parameters. A single-step genome-wide association study ( GWAS ) was used to identify candidate genes for citrate and post-GWAS analysis was done to investigate relationship and function of the identified candidate genes. The heritabilities estimated for citrate1, citrate2 and citrate3+ were 0.40, 0.37 and 0.35, respectively. The genetic correlations between the 3 traits ranged from 0.98 to 0.99. The genomic correlations between the 3 traits were also nearly 1.00 across the genomic regions (1 Mb) in the whole genome, which means that citrate can be considered as a single trait in the first 5 parities. In total, 603 significant SNPs located on 3 genomic regions (chromo-some7 68.569 – 68.575 Mb, 14 1.31 – 3.05 Mb, and 20 54.00 – 64.28 Mb), were identified to be associated with milk citrate. We identified 89 candidate genes including GPT , ANKH , PPP1R16A and 32 QTL reported in the literature related to the identified significant SNPs. These identified QTL were mainly reported associated with milk fatty acids and metabolic diseases in dairy cows. This study suggests that milk citrate in Holstein cows is highly heritable and has the potential to be used as an early proxy for the NEB of Holstein cows in a breeding objective.


INTRODUCTION
High-yield dairy cows in early lactation often experience negative energy balance (NEB) due to insufficient energy intake to support high milk production demands (Churakov et al., 2021).The NEB can lead to multiple metabolic diseases (e.g., ketosis) and reproductive problems (e.g., decrease in fertility rates) (Walsh et al., 2011;Zachut et al., 2020), reducing animal welfare and farmers' economic profitability.During NEB, dairy cows undergo a mobilization of body fat reserves, resulting in the liberation of glycerol and fatty acids (FA).The excessive presence of nonesterified FA (NEFA) in dairy cows' blood can precipitate the development of hepatic lipidosis, commonly known as fatty liver disease (Herdt, 2000).In such circumstances, the bovine liver significantly amplifies the synthesis of acetyl-CoA from FAs, subsequently leading to the conversion into ketone bodies, especially β-hydroxybutyric acid (BHB), due to diminished glucose levels and an oversupply of FA.The accumulation of these ketone bodies ultimately causes ketosis in cattle (Herdt, 2000;Ospina et al., 2010).
Direct NEB of dairy cows can be calculated by subtracting energy expended (here included milk, excreta, and maintenance) from energy intake.However, monitoring the NEB of individual cows under current commercial conditions is challenging for dairy farmers.Monitoring NEB requires frequent measurement of dry

Genetic analysis of milk citrate predicted by milk midinfrared spectra of Holstein cows in early lactation
Yansen Chen,1* Hongqing Hu, 1 Hadi Atashi,12 Clément Grelet, 3 Katrien Wijnrocx, 1 Pauline Lemal, 1 and Nicolas Gengler 1  matter intake (DMI), milk components, excreta, and body weight to calculate energy requirements and energy intake from nutrients.This approach is therefore not suitable for large-scale commercial assessment of NEB for dairy cows (Coffey et al., 2001;Friggens et al., 2007).The blood NEFA and BHB were proven as indicators for detecting NEB in dairy cows (Ospina et al., 2010;Zachut et al., 2020;Pires et al., 2022).However, both (NEFA and BHB) respond slower to NEB than milk citrate (Bjerre-Harpøth et al., 2012).When cows enter the NEB (reduced glucose), FA synthesis stops because it is a highly energy-consuming process.Citrate plays a role in this process that inhibits de novo FA synthesis in dairy cows (Garnsworthy et al., 2006).Therefore, citrate can be considered as a potential early biomarker for identifying NEB in dairy cows (Bjerre-Harpøth et al., 2012;Xu et al., 2020).
The first requirement for conducting a successful genetic selection for a given trait is to establish a method to measure the trait on a large number of animals at a low cost.Furthermore, it is necessary to estimate the genetic variation of the trait in the considered population to prove whether the trait is heritable.Despite the complexity of conventional methods for measuring milk citrate, novel mid-infrared (MIR) predictions can provide milk citrate concentration at the individual level on a large scale (Grelet et al., 2016).
Although there are some studies conducted on the genetic background of milk citrate in Montbéliarde cows (Sanchez et al., 2018(Sanchez et al., , 2019(Sanchez et al., , 2021)), the genetic background of citrate in Holstein cows has not yet been studied.However, citric acid, a citrate conjugate, has been shown to be heritable [heritability (h 2 ) ± standard error (SE), 0.54 ± 0.19] in small populations (n = 371) in Holstein cows and has been shown to be associated with metabolic energy (Buitenhuis et al., 2013).
Therefore, the objectives of this study were to (1) analyze the genetic parameters of milk citrate; (2) identify genomic regions associated with milk citrate through a single-step genome-wide association study (ssGWAS); (3) analyze the functional annotation of candidate genes and quantitative trait loci (QTL) related to milk citrate in Walloon Holstein cows.The potential relationships between milk citrate and traits of interest were also investigated.

Data
Phenotypic Data.All milk samples were collected by Elevéo (Awé groupe, Ciney, Belgium) from 2012 to 2019 during the official milk recording in the Walloon Region of Belgium.The milk samples were analyzed by MIR spectrometry (commercial instruments from FOSS) to generate MIR spectra.The milk spectra were harmonized into the common European format as described by Grelet et al. (2015) and then were used to predict milk citrate phenotypes (mmol/L, hereafter called citrate) for each MIR.The prediction equation developed by Grelet et al. (2016) was applied to the milk MIR of the present study.The coefficient of determination and root mean square error of validation for the citrate equation were 0.86 and 0.07 mmol/L, respectively.A total of 506 milk citrate reference data from 3 countries (Germany, France, and Luxembourg) and 3 cattle breeds (Holstein, Abondance, and Montbéliarde) was used to develop (380 samples) and validate (126 samples) the predicted model of milk citrate (Grelet et al., 2016).The citrate phenotypes were divided into 3 traits according to parity: citrate1 for the first parity, citrate2 for the second parity, and citrate3+ for the third to fifth parity.
To remove outliers, the filtering methods proposed by Chen et al. (2021) were used.Briefly, 1) the predicted MIR spectra, for which the standardized Mahalanobis distance between the MIR data and the calibration data set is ≤ 3, were retained; 2) the predicted value of citrate was restricted within the range of ± 3 standard deviations of the mean.Then, the citrate was restricted to the first 50 d in milk (DIM), a period in which most high-yield Holstein cows are in NEB.
Genotypic data Genotypic data related to cows with citrate phenotypes were extracted for 4,479 animals from the routine genetic evaluation system of Holstein cattle in the Walloon Region of Belgium.The used chip versions were BovineSNP50 K v1 to v3 (Illumina, San Diego, CA, USA).The single nucleotide polymorphisms (SNPs) common between all 3 chips were kept.Nonmapped SNPs, SNP located on sex chromosomes, and non-biallelic SNPs were excluded.A minimum GenCall Score of 0.15 and a minimum GenTrain Score of 0.55 were used to keep SNP (Wilmot et al., 2022).Next, genotypes were imputed to HD with a reference population of 4,352 HD individuals (1,046 bulls and 3,288 cows) using the FImpute V2.2 software (Sargolzaei et al., 2014).The SNPs with Mendelian conflicts, and those with minor allele frequency less than 5% were excluded.The difference between observed heterozygosity and that expected under Hardy-Weinberg equilibrium was estimated, and if the difference was greater than 0.15, the SNP was excluded (Wiggans et al., 2009) where y was the vector of citrate1, citrate2 and ci-trate3+.For each trait, h was the vector of fixed herdyear-season of calving classes; b was the vector of fixed regression coefficients for DIM, after standardization, and its quadratic; q was the vector of fixed regression coefficients of the age of calving, after standardization, defined as a constant (parity effect), linear, and quadratic regression, defined within parities (first to fifth parity); c was a vector of the non-genetic cow effect (within-parity permanent environment) random effects; p was a vector of non-genetic cow × parity effect (across-parity permanent environment) random effects, modeled only for citrate3+, as they allowed us to distinguish citrates for the same cow occurring during different parities (third to fifth parity); a was a vector of random additive genetic effects; and e was a vector of random residual effects.Additionally, H, X, Q, W 1 , W 2 , and Z were incidence matrices assigning observations to effects.More detailed information about the model can be found in Chen et al. (2021).To calculate the relationship between animals, the H relationship matrix was used.The H matrix combines pedigree (A) and genomic (G) based relationship matrices.The A is the numerator relationship matrix for all animals included in the pedigree; G is the genomic relationship matrix of genotyped animals obtained using the first formula described by VanRaden (2008): where Z is a matrix of gene content adjusted for allele frequencies (0, 1 or 2 for aa, Aa and AA, respectively); N is the number of SNPs; p i is the minor allele frequency of the ith SNP.
(Co)variance components were estimated by using the BLUPF90+ (version 2.42) program through the AI-REML method (Misztal et al., 2014).The h 2 , repeatability, and genetic and phenotypic correlations were calculated based on the estimated (co)variance components as previously described by Chen et al. (2021).Approximate SE of all calculated parameters were obtained according to the algorithm of Meyer and Houle (2013).

Genomic correlations among 3 classes of citrate across the whole genome
The de-regressed proofs (DRP) of animals were used in this section because not all animals had citrate phenotypes.Genomic breeding values (GEBV) of citrate and its reliability for all animals were calculated based on the estimated (co)variance components through BLUPF90+ (version 2.42).Only those genotyped animals (n = 4,435) that had a reliability of over 0.30 for the 3 included traits were used for the next step (Chen et al., 2021).The DRPs of selected animals were calculated based on the GEBVs through DEPROOFSF90 (version 1.4) from the method developed by Garrick et al. (2009).Each selected animal got 3 DRPs (DRP1, DRP2, DRP3+) for the 3 traits respectively (citrate1, citrate2, and citrate3+).A 3-trait marker-based BayesCΠ-model was used to estimate SNP effects for the predicted DRPs (Cheng et al., 2018a).The model was as follows: where y i is a vector of 3 DRPs for animal i; µ is a vector of overall means for 3 DRPs; p is the number of SNP; m ij is the genotype at SNP j (coded as 0, 1, 2) for animal i, with allele substitution effect D j β j , where D j is a diagonal matrix whose kth diagonal entry is an indicator variable indicating whether the marker effect of locus j for trait DRP is zero or nonzero (2 3 = 8 combinations), and β j follows a multivariate normal distribution; e i is a vector of random residuals of 3 DRPs for animal i, which is a priori assumed to be an independently and identically multivariate normal distribution.
The genomic correlation analyses were performed with the JWAS software (version 1.1.2)(Cheng et al., 2018b) using a Monte Carlo Markov Chain of length 50,000, with the first 5,000 iterations discarded as burnin.Genomic correlations between the studied traits across the whole genome (1 Mb as one moving genomic region) were estimated as the posterior mean of the correlation across animals between the sampled breeding values (1 Mb as one moving genomic region).Its standard deviation (SD) was calculated across the kept samples as the approximated SE (Cheng et al., 2022).The breeding values of each 1 Mb genomic region were calculated as the sum of its SNP genotypes multiplied by the sampled marker effects for each kept iteration.In addition, genomic covariances of each 1 Mb genomic region were calculated as sum of its SNP covariances for each kept iteration, and its SE was estimated by the SD across saved iterations.

Single-step genome-wide association study
The GEBV of genotyped animals were used in this section.The SNP effects for each trait (citrate1, 2, and 3+) were estimated by back-solving the animals' GEBV (Wang et al., 2012).The P-values of SNP were calculated as follows (Aguilar et al., 2019): where P-value i and âi are the P-value and effect of SNP i, respectively, and f is the cumulative standard normal function.The 8.83E-8 (0.05/566,170) was used as the significance threshold which was calculated based on the Bonferroni correction for multiple testing.Linkage disequilibrium (LD) (squared correlation coefficient, r 2 ) was calculated for significant SNPs.The SNP effects and P-values were calculated using POSTGSF90 software (version 1.73) (Misztal et al., 2014).

Functional annotation analysis
The protein-encoding genes within 50 kb of the significant SNPs were considered candidate genes for citrate.Based on the results of Cánovas et al. (2022), the position (coordinate) of significant SNPs on reference genome assembly UMD3.1 (the used chip version) was converted to the new position (coordinate) on the new reference genome assembly ARS-UCD1.2through the Lift Genome Annotations tool (https: / / genome .ucsc.edu/cgi -bin/ hgLiftOver).The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were carried out on the identified candidate gene sets through the g:Profiler website (Raudvere et al., 2019).Next, a protein-protein interaction (PPI) analysis was performed through STRING (Szklarczyk et al., 2021) to reveal the relationship between the identified candidate genes.The PPI relationship was based on text mining, experiments, database, co-expression, neighborhood, gene fusion, and co-occurrence, and the minimum required interaction score was set to 0.40 (Zhou et al., 2019).
To explore potential relationships between citrate and other traits, the same genomic regions identified for citrate were annotated with Cattle QTLdb (https: / / www .animalgenome.org/cgi -bin/ QTLdb/ BT/ index ,accessed on October 25, 2022) (Hu et al., 2019).At present, Cattle QTLdb has 170,536 QTL, which were divided into 6 classes including Exterior, Production, Health, Reproduction, Milk, Meat, and Carcass (https: / / www .animalgenome.org/cgi -bin/ QTLdb/ BT/ ontrait ?class _ID = 1).To avoid the deviation caused by the annotation richness of the different traits, the hypergeometric test approach was adopted for the enrichment analysis (Fonseca et al., 2020).The candidate genes and QTL annotations were performed using the GALLO package in R (Fonseca et al., 2020).In all enrichment analyses (GO, KEGG, QTL), the Benjamini-Hochberg method was used for multiple testing corrections.It should be noted that the Cattle QTL data set currently does not include NEB.
Furthermore, the genome region with the most significant impact on citrate was subjected to the GO, KEGG, and QTL analysis.The data preparation and processing were performed using R (version 4.1.2,https: / / www .r-project .org/).

Descriptive statistics and genetic parameters
Descriptive statistics of the studied traits are presented in Table 1.Citrate ranged from 3.97 to 14.18 mmol/L milk.The mean (SD) of citrate1, citrate2, and citrate3+ were 8.93 (1.48), 8.93 (1.63), and 9.18 (1.76) mmol/L, respectively.The average daily citrate showed a decreasing trend with increasing (DIM in early lactation (Figure 1) and remained relatively stable during the period from 20 to 32 DIM.
The estimated (co)variance components, h 2 , and repeatability of 3 classes of citrate are presented in Table 2.The h2 and repeatability of citrate ranged from 0.35 to 0.40 and from 0.43 to 0.44, respectively.The h 2 of Chen et al.: Genetic analysis of predicted milk citrate citrate decreased with advancing parity; however, its repeatability was stable.

Correlations among the different parities
The genetic and phenotypic correlations of 3 classes of citrate are reported in Table 3.The genetic and phenotypic correlations were around 0.98 and 0.41, respectively.
To demonstrate the consistency of the estimated genetic correlations, genomic covariances and correlations of citrate1, citrate2, and citrate3+ were calculated across genomic regions (1Mb) for the whole genome (Figure 2).The highest genomic covariances and correlations among the 3 classes of citrate were found in the same genomic region (Chr20 58.00 -59.00 Mb, UMD3.1;Chr20 57.94 -58.93 Mb, ARS-UCD1.2).The highest genomic covariance of the 3 classes of citrate was much higher than that of all other genomic regions (around 19 times).However, the highest genomic correlations between 3 classes of citrate were similar to that of all other genomic regions (around 0.99).

Single-step genome-wide association study
The Manhattan plots for the results of ssGWAS were presented in Figure 3A, and they were similar between the 3 classes of citrate.The number of significant SNPs associated with citrate1, citrate2, and citrate3+ were 589, 578, and 600, respectively, and 95% of these SNPs were common among the 3 classes of citrate (Figure 3B).These significant SNPs are distributed on Chr7, 14, and 20, respectively, and their positions are close when they are on the same Chr (Table 4).The -log10 P-values and position (in the UMD3.1 and ARS-UCD1.2) for each significant SNP associated with 3 classes of citrate are shown in supplementary file 1.
The LDs were calculated for all SNPs located in the identified genomic regions (Figure 4).Significant SNPs located in the genomic region identified in Chr7 were highly linked, whereas those located in the regions identified in Chr14 and 20 were not.

Functional annotation analysis
A respective total of 89, 88 and 88 candidate genes were identified for citrate1, citrate2 and citrate3+, while 99% of the identified genes were common among the 3 classes of citrate (Figure 3C).The number of candidate genes in the 3 identified genomic regions is presented in Table 4.The citrate1 lacked only the BASP1 gene on Chr20, as compared with citrate2 and citrate3+.
Figure 5 shows the PPI networks of the identified candidate genes.The candidate genes ARHGAP39-MYO10, DGAT1-ANKH, SHARPIN-OTULIN, and   BOP1-ZNF622 represent interactions between candidate genes in Chr14 and Chr20.The GO analyses results are presented in Table 5.The results identified 2 GO terms based on 89 candidate genes located on Chr7, 14 and 20 (Response to salt and LUBAC complex), and one GO term based on 16 candidate genes located on Chr20 (Inorganic diphosphate transmembrane transporter activity).For KEGG analyses, none of the pathways were found to be associated with the candidate genes for citrate.
The 3 selected genomic regions of citrate1, citrate2, and citrate3+ were significantly associated with the same 32 QTL, respectively (supplementary file 2).The classes and top 10 of 32 enrichment QTL are shown in Figures 6A and 6C.Milk and fat yields have the high-est QTL numbers.In addition, 9 QTL related to milk compositions and cow health traits were obtained when only analyzing the Chr20 53 -64 Mb region (Figures 6B and 6D).

Citrate potential for genetic selection
Citrate was considered as a proxy for NEB because it can be predicted accurately and cheaply in a large scale (Grelet et al., 2016), which is one of the requirements for the possibility of implementing genetic selection.The citrate was predicted by milk MIR and its prediction equation was suggested to be used for quantitative screening at the individual level (Grelet et al., 2021).Milk MIR is very inexpensive to obtain as it has been used worldwide to predict milk protein and fat percentages (Gengler et al., 2016).The range for predicted milk citrate obtained in this study was within the range of the reference data (directly measured values) from Grelet et al. (2016).Average citrate was relatively stable between 20 and 32 DIM (Figure 1), which is possibly due to a relatively stable period between cattle intake and metabolic needs.Our predicted mean values (8.93 -9.18 mmol/L) are higher than the predicted mean value (8.27 mmol/L for first parity) reported for Montbéliarde by Sanchez et al. (2018).The difference could be because we only used data from the early lactation (first 50 DIM), while they used data from the whole lactation (from 7 to 350 DIM).
The heritabilities of citrates in Holstein cows are high (0.35 -0.40), which is largely high enough for genetic selection.However, the heritabilities in this study were lower than those previously reported for Montbéliarde cows (0.48, Sanchez et al., 2018Sanchez et al., , 2021)), which may be due to breed differences.The heritabilities of citrates in this study decreased with increasing parity, which may be caused by the gradual increase in non-genetic effects, especially the residual variance (Table 2).The SD of citrate also increased with increasing parity, which also helps explain the increase in phenotypic variation of citrate (Table 1).
Genetic correlations between citrate in different parities in early lactation were close to 1.00, which suggests that the 3 classes of citrates can be considered as a single trait.Also, the results of ssGWAS and functional annotation confirm that citrate in the different parities is similar.

Genomic background of citrate
A total of 603 significant SNPs associated with citrate were identified, distributed in Chr7 (68.57-68.58 Mb), 14 (1.31 -3.05 Mb), and 20 (54.00 -64.28 Mb).The Chr7 region contains 5 consecutive SNPs in high LD (>0.80), suggesting that this region might be caused by one SNP (Chr7 68,574,155 bp) with the highest significance.The Chr14 region has a relatively small effect on citrate compared with the Chr20 region which showed the most significant impact on citrate.Sanchez et al. (2021) found similar results in Montbéliarde cows.The region on Chromosome 20 (55 -63 Mb in UMD3.1) was also found to be associated with subclinical ketosis (Nayeri et al., 2019).In contrast to our findings, Sanchez et al. ( 2021) identified a genomic locus on Chr11 associated with citrate.This difference found in Chr re-  gions may be explained by the different citrate content in breeds of cattle (Sundekilde et al., 2011).
Among the 89 candidate genes identified for citrate in this study, GPT, ANKH, and PPP1R16A have previously been reported to be related to milk citrate (Sanchez et al., 2019(Sanchez et al., , 2021)).Indeed, GPT encodes the glutamic pyruvic transaminase which converts cytosolic pyruvate and glutamate into α-ketoglutarate and alanine (Abla et al., 2020).The resulting α-ketoglutarate can then be converted into cytosolic citrate (Yoshimi et al., 2016).The ANKH encodes a membrane protein known to export pyrophosphate, but it seems also able to export citrate extracellularly (Szeri et al., 2020).Concerning PPP1R16A, it encodes a protein that interacts with the catalytic subunit of Ser/Thr protein phosphatase 1 (PP1) (Wang et al., 2019).Numerous potential substrates of PP1 were studied including the ATP-citrate lyase that converts citrate into oxaloacetate and acetyl-CoA (Ingebritsen and Cohen, 1983).
The DGAT1 was also identified as a candidate gene and is known to affect the blood NEFA (Oikonomou et al., 2009) which provides Acetyl-CoA for citrate synthesis (Van et al., 2020); OTULIN and SHARPIN are associated with individual immunity and inflammation (Damgaard et al., 2016;Zinngrebe et al., 2016) and citrate is considered as a key immunometabolite (Zotta et al., 2020).The PPI network of the identified 89 candidate genes shows an interaction between Chr14 and Chr20 regions (Figure 5), which helps to understand the relationship between genomic regions that regulate citrate.
As for the GO analysis, 2 GO terms (response to salt and inorganic bisphosphate transmembrane transporter activity) were enriched by the genes associated with citrate, probably because citrate regulates the balance  between Ca 2+ and H + ions (Sundekilde et al., 2011;Cánovas et al., 2013).The third enriched GO term, the LUBAC complex, stimulates the formation of linear ubiquitin chains, a signal that prevents inflammation and modulates immunity (Gerlach et al., 2011).This may be related to metabolic disease in cows that are in NEB for a long time.

Potential relationship between citrate and other traits
The top 10 of 32 significant QTL that were identified via QTL enrichment analysis are mainly related to milk and protein yield, and FA content, which may be the effect of genes located on Chr14 1.31 -3.05 Mb (e.g., DGAT1).This region has been reported to be associated with milk and protein yield, and FA in Holstein cows (Bouwman et al., 2011;Atashi et al., 2020).The top 10 significant QTL were all directly related to the NEB of dairy cows (Xu et al., 2018;Gross et al., 2019;Pires et al., 2022), except Lifetime profit index.However, NEB is related to the production and reproduction performance of dairy cows, both of which are directly related to the lifetime profit index (VanRaden, 2004).Six out of the top 10 significant QTL are related to FA; these results were expected because citrate participates in the NEB of dairy cows by regulating FA.McCabe et al. (2012) reported differential gene expression in the liver of dairy cows with severe NEB primarily associated with FA metabolism pathways.
When focusing on Chr20, 9 significant QTL were associated with milk composition and cow health (Figure 6B and 6D).As expected, milk citrate-related QTL from other reports (Sanchez et al., 2019(Sanchez et al., , 2021) ) et al., 2021).The enrichment of milk magnesium may be due to its significant association with Chr20 (54.00 -64.28 Mb) (Sanchez et al., 2021).Amino acid content in milk changes significantly when cows are in NEB (Xu et al., 2018(Xu et al., , 2020)).This may be the reason for the enrichment of milk α-lactalbumin percentage and colostrum albumin concentration.Numerous diseases including ketosis, abomasum displacement, and subacute respiratory acidosis are associated with the state of NEB (Esposito et al., 2014).These can explain why we found 3 of 9 significant QTL linked to citrate and diseases.

Direction of selection for citrate levels
Through the above discussion section, we can conclude that citrate has a potential correlation with the traits related to the NEB of dairy cows.This indirectly helps to demonstrate that citrate can be a potential proxy of NEB of dairy cows.This suggests an interest in selecting cows with lower citrate levels to indirectly select cows with better energy balance at the beginning of lactation.However, citrate levels are also associated with other issues of dairy cows.It has been reported that a decrease of milk citrate is observed during mastitis (Hyvönen et al., 2010).Choosing animals with lower citrate levels may increase the likelihood of selecting individuals with a higher incidence of mastitis.An alternative is to select animals with lower citrate levels only during early lactation, as this period is often linked to NEB, and the decrease in citrate levels is more clearly associated with mastitis during late lactation (Hyvönen et al., 2010).From a different point of view, different milk citrate levels are observed during heat stress events (higher or lower depending on the study) (Tian et al., 2016;Yue et al., 2020).Selecting lower milk citrate levels could also affect cow thermotolerance.This is supported by HSF1, a candidate gene in our study, which has been implied in the thermotolerance (Li et al., 2010;Lemal et al., 2023).
Future research perspectives could involve extending the genetic analysis to cover the entire lactation period of citrate, as the h 2 of energy balance varied significantly throughout the lactation period (Liinamo et al., 2012).Furthermore, the genetic correlations of citrate with other traits of interest, especially milk production and reference energy balance or its other proxies, could be explored.This study shows the potential relationship between citrate and other traits, which requires further confirmation of real genetic correlations.In addition, citrate is a proxy of NEB, so, we need to investigate the genetic correlation between citrate and NEB or its other proxies (e.g., BHB).It is important to know how citrate relates to other traits of interest before including it in a breeding program.

CONCLUSIONS
This study showed that milk citrate predicted by milk MIR spectra has the potential to be used as a proxy of NEB in dairy cows from the animal breeding perspective.If citrate is proved as a suitable proxy for NEB, this could be included as a selection index trait helping to reduce the disease incidence in Holstein cows.

ACKNOWLEDGMENTS
The China Scholarship Council (Beijing) is acknowledged for funding the PhD project of Yansen Chen and Hongqing Hu.The genotyping of animals (CDR "PRE-DICT-2 grant number J.0174.18) and the computation resources of the University of Liège-Gembloux Agro-Bio Tech (ULiège-GxABT, Gembloux, Belgium) were partly supported by the Fonds de la Recherche Scientifique (FRS-FNRS, Brussels, Belgium) which provided also support through the PDR projects "HTwoTHI" (grant number T.W005.23)and "DEEPSELECT" (grant number T.0095.19).The authors acknowledge the support of the Walloon Government (Service Public de Wallonie -Direction Générale Opérationnelle Agriculture, Ressources Naturelles et Environnement, SPW-DGARNE; Namur, Belgium) and the use of the computation resources of the ULiège-GxABT provided by the technical platform Calcul et Modélisation Informatique (CAMI) of the TERRA Teaching and Research Centre.The authors are grateful to Jian Cheng (North Carolina State University), Jiayi Qu (UC Davis), and Hao Cheng (UC Davis) for their help when we used the JWAS program.The authors declare that they have no competing interests.
Authors' contributions YC was responsible for the study design, data analysis and writing of the manuscript.HH estimated the genetic parameters of citrate and revised the manuscript.HA provided help in genotype data imputation and manuscript revision.CG provided the milk citrate data and revised the manuscript.KW and PL revised the manuscript.NG supervised the project and provided help in the study design and manuscript revision.

Figure 1 .
Figure 1.Average milk citrate phenotypes of Holstein cows with days in milk (DIM) Chen et al.: Genetic analysis of predicted milk citrate

Figure 2 .
Figure 2. Genomic covariances (left) and correlations (right) among 3 classes of milk citrate in the first 5 parities.PS: Citrate1 = milk citrate in the first parity; Citrate2 = milk citrate in the second parity; Citrate3+ = milk citrate from third to fifth parity; the black line represents the standard error value of the effect in the genetic window (1 Mb)

Figure 3 .
Figure 3. Manhattan plot (P-value of each SNP, A), number of significant SNP (B) and candidate genes (C) of milk citrate.PS: Citrate1 = milk citrate in the first parity; Citrate2 = milk citrate in the second parity; Citrate3+ = milk citrate from third to fifth parity; the red line in A is the threshold (7.05 = -log10(0.05/566,170)

Figure 4 .
Figure 4. Linkage disequilibrium between 61.656 Mb and 61.658 Mb, 1.31 Mb and 3.05 Mb, 54.00 Mb and 64.28 Mb on chromosome (chr.)7(A), 14 (B) and 20 (C) in genome assembly UMD3.1 associated with milk citrate in the first parity.the green line in A is the threshold (7.05 = -log10(0.05/566,170)) Figure5.Protein-protein interaction network of candidate genes of milk citrate.The zoom parts link candidate genes in chromosome 14 to candidate genes in chromosome 20.

Figure 6 .
Figure 6.Number of enrichment QTL classes and top 10 or 9 QTL associated with significant SNPs in 3 chromosomes (7, 14, and 20; A and C) or chromosome 20 only (B and D) of milk citrate. kov Chen et al.: Genetic analysis of predicted milk citrate . Finally, 566,170 out of 730,539 SNPs, distributed on 29 chromosomes (Chr), were kept.After filtering, 134,517 citrate phenotypes on 52,198 Holstein cows distributed in 774 farms remained.The number of citrate phenotypes (animals) in each parity is shown in Table 1.The pedigree related to the data

Table 1 .
Chen et al.: Genetic analysis of predicted milk citrate Range, mean, and standard deviation (SD) of milk citrate (mmol/L) of Holstein cows in the first 50 d in milk 1: Citrate1 = milk citrate in the first parity; Citrate2 = milk citrate in the second parity; Citrate3+ = milk citrate from third to fifth parity.

Table 2 .
Heritability, repeatability and (co)variance components 1 of milk citrate (mmol/L) of Holstein cows in the first 50 d in milk c = within-parity permanent environment (non-genetic cow) variance; σ 2 p = across-parity permanent environment (non-genetic cow x parity) variance, only for Citrate3+ traits; σ 2 e = residual variance.

Table 3 .
Genetic correlations (above the diagonal) and phenotypic correlations (below the diagonal) among three classes of milk citrate of Holstein cows in the first 50 d in milk 1: Citrate1 = milk citrate in the first parity; Citrate2 = milk citrate in the second parity; Citrate3+ = milk citrate from third to fifth parity.

Table 4 .
Chen et al.: Genetic analysis of predicted milk citrate Significant SNPs and candidate genes of milk citrate (mmol/L) of Holstein cows in the first 50 d in milk 1:Chr -chromosome ; 2 :a represents the same data as in citrate1; 3 : b represents the same data as in citrate1, except that BASP1 is missing.

Table 5 .
Gene Ontology (GO) terms enrichment analysis of candidate genes of milk citrate of Holstein cows in the first 50 d in milk