Advertisement

Diagnostic milk biomarkers for predicting the metabolic health status of dairy cattle during early lactation

  • S. Heirbaut
    Affiliations
    Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Faculty of Bioscience Engineering, Ghent University, 9000 Gent, Belgium
    Search for articles by this author
  • X.P. Jing
    Affiliations
    Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Faculty of Bioscience Engineering, Ghent University, 9000 Gent, Belgium

    State Key Laboratory of Grassland and Agro-Ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China
    Search for articles by this author
  • B. Stefańska
    Affiliations
    Department of Grassland and Natural Landscape Sciences, Poznań University of Life Sciences, Dojazd 11 Street, 60-632 Poznań, Poland
    Search for articles by this author
  • E. Pruszyńska-Oszmałek
    Affiliations
    Department of Animal Physiology, Biochemistry, and Biostructure, Poznań University of Life Sciences, Wołyńska 35 Street, 60-637 Poznań, Poland
    Search for articles by this author
  • L. Buysse
    Affiliations
    Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Faculty of Bioscience Engineering, Ghent University, 9000 Gent, Belgium
    Search for articles by this author
  • P. Lutakome
    Affiliations
    Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Faculty of Bioscience Engineering, Ghent University, 9000 Gent, Belgium

    School of Agricultural and Environmental Sciences, Mountains of the Moon University, PO Box 837, Fort Portal, Uganda

    Department of Agricultural Production, College of Agricultural and Environmental Sciences, Makerere University, PO Box 7062, Kampala, Uganda
    Search for articles by this author
  • M.Q. Zhang
    Affiliations
    Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Faculty of Bioscience Engineering, Ghent University, 9000 Gent, Belgium
    Search for articles by this author
  • M. Thys
    Affiliations
    ILVO, Scheldeweg 68, 9090 Melle, Belgium
    Search for articles by this author
  • L. Vandaele
    Affiliations
    ILVO, Scheldeweg 68, 9090 Melle, Belgium
    Search for articles by this author
  • V. Fievez
    Correspondence
    Corresponding author:
    Affiliations
    Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Faculty of Bioscience Engineering, Ghent University, 9000 Gent, Belgium
    Search for articles by this author
Open AccessPublished:November 07, 2022DOI:https://doi.org/10.3168/jds.2022-22217

      ABSTRACT

      Data on metabolic profiles of blood sampled at d 3, 6, 9, and 21 in lactation from 117 lactations (99 cows) were used for unsupervised k-means clustering. Blood metabolic parameters included β-hydroxybutyrate (BHB), nonesterified fatty acids, glucose, insulin-like growth factor-1 (IGF-1) and insulin. Clustering relied on the average and range of the 5 blood parameters of all 4 sampling days. The clusters were labeled as imbalanced (n = 42) and balanced (n = 72) metabolic status based on the values of the blood parameters. Various random forest models were built to predict the metabolic cluster of cows during early lactation from the milk composition. All the models were evaluated using a leave-group-out cross-validation, meaning data from a single cow were always present in either train or test data to avoid any data leakage. Features were either milk fatty acids (MFA) determined by gas chromatography (MFA [GC]) or features that could be determined during a routine dairy herd improvement (DHI) analysis, such as concentration of fat, protein, lactose, fat/protein ratio, urea, and somatic cell count (determined and reported routinely in DHI registrations), either or not in combination with MFA and BHB determined by mid-infrared (MIR), denoted as MFA [MIR] and BHB [MIR], respectively, which are routinely analyzed but not routinely reported in DHI registrations yet. Models solely based on fat, protein, lactose, fat/protein ratio, urea and somatic cell count (i.e., DHI model) were characterized by the lowest predictive performance [area under the receiver operating characteristic curve (AUCROC) = 0.69]. The combination of the features of the DHI model with BHB [MIR] and MFA [MIR] powerfully increased the predictive performance (AUCROC = 0.81). The model based on the detailed MFA profile determined by GC analysis did not outperform (AUCROC = 0.81) the model using the DHI-features in combination with BHB [MIR] and MFA [MIR]. Predictions solely based on samples at d 3 were characterized by lower performance (AUCROC DHI + BHB [MIR] + MFA [MIR] model at d 3: 0.75; AUCROC MFA [GC] model at d 3: 0.73). High predictive performance was found using samples from d 9 and 21. To conclude, overall, the DHI + BHB [MIR] + MFA [MIR] model allowed to predict metabolic status during early lactation. Accordingly, these parameters show potential for routine prediction of metabolic status.

      Key words

      INTRODUCTION

      During early lactation cows commonly encounter negative energy balance resulting in mobilization of body reserves as nonesterified fatty acids (NEFA;
      • Drackley J.K.
      ADSA Foundation Scholar Award: Biology of dairy cows during the transition period: The final frontier?.
      ). An excessive mobilization of adipose tissue is associated with metabolic diseases such as ketosis (
      • Ospina P.A.
      • Nydam D.V.
      • Stokol T.
      • Overton T.R.
      Evaluation of nonesterified fatty acids and β-hydroxybutyrate in transition dairy cattle in the northeastern United States: Critical thresholds for prediction of clinical diseases.
      ). Measuring the concentration of NEFA and BHB in blood is considered as the gold standard to monitor the metabolic health status (
      • Oetzel G.R.
      Monitoring and testing dairy herds for metabolic disease.
      ;
      • LeBlanc S.
      Monitoring metabolic health of dairy cattle in the transition period.
      ). When body reserves are being mobilized, NEFA concentration in the blood will increase. If this NEFA concentration exceeds the oxidizing ability of the liver, ketone bodies (e.g., BHB) are formed as a result of incomplete oxidation (
      • LeBlanc S.
      Monitoring metabolic health of dairy cattle in the transition period.
      ). In addition to BHB and NEFA, additional biochemical blood indices, such as glucose, insulin, and IGF-1 hormones, are important when evaluating success or failure of homeorhetic adaptation in early lactation (
      • Lucy M.C.
      Functional differences in the growth hormone and insulin-like growth factor axis in cattle and pigs: Implications for post-partum nutrition and reproduction.
      ). When single blood metabolites, such as BHB or NEFA, are being used for health monitoring, specific threshold values can be used. However, concomitant use of multiple biochemical blood indices prevents this threshold approach. Hence, unsupervised learning through cluster analysis recently got wide attention to combine multiple blood biomarkers holistically (
      • Tremblay M.
      • Kammer M.
      • Lange H.
      • Plattner S.
      • Baumgartner C.
      • Stegeman J.A.
      • Duda J.
      • Mansfeld R.
      • Döpfer D.
      Identifying poor metabolic adaptation during early lactation in dairy cows using cluster analysis.
      ;
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      ;
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ;
      • Foldager L.
      • Gaillard C.
      • Sorensen M.T.
      • Larsen T.
      • Matthews E.
      • O’Flaherty R.
      • Carter F.
      • Crowe M.A.
      • Grelet C.
      • Salavati M.
      • Hostens M.
      • Ingvartsen K.L.
      • Krogh M.A.
      Predicting physiological imbalance in Holstein dairy cows by three different sets of milk biomarkers.
      ). As such, cows are clustered according to their metabolic health status based on blood sampling and determination of biomarkers. Afterward, this grouping based on cluster analysis can be targeted by predictive models. This is in particular of interest because under practical farming conditions routine blood sampling for laboratory analysis is labor intensive, invasive, and expensive, hence limiting the application to individual animal cases or as herd monitoring tool. Therefore, predicting metabolic health status from milk parameters is of major interest. Such studies particularly used blood BHB or NEFA concentrations to diagnose metabolic health (e.g.,
      • van der Drift S.G.A.
      • Jorritsma R.
      • Schonewille J.T.
      • Knijn H.M.
      • Stegeman J.A.
      Routine detection of hyperketonemia in dairy cows using Fourier transform infrared spectroscopy analysis of β-hydroxybutyrate and acetone in milk in combination with test-day information.
      ;
      • Jorjong S.
      • van Knegsel A.T.M.
      • Verwaeren J.
      • Lahoz M.V.
      • Bruckmaier R.M.
      • De Baets B.
      • Kemp B.
      • Fievez V.
      Milk fatty acids as possible biomarkers to early diagnose elevated concentrations of blood plasma nonesterified fatty acids in dairy cows.
      ;
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      ), whereas only few more recent studies included a more extended set of reference blood parameters (e.g.,
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      ;
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ). However, those studies do not address the effect of early milk sampling, which particularly should be assessed in light of early warning of metabolic disorders. The prediction models developed by
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      included on-farm cow data, such as dry period length, parity, milk production traits, and BW, whereas the models by
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      relied on different sets of milk biomarkers, including milk mid-infrared (MIR) wavelengths, metabolites, and glycans. Although milk fatty acids (MFA) are associated with negative energy balance and metabolic status (
      • Jorjong S.
      • van Knegsel A.T.M.
      • Verwaeren J.
      • Lahoz M.V.
      • Bruckmaier R.M.
      • De Baets B.
      • Kemp B.
      • Fievez V.
      Milk fatty acids as possible biomarkers to early diagnose elevated concentrations of blood plasma nonesterified fatty acids in dairy cows.
      ;
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      ), they were not included as predictors in the studies of
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      and
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      . Despite their physiological correlation with metabolic status and hence their potential to improve predictions, gas-chromatographic analysis of MFA hampers their inclusion in practical models. An accurate quantification of some (categories of) MFA, routinely determined using MIR spectra (
      • Soyeurt H.
      • Dehareng F.
      • Gengler N.
      • McParland S.
      • Wall E.
      • Berry D.P.
      • Coffey M.
      • Dardenne P.
      Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries.
      ), could eliminate this analytical obstacle. However, MIR-based fatty acid (FA) quantification does not cover all FA, which could be determined by GC and is still under development. Hence, it is of interest to compare the model performances of MFA [GC]- and MFA [MIR]-based models.
      Obviously, an accurate diagnosis of metabolic health is paramount in this kind of studies. Estimation of the incidence (rather than prevalence) of metabolically impaired health requires repeated sampling and analysis of the reference blood parameters, particularly during the first 2 wk of lactation, which is considered the main risk period (
      • Benedet A.
      • Manuelian C.L.
      • Zidi A.
      • Penasa M.
      • De Marchi M.
      Invited review: β-hydroxybutyrate concentration in blood and milk and its associations with cow performance.
      ). However, the study by
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      is solely based on one blood sample during this critical period.
      Currently, the frequency of milk sampling for DHI is too low and not adapted to measuring during the critical period of metabolic problems. Targeting early lactating cows and a modified sampling frequency could solve this issue. In line with this the effect of days in milk at sampling on the predictive performance of the milk-based models should be assessed. Hence, the current research particularly focused on this early lactation period and responded to the current scientific gaps by inclusion of (1) repeated blood sampling as well as (2) repeated milk samplings to study potential differences in predictive performance based on sampling time.
      In this regard, the hypothesis of this research was that biomarkers in milk can be used to predict the metabolic status in early lactation, but predictive performance depends on the sampling day. Also, we hypothesized predictions using MIR-based features have similar predictive performance compared with GC features. Accordingly, the objective of this research was to build and compare different classes of models to predict metabolic status based on cluster analysis during early lactation using milk composition. Features used in the predictive models are either MFA determined by GC or routinely determinable features during a DHI analysis.

      MATERIALS AND METHODS

      Animals and Management

      The experiment was performed at the research farm of ILVO (Melle, Belgium) from October 2018 until October 2020. The experiment (2018/329) was approved by the Ethics Committee of ILVO. Only multiparous Holstein-Friesian cows (n = 120 lactations) were monitored in this study. After the experimental phase, the data from 3 cows were excluded because of missing data or death during the experiment, resulting in 117 lactations. These lactations contained data from 99 unique cows, from which 18 cows were monitored during 2 lactations. Cows were studied until d 23 in lactation.
      Dry cows and lactating cows were housed separately in a naturally ventilated freestall barn with slatted floor. Stocking density was always <1 cow/cubicle. From imminent calving (e.g., pelvic ligament relaxation, teat filling) until d 3 after calving, cows were housed in maternity pens with straw bedding within the same building. In case of severe disease (reported cases which required intervention by veterinarian or farm staff), cows stayed for a longer period in these pens. During the trial 24 cows were diagnosed as suffering from one or multiple of the following clinical health issues: displaced abomasum (n = 10), hypocalcemia (n = 9), clinical ketosis (n = 9), uterine infection (n = 6), mastitis (n = 5), and other (n = 6).
      From 3 wk before calving, cows received the partial mixed ration of the lactating cows supplemented with a dry cow mineral premix and on average 1 kg of balanced concentrate per cow per day. Belgian-Dutch energy and protein evaluation systems were used: protein digestible at the level of the small intestine requirements and supply were assessed according to the DVE-system (
      • Van Duinkerken G.
      • Blok M.C.
      • Bannink A.
      • Cone J.W.
      • Dijkstra J.
      • Van Vuuren A.M.
      • Tamminga S.
      Update of the Dutch protein evaluation system for ruminants: The DVE/OEB2010 system.
      ) and net energy requirements and supply according to the VEM-system (
      • Van Es A.J.H.
      Feed evaluation for dairy cows.
      ). The partial mixed ration of the lactating cows was calculated for an average adult cow of 650 kg, producing 26 kg of fat- and protein-corrected milk and was based on maize silage, prewilted grass silage, pressed beet pulp, and soybean meal to balance digestible protein (DVE)- and net energy (VEM)-requirements and supplemented with a VEM-DVE-balanced concentrate. The supply of the balanced concentrate changed according to lactation stage and slightly changed during the course of the experiment (running over 2 yr) in relation to the quality and feed values of the used silages. Concentrate intake started at 1.7 kg of balanced concentrates, 0.2 kg formaldehyde-treated soybean meal (Covasoy Braz., FeedValid), 0.3 kg of soybean meal at d 3 after calving. Formaldehyde-treated soybean meal was increased over a period of 7 d to 1 kg, whereas the balanced concentrate was increased to 6 kg over a period of 20 d. Detailed information about the chemical composition and concentrate increment over time is given in Supplemental Tables S1 and S2 (https://doi.org/10.6084/m9.figshare.19626474; Heirbaut et al., 2022). Individual feed intake was monitored throughout the trial using roughage intake control feeding bins (Insentec, Hokofarm Group), except during the period around calving. During lactation, concentrate intake was monitored at the automatic concentrate providers (Greenfeed, C-Lock Inc.; DeLaval) and in the herringbone milking parlor (DeLaval). The cows had access to water ad libitum.

      Blood and Milk Samples

      Blood sampling took place at d 3 (mean ± SD; 3.1 ± 0.28), 6 (5.9 ± 0.55), 9 (9.1 ± 0.49), and 21 (21.0 ± 0.70) after calving using an 18-gauge needle and Venoject system (BD Diagnostics). Samples were taken from the coccygeal vessels (d 3, 6, and 9) or the jugular vein (d 21) in the morning between 0915 and 0945 h (around 1.5 h after feeding). Blood was collected in plasma NaF (4 mL) and serum blood tubes (10 mL; SSTTM II Advance tubes; BD Diagnostics). Serum tubes were kept at room temperature for 30 min before centrifuging and NaF tubes were kept cooled until centrifuging. The NaF tubes were centrifuged at 1,000 × g for 10 min and serum tubes at 1,500 × g for 15 min at room temperature. Serum and plasma samples were divided into aliquots and stored at −20°C [analyses serum: BHB, NEFA, and insulin; analyses plasma (NaF): glucose] or −80°C (analysis serum: IGF-1). Glucose, BHB, and NEFA were analyzed using a Gallery Discrete Analyzer (ThermoFisher Scientific). The BHB was analyzed using kit RB1007 (Randox Laboratories Ltd.), NEFA using kit FA115 (Randox Laboratories Ltd.), and glucose with kit 981779 (ThermoFisher Scientific) by the laboratory of Dierengezondheidszorg Vlaanderen (Belgium). In brief, the determination of BHB concentration is based on the oxidation of D-3-hydroxybutyrate to acetoacetate by the enzyme 3-hydroxybutyrate dehydrogenase. During this reaction NAD+ is reduced to NADH and the associated change of absorbance, measured at 340 nm, can be directly correlated with the D-3-hydroxybutyrate concentration. The analysis of NEFA is based on the reaction with co-enzym A and ATP, in a reaction catalyzed by acyl CoA synthetase. The formed acyl CoA is further oxidized in the presence of acyl CoA oxidase, leading to the formation of hydrogen peroxide. In the presence of 4-aminoantipyrine and TOOS [N-ethyl-N-(2hydroxy-3-sulphopropyl)m-toluidine], the hydrogen peroxide reacts, resulting in a purple color, of which the absorbance is read at 540 nm. Analysis of glucose is based on a phosphorylation by ATP, in a reaction catalyzed by hexokinase. The glucose-6-phosphate formed is oxidized to 6-phosphogluconate by glucose-6-phosphate dehydrogenase. In this same reaction an equimolar amount of NAD is reduced to NADH, with a resulting increase in absorbance at 340 nm. Serum IGF-I was analyzed by an RIA method using the nonextraction IGF-1 IRMA DSL-2800 (LifeSpan Biosciences) by Poznań University of Life Sciences. The concentration of this hormone was determined with isotope J125 by using the automatic gamma radiation reader (Wizard2 2-Detector Gamma Counter, Perkin Elmer). In this method the peptide being determined is “sandwiched” between 2 antibodies. The tubes were coated by the first antibody. The second one [anti-IGF-1 (J-125) reagent] was radiolabeled for detection. The analytic peptide (IGF-1) present in blood serum, standards and controls were bound by both of the antibodies to form a “sandwich” complex. Unbound compounds were removed by decanting. The Mercodia bovine insulin ELISA kit (Bio-connect Diagnostics) was used for insulin analysis.
      Dairy cows were milked twice daily, starting at 0530 h and 1630 h in a 2 × 7 herringbone milking parlor, and their milk yield (kg/d) was recorded electronically. To determine milk performance, milk samples (27 mL) were collected from the cows in a representative way during the morning milking, at daily basis from d 3 to 23 after calving. Samples were stored at 4°C and contained preservatives (sodium azide, maximum concentration 0.02% m/m and bronopol, maximum concentration 0.005% m/m). Milk fat, protein, lactose, urea, BHB, SCC, SFA, UFA, MUFA, and total C18:1 were determined by Qlip laboratory (Zutphen, the Netherlands), which performed routine DHI analysis by means of Fourier transform infrared spectrometry (Milkoscan FT6000, Foss Electric). Milk fat, protein, lactose, and urea were determined according to ISO 9622:2013 (
      • ISO. (International Organization for Standardization)
      Milk and liquid milk products—Guidelines for the application of mid-infrared spectrometry. ISO 9622:2013.
      ). The SCC was analyzed according to ISO 13366–2:2006 (
      • ISO. (International Organization for Standardization)
      Milk—Enumeration of somatic cells—Part 2: Guidance on the operation of fluoro-opto-electronic counters. ISO 13366-2:2006.
      ). Milk BHB and FA were estimated based on the MIR spectra from in-house established equations. Previously MFA predictions based on similar versions of those in-house models by Qlip have also been used in
      • van Gastelen S.
      • Dijkstra J.
      Prediction of methane emission from lactating dairy cows using milk fatty acids and mid-infrared spectroscopy.
      and
      • Tremblay M.
      • Kammer M.
      • Lange H.
      • Plattner S.
      • Baumgartner C.
      • Stegeman J.A.
      • Duda J.
      • Mansfeld R.
      • Döpfer D.
      Prediction model optimization using full model selection with regression trees demonstrated with FTIR data from bovine milk.
      . In Supplemental Table S3 (https://doi.org/10.6084/m9.figshare.19626474; Heirbaut et al., 2022), a comparison between the FA classed predicted by MIR and analyzed by GC is reported for the current data set.
      In addition to the analysis done by Qlip, a second sample, containing the same preservatives as the sample for DHI and stored at −20°C, was analyzed by GC at d 3, 6, 9, and 21 to determine the detailed MFA composition as described by
      • Jing L.
      • Dewanckele L.
      • Vlaeminck B.
      • Van Straalen W.M.
      • Koopmans A.
      • Fievez V.
      Susceptibility of dairy cows to subacute ruminal acidosis is reflected in milk fatty acid proportions, with C18:1 trans-10 as primary and C15:0 and C18:1 trans-11 as secondary indicators.
      . Quantification of FA methyl esters was based on the conversion of peak areas to the mass proportion of FA by a theoretical response factor for each FA (
      • Ackman R.
      • Sipos J.
      Application of specific response factors in the gas chromatographic analysis of methyl esters of fatty acids with flame ionization detectors.
      ;
      • Wolff R.L.
      • Bayard C.C.
      • Fabien R.J.
      Evaluation of sequential methods for the determination of butterfat fatty acid composition with emphasis ontrans-18:1 acids. Application to the study of seasonal variations in French butters.
      ).

      Data Processing

      Data were processed using R (
      • R Core Team
      R: The R project for statistical computing.
      ), version 4.0.2 and RStudio, version 1.3.959. Data reading, manipulation, exploration, tabular reporting, and visualization were done with the packages readxl (v1.3.1;
      • Wickham H.
      • Bryan J.
      Readxl: Read Excel files. R package version 1.3.1.
      ), tidyverse (v1.3.1;
      • Wickham H.
      • Averick M.
      • Bryan J.
      • Chang W.
      • McGowan L.D.
      • François R.
      • Grolemund G.
      • Hayes A.
      • Henry L.
      • Hester J.
      • Kuhn M.
      • Pedersen T.L.
      • Miller E.
      • Bache S.M.
      • Müller K.
      • Ooms J.
      • Robinson D.
      • Seidel D.P.
      • Spinu V.
      • Takahashi K.
      • Vaughan D.
      • Wilke C.
      • Woo K.
      • Yutani H.
      Welcome to the Tidyverse.
      ), ggplot2 (v3.3.5;
      • Wickham H.
      Ggplot2: Elegant Graphics for Data Analysis.
      ), Ggally (v1.5.0;
      • Schloerke B.
      • Cook D.
      • Larmarange J.
      • Briatte F.
      • Marbach M.
      • Thoen E.
      • Elberg A.
      • Toomet O.
      • Crowley J.
      • Hofmann H.
      • Wickham H.
      GGally: Extension to “Ggplot2.”.
      ), skimr (v2.1.3;
      • Waring E.
      • Quinn M.
      • McNamara A.
      • de la Rubia E.A.
      • Zhu H.
      • Lowndes J.
      • Ellis S.
      • McLeod H.
      • Wickham H.
      • Müller K.
      • Kirkpatrick C.
      • Brenstuhl S.
      • Schratz P.
      • lbusett
      • Korpela M.
      • Thompson J.
      • McGehee H.
      • Roepke M.
      • Kennedy P.
      • Possenriede D.
      • Zimmermann D.
      • Butts K.
      • Torges B.
      • Saporta R.
      Skimr: Compact and flexible summaries of data.
      ), data.table (v1.14.0;
      • Dowle M.
      • Srinivasan A.
      data.table: Extension of “data.frame.” R package version 1.14.0.
      ), and flextable (v0.6.6;
      • Gohel D.
      Flextable: Functions for tabular reporting. R package version 0.6.6.
      ).

      Clustering

      K-means clustering was used to cluster cows based on blood BHB, NEFA, glucose, insulin, and IGF-1. Due to the outlier sensitivity of the k-means clustering algorithm a winsorizing strategy was used for IGF-1, using a fixed upper threshold of 250 ng/mL (96.5 percentile). Insulin concentrations below the detection limit were imputed by the minimum concentration in the data set. The k-means clustering was performed using the algorithm of
      • Hartigan J.A.
      • Wong M.A.
      Algorithm AS 136: A k-means clustering algorithm.
      . Before clustering, all variables [20 variables: 5 blood parameters × 4 sampling points (d 3, 6, 9, and 21)] were transformed by taking the natural logarithm to adjust skewed distributions. Rather than using the daily blood values for clustering, the mean and range across the different sampling days were determined for each blood parameter to reduce the dimensionality (to 10 parameters) but to keep the information about the within-animal variation. Finally, the clustering parameters were standardized (mean zero, unit variance), because the Euclidean distance is sensitive to the scale and units of the variables included. As such, each cow was allocated to a single cluster based on the 10 blood parameters. Cows monitored during 2 lactations were clustered separately for each lactation. The number of clusters was based on the elbow method, a visual inspection of the total within sum of the squares as a function of the number of clusters, using the package factoextra (v1.0.7;
      • Kassambara A.
      • Mundt F.
      Factoextra: Extract and visualize the results of multivariate data analyses.
      ). Clustering was done using the base R k-means function (
      • R Core Team
      R: The R project for statistical computing.
      ) with 100 random start centers and a maximum number of iterations of 100. Afterward, clustering stability was assessed using a bootstrap resampling scheme with 100 iterations and by calculating the Jaccard similarities of the original clusters to the clusters based on the bootstrapped data (package fpc; v2.2–5;
      • Hennig C.
      Fpc: Flexible procedures for clustering.
      ). During clustering all lactations were considered as independent. Two different clusters were found, postclustering denoted as clusters metabolically “balanced” and “imbalanced” cluster.

      Mixed Effect Modeling

      Linear mixed effect models using the package lme4 (v1.1-26;
      • Bates D.
      • Mächler M.
      • Bolker B.
      • Walker S.
      Fitting linear mixed-effects models using Ime4.
      ) were used to evaluate the effect of cluster on the different milk parameters. Cow was defined as a random effect, resulting in a compound symmetry structure of the model residuals. The days of sampling, cluster, and parity were used as fixed effects and forced into the models. The inclusion of the 2-way interaction cluster × sampling day was evaluated based on Akaike information criterion. The P-values were obtained via Satterthwaite's degrees of freedom method using the package lmerTest (v3.1-2;
      • Kuznetsova A.
      • Brockhoff P.B.
      • Christensen R.H.B.
      LmerTest: Tests in linear mixed effects models.
      ). Packages emmeans (v1.4.7;
      • Lenth R.V.
      • Buerkner P.
      • Herve M.
      • Love J.
      • Miguez F.
      • Riebl H.
      • Singmann H.
      Emmeans: Estimated marginal means, aka least-squares means.
      ) and multcomp (v1.4-13;
      • Hothorn T.
      • Bretz F.
      • Westfall P.
      Simultaneous inference in general parametric models.
      ) were used for Tukey post hoc tests and calculating least squares means. Normality of model residuals was checked visually using Q-Q plots constructed using package car (v3.0-10;
      • Fox J.
      • Weisberg S.
      • Price B.
      • Adler D.
      • Bates D.
      • Baud-Bovy G.
      • Bolker B.
      • Ellison S.
      • Firth D.
      • Friendly M.
      • Gorjanc G.
      • Graves S.
      • Heiberger R.
      • Krivitsky P.
      • Laboissiere R.
      • Maechler M.
      • Monette G.
      • Murdoch D.
      • Nilsson H.
      • Ogle D.
      • Ripley B.
      • Venables W.
      • Walker S.
      • Winsemius D.
      • Zeileis A.
      • R-Core
      Car: Companion to applied regression.
      ). In case the assumption of normality was violated, variables were square root transformed (BHB, C12:1). Reported parameters were expressed as least squares means ± standard error and were back-transformed values if necessary.

      Predictive Modeling

      To compare the predictive value of the different milk biomarkers, different classes of models were built using data from d 3, 6, 9, and 21 (n = 464 observations). These models were trained to determine whether a cow is considered imbalanced or balanced based on individual milk samples (i.e., d3, 6, 9, or 21). Reference classification of the balanced and imbalanced group was based on the k-means clustering procedure, which relied on the average and range of the 5 blood parameters of all 4 sampling days as explained before. The following classes of predictive models were constructed: (1) MFA [MIR] (MFA based on MIR: saturated, unsaturated, monounsaturated MFA and total C18:1); (2) DHI (fat, protein, fat/protein ratio, lactose, urea, SCC); (3) DHI + BHB [MIR] (features of the DHI model + BHB [MIR]); (4) DHI + MFA [MIR] (features of the DHI model + MFA [MIR]); (5) DHI + BHB [MIR] + MFA [MIR] (features of the DHI model + BHB [MIR] + MFA [MIR]); (6) MFA [GC] model (all 62 MFA based on GC analysis, as well as the saturated, unsaturated, poly-unsaturated, de novo, total C18:1, and odd- and branched-chain FA and the ratio cis-9 C18:1/C15:1 were included). All models also included parity information and day of sampling. No variable selection step was performed: all features were kept in the models.
      Random forest models were constructed using the packages caret (v6.0–84;
      • Kuhn M.
      • Wing J.
      • Weston S.
      • Williams A.
      • Keefer C.
      • Engelhardt A.
      • Cooper T.
      • Mayer Z.
      • Kenkel B.
      • R Core Team
      • Benesty M.
      • Lescarbeau R.
      • Ziem A.
      • Scrucca L.
      • Tang Y.
      • Candan C.
      • Hunt T.
      Caret: Classification and regression training.
      ) and ranger (v0.13.1;
      • Wright M.N.
      • Ziegler A.
      ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.
      ). To increase computational speed the models were trained using a local parallel backend using base R parallel package (
      • R Core Team
      R: The R project for statistical computing.
      ).
      Two modeling approaches were followed. The first modeling approach aimed at comparing the performance for the different types of models. For this purpose, a cross-validation approach with a default parameter tuning was performed for all models. There was no separate test set, because the number of available datapoints for the MFA [GC] model was limited (d 3, 6, 9, and 21). Models were evaluated using 10-fold cross-validation, 15 repeats. Folds were stratified using total C18:1MIR concentration in the milk to ensure a physiologically representative data distribution for model training. Number of trees was set fixed at 500. Minimal node size was kept constant at a value of 1. Typically, random forests do not account for whether different observations are repeated measures, increasing the risk of data leakage (i.e., information from the training data set leaking to the test set and increasing the predictive performance). To avoid such issues, all data from a single cow (repeated samplings within one lactation, but also multiple lactations from the same cow) were blocked either in the training or in the test data. As such, the model was always evaluated on data from cows which were not present during the training process. Model performance was calculated global across all sampling days as well as for each sampling day separately. For each model receiver operating characteristic (ROC) and precision recall (PR) curves were constructed and the AUC was calculated to evaluate the model performance using the ROCR package (v1.0–11;
      • Sing T.
      • Sander O.
      • Beerenwinkel N.
      • Lengauer T.
      ROCR: Visualizing the classifier performance in R.
      ).
      The second modeling approach was a more extended modeling for the DHI + BHB [MIR] + MFA [MIR] model, combined with the use of a separate test set (from the same experiment) to evaluate the model performance in a more robust way. MIR-data were available on a daily basis in the period d 3 till 23 (n = 2,312). Those daily data were all used as individual data records to train the MIR-based models. Again, data from a single cow were always included either in the training or in the test data to avoid any data leakage. This modeling approach concerned an extended parameter tuning on train data using adaptive resampling of the tuning parameter grid and a total of 100 different tuning combinations. Number of trees was set fixed at 500. Adaptive resampling was performed to tune: (1) number of variables randomly sampled as candidates at each split, (2) splitting rule (gini, extratrees and hellinger), and (3) minimal node size. Finally, the best performing model was selected and evaluated on a separate test. This approach permitted to train the best model and evaluate it on independent test data. The test set contained 20% of the balanced cows and 20% of the imbalanced cows.

      RESULTS

      Summary Descriptive Data

      The cows enrolled in the study had a median parity of 3 (range 2–7). The mean daily milk yield (d 3 to 23) was 36.7 ± 6.91 kg (mean ± SD), with an average fat content of 5.1 ± 1.33 g/100 g of milk and an average protein content of 3.6 ± 0.50 g/100 g of milk. The average daily total DMI was 20.5 ± 4.18 kg.
      In Supplemental Tables S4 and S5 (https://doi.org/10.6084/m9.figshare.19626474; Heirbaut et al., 2022) the descriptive information about the blood values and clinical health disorders are given respectively.

      Cluster Analysis

      K-means clustering resulted in 2 clusters with a high cluster stability (Jaccard indices after resampling with bootstrap scheme: 0.81 and 0.89). Based on the values of the blood parameters, the clusters were labeled as imbalanced (n = 42) and balanced (n = 75). An overview of the blood parameters and clusters is given in Table 1. The imbalanced cluster was characterized by high BHB and NEFA and low glucose, insulin, and IGF-1 concentrations. At d 6, 9, and 21 the average BHB concentration exceeded 1.20 mmol/L in the imbalanced cluster, whereas the average BHB did not exceed this threshold at any of the time points in the balanced cluster. The imbalanced cluster was characterized by 31, 50, 64, and 69% observations with a BHB concentration above 1.20 mmol/L at d 3, 6, 9, and 21 respectively, whereas for the balanced cluster it was 0, 0, 4, and 1% respectively. On all postpartum sampling days, the average NEFA concentration was higher than 0.60 mmol/L in the imbalanced cluster and not in the balanced cluster. The imbalanced cluster was characterized by 69, 71, 67, and 64% observations with a NEFA concentration above 0.60 mmol/L at d 3, 6, 9, and 21 respectively, whereas for the balanced cluster this was 32, 45, 31, and 31% respectively. A chi-squared contingency table test did not show differences in parity distribution (defined as parity 2, 3, and >3) across the clusters (P = 0.13). The imbalanced and balanced cluster contained 7 versus 3 cases of displaced abomasum, 6 versus 3 cases of hypocalcemia, 6 versus 1 case(s) of clinical ketosis (based on clinical observations without considering blood BHB concentrations), 4 versus 1 case(s) of uterine infection, 1 versus 3 case(s) of mastitis and 3 versus 3 cases of other diseases/health problems (data are represented as disease cases, multiple disease cases for the same cow are possible), respectively. Generally, cows in the metabolically imbalanced cluster had a 2.97 higher relative risk (95% CI: 1.43–6.21) to develop one of the mentioned health problems.
      Table 1Summary of the blood parameters of the metabolically imbalanced (IMB) and balanced (BAL) clusters at d 3, 6, 9, and 21 in lactation (LSM ± SE); P-values for the fixed effects of cluster and DIM, and the possible interaction cluster × DIM, are reported as well
      Day of blood sampling and cluster were forced into the models. Two-way interaction was evaluated and omitted if P > 0.10. In case of significant interaction, a Tukey post hoc test was performed.
      DIMClusterBHB
      Variables were ln-transformed.
      (mmol/L)
      NEFA
      Variables were ln-transformed.
      (mmol/L)
      Glucose (mmol/L)Insulin
      Variables were ln-transformed.
      (ng/mL)
      IGF-1
      Variables were ln-transformed.
      (ng/mL)
      3IMB1.02 ± 0.059*0.71 ± 0.0523.08 ± 0.076*0.13 ± 0.01371.4 ± 5.46*
      BAL0.75 ± 0.0330.45 ± 0.0273.31 ± 0.0570.22 ± 0.019106.7 ± 6.11
      6IMB1.25 ± 0.072*0.74 ± 0.0533.02 ± 0.0760.12 ± 0.01262.2 ± 4.77*
      BAL0.81 ± 0.0350.46 ± 0.0283.12 ± 0.0570.21 ± 0.018100.1 ± 5.74
      9IMB1.57 ± 0.091*0.76 ± 0.0552.60 ± 0.076*0.11 ± 0.01169.4 ± 4.94*
      BAL0.78 ± 0.0340.48 ± 0.0283.08 ± 0.0570.20 ± 0.01789.0 ± 5.10
      21IMB1.58 ± 0.091
      Significant differences (P < 0.05) at each DIM, which were tested when cluster × DIM interaction was significant.
      0.74 ± 0.0532.88 ± 0.076
      Significant differences (P < 0.05) at each DIM, which were tested when cluster × DIM interaction was significant.
      0.16 ± 0.01657.1 ± 4.37
      Significant differences (P < 0.05) at each DIM, which were tested when cluster × DIM interaction was significant.
      BAL0.79 ± 0.0340.46 ± 0.0283.23 ± 0.0570.27 ± 0.02397.0 ± 5.56
      Fixed effects
       ClusterP < 0.001P < 0.001P < 0.001P < 0.001P < 0.001
       DIMP < 0.001P = 0.788P < 0.001P < 0.001P < 0.001
       Cluster × DIMP < 0.001P = 0.014P = 0.017
      1 Day of blood sampling and cluster were forced into the models. Two-way interaction was evaluated and omitted if P > 0.10. In case of significant interaction, a Tukey post hoc test was performed.
      2 Variables were ln-transformed.
      * Significant differences (P < 0.05) at each DIM, which were tested when cluster × DIM interaction was significant.

      Predictive Models at Days 3, 6, 9, and 21

      In Figure 1, Figure 2 the ROC and PR curves for the different models predicting cows in the metabolically imbalanced cluster are reported. The DHI model was characterized by the lowest AUCROC (0.69) and AUCPR (0.53). The inclusion of BHB or MFA [MIR] resulted in important predictive improvement (DHI + BHB [MIR] model: AUCROC 0.79 and AUCPR 0.69; DHI + MFA [MIR] model: AUCROC 0.77 and AUCPR 0.65). Combining BHB [MIR] as well as MFA [MIR] with DHI resulted in additional predictive gain (AUCROC 0.81 and AUCPR 0.72). A model based on MFA determined by GC (AUCROC 0.81 and AUCPR 0.70) performed better compared with the MFA [MIR] model (AUCROC 0.75 and AUCPR 0.62), but did not outperform the model combining DHI, BHB [MIR] and MFA [MIR]. In the latter model the 4 most important features were: BHB [MIR] and the FA classes (predicted from MIR spectra): saturated, monounsaturated, and total C18:1 (Figure 3). In Figure 4, the 20 most important features of the GC model are reported. The 5 most important predictors were C11:0, C12:0, C14:0, C15:0, and C10:0. The relative variable importance of the ratio cis-9 C18:1/C15:0 and C4:0 also exceeded 50.
      Figure thumbnail gr1
      Figure 1Receiver operating characteristic (ROC) curves and area under the ROC curve (values presented between brackets in the legend) of different random forest prediction models predicting metabolically imbalanced cows using milk composition. K-means clustering, based on averages and ranges of the 4 blood samplings, was used to determine metabolic imbalance, whereas milk models to predict imbalanced cows relied on features of individual milk samples (i.e., d 3, 6, 9, or 21). Results are based on a 10-fold leave-group-out cross-validation with 15 repeats, and include all the hyperparameters investigated during the training process. Models are trained and evaluated based on data from d 3, 6, 9, and 21 (n = 464 observations). The following prediction models were evaluated: (1) DHI-based features (fat, protein, fat/protein ratio, lactose, urea, SCC); (2) DHI and BHB based on mid-infrared (MIR) spectra (DHI + BHB [MIR]); (3) DHI and milk fatty acids (MFA) based on MIR spectra (DHI + MFA [MIR]); (4) DHI, BHB [MIR] and MFA based on MIR spectra (DHI + BHB + MFA [MIR]); (5) MFA based on MIR spectra (MFA [MIR]); and (6) MFA based on GC (MFA [GC]).
      Figure thumbnail gr2
      Figure 2Precision recall (PR) curves and area under the PR curve (values presented between brackets in the legend) of different random forest prediction models predicting metabolically imbalanced cows using milk composition. K-means clustering, based on averages and ranges of the 4 blood samplings, was used to determine metabolic imbalance, whereas milk models to predict imbalanced cows relied on features of individual milk samples (i.e., either d 3, 6, 9, or 21). Results are based on a 10-fold leave-group-out cross-validation with 15 repeats, and include all the hyperparameters investigated during the training process. Models are trained and evaluated based on data from d 3, 6, 9, and 21 (n = 464 observations). The following prediction models were evaluated: (1) DHI-based features (fat, protein, fat/protein ratio, lactose, urea, SCC); (2) DHI and BHB based on mid-infrared (MIR) spectra (DHI + BHB [MIR]); (3) DHI and milk fatty acids (MFA) based on MIR spectra (DHI + MFA [MIR]); (4) DHI, BHB [MIR] and MFA based on MIR spectra (DHI + BHB + MFA [MIR]); (5) MFA based on MIR spectra (MFA [MIR]); and (6) MFA based on GC (MFA [GC]).
      Figure thumbnail gr3
      Figure 3Scaled variable importance of the features in the DHI + BHB [MIR]+ MFA [MIR] model. MIR = mid-infrared; MFA = milk fatty acids.
      Figure thumbnail gr4
      Figure 4Scaled variable importance of the 20 most important features in the MFA [GC] model. MFA = milk fatty acids.
      In Supplemental Table S6 (https://doi.org/10.6084/m9.figshare.19626474; Heirbaut et al., 2022) the milk composition and MFA are compared between metabolically imbalanced and balanced cows. The milk BHB [MIR] concentration was higher in metabolically imbalanced cows at d 6, 9, and 21 (Tukey post hoc test P < 0.05), but not at d 3 in lactation. Regarding FA determined by MIR, cluster had a significant effect on all the FA classes (P < 0.001), and there were no interaction effects. Concentrations of UFA, MUFA, and total C18:1 were found in higher concentrations in metabolically imbalanced cows. The concentration of SFA was higher in the balanced cluster. In order of variable importance in the GC model, the 5 most important variables all showed a significant cluster effect (P-value cluster <0.001). The interaction term cluster × DIM was kept in all statistical models of these 5 MFA [GC] (except for C14:0), but all sampling points differed when performing Tukey post hoc tests, resulting in higher concentrations of C11:0, C14:0, C12:0, C10:0, and C15:0 in the balanced cluster across all sampling days.
      As opposed to the overall model performance in Figure 1, Figure 2, Table 2 shows the variation in model performance (cross-validated AUC and AUCPR) across the different sampling days. Generally, the predictive models of d 3 were characterized by a lower AUCPR which never exceeded 0.60. The DHI + BHB [MIR] + MFA [MIR] had the best predictions at d 9 and 21. AUCPR values above 0.80 were found for the DHI + BHB [MIR] + MFA [MIR] model at d 9 and 21 and the GC model at d 21.
      Table 2Area under the receiver operating characteristic curve (AUCROC) and area under the precision recall curve (AUCPR) for d 3, 6, 9, and 21 in lactation of different random forest models predicting metabolically imbalanced cows using milk composition
      K-means clustering, based on averages and ranges of the 4 blood samplings, was used to determine metabolic imbalance, whereas milk models to predict imbalanced cows relied on features of individual milk samples (i.e., either d 3, 6, 9, or 21).
      Model features
      The following prediction models were evaluated: (1) DHI-based features (fat, protein, fat/protein ratio, lactose, urea, SCC); (2) DHI and milk BHB based on mid-infrared (MIR) spectra (DHI + BHB); (3) DHI and milk fatty acids (MFA) based on MIR spectra (DHI + MFA [MIR]); (4) DHI, BHB, and MFA based on MIR spectra (DHI + BHB + MFA [MIR]); (5) MFA based on MIR spectra (MFA [MIR]); and (6) MFA based on GC (MFA [GC]). Results are based on a 10-fold leave-group-out cross-validation with 15 repeats, and include different model hyperparameters using a limited grid search of 6 combinations (i.e., number of variables sampled as candidates at each split and splitting rule) investigated during the training process. Models are trained and evaluated based on data from d 3, 6, 9, and 21 (n = 464 observations).
      AUCROCAUCPRDIM
      DHI0.700.533
      DHI + BHB [MIR]0.700.533
      DHI + MFA [MIR]0.750.583
      DHI + BHB [MIR]+ MFA [MIR]0.750.583
      MFA [MIR]0.720.573
      MFA [GC]0.730.563
      DHI0.670.546
      DHI + BHB [MIR]0.790.756
      DHI + MFA [MIR]0.760.616
      DHI + BHB [MIR]+ MFA [MIR]0.810.696
      MFA [MIR]0.750.596
      MFA [GC]0.790.666
      DHI0.660.529
      DHI + BHB [MIR]0.830.759
      DHI + MFA [MIR]0.790.719
      DHI + BHB [MIR]+ MFA [MIR]0.850.819
      MFA [MIR]0.780.719
      MFA [GC]0.840.779
      DHI0.710.5421
      DHI + BHB [MIR]0.810.7421
      DHI + MFA [MIR]0.800.7221
      DHI + BHB [MIR]+ MFA [MIR]0.850.8021
      MFA [MIR]0.740.6421
      MFA [GC]0.880.8321
      1 K-means clustering, based on averages and ranges of the 4 blood samplings, was used to determine metabolic imbalance, whereas milk models to predict imbalanced cows relied on features of individual milk samples (i.e., either d 3, 6, 9, or 21).
      2 The following prediction models were evaluated: (1) DHI-based features (fat, protein, fat/protein ratio, lactose, urea, SCC); (2) DHI and milk BHB based on mid-infrared (MIR) spectra (DHI + BHB); (3) DHI and milk fatty acids (MFA) based on MIR spectra (DHI + MFA [MIR]); (4) DHI, BHB, and MFA based on MIR spectra (DHI + BHB + MFA [MIR]); (5) MFA based on MIR spectra (MFA [MIR]); and (6) MFA based on GC (MFA [GC]). Results are based on a 10-fold leave-group-out cross-validation with 15 repeats, and include different model hyperparameters using a limited grid search of 6 combinations (i.e., number of variables sampled as candidates at each split and splitting rule) investigated during the training process. Models are trained and evaluated based on data from d 3, 6, 9, and 21 (n = 464 observations).

      DHI + BHB [MIR] + MFA [MIR] Model, Day 3 to 23

      Based on the performance of the DHI + BHB [MIR] + MFA [MIR] model using data from d 3, 6, 9, and 21 only, a second modeling with separate test set and adaptive resampling was performed using the full data set in the range d 3 to 23. Across the 10 cross-validation runs, 15 repeats and all different hyperparameter runs, the ROC curve had an AUCROC of 0.79 and the PR curve an AUCPR of 0.69 on training data. The predictive performance for each day separately is given in Supplemental Table S7 (https://doi.org/10.6084/m9.figshare.19626474;
      • Heirbaut S.
      • Xiaoping J.
      • Stefanska B.
      • Pruszyńska-Oszmałek E.
      • Buysse L.
      • Lutakome P.
      • Zhang M.
      • Thys M.
      • Vandaele L.
      • Fievez V.
      Supplemental tables and figures. Figshare.
      ). Refitting the best model on training data resulted in an AUCROC of 0.91 and AUCPR of 0.87. When evaluating the model on test data, the performance slightly decreased to an AUCROC of 0.90 and an AUCPR of 0.82.

      DISCUSSION

      Classifying dairy cows during early lactation according to their metabolic status is often challenging and debatable. Traditionally, the metabolic status is assessed by using threshold values for BHB or NEFA (
      • LeBlanc S.
      Monitoring metabolic health of dairy cattle in the transition period.
      ). However, as indicated by
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      and
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      , other parameters without previously determined thresholds (e.g., the hormone IGF-1) may also be important to assess the energy balance and metabolic status in early lactation. These blood parameters then can be combined in an unsupervised learning approach to cluster cows according to their metabolic health status (
      • Tremblay M.
      • Kammer M.
      • Lange H.
      • Plattner S.
      • Baumgartner C.
      • Stegeman J.A.
      • Duda J.
      • Mansfeld R.
      • Döpfer D.
      Identifying poor metabolic adaptation during early lactation in dairy cows using cluster analysis.
      ;
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      ;
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ). In the current research, the metabolic clusters were based on 5 parameters (BHB, NEFA, glucose, insulin, and IGF-1), which also have been used by
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      (except for insulin) and
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      .
      Before performing a clustering, it is crucial to assess and interpret the variation in the data set, because the clustering algorithm is solely a mathematical approach which groups observations based on a specific distance metric. Supplemental Table S4 shows there is substantial variation in the different blood parameters. Depending on the sampling day, up to 26 and 55% of the cows exceed the 1.2 or 0.6 mmol/L threshold for BHB and NEFA, respectively. Based on this substantial variation, the data set is well suited for applying a clustering, which afterward was also compared by the stable results when performing a bootstrap resampling.
      In our study 36% of the cows were in the imbalanced cluster, which is higher than in other studies (
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      ;
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ). Interherd differences might be at the origin of this difference (
      • Benedet A.
      • Manuelian C.L.
      • Zidi A.
      • Penasa M.
      • De Marchi M.
      Invited review: β-hydroxybutyrate concentration in blood and milk and its associations with cow performance.
      ), although the repeated blood sampling in the current study also may have contributed. Indeed, this approach aimed to estimate the incidence of metabolically imbalanced cows rather than the prevalence based on a single or limited number of blood samples. In our study 37% of the cows had an elevated blood BHB concentration (BHB >1.2 mmol/L) at least once. Only 6.0% of the cows had an elevated blood BHB across all 4 samplings, whereas only 16.2% of the cows were characterized by one single elevated blood BHB event. This variation addresses the importance of repeated samplings (
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      ;
      • Benedet A.
      • Manuelian C.L.
      • Zidi A.
      • Penasa M.
      • De Marchi M.
      Invited review: β-hydroxybutyrate concentration in blood and milk and its associations with cow performance.
      ).
      Despite these methodological differences across studies (
      • Carrier J.
      • Stewart S.
      • Godden S.
      • Fetrow J.
      • Rapnicki P.
      Evaluation and use of three cowside tests for detection of subclinical ketosis in early postpartum cows.
      ;
      • Oetzel G.R.
      Monitoring and testing dairy herds for metabolic disease.
      ;
      • Brunner N.
      • Groeger S.
      • Canelas Raposo J.
      • Bruckmaier R.M.
      • Gross J.J.
      Prevalence of subclinical ketosis and production diseases in dairy cows in Central and South America, Africa, Asia, Australia, New Zealand, and Eastern Europe.
      ), all report an unequal distribution of metabolically balanced versus imbalanced cows within a herd. Also in our study, this unequal ratio (i.e., 36% of the cows were in the imbalanced cluster) has been observed. In order for the model to be trained on data with sufficient variation, stratification of total C18:1 was performed during cross-validation. This ensures a more accurate sampling compared with a random sampling. Standard predictive models often fail when predicting minority classes (i.e., imbalanced cows). However, this potentially low predictive performance of the minority class is not reflected in the widely used accuracy metric (i.e., proportion of correct predictions), because the accuracy of a model that predicts only the majority class equals the proportion of the majority class (
      • He H.
      • Garcia E.A.
      Learning from imbalanced data.
      ). This challenges the construction of high-performance models predicting the minority class and tempts to focus on the prediction of the majority class (e.g., cows in good or average metabolic status;
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ). Obviously, this does not contribute to a better identification of metabolically imbalanced cows, whereas the (early) identification of metabolically imbalanced cows is of particular importance under practical farming conditions.
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      built prediction models, addressing metabolically balanced as well as metabolically imbalanced cows. Unfortunately, only the accuracy metric has been used, which challenges direct (numeric) comparisons with our results. Moreover, the value of this metric is highly questionable with skewed class distributions as indicated before (
      • He H.
      • Garcia E.A.
      Learning from imbalanced data.
      ). Surprisingly,
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      found highest prediction accuracy for prediction of imbalanced cows. The best prediction model had an accuracy of 81% for imbalanced cows; however, the number of imbalanced cows was 19 out of 107, which means a naive model predicting all cows as balanced would already achieve an accuracy of 82%. This questions the precision of the prediction. The precision itself or data to calculate precision were not reported. However,
      • Grelet C.
      • Vanlierde A.
      • Hostens M.
      • Foldager L.
      • Salavati M.
      • Ingvartsen K.L.
      • Crowe M.
      • Sorensen M.T.
      • Froidmont E.
      • Ferris C.P.
      • Marchitelli C.
      • Becker F.
      • Larsen T.
      • Carter F.
      • Dehareng F.
      E. Gplus. Consortium
      Potential of milk mid-IR spectra to predict metabolic status of cows through blood components and an innovative clustering approach.
      [study using data of the same experiments as reported by
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      ], but also extended by data from heifers and additional herds) reported the confusion matrix of a partial least squares discriminant analysis, allowing a more detailed comparison: global accuracy was 87% and sensitivity 76%, but the precision was only 56%. In our study a sensitivity of 76% corresponded to a precision of 61%. If a higher sensitivity would be pursued, a precision of 56% would result in a sensitivity of 81%. The values in our study are somewhat higher, but still in a similar range, despite the differences in sampling days and the inclusion of heifers in the study of
      • Grelet C.
      • Vanlierde A.
      • Hostens M.
      • Foldager L.
      • Salavati M.
      • Ingvartsen K.L.
      • Crowe M.
      • Sorensen M.T.
      • Froidmont E.
      • Ferris C.P.
      • Marchitelli C.
      • Becker F.
      • Larsen T.
      • Carter F.
      • Dehareng F.
      E. Gplus. Consortium
      Potential of milk mid-IR spectra to predict metabolic status of cows through blood components and an innovative clustering approach.
      . Low precision has been observed in the study of
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      , using a cis-9 C18:1 to C15:0 ratio of ≥45.0 [as proposed by
      • Jorjong S.
      • van Knegsel A.T.M.
      • Verwaeren J.
      • Bruckmaier R.M.
      • De Baets B.
      • Kemp B.
      • Fievez V.
      Milk fatty acids as possible biomarkers to diagnose hyperketonemia in early lactation.
      ] for detection of hyperketonemia, which resulted in a sensitivity, specificity, and accuracy of respectively 84.0, 63.2, and 69.5%. However, the precision was only 50.0%. Accordingly, only half of the cows predicted with elevated blood NEFA concentrations effectively experienced this metabolic status, whereas the other half were incorrectly alerted. As discussed by
      • He H.
      • Garcia E.A.
      Learning from imbalanced data.
      the use of alternative metrics such as precision could be beneficial in situations with an unequal class distribution. Indeed, from a practical point of view, not only the proportion of imbalanced cows identified by a model (i.e., sensitivity) is of importance, but also the precision [i.e., true positives/(true positives + false positives)]. Obviously, reliable and practically applicable predictions require a limited number of false positives. It has been shown that a model which generates a (too) high number of positives can impair the correct follow-up by dairy farmers (
      • Eckelkamp E.A.
      • Bewley J.M.
      On-farm use of disease alerts generated by precision dairy technology.
      ). From this point of view external model validation is very important before models could be implemented in practice. The models in our study were solely evaluated on data from a single farm and hence require further validation.
      The MFA [GC] model highlighted the predictive power of some minor FA, such as C11:0 and C15:0. Given their low prediction accuracy by MIR, these MFA were not included in the DHI + BHB [MIR] + MFA [MIR] model. Indeed, accurate FA predictions based on MIR are generally lacking for FA at low concentrations (
      • Rutten M.J.M.
      • Bovenhuis H.
      • Hettinga K.A.
      • van Valenberg H.J.F.
      • van Arendonk J.A.M.
      Predicting bovine milk fat composition using infrared spectroscopy based on milk samples collected in winter and summer.
      ;
      • Soyeurt H.
      • Dehareng F.
      • Gengler N.
      • McParland S.
      • Wall E.
      • Berry D.P.
      • Coffey M.
      • Dardenne P.
      Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries.
      ).
      • Rutten M.J.M.
      • Bovenhuis H.
      • Hettinga K.A.
      • van Valenberg H.J.F.
      • van Arendonk J.A.M.
      Predicting bovine milk fat composition using infrared spectroscopy based on milk samples collected in winter and summer.
      suggested a minimum FA concentration of 2.45 g/100 g of FA for obtaining a R2 of minimum 0.6 or higher. Nevertheless, despite their physiological relevance (
      • Vlaeminck B.
      • Fievez V.
      • Tamminga S.
      • Dewhurst R.J.
      • van Vuuren A.
      • De Brabander D.
      • Demeyer D.
      Milk Odd- and branched-chain fatty acids in relation to the rumen fermentation pattern.
      ;
      • Dórea J.R.R.
      • French E.A.
      • Armentano L.E.
      Use of milk fatty acids to estimate plasma nonesterified fatty acid concentrations as an indicator of animal energy balance.
      ) and high variable importance in the MFA [GC] model, the inclusion of these MFA did not seem necessary to accurately predict metabolic status by the DHI + BHB [MIR] + MFA [MIR] model. As opposed to the study of
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      the DHI + BHB [MIR] + MFA [MIR] model did not use the raw MIR spectra, but the FA classes predicted based on MIR. Although this could be considered an unnecessary intermediate step, it should be realized that the use of MIR-data in predictive models require dimension reduction. As such the use of MIR-based FA predictions could be considered as a preprocessing step to reduce the number of parameters, which may be of particular interest because the number of observations is often limited in this type of animal studies with transition animals and relying on blood-based reference metabolites.
      Because DIM and often also parity affect the measured milk parameters, the construction of a model seems more appropriate than using single threshold values to capture these additional factors of variation. Regarding early warning predictions, further research focusing on the construction of specific models targeting this period and possible incorporation of additional data sources could be of interest (e.g., sensor data, data of dry period), because the predictions of the model performed at d 3 in lactation were generally characterized by lower predictive performance compared with the other days.
      • De Koster J.
      • Salavati M.
      • Grelet C.
      • Crowe M.A.
      • Matthews E.
      • O’Flaherty R.
      • Opsomer G.
      • Foldager L.
      • Hostens M.
      Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
      and
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      did not focus on this early period. Based on our results early prediction of metabolic status using milk data should not be (solely) based on samples from d 3 in lactation. The lower BHB concentrations in the beginning of the lactation could be at the origin of this lower performance, only 11.1% of the samples was characterized by an elevated BHB concentration at d 3. In line with these results,
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      showed FA in colostrum could not be used to predict elevated BHB and NEFA concentrations during the first 2 wk of lactation. Therefore, they recommended to target the period between 3 and 14 DIM for further studies. Important features in our models generally agree with the relevant milk biomarkers reported in literature. In line with
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      , several de novo synthesized FA (in order of importance: C14:0, C12:0, and C10:0) showed high variable importance. Indeed, imbalanced cows were characterized by lower de novo FA concentrations. As discussed in
      • Craninx M.
      • Steen A.
      • Van Laar H.
      • Van Nespen T.
      • Martín-Tereso J.
      • de Baets B.
      • Fievez V.
      Effect of lactation stage on the odd- and branched-chain milk fatty acids of dairy cattle under grazing and indoor conditions.
      increased concentrations of long-chain MFA as a result of body mobilization concomitantly result in lower concentration of de novo MFA. In addition, C15:0 and the ratio cis-9 C18:1/C15:0 were identified as important predictors (cf.
      • Jorjong S.
      • van Knegsel A.T.M.
      • Verwaeren J.
      • Bruckmaier R.M.
      • De Baets B.
      • Kemp B.
      • Fievez V.
      Milk fatty acids as possible biomarkers to diagnose hyperketonemia in early lactation.
      ;
      • Mann S.
      • Nydam D.V.
      • Lock A.L.
      • Overton T.R.
      • McArt J.A.A.
      Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
      ). Interestingly, in our study, the odd short-chain FA C11:0 had the highest importance in the MFA [GC] model. In line with our results,
      • Dórea J.R.R.
      • French E.A.
      • Armentano L.E.
      Use of milk fatty acids to estimate plasma nonesterified fatty acid concentrations as an indicator of animal energy balance.
      found high sensitivity for odd short-chain FA (e.g., C11:0, to detect elevated NEFA concentrations). C11:0, such as other odd-chain FA, requires propionyl-CoA as precursor (
      • Fievez V.
      • Colman E.
      • Castro-Montoya J.M.
      • Stefanov I.
      • Vlaeminck B.
      Milk odd- and branched-chain fatty acids as biomarkers of rumen function—An update.
      ). A higher availability of propionate in the rumen might therefore have resulted in the higher odd-chain FA observed in the balanced cluster.

      CONCLUSIONS

      Both the DHI + BHB [MIR] + MFA [MIR] model and the MFA [GC] model accurately predict metabolic status during early lactation. As such, GC analysis is not required, which extends the opportunity to routinely screen the metabolic status. Nevertheless, the predictive performance varies considerably across sampling days, with models solely based on data from d 3 in milk resulting in low reliability. Considering this trade-off between predictive performance and early sampling, predictions could be performed on data from d 6 onwards. Therefore, the use of early milk samplings should be further investigated together with the possible incorporation of additional data sources to improve early warning predictions. Finally, before application in practice, additional data from different farms should be added and evaluated to make the model robust to differences in management and rations.

      ACKNOWLEDGMENTS

      We gratefully acknowledge the financial support of Flanders Innovation & Entrepreneurship (VLAIO; Brussels, Belgium; LA170830). The research of Xiaoping Jing and Mingqi Zhang was supported by the Chinese Scholarship Council (CSC, China); the PhD research of Stijn Heirbaut was funded by a PhD grant from the Special Research Fund of the Ghent University (Bijzonder Onderzoeksfonds, BOF, Belgium). The authors gratefully acknowledge the staff from the Animal Science Unit, Flanders Research Institute for Agriculture, Fisheries and Food for their help during this experiment. We also thank Charlotte Melis (Laboratory for Animal Nutrition and Animal Product Quality, Department of Animal Sciences and Aquatic Ecology, Ghent University, Gent, Belgium) and Ilke van Hese [Animal Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Scheldeweg, Melle, 9090, Belgium; Department of Reproduction, Obstetrics and Herd Health Faculty of Veterinary Medicine, Ghent University, Salisburylaan, Merelbeke, 9820, Belgium] for assistance during sampling. The authors have not stated any conflicts of interest.

      REFERENCES

        • Ackman R.
        • Sipos J.
        Application of specific response factors in the gas chromatographic analysis of methyl esters of fatty acids with flame ionization detectors.
        J. Am. Oil Chem. Soc. 1964; 41: 377-378
        • Bates D.
        • Mächler M.
        • Bolker B.
        • Walker S.
        Fitting linear mixed-effects models using Ime4.
        J. Stat. Softw. 2015; 67: 1-48
        • Benedet A.
        • Manuelian C.L.
        • Zidi A.
        • Penasa M.
        • De Marchi M.
        Invited review: β-hydroxybutyrate concentration in blood and milk and its associations with cow performance.
        Animal. 2019; 13: 1676-1689
        • Brunner N.
        • Groeger S.
        • Canelas Raposo J.
        • Bruckmaier R.M.
        • Gross J.J.
        Prevalence of subclinical ketosis and production diseases in dairy cows in Central and South America, Africa, Asia, Australia, New Zealand, and Eastern Europe.
        Transl. Anim. Sci. 2019; 3 (32704781): 84-92
        • Carrier J.
        • Stewart S.
        • Godden S.
        • Fetrow J.
        • Rapnicki P.
        Evaluation and use of three cowside tests for detection of subclinical ketosis in early postpartum cows.
        J. Dairy Sci. 2004; 87 (15483156): 3725-3735
        • Craninx M.
        • Steen A.
        • Van Laar H.
        • Van Nespen T.
        • Martín-Tereso J.
        • de Baets B.
        • Fievez V.
        Effect of lactation stage on the odd- and branched-chain milk fatty acids of dairy cattle under grazing and indoor conditions.
        J. Dairy Sci. 2008; 91 (18565925): 2662-2677
        • De Koster J.
        • Salavati M.
        • Grelet C.
        • Crowe M.A.
        • Matthews E.
        • O’Flaherty R.
        • Opsomer G.
        • Foldager L.
        • Hostens M.
        Prediction of metabolic clusters in early-lactation dairy cows using models based on milk biomarkers.
        J. Dairy Sci. 2019; 102 (30692010): 2631-2644
        • Dórea J.R.R.
        • French E.A.
        • Armentano L.E.
        Use of milk fatty acids to estimate plasma nonesterified fatty acid concentrations as an indicator of animal energy balance.
        J. Dairy Sci. 2017; 100 (28551181): 6164-6176
        • Dowle M.
        • Srinivasan A.
        data.table: Extension of “data.frame.” R package version 1.14.0.
        https://CRAN.R-project.org/package=data.table
        Date: 2021
        Date accessed: October 26, 2022
        • Drackley J.K.
        ADSA Foundation Scholar Award: Biology of dairy cows during the transition period: The final frontier?.
        J. Dairy Sci. 1999; 82 (10575597): 2259-2273
        • Van Duinkerken G.
        • Blok M.C.
        • Bannink A.
        • Cone J.W.
        • Dijkstra J.
        • Van Vuuren A.M.
        • Tamminga S.
        Update of the Dutch protein evaluation system for ruminants: The DVE/OEB2010 system.
        J. Agric. Sci. 2011; 149: 351-367
        • Eckelkamp E.A.
        • Bewley J.M.
        On-farm use of disease alerts generated by precision dairy technology.
        J. Dairy Sci. 2020; 103 (31759584): 1566-1582
        • Fievez V.
        • Colman E.
        • Castro-Montoya J.M.
        • Stefanov I.
        • Vlaeminck B.
        Milk odd- and branched-chain fatty acids as biomarkers of rumen function—An update.
        Anim. Feed Sci. Technol. 2012; 172: 51-65
        • Foldager L.
        • Gaillard C.
        • Sorensen M.T.
        • Larsen T.
        • Matthews E.
        • O’Flaherty R.
        • Carter F.
        • Crowe M.A.
        • Grelet C.
        • Salavati M.
        • Hostens M.
        • Ingvartsen K.L.
        • Krogh M.A.
        Predicting physiological imbalance in Holstein dairy cows by three different sets of milk biomarkers.
        Prev. Vet. Med. 2020; 179 (32361640)105006
        • Fox J.
        • Weisberg S.
        • Price B.
        • Adler D.
        • Bates D.
        • Baud-Bovy G.
        • Bolker B.
        • Ellison S.
        • Firth D.
        • Friendly M.
        • Gorjanc G.
        • Graves S.
        • Heiberger R.
        • Krivitsky P.
        • Laboissiere R.
        • Maechler M.
        • Monette G.
        • Murdoch D.
        • Nilsson H.
        • Ogle D.
        • Ripley B.
        • Venables W.
        • Walker S.
        • Winsemius D.
        • Zeileis A.
        • R-Core
        Car: Companion to applied regression.
        https://cran.r-project.org/web/packages/car/index.html
        Date: 2019
        Date accessed: October 26, 2022
        • Gohel D.
        Flextable: Functions for tabular reporting. R package version 0.6.6.
        https://CRAN.R-project.org/package=flextable
        Date: 2021
        Date accessed: October 26, 2022
        • Grelet C.
        • Vanlierde A.
        • Hostens M.
        • Foldager L.
        • Salavati M.
        • Ingvartsen K.L.
        • Crowe M.
        • Sorensen M.T.
        • Froidmont E.
        • Ferris C.P.
        • Marchitelli C.
        • Becker F.
        • Larsen T.
        • Carter F.
        • Dehareng F.
        • E. Gplus. Consortium
        Potential of milk mid-IR spectra to predict metabolic status of cows through blood components and an innovative clustering approach.
        Animal. 2019; 13: 649-658
        • Hartigan J.A.
        • Wong M.A.
        Algorithm AS 136: A k-means clustering algorithm.
        Appl. Stat. 1979; 28: 100-108
        • He H.
        • Garcia E.A.
        Learning from imbalanced data.
        IEEE Trans. Knowl. Data Eng. 2009; 21: 1263-1284
        • Heirbaut S.
        • Xiaoping J.
        • Stefanska B.
        • Pruszyńska-Oszmałek E.
        • Buysse L.
        • Lutakome P.
        • Zhang M.
        • Thys M.
        • Vandaele L.
        • Fievez V.
        Supplemental tables and figures. Figshare.
        • Hennig C.
        Fpc: Flexible procedures for clustering.
        https://CRAN.R-project.org/package=fpc
        Date: 2020
        Date accessed: October 26, 2022
        • Hothorn T.
        • Bretz F.
        • Westfall P.
        Simultaneous inference in general parametric models.
        Biom. J. 2008; 50: 346-363
        • ISO. (International Organization for Standardization)
        Milk and liquid milk products—Guidelines for the application of mid-infrared spectrometry. ISO 9622:2013.
        ISO, 2013
        • ISO. (International Organization for Standardization)
        Milk—Enumeration of somatic cells—Part 2: Guidance on the operation of fluoro-opto-electronic counters. ISO 13366-2:2006.
        ISO, 2006
        • Jing L.
        • Dewanckele L.
        • Vlaeminck B.
        • Van Straalen W.M.
        • Koopmans A.
        • Fievez V.
        Susceptibility of dairy cows to subacute ruminal acidosis is reflected in milk fatty acid proportions, with C18:1 trans-10 as primary and C15:0 and C18:1 trans-11 as secondary indicators.
        J. Dairy Sci. 2018; 101 (30172392): 9827-9840
        • Jorjong S.
        • van Knegsel A.T.M.
        • Verwaeren J.
        • Bruckmaier R.M.
        • De Baets B.
        • Kemp B.
        • Fievez V.
        Milk fatty acids as possible biomarkers to diagnose hyperketonemia in early lactation.
        J. Dairy Sci. 2015; 98 (26094221): 5211-5221
        • Jorjong S.
        • van Knegsel A.T.M.
        • Verwaeren J.
        • Lahoz M.V.
        • Bruckmaier R.M.
        • De Baets B.
        • Kemp B.
        • Fievez V.
        Milk fatty acids as possible biomarkers to early diagnose elevated concentrations of blood plasma nonesterified fatty acids in dairy cows.
        J. Dairy Sci. 2014; 97 (25200787): 7054-7064
        • Kassambara A.
        • Mundt F.
        Factoextra: Extract and visualize the results of multivariate data analyses.
        https://CRAN.R-project.org/package=factoextra
        Date: 2020
        Date accessed: October 26, 2022
        • Kuhn M.
        • Wing J.
        • Weston S.
        • Williams A.
        • Keefer C.
        • Engelhardt A.
        • Cooper T.
        • Mayer Z.
        • Kenkel B.
        • R Core Team
        • Benesty M.
        • Lescarbeau R.
        • Ziem A.
        • Scrucca L.
        • Tang Y.
        • Candan C.
        • Hunt T.
        Caret: Classification and regression training.
        https://CRAN.R-project.org/package=caret
        Date: 2021
        Date accessed: October 26, 2022
        • Kuznetsova A.
        • Brockhoff P.B.
        • Christensen R.H.B.
        LmerTest: Tests in linear mixed effects models.
        J. Stat. Softw. 2017; 82: 1-26
        • LeBlanc S.
        Monitoring metabolic health of dairy cattle in the transition period.
        J. Reprod. Dev. 2010; 56 (20629214): S29-S35
        • Lenth R.V.
        • Buerkner P.
        • Herve M.
        • Love J.
        • Miguez F.
        • Riebl H.
        • Singmann H.
        Emmeans: Estimated marginal means, aka least-squares means.
        https://cran.r-project.org/package=emmeans
        Date: 2020
        Date accessed: October 26, 2022
        • Lucy M.C.
        Functional differences in the growth hormone and insulin-like growth factor axis in cattle and pigs: Implications for post-partum nutrition and reproduction.
        Reprod. Domest. Anim. 2008; 43 (18638098): 31-39
        • Mann S.
        • Nydam D.V.
        • Lock A.L.
        • Overton T.R.
        • McArt J.A.A.
        Short communication: Association of milk fatty acids with early lactation hyperketonemia and elevated concentration of nonesterified fatty acids.
        J. Dairy Sci. 2016; 99 (27179869): 5851-5857
        • Oetzel G.R.
        Monitoring and testing dairy herds for metabolic disease.
        Vet. Clin. North Am. Food Anim. Pract. 2004; 20 (15471629): 651-674
        • Ospina P.A.
        • Nydam D.V.
        • Stokol T.
        • Overton T.R.
        Evaluation of nonesterified fatty acids and β-hydroxybutyrate in transition dairy cattle in the northeastern United States: Critical thresholds for prediction of clinical diseases.
        J. Dairy Sci. 2010; 93 (20105526): 546-554
        • R Core Team
        R: The R project for statistical computing.
        https://www.R-project.org/
        Date: 2020
        Date accessed: October 26, 2022
        • Rutten M.J.M.
        • Bovenhuis H.
        • Hettinga K.A.
        • van Valenberg H.J.F.
        • van Arendonk J.A.M.
        Predicting bovine milk fat composition using infrared spectroscopy based on milk samples collected in winter and summer.
        J. Dairy Sci. 2009; 92 (19923625): 6202-6209
        • Schloerke B.
        • Cook D.
        • Larmarange J.
        • Briatte F.
        • Marbach M.
        • Thoen E.
        • Elberg A.
        • Toomet O.
        • Crowley J.
        • Hofmann H.
        • Wickham H.
        GGally: Extension to “Ggplot2.”.
        https://CRAN.R-project.org/package=GGally
        Date: 2020
        Date accessed: October 26, 2022
        • Sing T.
        • Sander O.
        • Beerenwinkel N.
        • Lengauer T.
        ROCR: Visualizing the classifier performance in R.
        Bioinformatics. 2005; 21: 3940-3941
        • Soyeurt H.
        • Dehareng F.
        • Gengler N.
        • McParland S.
        • Wall E.
        • Berry D.P.
        • Coffey M.
        • Dardenne P.
        Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries.
        J. Dairy Sci. 2011; 94 (21426953): 1657-1667
        • Tremblay M.
        • Kammer M.
        • Lange H.
        • Plattner S.
        • Baumgartner C.
        • Stegeman J.A.
        • Duda J.
        • Mansfeld R.
        • Döpfer D.
        Identifying poor metabolic adaptation during early lactation in dairy cows using cluster analysis.
        J. Dairy Sci. 2018; 101 (29729924): 7311-7321
        • Tremblay M.
        • Kammer M.
        • Lange H.
        • Plattner S.
        • Baumgartner C.
        • Stegeman J.A.
        • Duda J.
        • Mansfeld R.
        • Döpfer D.
        Prediction model optimization using full model selection with regression trees demonstrated with FTIR data from bovine milk.
        Prev. Vet. Med. 2019; 163 (30670181): 14-23
        • van der Drift S.G.A.
        • Jorritsma R.
        • Schonewille J.T.
        • Knijn H.M.
        • Stegeman J.A.
        Routine detection of hyperketonemia in dairy cows using Fourier transform infrared spectroscopy analysis of β-hydroxybutyrate and acetone in milk in combination with test-day information.
        J. Dairy Sci. 2012; 95 (22916893): 4886-4898
        • Van Es A.J.H.
        Feed evaluation for dairy cows.
        Livest. Prod. Sci. 1975; 2: 95-107
        • van Gastelen S.
        • Dijkstra J.
        Prediction of methane emission from lactating dairy cows using milk fatty acids and mid-infrared spectroscopy.
        J. Sci. Food Agric. 2016; 96 (26996655): 3963-3968
        • Vlaeminck B.
        • Fievez V.
        • Tamminga S.
        • Dewhurst R.J.
        • van Vuuren A.
        • De Brabander D.
        • Demeyer D.
        Milk Odd- and branched-chain fatty acids in relation to the rumen fermentation pattern.
        J. Dairy Sci. 2006; 89 (16960070): 3954-3964
        • Waring E.
        • Quinn M.
        • McNamara A.
        • de la Rubia E.A.
        • Zhu H.
        • Lowndes J.
        • Ellis S.
        • McLeod H.
        • Wickham H.
        • Müller K.
        • Kirkpatrick C.
        • Brenstuhl S.
        • Schratz P.
        • lbusett
        • Korpela M.
        • Thompson J.
        • McGehee H.
        • Roepke M.
        • Kennedy P.
        • Possenriede D.
        • Zimmermann D.
        • Butts K.
        • Torges B.
        • Saporta R.
        Skimr: Compact and flexible summaries of data.
        https://CRAN.R-project.org/package=skimr
        Date: 2021
        Date accessed: October 26, 2022
        • Wickham H.
        Ggplot2: Elegant Graphics for Data Analysis.
        Springer-Verlag, 2016
        • Wickham H.
        • Averick M.
        • Bryan J.
        • Chang W.
        • McGowan L.D.
        • François R.
        • Grolemund G.
        • Hayes A.
        • Henry L.
        • Hester J.
        • Kuhn M.
        • Pedersen T.L.
        • Miller E.
        • Bache S.M.
        • Müller K.
        • Ooms J.
        • Robinson D.
        • Seidel D.P.
        • Spinu V.
        • Takahashi K.
        • Vaughan D.
        • Wilke C.
        • Woo K.
        • Yutani H.
        Welcome to the Tidyverse.
        J. Open Source Softw. 2019; 41686
        • Wickham H.
        • Bryan J.
        Readxl: Read Excel files. R package version 1.3.1.
        • Wolff R.L.
        • Bayard C.C.
        • Fabien R.J.
        Evaluation of sequential methods for the determination of butterfat fatty acid composition with emphasis ontrans-18:1 acids. Application to the study of seasonal variations in French butters.
        J. Am. Oil Chem. Soc. 1995; 72: 1471-1483
        • Wright M.N.
        • Ziegler A.
        ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.
        J. Stat. Softw. 2017; 77: 1-17
        • Xu W.
        • van Knegsel A.T.M.
        • Vervoort J.J.M.
        • Bruckmaier R.M.
        • van Hoeij R.J.
        • Kemp B.
        • Saccenti E.
        Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
        J. Dairy Sci. 2019; 102 (31477295): 10186-10201