Advertisement

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data

Open ArchivePublished:April 14, 2021DOI:https://doi.org/10.3168/jds.2020-19861

      ABSTRACT

      Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood β-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.

      Key words

      INTRODUCTION

      Fourier-transform infrared (FTIR) spectroscopy is now widely available, creating new opportunities and many challenges for high-throughput prediction of dairy cattle traits that are expensive and hard to measure (
      • Ferragina A.
      • de los Campos G.
      • Vazquez A.I.
      • Cecchinato A.
      • Bittante G.
      Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data.
      ;
      • Grelet C.
      • Bastin C.
      • Gelé M.
      • Davière J.B.
      • Johan M.
      • Werner A.
      • Reding R.
      • Fernandez Pierna J.A.
      • Colinet F.G.
      • Dardenne P.
      • Gengler N.
      • Soyeurt H.
      • Dehareng F.
      Development of Fourier transform mid-infrared calibrations to predict acetone, β-hydroxybutyrate, and citrate contents in bovine milk through a European dairy network.
      ;
      • Dórea J.R.R.
      • Rosa G.J.M.
      • Weld K.A.
      • Armentano L.E.
      Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
      ). Milk FTIR spectra have been found to have considerable prediction power for traits considered difficult to measure, such as milk fatty acids (
      • Maurice-Van Eijndhoven M.H.T.
      • Soyeurt H.
      • Dehareng F.
      • Calus M.P.L.
      Validation of fatty acid predictions in milk using mid-infrared spectrometry across cattle breeds.
      ), milk coagulation properties (
      • Cecchinato A.
      • de Marchi M.
      • Gallo L.
      • Bittante G.
      • Carnier P.
      Mid-infrared spectroscopy predictions as indicator traits in breeding programs for enhanced coagulation properties of milk.
      ), and curd-firming (
      • Ferragina A.
      • Cipolat-Gotet C.
      • Cecchinato A.
      • Pazzola M.
      • Dettori M.L.
      • Vacca G.M.
      • Bittante G.
      Prediction and repeatability of milk coagulation properties and curd-firming modeling parameters of ovine milk using Fourier-transform infrared spectroscopy and Bayesian models.
      ), and moderate predictive ability for milk protein fractions (
      • Bonfatti V.
      • Di Martino G.
      • Carnier P.
      Effectiveness of mid-infrared spectroscopy for the prediction of detailed protein composition and contents of protein genetic variants of individual milk of Simmental cows.
      ). Some studies have also explored the possibility of using FTIR as a tool to predict phenotypes related to animal health (
      • Grelet C.
      • Bastin C.
      • Gelé M.
      • Davière J.B.
      • Johan M.
      • Werner A.
      • Reding R.
      • Fernandez Pierna J.A.
      • Colinet F.G.
      • Dardenne P.
      • Gengler N.
      • Soyeurt H.
      • Dehareng F.
      Development of Fourier transform mid-infrared calibrations to predict acetone, β-hydroxybutyrate, and citrate contents in bovine milk through a European dairy network.
      ;
      • Belay T.K.K.
      • Dagnachew B.S.S.
      • Kowalski Z.M.M.
      • Ådnøy T.
      An attempt at predicting blood β-hydroxybutyrate from Fourier-transform mid-infrared spectra of milk using multivariate mixed models in Polish dairy cattle.
      ), fertility (
      • Toledo-Alvarado H.
      • Vazquez A.I.
      • de los Campos G.
      • Tempelman R.J.
      • Bittante G.
      • Cecchinato A.
      Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows.
      ), body energy status (
      • McParland S.
      • Banos G.
      • McCarthy B.
      • Lewis E.
      • Coffey M.P.
      • O'Neill B.
      • O'Donovan M.
      • Wall E.
      • Berry D.P.
      Validation of mid-infrared spectrometry in milk for predicting body energy status in Holstein-Friesian cows.
      ), feed efficiency (
      • McParland S.
      • Lewis E.
      • Kennedy E.
      • Moore S.
      • McCarthy B.
      • O'Donovan M.
      • Butler S.
      • Pryce J.
      • Berry D.
      Mid-infrared spectrometry of milk as a predictor of energy intake and efficiency in lactating dairy cows.
      ), and methane emission (
      • Wang Q.
      • Bovenhuis H.
      Validation strategy can result in an overoptimistic view of the ability of milk infrared spectra to predict methane emission of dairy cattle.
      ;
      • Bittante G.
      • Cipolat-Gotet C.
      • Cecchinato A.
      Genetic parameters of different FTIR-enabled phenotyping tools derived from milk fatty acid profile for reducing enteric methane emissions in dairy cattle.
      ).
      Transforming large amounts of infrared wavelength data into useful information for predicting complex traits has been an important challenge for the dairy cattle industry, while different statistical methods have been proposed for developing FTIR calibration equations (
      • Ferragina A.
      • de los Campos G.
      • Vazquez A.I.
      • Cecchinato A.
      • Bittante G.
      Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data.
      ;
      • Hempstalk K.
      • McParland S.
      • Berry D.P.
      Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows.
      ;
      • Bonfatti V.
      • Tiezzi F.
      • Miglior F.
      • Carnier P.
      Comparison of Bayesian regression models and partial least squares regression for the development of infrared prediction equations.
      ;
      • Pralle R.S.S.
      • Weigel K.W.W.
      • White H.M.M.
      Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.
      ). Partial least squares (PLS) is the most common approach used in phenotypic prediction of difficult-to-measure traits in dairy cattle using FTIR data (
      • Bonfatti V.
      • Di Martino G.
      • Carnier P.
      Effectiveness of mid-infrared spectroscopy for the prediction of detailed protein composition and contents of protein genetic variants of individual milk of Simmental cows.
      ;
      • Soyeurt H.
      • Dehareng F.
      • Gengler N.
      • McParland S.
      • Wall E.
      • Berry D.P.
      • Coffey M.
      • Dardenne P.
      Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries.
      ;
      • McParland S.
      • Lewis E.
      • Kennedy E.
      • Moore S.
      • McCarthy B.
      • O'Donovan M.
      • Butler S.
      • Pryce J.
      • Berry D.
      Mid-infrared spectrometry of milk as a predictor of energy intake and efficiency in lactating dairy cows.
      ). Partial least squares regression combines the most useful predictor variables into latent variables (i.e., PLS factors) that are used to develop predictive equations with relevant information while reducing the effects of noise regions related to problems in predictive ability and biased predictions (
      • Høy M.
      • Steen K.
      • Martens H.
      Review of partial least squares regression prediction error in Unscrambler.
      ;
      • Baum A.
      • Hansen P.
      • Nørgaard L.
      • Sørensen J.
      • Mikkelsen J.
      Rapid quantification of casein in skim milk using Fourier transform infrared spectroscopy, enzymatic perturbation, and multiway partial least squares regression: Monitoring chymosin at work.
      ). Nevertheless, new approaches, mainly machine learning techniques, that select specific wavelength regions and disregard noisy regions have been suggested for predicting complex phenotypes in dairy cattle, due to their ability to deal with complex associations between infrared spectra and target phenotypes (
      • Dórea J.R.R.
      • Rosa G.J.M.
      • Weld K.A.
      • Armentano L.E.
      Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
      ;
      • Vásquez N.
      • Magán C.
      • Oblitas J.
      • Chuquizuta T.
      • Avila-George H.
      • Castro W.
      Comparison between artificial neural network and partial least squares regression models for hardness modeling during the ripening process of Swiss-type cheese using spectral profiles.
      ;
      • Mendez K.M.
      • Reinke S.N.
      • Broadhurst D.I.
      A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.
      ;
      • Neto H.A.
      • Tavares W.L.F.
      • Ribeiro D.C.S.Z.
      • Alves R.C.O.
      • Fonseca L.M.
      • Campos S.V.A.
      On the utilization of deep and ensemble learning to detect milk adulteration.
      ).
      Recently, the use of machine learning techniques has gained much interest in phenotypic prediction using high-throughput phenotype or genomic information, as they reduce the problems associated with a large number of predictors or few observations (or both) and mainly capture and describe complex relations between predictors and target phenotypes (
      • Morota G.
      • Ventura R.V.
      • Silva F.F.
      • Koyama M.
      • Fernando S.C.
      Big Data Analytics And Precision Animal Agriculture Symposium: Machine learning and data mining advance predictive big data analysis in precision animal agriculture.
      ;
      • Abdollahi-Arpanahi R.
      • Gianola D.
      • Peñagaricano F.
      Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.
      ). Thus, the main advantage of using machine learning for phenotype prediction is related to the opportunity of using large data sets to simultaneously discover patterns by looking at a combination of factors instead of analyzing each feature individually. In practice, machine learning techniques and penalized regression offer greater flexibility in modeling complex relationships between high-throughput data and phenotypes. Recent studies indicate that some of these techniques provide similar or even better predictive ability than the standard approach (i.e., PLS regression;
      • Dórea J.R.R.
      • Rosa G.J.M.
      • Weld K.A.
      • Armentano L.E.
      Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
      ;
      • Vásquez N.
      • Magán C.
      • Oblitas J.
      • Chuquizuta T.
      • Avila-George H.
      • Castro W.
      Comparison between artificial neural network and partial least squares regression models for hardness modeling during the ripening process of Swiss-type cheese using spectral profiles.
      ;
      • Neto H.A.
      • Tavares W.L.F.
      • Ribeiro D.C.S.Z.
      • Alves R.C.O.
      • Fonseca L.M.
      • Campos S.V.A.
      On the utilization of deep and ensemble learning to detect milk adulteration.
      ). Machine learning techniques perform variable selection of a subset of the relevant variables, for example, decision tree-based ensemble methods RF and GBM (), aiming to reduce the dimensionality of the calibration data without compromising the predictive ability of the model.
      On the other hand, penalized regression through variable selection methods such as elastic net (EN) and Lasso, which select relevant variables, allow the less contributive variables to have a coefficient close to zero or equal to zero (
      • Zou H.
      • Hastie T.
      Regularization and variable selection via the elastic net.
      ). These algorithms learn and make predictions from the data, and thus provide more robust predictions due to their ability in dealing with variable interactions, nonlinear relationships, outliers, and imposing few or no specific prior assumptions regarding the trait distribution, leading to improvements in the model fitting ability (
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      ;
      • Neto H.A.
      • Tavares W.L.F.
      • Ribeiro D.C.S.Z.
      • Alves R.C.O.
      • Fonseca L.M.
      • Campos S.V.A.
      On the utilization of deep and ensemble learning to detect milk adulteration.
      ;
      • Lopez-Cruz M.
      • Olson E.
      • Rovere G.
      • Crossa J.
      • Dreisigacker S.
      • Mondal S.
      • Singh R.
      • de los Campos G.
      Regularized selection indices for breeding value prediction using hyper-spectral image data.
      ). These different techniques based on different mathematical algorithms, for example, penalized regression, random forest (RF), and gradient boosting machine (GBM), have been shown to improve predictive performance compared with PLS regression for different phenotypes, not only in animals (
      • Hempstalk K.
      • McParland S.
      • Berry D.P.
      Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows.
      ) but also in plants (
      • Morellos A.
      • Pantazi X.-E.
      • Moshou D.
      • Alexandridis T.
      • Whetton R.
      • Tziotzios G.
      • Wiebensohn J.
      • Bill R.
      • Mouazen A.M.
      Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy.
      ). Recently, machine learning algorithms have demonstrated their utility to predict difficult-to-measure traits such as tuberculosis status (
      • Denholm S.J.
      • Brand W.
      • Mitchell A.
      • Wells A.
      • Krzyzelewski T.
      • Smith S.
      • Wall E.
      • Coffey M.
      Predicting bovine tuberculosis status of dairy cows from mid-infrared spectral data of milk using deep learning.
      ), lactoferrin content (
      • Soyeurt H.
      • Grelet C.
      • McParland S.
      • Calmels M.
      • Coffey M.
      • Tedde A.
      • Delhez P.
      • Dehareng F.
      • Gengler N.
      A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra.
      ), metabolic status (
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ), conception success to a given insemination (
      • Hempstalk K.
      • McParland S.
      • Berry D.P.
      Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows.
      ), the content of blood BHB (
      • Pralle R.S.S.
      • Weigel K.W.W.
      • White H.M.M.
      Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.
      ), and feed intake (
      • Dórea J.R.R.
      • Rosa G.J.M.
      • Weld K.A.
      • Armentano L.E.
      Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
      ). These authors observed an increase in the predictive ability ranging from 1 to 30% using machine learning as compared with PLS regression. Thus, making an appropriate choice of these sophisticated tools is crucial for obtaining the maximum benefits for phenotype prediction from milk spectra.
      The aim of this study was to compare the predictive ability of different machine learning methods (GBM, RF, and penalized regression) against the standard model (PLS regression) using 3 different phenotypes differing in terms of their biological meaning and relationships with milk composition, variability, and heritability, namely BCS, blood BHB, and κ-CN, under 2 different cross-validation scenarios (samples-out random and leave-20%-herd/date-out).

      MATERIALS AND METHODS

      Ethics Statement

      This study did not require any specific ethics permit. The cows sampled belonged to private commercial herds and were not experimentally manipulated. Milk samples and blood samples collected during routine milk recording carried out by technicians from the Breeders Federation of Trento Province (FPA, Trento, Italy) and therefore authorized by a local authority.

      Field Data

      The present study is part of a broader project (Cowplus project), described in
      • Cecchinato A.
      • Bobbo T.
      • Ruegg P.L.
      • Gallo L.
      • Bittante G.
      • Pegolo S.
      Genetic variation in serum protein pattern and blood β-hydroxybutyrate and their relationships with udder health traits, protein profile, and cheese-making properties in Holstein cows.
      , which deals with the study of cattle farming in mountain areas. Briefly, blood and milk samples were collected from 1,508 cows of 3 specialized dairy breeds (Holstein-Friesian, Brown Swiss, and Jersey) and 3 dual-purpose breeds of Alpine origin (Simmental, Rendena, and Alpine Grey). The cows were housed in 41 multibreed farms (with at least 2 breeds/farm). For this study aimed at developing FTIR calibration equations, we selected the Holstein-Friesian breed, which was widely represented within the multibreed farms. The cows were sampled once (one herd per day) after their health status had been determined on the basis of rectal temperature, heart rate, respiratory profile, appetite, and fecal consistency. Only clinically healthy cows at the time of the visit were included in the study.
      A milk sample (50 mL) was taken from each cow during the evening milking by a trained technician, and kept at a temperature of 4°C (without preservative) until processing (within 24 h). An additional aliquot (50 mL) was brought to the laboratory of the Breeders Federation of Trento Province (Trento, Italy) for the routine milk analyses and FTIR spectra storage. Blood samples were taken by a veterinarian via jugular venipuncture (Venosafe) with no anticoagulant additive. At the same time spectra data were stored by the Breeders Federation of Trento Province (Trento, Italy).
      For this study 471 Holstein cows from 31 dairy farms with an average milk yield of 27.95 ± 9.42 kg, in the first to sixth parities with DIM ranging from 10 to 380, were used. Sampled farms operate under different dairy farming systems, ranging from the traditional small farms of the mountainous areas to the larger, more modern operations (
      • Stocco G.
      • Cipolat-Gotet C.
      • Bobbo T.
      • Cecchinato A.
      • Bittante G.
      Breed of cow and herd productivity affect milk composition and modeling of coagulation, curd firming, and syneresis.
      ). Milk production was monitored by the official milk recording system. Milk samples from each cow were collected during the evening milking (before feeding) and either (1) stored overnight at 4°C (without preservative) until assessment of technological properties (within 24 h) at the Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE) of the University of Padova, or (2) stored at −80°C (without preservative) until chromatographic analyses at DAFNAE's Central Chemical Laboratory.

      Phenotypes

      BCS.

      On the same day of milk sampling, the BCS of each Holstein cow was measured by a trained technician and scored according to the
      • Edmonson A.J.
      • Lean I.J.
      • Weaver L.D.
      • Farver T.
      • Webster G.
      A body condition scoring chart for Holstein dairy cows.
      classification on a scale ranging from 1 (emaciated) to 5 (extremely fat).

      Blood BHB.

      All blood samples were centrifuged at 1,780 × g for 10 min at 4°C, and the plasma obtained was then transferred at 4°C to the laboratory of the Department of Animal Medicine, Production and Health of the University of Padua (Italy), where it was stored at −20°C until analysis (
      • Cecchinato A.
      • Bobbo T.
      • Ruegg P.L.
      • Gallo L.
      • Bittante G.
      • Pegolo S.
      Genetic variation in serum protein pattern and blood β-hydroxybutyrate and their relationships with udder health traits, protein profile, and cheese-making properties in Holstein cows.
      ). The BHB concentration was determined by the Ranbut RX Monza test (Randox) on a Cobas C-501 analyzer (Roche Diagnostics).

      κ-CN.

      Milk proteins were separated using a validated reversed-phase high-performance liquid chromatography method, as described in detail by
      • Amalfitano N.
      • Stocco G.
      • Maurmayr A.
      • Pegolo S.
      • Cecchinato A.
      • Bittante G.
      Quantitative and qualitative detailed milk protein profiles of 6 cattle breeds: Sources of variation and contribution of protein genetic variants.
      . The milk κ-CN protein fraction was expressed as a percentage of the total milk nitrogen content (% N).
      Phenotypic records for BCS, BHB, and κ-CN were examined for possible outliers in each farm separately. Observations outside the interval between 3.0 standard deviations below and above the mean were excluded. After quality control, normal distribution for each phenotype was checked; phenotypic values for all traits are shown in Figure 1.
      Figure thumbnail gr1
      Figure 1Descriptive statistics of phenotypic data. Boxplot and histogram of the phenotypic values for BCS (A), blood BHB (B; mmol/L), and κ-CN expressed as % N (C). Min = minimum; Max = maximum. Horizontal lines within each boxplot represent the median value.

      FTIR Spectra.

      Individual milk samples were analyzed using a MilkoScanFT6000 (Foss Electric). For each sample, 1,060 absorbance values were recorded, and 2 spectra in the infrared region from 5,011 to 925 cm−1 were obtained and averaged before data analysis (
      • Bittante G.
      • Cecchinato A.
      Genetic analysis of the Fourier-transform infrared spectra of bovine milk with emphasis on individual wavelengths related to specific chemical bonds.
      ). This FTIR spectrum covers from the short-wavelength infrared to the long-wavelength infrared regions with a total of 1,060 spectral points that were included in the analyses (
      • Toledo-Alvarado H.
      • Vazquez A.I.
      • de los Campos G.
      • Tempelman R.J.
      • Bittante G.
      • Cecchinato A.
      Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows.
      ;
      • Cecchinato A.
      • Toledo-Alvarado H.
      • Pegolo S.
      • Rossoni A.
      • Santus E.
      • Maltecca C.
      • Bittante G.
      • Tiezzi F.
      Integration of wet-lab measures, milk infrared spectra, and genomics to improve difficult-to-measure traits in dairy cattle populations.
      ). The FTIR spectral transmittance (T) was transformed to absorbance (A) using the equation A = log(1/T) (Figure 2A). The FTIR quality control was assessed through principal component analysis using Mahalanobis distance to remove possible outliers, in accordance with
      • Shah N.K.
      • Gemperline P.J.
      A program for calculating Mahalanobis distances using principal component analysis.
      ; Figure 2B). After FTIR quality control, milk spectral data from 463 Holstein cows belonging to 31 herds were included in the subsequent analyses. The number of cows per herd ranged from 4 to 73.
      Figure thumbnail gr2
      Figure 2(A) Average value for Fourier-transform infrared (FTIR) spectra absorbance (solid line represents the average, and color region represents the mean ± 3 × SD) and (B) principal component (PC) for the FTIR spectral data of milk samples recorded on Holstein dairy cows.

      Cross-Validation Scenarios.

      Predictive ability for each target phenotype was assessed using 2 alternative cross-validation scenarios, namely samples-out random, and herd/date-out random. The training, tuning, and testing sets in each scenario were remained fixed for all methods evaluated.

      Samples-Out Random

      In this cross-validation scenario, the data set was split at random into 10-folds of approximately equal size. The training population consisted of 8-folds (368 cows for BCS, 360 for BHB, and 288 for κ-CN), and the validation set and testing set consisted of 1-fold (46 cows for BCS, 45 for BHB, and 35 for κ-CN). The sample-out random cross-validation scenario was repeated to ensure each fold was predicted in the testing set. This scenario was replicated 10 times, and the final correlation was estimated for each replication, and then their average was taken as the predictive ability.

      Herd/Date-Out

      The data set was randomly assigned to training, validation, and testing sets based on the herd and date from which the samples were collected. The training set consisted of 70% of the herds (n = 25), 10% (n = 2) of the herds was assigned to tuning set, and the testing set the other 20% (n = 6). This cross-validation scenario was repeated to ensure each herd was assigned to the testing set to be predicted and to evaluate the predictive ability of the model. The herd/date-out scenario was replicated 10 times, and model accuracy was taken as the mean from the number of repeats. Regarding the variability in the number of animals across the herd, the random sampling process was performed to ensure greater homogeneity in the number of animals assigned to training, tuning, and testing subsets. The number of cows in training set was in the range of 330 to 306 for BCS, 338 to 314 for BHB, and 268 to 246 for κ-CN, in validation set was in the range of 44 to 52 for BCS, 38 to 46 for BHB, and 30 to 38 for κ-CN, and in the testing set was in the range of 88 to 104 for BCS, 75 to 91 for BHB, and 60 to 75 for κ-CN.

      Statistical Analysis

      PLS Regression.

      The PLS regression was included because it is the most common approach for assessing FTIR predictive ability. The target phenotype (y) is predicted using a set of predictors (X) representing the standardized FTIR spectra-derived wavelength. The prediction is performed by extracting those linear combinations from the predictors (i.e., latent variables) that explain most of the response variable and predictor variation, and that therefore have the best predictive power (
      • Martens H.
      Reliable and relevant modelling of real world data: A personal account of the development of PLS Regression.
      ). Thus, in the PLS method, the predictors (X) are decomposed into factor scores (T) and loadings (P) as X = TPT. The target phenotype (y) can then be predicted by y = TBCT + E, where B is the diagonal matrix for the regression weights, T is the latent vector, C is the weight matrix for the dependent variable, and E is the residual matrix.
      The PLS regression was fitted using the R pls package (
      • Mevik B.-H.
      • Wehrens R.
      The pls package: Principal component and partial least squares regression in R.
      ). The number of latent variables included in each calibration equation was determined based on the model having the smallest root mean square error (RMSE) while maximizing the percentage of variance captured for each trait. In each cross-validation scenario, the training folds were used to optimize the parameters of the model (number of latent variables) and the validation set was used to evaluate the model performance aiming to develop a model with the most accurate predictions to be used in each disjoint testing set of each cross-validation scenario (
      • Goodfellow I.
      • Bengio Y.
      • Courville A.
      • Bengio Y.
      Deep Learning (Vol. 1, No. 2).
      ;
      • Eraslan G.
      • Avsec Ž.
      • Gagneur J.
      • Theis F.J.
      Deep learning: New computational modelling techniques for genomics.
      ).

      Penalized Regression and Supervised Machine Learning Methods.

      Two supervised machine learning (GBM and RF) and penalized regression methods were applied. The target phenotype at the ith individual (yi) was predicted given the FTIR spectrum wavelength (xij) with j ranging from 1 to 1,060. The h2o R package was used to fit these 3 nonlinear algorithms (https://github.com/h2oai/h2o-3).
      Penalized regression performs phenotypic prediction using penalized regression through the least absolute shrinkage and selection operator (Lasso) or ridge regression (RR). Elastic net (EN) was also performed combining both Lasso (λ1 regularization term) and RR penalties (λ2 regularization term), providing some balance between the Lasso and the RR methods. These regularizations can handle correlated and noisy predictors in which the λ1 regularization term imposes a penalty on the parameters and performs both parameter shrinking and variable selection, whereas a model with a λ2 penalty on the parameters does not have the variable selection property. In the penalized regression, λ1=Σ|βj| (Lasso) and λ2=Σβj2 (RR) are controlled by the α parameter (α).
      Considering a linear model yi=β0+j=11060xijβj+ei, the optimal weights λ1 and λ2 in the penalized regression are used to minimize the loss function, assumed as follows:
      L(λ1,λ2,β)=min[12Ni=1N{yi-(β0+j=11060xijβj)}2+Pα(β)],


      where N is the number of animals for each trait, Pα(β)=(1-α)βj2+αΣ|βj| is the EN penalty, which combines the RR and the Lasso regressions with α = λ1/(λ1 + λ2). Thus, if α = 0 the elastic net becomes a RR, whereas if α = 1, it becomes a Lasso regression.
      Random forest is a modification of bootstrap aggregation that fits several weak models and then combines the predictions to select a final predictive model. In this framework, the RF constructs a series of regression trees partitioning the original training data and randomly selects subsets of explanatory variables (FTIR spectra) as candidate predictors for splitting tree nodes (). The RF algorithm compresses 4 major parameters to be used in the prediction and is based on the number of information (y) and predictor variables (X; i.e., FTIR wavenumber, to which T decision trees are built); mtries, which is a randomly chosen subset from predictor X for determining a decision tree (splitting tree nodes); and Ntree, which is the number of decision trees to be used in the model. The RF regression is performed by averaging the output of the trees {T(x,ψb)}1B (
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      ):
      yi=1Bb=1BT(x,ψb),


      where Ψb refers to the bth RF tree associated with the split variables, that is, the threshold point at each node, and the terminal node value (
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      ), and B the number of bootstrap samples from the training data ().
      Overall, the RF (1) randomly selects a subset of observations (y) from the training data set, (2) randomly selects a subset of FTIR (mtries), and (3) creates a single tree by recursively splitting the subset of FTIR to create tree nodes, with the aim of dividing the subset of observations into distinct groups. For each node in the tree during the bootstrap sample process, the predictor variables (FTIR) with the greatest reduction in the mean square error (MSE) of the child node are selected to split the node, (4) uses all “out-of-bag” data to determine the prediction MSE of the tree, and (5) repeats steps 1 to 4 to generate a RF of trees. This process of node splitting is repeated until there are no changes in the MSE values (). The final predictions were obtained by averaging the values predicted at each tree.
      Gradient boosting machine builds a predictive model through an iterative way of assembling weak learners into strong learners for regression problems to reduce both bias and variance (
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      ). Gradient boosting machine builds regression trees sequentially with some shrinkage and variable selection in a fully distributed way, with each tree built in parallel (
      • Friedman J.
      • Hastie T.
      • Tibshirani R.
      Additive logistic regression: A statistical view of boosting.
      ;
      • Friedman J.H.
      Stochastic gradient boosting.
      ;
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      ). During the process, new models are added sequentially to minimize the prediction error (e=y-y) estimated in a previous model until no further improvements can be made (
      • Friedman J.
      • Hastie T.
      • Tibshirani R.
      Additive logistic regression: A statistical view of boosting.
      ). More details on the gradient boosting are in
      • Natekin A.
      • Knoll A.
      Gradient boosting machines, a tutorial.
      . The GBM method can be described as follows:
      y=m=1Mβmb(x,γm),


      where M represents the numbers of iterations (expansion coefficients), βm is the function increment, also known as the “boost,” and b(x, γm) is the base learner, a function of the multivariate argument x with a set of parameters γm = {γ1, γ2, ..., γm}. Expansions of the coefficients {βm}1M and parameters {γm}1M are used to map the FTIR (predictor variables, x) to the target phenotype (y) considering the joint distribution of all values (y, x) that minimize the loss function L{yi, F(x)} given [y,Fm-1(xi)+h(yi;xi,pm)], where pm is the FTIR (only 1 FTIR spectrum is selected at each iteration) minimizing i=1nL[y,Fm-1(xi)+h(yi;xi,pm)]. The GBM follows the algorithm specified by
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      .

      Grid Search.

      To identify the best combination of hyperparameters (i.e., adjustable parameters that must be tuned to control the learning process to obtain a model with optimal performance) in the supervised machine learning algorithms (RF and GBM) and penalized regression methods, we performed a random grid search (Figure 3) using the h2o.grid function in the h2o R package (https://cran.r-project.org/web/packages/h2o). The search grid was defined by specifying the main hyperparameters that maximized each model performance (highest coefficient of determination and lowest RMSE) for each trait. The grid search for machine learning and penalized regression was performed following the specifications indicated in
      • Goodfellow I.
      • Bengio Y.
      • Courville A.
      • Bengio Y.
      Deep Learning (Vol. 1, No. 2).
      and
      • Eraslan G.
      • Avsec Ž.
      • Gagneur J.
      • Theis F.J.
      Deep learning: New computational modelling techniques for genomics.
      . In this framework, we split the population into training, validation, and test set in samples-out random or herd/date-out cross-validation scenario (Figure 3). The training subset aims at the hyperparameter optimization and it was used in each cross-validation scenario, samples-out random cross-validation (368 cows for BCS, 360 for BHB, and 288 for κ-CN) and herd/date-out (330–306 for BCS, 338–314 for BHB, and 268–246 for κ-CN). The validation set was used for the generalization of prediction error in each cross-validation scenario: samples-out random cross-validation (46 cows for BCS, 45 for BHB, and 35 for κ-CN) and herd/date-out cross-validation (44–52 for BCS, 38–46 for BHB, and 30–38 for κ-CN). Thus, the trained model with the lowest RMSE was applied to disjoint testing set for samples-out random (46 cows for BCS, 45 for BHB, and 35 for κ-CN) and herd/date-out (88–104 for BCS, 75–91 for BHB, and 60–75 for κ-CN) to obtain the final estimation.
      Figure thumbnail gr3
      Figure 3Workflow of grid search for the hyperparameter optimization of supervised machine learning and penalized regression methods. The general process for the cross-validation includes splitting the whole population into training, validation, and testing subsets. The training and the validation subsets are used to select the main hyperparameters for each approach used for phenotypic prediction. The trained model with the best adjustment is then evaluated on the disjoint test subset in the cross-validation scenario. FTIR = Fourier-transform infrared spectra.

      Penalized Regression.

      A random search was performed to find the optimal values for the regularization parameters α and λ that maximized the model performance based on the highest coefficient of determination and the lowest RMSE. A random grid search for α parameters was performed considering values ranging from 0.0 (RR) to 1.0 (Lasso regression) with an interval of 0.05 and lambda using the parameter search_lambda in the h2o.glm function of the h2o R package. The parameters with the highest coefficient of determination and lowest RMSE indicated EN as the best regularization, outperforming the models Lasso by 5.8% and RR by 6.5% in predictive ability.

      RF.

      Two major parameters have a high impact on the outcome of an RF predictive model, namely the number of trees grown (Ntree) per forest and the number of predictors (FTIR) to be randomly sampled at each node (mtries;
      • Goldstein B.A.
      • Hubbard A.E.
      • Cutler A.
      • Barcellos L.F.
      An application of Random Forests to a genome-wide association dataset: Methodological considerations and new findings.
      ). Increasing the values assigned for Ntree and mtry leads to improvements in the predictive ability of RF until reaching a plateau (
      • Goldstein B.A.
      • Hubbard A.E.
      • Cutler A.
      • Barcellos L.F.
      An application of Random Forests to a genome-wide association dataset: Methodological considerations and new findings.
      ). To determine the minimum requirement for these parameters, we performed a random grid search to determine best combination of these parameters. The values of Ntree ranged from 10 to 10,000 in intervals of 10, and the mtries ranging from M to M in intervals of 20, where M is the total number of FTIR predictors (
      • Brieuc M.S.O.
      • Waters C.D.
      • Drinan D.P.
      • Naish K.A.
      A practical introduction to Random Forest for genetic association studies in ecology and evolution.
      ). The best combination of these parameters was identified and used in the subsequent analyses.

      GBM.

      The 4 most important parameters with a strong effect on the predictive ability of the model were used in a random grid search aimed at selecting the optimal number of trees that can minimize the validation error and increase the predictive ability (
      • Natekin A.
      • Knoll A.
      Gradient boosting machines, a tutorial.
      ). These hyperparameters include Ntree (total number of trees in the sequence used in the model), learning rate (determines the contribution of each tree on the final model and perform the shrinkage for avoiding variable overfitting), maximum tree depth (controls the depth of the individual trees to be considered in the model), and minimum samples per leaf (controls the complexity of each tree). The Ntree values used in the GBM were similar to those used in the RF method; the learning rate was in the range of 0.001 to 1 at intervals of 0.001; maximum tree depth was determined in the range of 1 to 80; minimum samples per leaf was determined from 1 to 100 at intervals of 5. The description of the hyperparameters considered in each model for each cross-validation scenario can be found in Supplemental Table S1 (https://figshare.com/articles/dataset/Untitled_Item/14355863).

      Model Fit Parameter Assessment.

      The predictive ability of the different statistical methods was assessed by Pearson correlation (r) between the observed and predicted phenotypes, and their standard deviation was assessed across the 10 replications used in the cross-validation scenarios. The second parameter used to assess the model performance was the RMSE.
      The slope of the linear regression of the observed, and predicted values in each model or cross-validation scenario for the phenotypic traits evaluated was used to assess the model unbiasedness. The Hotelling-Williams t-test, which takes account of the number of individuals in the validation set (
      • Dunn O.J.
      • Clark V.
      Comparison of tests of the equality of dependent correlation coefficients.
      ), was used to determine the significance level of the difference for predictive ability (Pearson correlation) and for slope between machine learning and penalized regression, with PLS regression, in which each of the 10 replications was considered as paired samples. Similarity between the predictive performance of the different models was assessed using Ward's hierarchical clustering method with a Euclidian distance analysis. The relative gain (RG) in predictive ability as measured as RG=(rm-rPLS)rPLS×100, where rm represents the predictive ability assessed using GBM, RF, or EN, and rPLS is the predictive ability obtained using PLS regression.

      RESULTS

      Machine Learning Models with Samples-Out Random Cross-Validation

      The predictive ability of the different machine learning methods (GBM, EN, and RF) and PLS regression for different traits using samples-out random cross-validation is shown in Table 1. Overall, the machine learning models exhibited greater predictive ability than the standard model (PLS) for all traits. The greatest accuracy was obtained with the GBM (ranging from 0.63 to 0.81), followed by RF and EN (ranging from 0.59 to 0.80), and finally PLS (ranging from 0.57 to 0.77). Comparison of the machine learning models (i.e., GBM, EN, and RF) with the standard model (PLS) using a Hotelling-Williams test (
      • Dunn O.J.
      • Clark V.
      Comparison of tests of the equality of dependent correlation coefficients.
      ) showed that the GBM and RF had significantly higher accuracy than the PLS regression (P < 0.05, Figure 4A). The similarity in model predictive ability across traits assessed by hierarchical clustering indicated a clear difference between GBM and RF compared with PLS regression (Figure 4B). However, EN was similar to PLS regression in terms of predictive ability (Figure 4B).
      Table 1Predictive ability (r), root mean square error (RMSE), and slope prediction (SD in parentheses, refers to variation between replications used in the cross-validation scenarios) of the standard model (PLS) and machine learning methods for BCS, BHB, and κ-CN using milk spectra data
      TraitModel
      EN = elastic net; GBM = gradient boosting machine; RF = random forest; PLS = partial least square.
      Samples-out random cross-validationHerd/date-out random cross-validation
      r, training
      Represents the average of predictive ability in the training set for each trait across 10 replicates.
      r, validation
      Represents the average of predictive ability in the validation set across 10 replicates.
      RMSE validation
      Average of RMSE for each trait in the validation set.
      Slope prediction
      Represents the average value for the slope of the regression of observed, and predicted value across the cross-validation scenarios for each trait.
      r, training
      Represents the average of predictive ability in the training set for each trait across 10 replicates.
      r, validation
      Represents the average of predictive ability in the validation set across 10 replicates.
      RMSE validation
      Average of RMSE for each trait in the validation set.
      Slope prediction
      Represents the average value for the slope of the regression of observed, and predicted value across the cross-validation scenarios for each trait.
      BCSEN0.92 (0.003)0.59 (0.030)0.27 (0.025)1.22 (0.035)0.85 (0.001)0.55 (0.051)0.24 (0.042)1.28 (0.041)
      GBM0.91 (0.002)0.63 (0.023)0.25 (0.017)1.07 (0.030)0.88 (0.002)0.58 (0.048)0.23 (0.042)1.13 (0.039)
      RF0.95 (0.001)0.61 (0.028)0.26 (0.038)1.29 (0.038)0.82 (0.007)0.58 (0.051)0.24 (0.048)1.30 (0.045)
      PLS0.95 (0.001)0.57 (0.034)0.35 (0.036)0.89 (0.047)0.89 (0.004)0.53 (0.057)0.25 (0.053)0.79 (0.095)
      BHB (mmol/L)EN0.89 (0.004)0.78 (0.023)0.10 (0.012)1.22 (0.034)0.86 (0.003)0.70 (0.039)0.12 (0.019)1.29 (0.042)
      GBM0.90 (0.001)0.80 (0.023)0.09 (0.009)1.03 (0.026)0.90 (0.001)0.73 (0.030)0.11 (0.016)1.14 (0.033)
      RF0.90 (0.001)0.79 (0.027)0.10 (0.011)1.30 (0.034)0.85 (0.003)0.73 (0.040)0.12 (0.017)1.44 (0.037)
      PLS0.88 (0.002)0.76 (0.030)0.10 (0.012)0.89 (0.037)0.92 (0.002)0.68 (0.048)0.12 (0.015)0.92 (0.061)
      κ-CN (% N)EN0.96 (0.001)0.79 (0.027)1.25 (0.049)0.92 (0.035)0.86 (0.004)0.74 (0.030)1.29 (0.067)1.20 (0.061)
      GBM0.97 (0.001)0.81 (0.025)1.08 (0.046)1.06 (0.034)0.88 (0.002)0.77 (0.029)1.14 (0.048)1.13 (0.047)
      RF0.96 (0.001)0.80 (0.030)1.18 (0.052)1.21 (0.037)0.90 (0.002)0.77 (0.031)1.17 (0.054)1.25 (0.062)
      PLS0.90 (0.008)0.77 (0.034)1.41 (0.062)1.37 (0.041)0.90 (0.003)0.73 (0.035)1.26 (0.073)0.90 (0.067)
      1 EN = elastic net; GBM = gradient boosting machine; RF = random forest; PLS = partial least square.
      2 Represents the average of predictive ability in the training set for each trait across 10 replicates.
      3 Represents the average of predictive ability in the validation set across 10 replicates.
      4 Average of RMSE for each trait in the validation set.
      5 Represents the average value for the slope of the regression of observed, and predicted value across the cross-validation scenarios for each trait.
      Figure thumbnail gr4
      Figure 4(A) Boxplot of predictive ability values considering the 10 replicates of samples-out random cross-validation of each model for BCS, BHB (mmol/L), and κ-CN (% N). Horizontal line within each boxplot represents the median, dots represent the predictive ability values, and red dots are the mean value obtained across the 10 replicates. (B) Ward's hierarchical clustering of models based on predictive ability across all cross-validation and trait combinations. (C) Boxplot of the slope values for each trait across the models evaluated. Dots represent outlier values obtained for slope values. EN = elastic net; GBM = gradient boosting machine; RF = random forest; PLS = partial least square. Asterisks represent a significant difference in machine learning as compared with the standard model (PLS), in which each of the 10 replications was considered as paired samples (*P < 0.05 and **P < 0.01).
      The slope values indicate the effect of different models on phenotype prediction (Table 1 and Figure 4C). The slope values for EN, RF, and PLS differed significantly from 1, indicating biased predictions for all the traits evaluated (Figure 4C). The regression coefficients for BCS using PLS, and for κ-CN using EN were biased, with values lower than 1 leading to increased variance of predictions (Figure 4C). On the other hand, RF produced more biased phenotypic predictions, with slope values ranging from 1.41 to 1.50 reducing the variance. The GBM produced predictions for difficult-to-measure traits that were less biased than those of the other models (Figure 4C) and differed significantly from those obtained by PLS regression (Figure 4C). Conversely, RF regression models had a predictive slope greater than 1 for all traits, increasing from 41% for BCS to 50% for κ-CN (Figure 4C).
      To demonstrate the effect of machine learning methods on predictive ability, we assessed the RG from machine learning techniques compared with PLS regression. Among the models evaluated, GBM increased the predictive ability (RG) by 7.46%, RF by 5.42%, and EN by 3.29% compared with the PLS model. Tree-based ensemble models (GBM and RF) increased predictive accuracy across traits, with the highest gains in accuracy being for BCS (11.92%), the smallest for κ-CN (3.90%). The Hotelling-Williams t-test showed that the increment in predictive ability for GBM and RF was significant (P < 0.05) for all traits, whereas for EN no statistical difference was observed (P = 0.105; Figure 4A). Assessment of model fit based on RMSE indicated that machine learning techniques considerably reduced the predictive error of estimation by 33% for GBM, by 26% for RF, and by 25% for EN compared with PLS predictions (Table 1). Overall, GBM yielded the lowest RMSE value for all traits evaluated (Table 1).

      Machine Learning Models with Herd/Date-Out Cross-Validation

      Using the herd/date-out cross-validation scenario, the predictive ability of the models based on FTIR milk spectra was lower than using the samples/out random cross-validation scenario (Table 1). Prediction accuracy across traits ranged from 0.53 ± 0.171 to 0.77 ± 0.014 (Table 1). The performance of the GBM and RF models was similar for all traits evaluated (Table 1). However, the RF model exhibited the greatest variation in predictive ability and the highest RMSE values (Table 1). Machine learning techniques (GBM, EN, and RF) outperformed PLS regression, although no differences in their levels of accuracy were found when they were compared using Hotelling-Williams t-test (P > 0.05; Figure 5A). Regarding predictive values for difficult-to-measure traits, GBM and RF were more similar to each other than EN and PLS (Figure 5B).
      Figure thumbnail gr5
      Figure 5(A) Boxplot of predictive ability values considering the 10 replicates of herd/date-out cross-validation of each model for BCS, BHB (mmol/L), and κ-CN (% N). Horizontal line within each boxplot represents the median, dots represent the predictive ability values, and red dots are the mean value obtained across the 10 replicates. (B) Ward's hierarchical clustering of models based on predictive ability for each model across all trait combinations. (C) Boxplot of the slope values for each trait across the models evaluated. Dots represent outlier values obtained for slope values. EN = elastic net; GBM = gradient boosting machine; RF = random forest; PLS = partial least square. The asterisk represents a significant difference in machine learning as compared with the standard model (PLS), in which each of the 10 replications was considered as paired samples (*P < 0.05 and **P < 0.01).
      As shown in Figure 5C, the slope values estimated as the slope of the regression of the observed phenotype on predicted values indicated a significant difference between machine learning and PLS regression for all traits. Using PLS regression, the estimated values exhibited a predictive slope lower than 1 for all the traits, with values ranging from −21% for BCS to −11.6% for blood BHB, indicating an increase in the variance of predictions (Figure 5C). The average slope value for GBM ranged from 1.12 to 1.17, indicating lower slope for predictions, whereas the slope values for RF and EN differed significantly from the unit with biased predictions (Figure 5C). These biased predictions increased from 23% to 32% for EN, and from 35% to 44% for RF.
      Machine learning methods outperformed PLS and exhibited the greatest predictive ability of 8.49% for GBM, 8.10% for RF, and 3.71% for EN. The highest relative improvement in predictive ability was found using regression tree models for BCS of 10.81% for GBM, and RF of 10.20% RF, and blood BHB of 10.20% for GBM, and 7.32% for RF. Assessed by RMSE, slight reductions in the predictive error of estimation were obtained with the GBM model compared with the EN and RF models for BCS and BHB, and significant reductions were obtained with the GBM model compared with the PLS model (Table 1).

      Comparing the Samples-Out Random and Herd/Date-Out Cross-Validation Scenarios

      As shown in Table 1, samples-out random cross-validation had a higher accuracy of prediction than the herd/date-out cross-validation scenario. Both validation groups were considered as random samples, but the testing set in the samples-out random had a closer relationship with the training set (reference population) as records from the same farms were included in both the training and the validation subsets. On the other hand, the leaving 20% of herd/date-out cross-validation, records from different farms were assigned to 2 different subsets (training or testing), resulting in a 8% reduction in model performance compared with samples-out random cross-validation (Table 1). In general, we found the machine learning techniques evaluated here to be useful tools for improving FTIR-based predictions of phenotypes differing in terms of biological background and variability across different cross-validation strategies.

      DISCUSSION

      Predictive Performance of Machine Learning Versus PLS Models

      This work demonstrates that it is feasible to predict different phenotypic traits in Holstein cattle using machine learning techniques from FTIR spectra data. Previous research has suggested some promise in using different machine learning methods to improve FTIR-based prediction (
      • Hempstalk K.
      • McParland S.
      • Berry D.P.
      Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows.
      ;
      • Dórea J.R.R.
      • Rosa G.J.M.
      • Weld K.A.
      • Armentano L.E.
      Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
      ;
      • Pralle R.S.S.
      • Weigel K.W.W.
      • White H.M.M.
      Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.
      ;
      • Xu W.
      • van Knegsel A.T.M.
      • Vervoort J.J.M.
      • Bruckmaier R.M.
      • van Hoeij R.J.
      • Kemp B.
      • Saccenti E.
      Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
      ;
      • Denholm S.J.
      • Brand W.
      • Mitchell A.
      • Wells A.
      • Krzyzelewski T.
      • Smith S.
      • Wall E.
      • Coffey M.
      Predicting bovine tuberculosis status of dairy cows from mid-infrared spectral data of milk using deep learning.
      ;
      • Soyeurt H.
      • Grelet C.
      • McParland S.
      • Calmels M.
      • Coffey M.
      • Tedde A.
      • Delhez P.
      • Dehareng F.
      • Gengler N.
      A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra.
      ). For high-throughput phenotyping in dairy cattle, machine learning takes advantage of its model flexibility outperforming PLS regression (Table 1). Benchmarking, the RF and GBM methods with EN and PLS regression exhibited competitive predictive ability (Figure 4, Figure 5), with lower RMSE for all traits (Table 1). This result was not entirely unexpected, due to the fact that RF and GBM can better model the complex relationships (e.g., nonlinear and interactions) between the FTIR variables and the target trait (;
      • Friedman J.H.
      Stochastic gradient boosting.
      ;
      • Natekin A.
      • Knoll A.
      Gradient boosting machines, a tutorial.
      ) and handle the problems associated with multicollinearity among wavenumber regions (Supplemental Figure S1, https://figshare.com/articles/dataset/Untitled_Item/14355863).
      • Tomaschek F.
      • Hendrix P.
      • Baayen R.H.
      Strategies for addressing collinearity in multivariate linguistic data.
      indicated that models considering regression with regularization (EN), and RF outperform PLS regression combined with a generalized linear model (SCGLR) to manage collinearity in data sets. Indeed, differences in predictive ability between machine learning techniques and PLS regression can be related to differences in the way milk FTIR spectra are processed and subsequently used to make predictions from new data. On the other hand, the superior performance of RF and GBM could be explained by the power of ensemble methods to combine several models to generate high-performance predictions rather than utilizing a single model (
      • Brieuc M.S.O.
      • Waters C.D.
      • Drinan D.P.
      • Naish K.A.
      A practical introduction to Random Forest for genetic association studies in ecology and evolution.
      ).
      The performance of algorithms determines their application in practice, although the relative higher predictive ability observed for RF and GBM indicates the potential usefulness of these model for FTIR-based predictions of different phenotypic traits. Further testing of these machine learning algorithms using a larger data set, as well as more comprehensive on-farm cow data (i.e., data across lactation), is required, however.

      Predictive Performance of Alternative Machine Learning Techniques

      In FTIR-based phenotype prediction, the usefulness of supervised machine learning techniques depends on the extent to which the methods accurately encode prior knowledge about the predictors and the target trait. Our research evaluating the predictive ability of PLS regression with that of the machine learning techniques with different noise reductions has shown that regression tree-based models (GBM and RF) significantly improved predictive ability (Table 1 and Figure 4). The statistical methods evaluated exhibit differences related to predictor variable selection leading to different levels of noise reduction in the training data set. Some studies have shown that machine learning techniques are powerful and flexible and make more accurate predictions from milk infrared spectra than PLS regression (
      • Dórea J.R.R.
      • Rosa G.J.M.
      • Weld K.A.
      • Armentano L.E.
      Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
      ;
      • Pralle R.S.S.
      • Weigel K.W.W.
      • White H.M.M.
      Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.
      ). However, other studies using Bayesian variable selection or shrinkage methods found no significant improvements (
      • Ferragina A.
      • de los Campos G.
      • Vazquez A.I.
      • Cecchinato A.
      • Bittante G.
      Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data.
      ;
      • Bonfatti V.
      • Tiezzi F.
      • Miglior F.
      • Carnier P.
      Comparison of Bayesian regression models and partial least squares regression for the development of infrared prediction equations.
      ;
      • El Jabri M.
      • Sanchez M.P.
      • Trossat P.
      • Laithier C.
      • Wolf V.
      • Grosperrin P.
      • Beuvier E.
      • Rolet-Répécaud O.
      • Gavoye S.
      • Gaüzère Y.
      • Belysheva O.
      • Notz E.
      • Boichard D.
      • Delacroix-Buchet A.
      Comparison of Bayesian and partial least squares regression methods for mid-infrared prediction of cheese-making properties in Montbéliarde cows.
      ).
      Statistical analysis of model fitting using milk FTIR spectra showed that the machine learning techniques, mainly GBM, produced more accurate predictions than PLS regression for the different traits evaluated (Table 1). Overall, considering the different cross-validation scenarios, GBM increased predictive ability by 10% for BCS, 6.3% for BHB, and 6.1% for κ-CN (Figure 4, Figure 5) and led to reductions on RMSE by 17% for BCS, 9% for BHB, and 16% for κ-CN (Table 1). These improvements in predictive ability can be explained by the fact that this method selects wavelengths that are more informative and explain a large amount of phenotypic variation for the target trait. The predictive equations for PLS and the machine learning techniques were developed based on whole-milk spectra without removing the water absorption regions, which gives an advantage to the machine learning techniques as these models can select informative wavelengths and mitigate the effects of noninformative regions (
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning. Springer Series in Statistics.
      ;
      • Natekin A.
      • Knoll A.
      Gradient boosting machines, a tutorial.
      ). The main idea behind the use of machine learning techniques is that a shrinkage or variable selection procedure can be performed to identify a subset of predictors, providing tremendous flexibility to adapt to complex associations between predictors and target phenotype (
      • Friedman J.H.
      Stochastic gradient boosting.
      ;
      • Zou H.
      • Hastie T.
      Regularization and variable selection via the elastic net.
      ). Our results show that tree-based variable selection using the RF and GBM methods were the most parsimonious for making accurate predictions. According to
      • Gianola D.
      Priors in whole-genome regression: The Bayesian alphabet returns.
      and
      • Hapfelmeier A.
      • Ulm K.
      A new variable selection approach using Random Forests.
      , models able to remove noninformative or redundant predictors reduce the uncertainty of predictions and hence increase model predictive ability.
      The tree-based ensemble has advantages, such as being a robust algorithm for dealing with a large number of covariates.
      • Agjee N.H.
      • Mutanga O.
      • Peerbhay K.
      • Ismail R.
      The impact of simulated spectral noise on random forest and oblique random forest classification performance.
      investigated the effects of simulated spectral noise on RF model performance and found a decrease in classification accuracy when unfavorable noisy data levels were added. According to
      • Ross R.
      • Kelleher J.
      A Comparative Study of the Effect of Sensor Noise on Activity Recognition Models.
      , the predictive ability of the RF model decreased by 8% when noninformative regions were included in the predictor database. In this context, the GBM approach was able to deal with complex scenarios and achieved higher prediction accuracies using whole-milk FTIR spectra.

      Predictive Ability of Complex Traits Using FTIR Spectroscopy

      The moderate prediction accuracies for BCS using samples-out random (r = 0.57–0.63) and herd/date-out (r = 0.53–0.58) suggest that FTIR data could be useful for predicting changes in body reserves (Table 1). These moderate accuracies highlight the potential of FTIR spectral data to capture the biological association between milk compounds and energy balance, which are key components of the cow's energy status. Some studies have shown that milk infrared spectra can be used to assess BCS, BW, and body energy status (
      • McParland S.
      • Banos G.
      • Wall E.
      • Coffey M.P.
      • Soyeurt H.
      • Veerkamp R.F.
      • Berry D.P.
      The use of mid-infrared spectrometry to predict body energy status of Holstein cows.
      ,
      • McParland S.
      • Banos G.
      • McCarthy B.
      • Lewis E.
      • Coffey M.P.
      • O'Neill B.
      • O'Donovan M.
      • Wall E.
      • Berry D.P.
      Validation of mid-infrared spectrometry in milk for predicting body energy status in Holstein-Friesian cows.
      ,
      • McParland S.
      • Lewis E.
      • Kennedy E.
      • Moore S.
      • McCarthy B.
      • O'Donovan M.
      • Butler S.
      • Pryce J.
      • Berry D.
      Mid-infrared spectrometry of milk as a predictor of energy intake and efficiency in lactating dairy cows.
      ;
      • Wallén S.E.
      • Prestløkken E.
      • Meuwissen T.H.E.
      • McParland S.
      • Berry D.P.
      Milk mid-infrared spectral data as a tool to predict feed intake in lactating Norwegian Red dairy cows.
      ). The moderate accuracy achieved by GBM in predicting BCS demonstrates the usefulness of this technique in predicting body energy status-related traits in milk recording schemes. Furthermore, this predictive ability can be increased by including the major milk components, which are routinely assessed during milk recording (
      • McParland S.
      • Banos G.
      • Wall E.
      • Coffey M.P.
      • Soyeurt H.
      • Veerkamp R.F.
      • Berry D.P.
      The use of mid-infrared spectrometry to predict body energy status of Holstein cows.
      ). The GBM model might therefore be used to effectively and accurately assess energy balance-related traits in cows aiming to improve management strategies and mitigating the effects of negative energy balance during milk production.
      The performance of the calibration equation using milk FTIR spectra showed that the machine learning techniques accurately predict blood BHB, and κ-CN expressed as % N (Figure 4, Figure 5), which can be explained by the fact that these methods select the most informative spectra to explain a large amount of the phenotypic variation. As robust algorithms, the GBM and RF models have numerous advantages as they are less sensitive to noisy spectral regions (
      • Natekin A.
      • Knoll A.
      Gradient boosting machines, a tutorial.
      ).
      The prediction accuracy for blood BHB ranged from 0.68 to 0.80, higher than that obtained with the PLS model by
      • Belay T.K.K.
      • Dagnachew B.S.S.
      • Kowalski Z.M.M.
      • Ådnøy T.
      An attempt at predicting blood β-hydroxybutyrate from Fourier-transform mid-infrared spectra of milk using multivariate mixed models in Polish dairy cattle.
      (0.46 to 0.66) and
      • Bonfatti V.
      • Turner S.-A.
      • Kuhn-Sherlock B.
      • Luke T.D.W.
      • Ho P.N.
      • Phyn C.V.C.
      • Pryce J.E.
      Prediction of blood β-hydroxybutyrate content and occurrence of hyperketonemia in early-lactation, pasture-grazed dairy cows using milk infrared spectra.
      (around 0.72).
      • Luke T.D.W.
      • Rochfort S.
      • Wales W.J.
      • Bonfatti V.
      • Marett L.
      • Pryce J.E.
      Metabolic profiling of early-lactation dairy cows using milk mid-infrared spectra.
      found the mid-infrared spectroscopy prediction model for blood BHB to be accurate (0.69 ≤ r ≤ 0.77), and that milk spectra could be useful for evaluating ketosis risk in dairy cattle.
      • Caldeira M.O.
      • Dan D.
      • Neuheuser A.
      • Stürmlin R.
      • Weber C.
      • Glauser D.L.
      • Stierli M.
      • Schuler U.
      • Moll J.
      • Wegmann S.
      • Bruckmaier R.M.
      • Gross J.J.
      Opportunities and limitations of milk mid-infrared spectra-based estimation of acetone and β-hydroxybutyrate for the prediction of metabolic stress and ketosis in dairy cows.
      evaluated the ability of milk FTIR spectra to predict blood BHB in several breeds (Holstein, Brown Swiss, and Swiss Fleckvieh), and observed a lower level of accuracy (r = 0.37) than in the present study.
      • Pralle R.S.S.
      • Weigel K.W.W.
      • White H.M.M.
      Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.
      instead found that the artificial neuro net increased the predictive ability of PLS, with values of 0.65 for artificial neuro net and 0.58 for PLS.
      • Grelet C.
      • Vanlierde A.
      • Hostens M.
      • Foldager L.
      • Salavati M.
      • Ingvartsen K.L.
      • Crowe M.
      • Sorensen M.T.
      • Froidmont E.
      • Ferris C.P.
      • Marchitelli C.
      • Becker F.
      • Larsen T.
      • Carter F.
      • Dehareng F.
      Potential of milk mid-IR spectra to predict metabolic status of cows through blood components and an innovative clustering approach.
      obtained a higher predictive ability with PLS (0.83) than the current study (0.76 for samples-out random, and 0.68 for herd/date-out), having excluded noisy FTIR wavelengths, as well as included parity and daily milk yield in the prediction models. In this regard, the inclusion of on-farm information, such as herd effects and milk production, can be a key factor in improving the predictive ability of machine learning techniques, but it is not feasible for an immediate prediction during milk recording.
      The predictive ability for the milk protein fraction κ-CN (% N) in 2 cross-validation scenarios ranged from 0.72 for PLS to 0.81 for GBM, a significant difference in prediction accuracy (Table 1). The superior performance achieved using machine learning techniques confirm these ensemble methods as powerful tools to learn the complex relationships between milk spectra and the target phenotype (κ-CN), and to capture the biological relationships among them. Several studies obtained a lower predictive ability for κ-CN than what is reported in the current study, with values ranging from 0.50 to 0.67 (
      • Bonfatti V.
      • Di Martino G.
      • Carnier P.
      Effectiveness of mid-infrared spectroscopy for the prediction of detailed protein composition and contents of protein genetic variants of individual milk of Simmental cows.
      ;
      • Rutten M.J.
      • Bovenhuis H.
      • Heck J.M.L.M.L.
      • van Arendonk J.A.M.A.M.
      Predicting bovine milk protein composition based on Fourier transform infrared spectra.
      ;
      • McDermott A.
      • Visentin G.
      • De Marchi M.
      • Berry D.P.
      • Fenelon M.A.
      • O'Connor P.M.
      • Kenny O.A.
      • McParland S.
      Prediction of individual milk proteins including free amino acids in bovine milk using mid-infrared spectroscopy and their correlations with milk processing characteristics.
      ).
      • Ferragina A.
      • de los Campos G.
      • Vazquez A.I.
      • Cecchinato A.
      • Bittante G.
      Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data.
      and
      • Bonfatti V.
      • Tiezzi F.
      • Miglior F.
      • Carnier P.
      Comparison of Bayesian regression models and partial least squares regression for the development of infrared prediction equations.
      compared PLS and Bayesian models and found the Bayesian models to have slightly better predictive ability. The moderate predictive ability of FTIR for κ-CN expressed as % N could be related to this casein thus expressed being a fraction of the milk protein content and hence dependent on the ratio between the content of individual proteins and the total milk protein content.

      Predictive Ability Across Cross-Validation Strategies

      Cross-validation is considered the gold-standard technique for evaluating model prediction performance in dairy cattle. In this study, we found model predictive ability differed between samples-out random and herd/date-out cross-validation scenarios. This is because with samples-out random the training and validation data sets contain samples from all the herds, while with herd/date-out the data set is split into training and validation sets that are independent of each other, which increases the variability in predictive accuracy (higher standard deviation) and the model predictive error. Although, the herd/date-out reflects the real conditions of milk recording and avoid apparent overfitting of prediction equations. A similar pattern for FTIR predictive ability using samples-out random and leaving 20% of herd/date-out cross-validation scenarios was observed by
      • Wang Q.
      • Bovenhuis H.
      Validation strategy can result in an overoptimistic view of the ability of milk infrared spectra to predict methane emission of dairy cattle.
      , who concluded that samples-out random results in over-optimistic predictive ability estimates. Some studies in other fields of science have reported reductions in the predictive ability of models when using less dependence between training and validation sets (
      • Roberts D.R.
      • Bahn V.
      • Ciuti S.
      • Boyce M.S.
      • Elith J.
      • Guillera-Arroita G.
      • Hauenstein S.
      • Lahoz-Monfort J.J.
      • Schröder B.
      • Thuiller W.
      • Warton D.I.
      • Wintle B.A.
      • Hartig F.
      • Dormann C.F.
      Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure.
      ;
      • Meyer H.
      • Reudenbach C.
      • Hengl T.
      • Katurji M.
      • Nauss T.
      Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation.
      ). The main idea behind leaving 20% ​​of the herd/date-out in cross-validation is to ensure that the training and validation sets are independent of each other. It is important to note that leaving 20% of herd/date-out cross-validation scenario, instead of the samples-out random cross-validation, reduced prediction performance in the test data by around 5% to 8% for BCS, 8% to 10.5% for serum BHB, and 4% to 6.5% for κ-CN. However, reducing the dependence between training and validation sets is fundamental for an accurate assessment of prediction performances and for implementing FTIR prediction at the population level. In this context, tree-based models (RF and GBM) improved the model robustness in both cross-validation scenarios, resulting therefore in greater model precision and accuracy.

      CONCLUSIONS

      In this study, we evaluated the prediction of different phenotypes using FTIR spectra and different statistical methods, including the standard PLS regression and 3 machine learning methods, namely GBM, RF, and EN. Machine learning techniques, in particular GBM, improved predictive ability for BCS, blood BHB, and κ-CN traits consistently, and tended to outperform the other models. The superior ability of GBM to predict difficult-to-measure milk traits shows that this approach can be a promising technique for improving the robustness and accuracy of predictions potentially exploitable both for selective breeding and dairy cattle industry, provided that routine spectra acquisition within milk recording scenarios is performed and available for breeders. However, an independent validation scenario, which for instance takes into consideration more cows, herds sampled in different years, and different cattle breeds, would be desirable for an accurate evaluation of calibration equations.

      ACKNOWLEDGMENTS

      We thank the Breeders Federation of Trento Province (FPA, Trento, Italy) for providing milk spectra. The authors have not stated any conflicts of interest.

      REFERENCES

        • Abdollahi-Arpanahi R.
        • Gianola D.
        • Peñagaricano F.
        Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.
        Genet. Sel. Evol. 2020; 52 (32093611): 12
        • Agjee N.H.
        • Mutanga O.
        • Peerbhay K.
        • Ismail R.
        The impact of simulated spectral noise on random forest and oblique random forest classification performance.
        J. Spectrosc. 2018; 2018: 1-8
        • Amalfitano N.
        • Stocco G.
        • Maurmayr A.
        • Pegolo S.
        • Cecchinato A.
        • Bittante G.
        Quantitative and qualitative detailed milk protein profiles of 6 cattle breeds: Sources of variation and contribution of protein genetic variants.
        J. Dairy Sci. 2020; 103 (33069399): 11190-11208
        • Baum A.
        • Hansen P.
        • Nørgaard L.
        • Sørensen J.
        • Mikkelsen J.
        Rapid quantification of casein in skim milk using Fourier transform infrared spectroscopy, enzymatic perturbation, and multiway partial least squares regression: Monitoring chymosin at work.
        J. Dairy Sci. 2016; 99 (27265175): 6071-6079
        • Belay T.K.K.
        • Dagnachew B.S.S.
        • Kowalski Z.M.M.
        • Ådnøy T.
        An attempt at predicting blood β-hydroxybutyrate from Fourier-transform mid-infrared spectra of milk using multivariate mixed models in Polish dairy cattle.
        J. Dairy Sci. 2017; 100 (28571989): 6312-6326
        • Bittante G.
        • Cecchinato A.
        Genetic analysis of the Fourier-transform infrared spectra of bovine milk with emphasis on individual wavelengths related to specific chemical bonds.
        J. Dairy Sci. 2013; 96 (23810593): 5991-6006
        • Bittante G.
        • Cipolat-Gotet C.
        • Cecchinato A.
        Genetic parameters of different FTIR-enabled phenotyping tools derived from milk fatty acid profile for reducing enteric methane emissions in dairy cattle.
        Animals (Basel). 2020; 10 (32942618)1654
        • Bonfatti V.
        • Di Martino G.
        • Carnier P.
        Effectiveness of mid-infrared spectroscopy for the prediction of detailed protein composition and contents of protein genetic variants of individual milk of Simmental cows.
        J. Dairy Sci. 2011; 94 (22118068): 5776-5785
        • Bonfatti V.
        • Tiezzi F.
        • Miglior F.
        • Carnier P.
        Comparison of Bayesian regression models and partial least squares regression for the development of infrared prediction equations.
        J. Dairy Sci. 2017; 100 (28647337): 7306-7319
        • Bonfatti V.
        • Turner S.-A.
        • Kuhn-Sherlock B.
        • Luke T.D.W.
        • Ho P.N.
        • Phyn C.V.C.
        • Pryce J.E.
        Prediction of blood β-hydroxybutyrate content and occurrence of hyperketonemia in early-lactation, pasture-grazed dairy cows using milk infrared spectra.
        J. Dairy Sci. 2019; 102 (31079906): 6466-6476
        • Breiman L.
        Random forests.
        Mach. Learn. 2001; 45: 5-32
        • Brieuc M.S.O.
        • Waters C.D.
        • Drinan D.P.
        • Naish K.A.
        A practical introduction to Random Forest for genetic association studies in ecology and evolution.
        Mol. Ecol. Resour. 2018; 18 (29504715): 755-766
        • Caldeira M.O.
        • Dan D.
        • Neuheuser A.
        • Stürmlin R.
        • Weber C.
        • Glauser D.L.
        • Stierli M.
        • Schuler U.
        • Moll J.
        • Wegmann S.
        • Bruckmaier R.M.
        • Gross J.J.
        Opportunities and limitations of milk mid-infrared spectra-based estimation of acetone and β-hydroxybutyrate for the prediction of metabolic stress and ketosis in dairy cows.
        J. Dairy Res. 2020; 87 (32308161): 196-203
        • Cecchinato A.
        • Bobbo T.
        • Ruegg P.L.
        • Gallo L.
        • Bittante G.
        • Pegolo S.
        Genetic variation in serum protein pattern and blood β-hydroxybutyrate and their relationships with udder health traits, protein profile, and cheese-making properties in Holstein cows.
        J. Dairy Sci. 2018; 101 (30316608): 11108-11119
        • Cecchinato A.
        • de Marchi M.
        • Gallo L.
        • Bittante G.
        • Carnier P.
        Mid-infrared spectroscopy predictions as indicator traits in breeding programs for enhanced coagulation properties of milk.
        J. Dairy Sci. 2009; 92 (19762848): 5304-5313
        • Cecchinato A.
        • Toledo-Alvarado H.
        • Pegolo S.
        • Rossoni A.
        • Santus E.
        • Maltecca C.
        • Bittante G.
        • Tiezzi F.
        Integration of wet-lab measures, milk infrared spectra, and genomics to improve difficult-to-measure traits in dairy cattle populations.
        Front. Genet. 2020; 11 (33133149)563393
        • Denholm S.J.
        • Brand W.
        • Mitchell A.
        • Wells A.
        • Krzyzelewski T.
        • Smith S.
        • Wall E.
        • Coffey M.
        Predicting bovine tuberculosis status of dairy cows from mid-infrared spectral data of milk using deep learning.
        J. Dairy Sci. 2020; 103 (32828515): 9355-9367
        • Dórea J.R.R.
        • Rosa G.J.M.
        • Weld K.A.
        • Armentano L.E.
        Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows.
        J. Dairy Sci. 2018; 101 (29680644): 5878-5889
        • Dunn O.J.
        • Clark V.
        Comparison of tests of the equality of dependent correlation coefficients.
        J. Am. Stat. Assoc. 1971; 66: 904-908
        • Edmonson A.J.
        • Lean I.J.
        • Weaver L.D.
        • Farver T.
        • Webster G.
        A body condition scoring chart for Holstein dairy cows.
        J. Dairy Sci. 1989; 72: 68-78
        • El Jabri M.
        • Sanchez M.P.
        • Trossat P.
        • Laithier C.
        • Wolf V.
        • Grosperrin P.
        • Beuvier E.
        • Rolet-Répécaud O.
        • Gavoye S.
        • Gaüzère Y.
        • Belysheva O.
        • Notz E.
        • Boichard D.
        • Delacroix-Buchet A.
        Comparison of Bayesian and partial least squares regression methods for mid-infrared prediction of cheese-making properties in Montbéliarde cows.
        J. Dairy Sci. 2019; 102 (31178172): 6943-6958
        • Eraslan G.
        • Avsec Ž.
        • Gagneur J.
        • Theis F.J.
        Deep learning: New computational modelling techniques for genomics.
        Nat. Rev. Genet. 2019; 20 (30971806): 389-403
        • Ferragina A.
        • Cipolat-Gotet C.
        • Cecchinato A.
        • Pazzola M.
        • Dettori M.L.
        • Vacca G.M.
        • Bittante G.
        Prediction and repeatability of milk coagulation properties and curd-firming modeling parameters of ovine milk using Fourier-transform infrared spectroscopy and Bayesian models.
        J. Dairy Sci. 2017; 100 (28318586): 3526-3538
        • Ferragina A.
        • de los Campos G.
        • Vazquez A.I.
        • Cecchinato A.
        • Bittante G.
        Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data.
        J. Dairy Sci. 2015; 98 (26387015): 8133-8151
        • Friedman J.
        • Hastie T.
        • Tibshirani R.
        Additive logistic regression: A statistical view of boosting.
        Ann. Stat. 2000; 28: 337-407
        • Friedman J.H.
        Stochastic gradient boosting.
        Comput. Stat. Data Anal. 2002; 38: 367-378
        • Gianola D.
        Priors in whole-genome regression: The Bayesian alphabet returns.
        Genetics. 2013; 194 (23636739): 573-596
        • Goldstein B.A.
        • Hubbard A.E.
        • Cutler A.
        • Barcellos L.F.
        An application of Random Forests to a genome-wide association dataset: Methodological considerations and new findings.
        BMC Genet. 2010; 11 (20546594): 49
        • Goodfellow I.
        • Bengio Y.
        • Courville A.
        • Bengio Y.
        Deep Learning (Vol. 1, No. 2).
        MIT Press, 2016
        • Grelet C.
        • Bastin C.
        • Gelé M.
        • Davière J.B.
        • Johan M.
        • Werner A.
        • Reding R.
        • Fernandez Pierna J.A.
        • Colinet F.G.
        • Dardenne P.
        • Gengler N.
        • Soyeurt H.
        • Dehareng F.
        Development of Fourier transform mid-infrared calibrations to predict acetone, β-hydroxybutyrate, and citrate contents in bovine milk through a European dairy network.
        J. Dairy Sci. 2016; 99 (27016835): 4816-4825
        • Grelet C.
        • Vanlierde A.
        • Hostens M.
        • Foldager L.
        • Salavati M.
        • Ingvartsen K.L.
        • Crowe M.
        • Sorensen M.T.
        • Froidmont E.
        • Ferris C.P.
        • Marchitelli C.
        • Becker F.
        • Larsen T.
        • Carter F.
        • Dehareng F.
        Potential of milk mid-IR spectra to predict metabolic status of cows through blood components and an innovative clustering approach.
        Animal. 2019; 13: 649-658
        • Hapfelmeier A.
        • Ulm K.
        A new variable selection approach using Random Forests.
        Comput. Stat. Data Anal. 2013; 60: 50-69
        • Hastie T.
        • Tibshirani R.
        • Friedman J.
        The Elements of Statistical Learning. Springer Series in Statistics.
        Springer, 2009
        • Hempstalk K.
        • McParland S.
        • Berry D.P.
        Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows.
        J. Dairy Sci. 2015; 98 (26074247): 5262-5273
        • Høy M.
        • Steen K.
        • Martens H.
        Review of partial least squares regression prediction error in Unscrambler.
        Chemom. Intell. Lab. Syst. 1998; 44: 123-133
        • Lopez-Cruz M.
        • Olson E.
        • Rovere G.
        • Crossa J.
        • Dreisigacker S.
        • Mondal S.
        • Singh R.
        • de los Campos G.
        Regularized selection indices for breeding value prediction using hyper-spectral image data.
        Sci. Rep. 2020; 10 (32424224)8195
        • Luke T.D.W.
        • Rochfort S.
        • Wales W.J.
        • Bonfatti V.
        • Marett L.
        • Pryce J.E.
        Metabolic profiling of early-lactation dairy cows using milk mid-infrared spectra.
        J. Dairy Sci. 2019; 102 (30594377): 1747-1760
        • Martens H.
        Reliable and relevant modelling of real world data: A personal account of the development of PLS Regression.
        Chemom. Intell. Lab. Syst. 2001; 58: 85-95
        • Maurice-Van Eijndhoven M.H.T.
        • Soyeurt H.
        • Dehareng F.
        • Calus M.P.L.
        Validation of fatty acid predictions in milk using mid-infrared spectrometry across cattle breeds.
        Animal. 2013; 7: 348-354
        • McDermott A.
        • Visentin G.
        • De Marchi M.
        • Berry D.P.
        • Fenelon M.A.
        • O'Connor P.M.
        • Kenny O.A.
        • McParland S.
        Prediction of individual milk proteins including free amino acids in bovine milk using mid-infrared spectroscopy and their correlations with milk processing characteristics.
        J. Dairy Sci. 2016; 99 (26830742): 3171-3182
        • McParland S.
        • Banos G.
        • McCarthy B.
        • Lewis E.
        • Coffey M.P.
        • O'Neill B.
        • O'Donovan M.
        • Wall E.
        • Berry D.P.
        Validation of mid-infrared spectrometry in milk for predicting body energy status in Holstein-Friesian cows.
        J. Dairy Sci. 2012; 95 (23040020): 7225-7235
        • McParland S.
        • Banos G.
        • Wall E.
        • Coffey M.P.
        • Soyeurt H.
        • Veerkamp R.F.
        • Berry D.P.
        The use of mid-infrared spectrometry to predict body energy status of Holstein cows.
        J. Dairy Sci. 2011; 94 (21700055): 3651-3661
        • McParland S.
        • Lewis E.
        • Kennedy E.
        • Moore S.
        • McCarthy B.
        • O'Donovan M.
        • Butler S.
        • Pryce J.
        • Berry D.
        Mid-infrared spectrometry of milk as a predictor of energy intake and efficiency in lactating dairy cows.
        J. Dairy Sci. 2014; 97 (24997658): 5863-5871
        • Mendez K.M.
        • Reinke S.N.
        • Broadhurst D.I.
        A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.
        Metabolomics. 2019; 15 (31728648): 150
        • Mevik B.-H.
        • Wehrens R.
        The pls package: Principal component and partial least squares regression in R.
        J. Stat. Softw. 2007; 18
        • Meyer H.
        • Reudenbach C.
        • Hengl T.
        • Katurji M.
        • Nauss T.
        Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation.
        Environ. Model. Softw. 2018; 101: 1-9
        • Morellos A.
        • Pantazi X.-E.
        • Moshou D.
        • Alexandridis T.
        • Whetton R.
        • Tziotzios G.
        • Wiebensohn J.
        • Bill R.
        • Mouazen A.M.
        Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy.
        Biosyst. Eng. 2016; 152: 104-116
        • Morota G.
        • Ventura R.V.
        • Silva F.F.
        • Koyama M.
        • Fernando S.C.
        Big Data Analytics And Precision Animal Agriculture Symposium: Machine learning and data mining advance predictive big data analysis in precision animal agriculture.
        J. Anim. Sci. 2018; 96 (29385611): 1540-1550
        • Natekin A.
        • Knoll A.
        Gradient boosting machines, a tutorial.
        Front. Neurorobot. 2013; 7 (24409142): 21
        • Neto H.A.
        • Tavares W.L.F.
        • Ribeiro D.C.S.Z.
        • Alves R.C.O.
        • Fonseca L.M.
        • Campos S.V.A.
        On the utilization of deep and ensemble learning to detect milk adulteration.
        BioData Min. 2019; 12 (31320927): 13
        • Pralle R.S.S.
        • Weigel K.W.W.
        • White H.M.M.
        Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.
        J. Dairy Sci. 2018; 101 (29477523): 4378-4387
        • Roberts D.R.
        • Bahn V.
        • Ciuti S.
        • Boyce M.S.
        • Elith J.
        • Guillera-Arroita G.
        • Hauenstein S.
        • Lahoz-Monfort J.J.
        • Schröder B.
        • Thuiller W.
        • Warton D.I.
        • Wintle B.A.
        • Hartig F.
        • Dormann C.F.
        Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure.
        Ecography. 2017; 40: 913-929
        • Ross R.
        • Kelleher J.
        A Comparative Study of the Effect of Sensor Noise on Activity Recognition Models.
        Springer Verlag, 2013
        • Rutten M.J.
        • Bovenhuis H.
        • Heck J.M.L.M.L.
        • van Arendonk J.A.M.A.M.
        Predicting bovine milk protein composition based on Fourier transform infrared spectra.
        J. Dairy Sci. 2011; 94 (22032392): 5683-5690
        • Shah N.K.
        • Gemperline P.J.
        A program for calculating Mahalanobis distances using principal component analysis.
        Trends Analyt. Chem. 1989; 8: 357-361
        • Soyeurt H.
        • Dehareng F.
        • Gengler N.
        • McParland S.
        • Wall E.
        • Berry D.P.
        • Coffey M.
        • Dardenne P.
        Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries.
        J. Dairy Sci. 2011; 94 (21426953): 1657-1667
        • Soyeurt H.
        • Grelet C.
        • McParland S.
        • Calmels M.
        • Coffey M.
        • Tedde A.
        • Delhez P.
        • Dehareng F.
        • Gengler N.
        A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra.
        J. Dairy Sci. 2020; 103 (33222859): 11585-11596
        • Stocco G.
        • Cipolat-Gotet C.
        • Bobbo T.
        • Cecchinato A.
        • Bittante G.
        Breed of cow and herd productivity affect milk composition and modeling of coagulation, curd firming, and syneresis.
        J. Dairy Sci. 2017; 100 (27837976): 129-145
        • Toledo-Alvarado H.
        • Vazquez A.I.
        • de los Campos G.
        • Tempelman R.J.
        • Bittante G.
        • Cecchinato A.
        Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows.
        J. Dairy Sci. 2018; 101 (29290427): 2496-2505
        • Tomaschek F.
        • Hendrix P.
        • Baayen R.H.
        Strategies for addressing collinearity in multivariate linguistic data.
        J. Phonetics. 2018; 71: 249-267
        • Vásquez N.
        • Magán C.
        • Oblitas J.
        • Chuquizuta T.
        • Avila-George H.
        • Castro W.
        Comparison between artificial neural network and partial least squares regression models for hardness modeling during the ripening process of Swiss-type cheese using spectral profiles.
        J. Food Eng. 2018; 219: 8-15
        • Wallén S.E.
        • Prestløkken E.
        • Meuwissen T.H.E.
        • McParland S.
        • Berry D.P.
        Milk mid-infrared spectral data as a tool to predict feed intake in lactating Norwegian Red dairy cows.
        J. Dairy Sci. 2018; 101 (29605317): 6232-6243
        • Wang Q.
        • Bovenhuis H.
        Validation strategy can result in an overoptimistic view of the ability of milk infrared spectra to predict methane emission of dairy cattle.
        J. Dairy Sci. 2019; 102 (31056328): 6288-6295
        • Xu W.
        • van Knegsel A.T.M.
        • Vervoort J.J.M.
        • Bruckmaier R.M.
        • van Hoeij R.J.
        • Kemp B.
        • Saccenti E.
        Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms.
        J. Dairy Sci. 2019; 102 (31477295): 10186-10201
        • Zou H.
        • Hastie T.
        Regularization and variable selection via the elastic net.
        J. R. Stat. Soc. Ser. B. Stat. Methodol. 2005; 67: 301-320