Predicting dry matter intake in mid-lactation Holstein cows using point-in-time data streams available on dairy farms

Quantifying dry matter intake (DMI) in lactating dairy cows is important for determining feed efficiency; however, there are no methods for economically quantifying individual cow DMI on dairy farms where cows are group-fed. Attempts have been made to model DMI using cow factors, milk production, milk infrared spectra, and behavioral sensors with reasonable success. Other data streams are available on the farm that may contribute to DMI predictions. In this study, our objective was to model DMI with multiple linear regression using data from a single point-in-time that can easily be accessed on-farm. Candidate predictor variables included cow descriptors, milk yield and composition, milk fatty acid profile, and production and efficiency predicting transmitting abilities (PTA). Observations of DMI were obtained from 350 cows across 6 cohorts using individual feed bunks. The cow to bunk ratio was 2:1, with an overall bunk occupation rate of 32% throughout the day. The following models were developed sequentially with milk data obtained from a single morning milking and other data from the same day: model B (production, metabolic body weight, body condition score, lactation category, and week of lactation), model BC [model B + fatty acid (FA) content], model BY (model B + FA yield), model BPE (model B + production and efficiency PTA), model BYP (model BY + production PTA), model BYE (model BY + efficiency PTA), and model BYPE (model BY + production and efficiency PTA). Outcome variables predicted in these models were the DMI on the previous day or current day relative to the morning milk sample. The predictions for DMI on the previous day outperformed current day DMI in every model for which they were both determined. Addition of milk FA and PTA as candidate predictor variable types to the models resulted in enhanced predictive ability, with incremental enhancements when combined. The most robust model (BYPE) included cow descriptors, protein and FA yields, and PTA for milk and residual feed intake. Model BYPE described 21 to 32% more of the variation in DMI (based on concordance correlation coefficient) than when other common DMI models were applied to the same data set. Overall, reasonable performance of models including single point-in-time cow descriptors, milk and FA production, and production and efficiency PTA commonly available to dairy farmers through dairy herd improvement programs offer an opportunity for on-farm prediction of DMI, yet further improvement may be possible.


ABSTRACT
Quantifying dry matter intake (DMI) in lactating dairy cows is important for determining feed efficiency; however, there are no methods for economically quantifying individual cow DMI on dairy farms where cows are group-fed. Attempts have been made to model DMI using cow factors, milk production, milk infrared spectra, and behavioral sensors with reasonable success. Other data streams are available on the farm that may contribute to DMI predictions. In this study, our objective was to model DMI with multiple linear regression using data from a single point-in-time that can easily be accessed on-farm. Candidate predictor variables included cow descriptors, milk yield and composition, milk fatty acid profile, and production and efficiency predicting transmitting abilities (PTA). Observations of DMI were obtained from 350 cows across 6 cohorts using individual feed bunks. The cow to bunk ratio was 2:1, with an overall bunk occupation rate of 32% throughout the day. The following models were developed sequentially with milk data obtained from a single morning milking and other data from the same day: model B (production, metabolic body weight, body condition score, lactation category, and week of lactation), model BC [model B + fatty acid (FA) content], model BY (model B + FA yield), model BPE (model B + production and efficiency PTA), model BYP (model BY + production PTA), model BYE (model BY + efficiency PTA), and model BYPE (model BY + production and efficiency PTA). Outcome variables predicted in these models were the DMI on the previous day or current day relative to the morning milk sample. The predictions for DMI on the previous day outperformed current day DMI in every model for which they were both determined. Addition of milk FA and PTA as candidate predictor variable types to the models resulted in enhanced predictive ability, with incremental enhancements when combined. The most robust model (BYPE) included cow descriptors, protein and FA yields, and PTA for milk and residual feed intake. Model BYPE described 21 to 32% more of the variation in DMI (based on concordance correlation coefficient) than when other common DMI models were applied to the same data set. Overall, reasonable performance of models including single point-in-time cow descriptors, milk and FA production, and production and efficiency PTA commonly available to dairy farm-

INTRODUCTION
Feed is the largest input expense on dairy farms (USDA-ERS, 2021), and seeking methods to reduce feed cost through greater feed efficiency are warranted. There are many ways to calculate or assess feed efficiency for the purpose of farm-level evaluation (VandeHaar et al., 2016); however, identifying feed efficiency at the individual cow-level is often limited to research farms, where accurate recording of daily feed intake is conducted via electronic or manual methods, often requiring individual feeding stations. These laborintensive and costly feeding systems are not practical on farms employing group feeding strategies, which represents most cows in the United States. Therefore, a need exists to develop accurate methods of cow-level DMI prediction using readily available on-farm data streams.
Recent modeling work has demonstrated marginal to good ability to predict individual cow DMI from data readily available on dairy farms (Shetty et al., 2017;Dórea et al., 2018;Martin et al., 2021a). These models have been generated primarily using cow descriptive variables, milk production, milk mid-infrared spectra, and behavioral sensors. A recent publication added prediction of limited milk fatty acids (FA) from milk mid-infrared spectra as candidate predictor variables (Tedde et al., 2021). In many cases, data used to generate DMI prediction models were developed from data averaged over a study period (Shetty et al., 2017;Dórea et al., 2018;Martin et al., 2021a). Although reducing the number of available samples may  or may not (McParland et al., 2014) reduce prediction accuracy, collecting vast quantities of additional data are likely not feasible on dairy farms due to expense and labor. Farms that are enrolled in DHI programs have access to routine monthly milk sample analysis, which may include detailed reports of milk FA profile. Additional performance indicators, such as previous lactation performance and PTA, provide potential data for model development. Specifically, the recently released Feed Saved PTA (CDCB, 2020) provides the first estimate of the genetic predisposition for feed intake and feed efficiency of US Holstein cattle. Milk FA have been explored minimally (Tedde et al., 2021), whereas previous lactation performance and cow PTA have not yet been explored as potential predictor variables to enhance DMI predictions.
Efforts to model DMI have relied on multiple linear regression (Wallén et al., 2018;Martin et al., 2021a) and partial least squares regression (McParland et al., 2014;Shetty et al., 2017;Martin et al., 2021a). More recently, the machine-learning techniques of artificial neural networks Martin et al., 2021a;Tedde et al., 2021) and stacked ensemble (Martin et al., 2021a) have been tested. Artificial neural networks have an advantage in their ability to determine more intricate relationships between predictor and response variables, which could prove useful in situations where DMI cannot be modeled as a linear function. Nonetheless, similar results have been achieved in modeling DMI between artificial neural networks and linear regression approaches Martin et al., 2021a), and linear regression models are easier to implement computationally in real-world settings.
In this paper, we explore the possibility of multiple linear regression prediction of DMI using single pointin-time data collection from 1 milking (milk and component yields, FA profile), cow descriptors (lactation category, week of lactation, BW, and so on), previous lactation performance, and PTA for production and efficiency as candidate predictor variables. This research aimed to develop a suite of models for applied use on dairy farms to predict DMI, where specific models can be selected depending on the data streams available on each farm. Using data averaged over a period of time, other DMI prediction models have explained 70 to 80% of the variation in DMI (R 2 ), including up to 78% for cross-validation data (Shetty et al., 2017;Dórea et al., 2018;Martin et al., 2021a). Our objective was to develop a DMI prediction model using data from only a single time point, but including novel predictors (i.e., milk FA and PTA) to achieve model performance as previously realized.

MATERIALS AND METHODS
Animal handling and sampling procedures were approved by the Animal Care and Use Committees of the University of Wisconsin-Madison College of Agricultural and Life Sciences and Iowa State University for the respective data sets generated at each institution.
Mid-lactation Holstein cows [n = 315; 111 ± 23 DIM (± SD) at sampling; 2.4 ± 1.3 lactations] in 6 cohorts spanning from July 2018 to December 2019 were housed in sand-bedded freestalls at the University of Wisconsin-Madison Emmons Blaine Dairy Cattle Research Center (Arlington, WI) as part of a national feed efficiency project. In total, there were 350 individual cow observations, given that some cows were enrolled in multiple cohorts. No treatments of any kind were applied to the animals. Cows were provided ad libitum access to a common diet within each cohort (Table 1), formulated to meet or exceed requirements, and milked twice daily, as described previously (Martin et al., 2021a,b). Briefly, the diet was mixed once daily, and feed was administered over 2 separate feedings into roughage intake control electronic feeding bins (Hokofarm Group), with target refusals of 10%. Cows had unlimited access to all feeding bins, with an approximate ratio of cows to bins of 2:1. Average feed bunk occupancy was 32.1% across all cohorts on the day of DMI observation. A bunk occupancy rate ≥90% was only recorded for 33 min, corresponding to the time of feeding. Daily feed intake was monitored electronically throughout the cohort data collection period. Milk weight was recorded electronically for each cow at each milking, and milk samples were obtained weekly during 4 consecutive milkings. Samples were sent to a commercial laboratory (AgSource, Menominee, WI) for Fourier-transformed infrared spectrometric analysis of milk fat, true protein, lactose, and MUN using the FOSS MilkoScan FT6000 (FOSS Analytical). Somatic cell count was determined using a Fossomatic FC (FOSS Analytical). Body weight was measured on 3 consecutive days in the first, fourth, and seventh weeks of the data collection period, and BCS was recorded by 3 trained individuals on the same weeks, using a scale from 1 to 5, where 1 = thin and 5 = fat, with increments of 0.25 (Wildman et al., 1982). Previous lactation 305 mature-equivalent yield Brown et al.: POINT-IN-TIME DATA STREAMS TO PREDICT DMI of milk, fat, and protein, in addition to PTA for milk, fat, protein, and fluid merit, were obtained in May 2021 (AgSource). The genomic PTA for Feed Saved, residual feed intake (RFI), and BW composite were also obtained in May 2021 from the Council on Dairy Cattle Breeding, with data corresponding to the April 2021 official evaluations (CDCB, 2021).

Data Selection, Calculations, and Cleaning
For the purpose of this study, to model DMI on a single day using readily available data streams available on dairy farms, milk yield and composition data were used from 1 morning milking at the mid-point of the data collection period (approximately 23 d after initiation of data collection) to mimic monthly DHI testing procedures. Body weight and BCS data were also available for the same sample day. The dependent variable was DMI, which was modeled for the calendar day in which the milk sample was obtained (current day DMI; CDI), in addition to the day prior (previous day DMI; PDI). The candidate predictor variables (independent variables) and the models in which they were tested are outlined in Table 2, and they represent variables that can readily be obtained on-farm through routine management practices and DHI data streams.  (ICAR, 2017); therefore, the content of protein and lactose in the morning milking were used to estimate daily protein and lactose yields.
One cow was excluded from the data set due to treatment for respiratory disease and concomitant low levels of DMI and milk production. All cows in the final data set had complete records for each candidate predictor variable.

Model Fitting
Multiple linear regression was performed using a stepwise approach in JMP Pro, version 15 (SAS Institute Inc.). Both forward and backward stepwise approaches were fitted automatically for the dependent variable , and the model with the lowest Akaike information criterion was chosen for further evaluation. Next, the variance inflation factor was determined for each independent variable remaining in the model to assess collinearity, as previously described (Chandler et al., 2018;Pralle et al., 2018;Allen et al., 2019). Briefly, variables with variance inflation factor ≥5 were removed from the model one at a time, and the model was refitted until all variance inflation factors were <5. Removal of predictor variables can alter the significance of other predictor variables; therefore, resulting nonsignificant variables were removed, and freedom of multicollinearity in the final model was confirmed a final time. Models in which co-derived variables were tested within the same model (i.e., protein content and yield) were additionally screened before model fitting to ensure absence of collinearity. The correlation was −0.04 for protein content and yield, and 0.48 for fat content and yield (model B only).
Several groups of models were tested and are described further in Table 3. Models were used to predict PDI and CDI relative to the morning milking in which the milk sample was obtained. The prediction of CDI was eventually terminated for models with more candidate predictor variables due to considerably poorer performance compared with PDI in the initial models tested. Model B (basic) evaluated the prediction of PDI and CDI relative to the morning milking in which the milk sample was obtained using milk macro-component content and yield, lactation category (primiparous vs. multiparous), week of lactation, and MBW and BCS as candidate predictor variables. Model BC (basic + milk FA content; BC) evaluated the addition of content of milk FA (g/100 g of milk) to model B, as candidate predictor variables to predict both PDI and CDI relative to the morning milking, from which the sample was obtained. Model BY (basic + milk FA yield) evaluated the addition of milk FA yield to model B as candidate predictor variables to predict both PDI and CDI. Model BPE (basic + production and efficiency PTA) added to model B for predicting only PDI. Model BYP (basic + milk FA yield + production PTA) built upon BY for predicting PDI with the addition of PTA for milk, protein, and fat. Model BYE (basic + milk FA yield + efficiency PTA) added to BY for predicting PDI, with the addition of PTA for RFI and BW composite as candidate predictor variables to the model. Model BYPE (basic + milk FA yield + production PTA +   Table 4. The random effect of cohort was tested in each model, and subsequently removed when P > 0.10. The final models were evaluated using 5-fold crossvalidation R 2 , the concordance correlation coefficient (CCC; Lin, 1989), root mean squared error of prediction (RMSEP), and slope and mean biases as a percent of the mean square error of the prediction (Bibby and Toutenburg, 1977) using the Model Evaluation System (Mathematical Nutrition Models). To assess relative performance between models, the R 2 , CCC, and RM-SEP for each of the 5-fold cross-validation testing data sets were calculated (R Core Team, version 4.0.3). The resulting parameters were compared between models using a paired t-test (Microsoft Excel) with fold as the blocking factor. The overall best model was further evaluated by leaveone-cohort-out cross validation. In this case, 1 cohort of the 6 was left out of the stepwise model development process using the same candidate predictor variables in the BYPE. The predicted values for the remaining cohort were then compared with the observed DMI, and model precision and accuracy were evaluated. The process was repeated for each cohort in the data set. An external data set from Iowa State University (Ames, IA) was used for validation of model BYPE (Validation Set; Siberski-Cooper et al., 2022). Cows (n = 52; 32 primiparous and 20 multiparous) were fed individually using a Calan Broadbent Feeding System (1 cow per bunk; American Calan). Diet chemical composition and ingredients are presented in Table 1. Milk weights were recorded electronically (BouMatic, LLC), and milk samples analyzed in the same manner and facility as those in the training data set (AgSource). The PTA data were also obtained (CDCB, 2021) using the April 2021 genetic evaluation. Descriptive statistics for the external data set are in Table 5. Finally, the DMI models from the NRC (2001) and de Souza et al. (2019) were compared with PDI to compare model performance with BYPE from the current study.
A secondary validation data set was obtained internally from 3 cohorts in which data were collected in 2021 at the University of Wisconsin-Madison dairy herd. The purpose of this data set was to assess whether RFI PTA is artificially inflating predictive ability in our training data set, given that the PTA is generated using the RFI phenotypes from the training data set (among thousands of other observations). Data for this validation set was only used if a cow had not previously been enrolled in a University of Wisconsin-Madison feed efficiency data collection period, and RFI phenotypes were not yet added to the national database. Therefore, a subset of cows for each of the 3 cohorts was used. The RFI PTA was obtained from the April 2021 genetic evaluation. Cows were handled and samples collected in a similar manner as described above.

RESULTS AND DISCUSSION
In this study, we sought to develop feed intake prediction models leveraging existing and easily accessible data streams. We included candidate predictor variables such as cow descriptive factors and milk components that have traditionally been used for DMI model development (Holter et al., 1997;Martin et al., 2021a;NRC, 2001), but also incorporated novel predictors like milk FA composition and yield, and PTA for production and efficiency (Tables 2, 3, and 4). Also Brown et al.: POINT-IN-TIME DATA STREAMS TO PREDICT DMI unique to our approach is that we developed these models use data from only a single point-in-time, in a manner that could be compatible with DHI sampling methods. The incorporation of various types of data as candidate predictor variables resulted in variable accuracy of predictions (Table 6). In general, adding milk FA and PTA enhanced model accuracy compared with basic cow factors traditionally used in DMI prediction equations (Tables 6 and 7). The final variables included in each model are reported in Tables 8 and 9. Traditional cow descriptive predictive variables, such as lactation category, week of lactation, MBW, milk yield, and macro-components, were all retained in at least 1 model. In fact, lactation category, BCS, milk protein, and SCC were all retained as predictor variables in at least 4 of the models tested, and MBW was retained in every model. In large part, the inclusion of these variables in the final models is to be expected because of their general relationship with energy demands.
Adding milk FA and PTA for production and efficiency as novel candidate predictor variables resulted in several variables consistently being included in final prediction models. With the addition of milk FA yield and concentration added as candidate predictor variables, final models consistently included de novo,  preformed, and C18 and C18:1 FA (Tables 8 and 9). Furthermore, when PTA were included as candidate predictor variables, PTA for milk and RFI were consistently included in the final models (Tables 8 and 9).
To assist in comparing the performance of models, we conducted a paired t-test on the 5-fold R 2 , CCC, and RMSEP validation data set within each model, and the results are reported in Table 7. Overall, model BYPE had the most robust model performance based on R 2 , CCC, and RMSEP (Table 6). Model parameters for BYPE were also significantly different compared with the other models based on comparison of the 5-fold cross validation results (P ≤ 0.07; Table 7). Overarching themes highlighting predictor variables that were retained across multiple models and in model BYPE will be discussed in the appropriate sections in this article.

Current Versus PDI Predictions
The initial models BC and BY demonstrate an advantage for predicting PDI versus CDI when milk samples are obtained from the morning milking (Tables 6 and  7). In models BC and BY, the prediction of PDI compared with CDI significantly increased CCC (P ≤ 0.03; Tables 6 and 7) and tended to increase R 2 (P ≤ 0.07). Using milk samples to predict production outcomes likely suffers from a certain lag time (McParland and    (Lin, 1989). 4 RMSEP = root mean squared error of the prediction. 5 MSEP = mean squared error of the prediction. 6 Prediction of DMI on the day before which the morning milk sample was obtained. 7 Prediction of DMI on the day in which the morning milk sample was obtained.
Berry, 2016) as DMI drives feed available for digestion, and the resulting metabolites that will become available to the mammary gland for milk synthesis. Considering that the milk samples were obtained at 0400 h, the milk yield would be less related to alterations in DMI that would occur over the next 20 h, for which the DMI was measured. In grazing dairy cows, for which it is notoriously difficult to determine DMI, prediction of averaged weekly effective energy intake was similar when using predictor variables from either a single morning or evening milking (McParland et al., 2014), which may be an artifact of the method in which they quantified DMI. Furthermore, if evening milk samples had been used in our data set, they may have increased the accuracy of prediction, as has been noted in other studies with multiple milk samples (McParland et al., 2014), although this additional sampling adds practical limitations to implementation on most dairy farms. This was not done in the current research because our priority was to determine whether milk samples from a single milking could be used to predict DMI, as it is the most common method of DHI sampling and adding extra samples would add labor and expense for dairy farmers. Future work could evaluate covariate factors to be implemented in DMI prediction models to account for milk yield effects from diurnal variation and management factors (Lee and Wardrop, 1984;Rottman et al., 2014;Salfer and Harvatine, 2020), as has been  done for accurately determining daily production records from a single milk sample (Duplessis et al., 2019). Overall, because of the improvement in models predicting PDI versus CDI in models B, BC, and BY, the remaining models evaluated only PDI, as in previous studies .

Milk FA in Model Predictions
A main goal of this paper was to determine whether factors beyond basic animal descriptors, yield of milk, and macro-components could enhance the prediction of feed intake in lactating dairy cows. Because we were seeking to use candidate variables that are readily available to dairy farmers, we chose to add milk FA yield and concentrations that are available from DHI organizations through routine on-farm milk testing programs. For the prediction of PDI, including the concentration of FA as candidate predictor variables in model BC (Tables 2 and 3 Tables 6 and 7), but fewer instances of multicollinearity in the model development process for BY compared with BC led us to continue our efforts using only FA yield in the remaining models. It should be noted that FA yields were reported as a percentage of total milk yield, which may explain why we did not find greater differences between BC and BY. Adding milk FA as candidate variables in models BC and BY may have helped improve accuracy and precision of DMI prediction compared with model B because milk FA can serve as an indicator of the energy status of the cow. For example, preformed FA content or yield was included as a variable in nearly every model for which it was added as a candidate predictor variable (Tables 7 and 8), with a negative parameter estimate indicating a reduction in DMI as preformed FA in milk increases. The preformed FA C18:1 also had a negative parameter estimate (Table 8). Conversely, de novo FA had a positive parameter estimate when included in the final model (BC, BY; Table 8). Milk FA less than C16 are synthesized in the mammary gland (de novo), those greater than C16 are absorbed from the diet or mobilized from body fat stores (preformed), and C16 FA are a mixture of de novo and preformed (Lanier and Corl, 2015). Taken together, milk FA types can provide insight into cow biology. The ratio of certain FA in milk has been shown to be a reasonable predictor of energy status in cows under the assumption that longer chain FA, such as C18:1, are mobilized from body fat reserves during negative energy balance (Dórea et al., 2017), and cows predicted to have hyperketonemia at the first milk test have greater preformed milk FA content (Pralle et al., 2021). Milk FA content changes over the lactation, with preformed and de novo FA displaying inverse prevalence throughout lactation as energy balance changes (Craninx et al., 2008). The changes noted in the ratio of preformed to de novo FA in milk are most pronounced in early lactation when the cow is adapting to negative energy balance, and the data used for model development herein were from mid-lactation cows that were not experiencing extreme negative energy balance.
The parameter estimate for C18:0 was positive in model BYPE (Table 8), despite it being a preformed FA. Due to rumen biohydrogenation of unsaturated long-chain FA, C18:0 is the primary FA absorbed (Loften et al., 2014); therefore, as DMI increases, more C18:0 could be absorbed and made available for secretion by the mammary gland. In general, dietary ingredients have a profound effect on milk FA. For example, greater dietary starch levels in the diet alter milk FA through depression of milk fat yields (Jenkins and McGuire, 2006), and increasing the FA content of the diet increases milk FA content and yield (Boerman and Lock, 2014). Dietary FA profile also directly alters milk FA profile (Kelly et al., 1998). The FA profile of the dietary ingredients in this paper are not available; however, the lack of major dietary changes between cohorts and the similarity of feedstuffs used in each  diet would not lend to great variability in FA profile across cohorts. Future work including data from cows fed a variety of starch levels and fat sources could prove useful in DMI prediction models using milk FA as predictor variables. The use of milk FA in DMI predictions is a relatively new approach. Recently, Tedde et al. (2021) used concentration of predicted FA (g/100 g of fat) from milk mid-infrared spectra to predict DMI using partial least squares regression. They retained 6 FA as candidate variables [C8, C10, C12, C14, C18, and C18:1(9)], whereas we included 4 FA as candidate variables (C14, C16, C18, C18:1; Table 2), in addition to FA chain length and saturation. Adding FA candidate predictor variables in our study markedly improved DMI prediction accuracy and precision compared with the model including the basic animal factors (BY and BC vs. B; Table 6). Conversely, no improvement in outcomes was achieved when FA were added in Tedde et al. (2021), but this may have been because milk mid-infrared spectra were included in most of their basic models (Tedde et al., 2021). Using milk mid-infrared spectroscopy, several wavelengths associated with milk FA also contributed to DMI models generated by another laboratory group (Shetty et al., 2017).
A potential limitation of including FA data in future DMI prediction models is that there are likely differences between reported FA analysis due to a lack of industry standardization across laboratories and across equipment types (Pralle and White, 2020). Nonetheless, the fact that improved predictive ability was achieved by adding milk FA as candidate predictor variables is intriguing and merits further consideration, especially for early lactation cows.

PTA in Model Predictions
Additional insight could be garnered from genetic metrics such as PTA. Although models using PTA alone to predict DMI were poor (CCC = 0.08; data not shown), they could contribute novel information when added to the B and BY models. Compared with model B, BPE tended to increase CCC (P = 0.07; Tables 6  and 7). Compared with BY, the addition of PTA for milk, fat, and protein as candidate predictor variables in BYP tended to increase CCC (0.80 vs. 0.78; P = 0.10; Tables 6 and 7) and reduce RMSEP (2.90 vs. 3.00 kg/d; P = 0.10). We found similar improvements in model predictive ability by adding PTA for efficiency as candidate predictor variables (RFI and BW composite) in BYE (Tables 6 and 7). When all PTA were added as candidate predictor variables in model BYPE, the most robust model was produced (R 2 = 0.67; CCC = 0.81; RMSEP = 2.83 kg/d; mean bias = 0.00; slope bias = 0.00; Table 6; Figure 1). Based on the 5-fold cross-validation comparison of model parameters, model BYPE had a greater R 2 , CCC, and a lower RMSEP compared with all other models (P ≤ 0.07; Table 7).
In models BYPE, BYE, and BPE, PTA RFI was included in the final model with a positive relationship between PTA RFI and DMI (Table 8). Greater phenotypic and genetic RFI values represent greater DMI than predicted within a cohort, after accounting for major energy sinks (VandeHaar et al., 2016); thus, the relationship between PTA RFI and DMI in our models is expected. A secondary validation step was implemented to evaluate whether cows in the training data set, having an RFI phenotype that contributed to the calculation of the RFI PTA in the national database, inflated model accuracy. Using cows from the University of Wisconsin-Madison, for which the RFI phenotype had not yet been incorporated in the national genetic evaluation, DMI was predicted using model BYP and BYPE. The CCC was 0.07 greater for BYP than BYPE (data not shown), indicating no inflation of predictive ability by developing the model with data from cows that had already contributed to the RFI PTA. Furthermore, when comparing the original model performance built on the training data set, the addition of RFI PTA to models BPE and BYPE changed CCC negligibly compared with model BYP  Table 6). When considered together, using RFI PTA does not appear to artificially enhance model performance when determined from cows with an RFI phenotype in the national database. The inclusion of PTA RFI in the final models confirms the applicability of past research efforts to accurately identify genetic factors associated with feed intake in dairy cows. As additional RFI phenotypes continue to be recorded in research herds for use in national genetic evaluations, this will hopefully continue to enhance DMI prediction models.

Other Factors in Model Predictions
The fact that MBW was included as a positive predictor variable in all models in this paper (Tables 8  and 9) underscores the necessity to have capability to obtain BW measurements routinely on dairy farms. We recognize that many dairy farms do not have a method for weighing animals currently, and determining BCS requires a labor investment, so supplemental models without BW and BCS were developed (Supplemental Table S1; https: / / data .mendeley .com/ datasets/ 6f62gtxjt3/ 2; Brown, 2022). However, walk-over scales have been implemented successfully on dairy farms for routine monitoring of animal performance (van Straten et al., 2008;Alawneh et al., 2011), and walk-over scales are an added benefit of automated milking systems (Thorup et al., 2012;Pszczola et al., 2018). There is also interest in determining BW using artificial intelligence, such as a 3-D vision system (Kuzuhara et al., 2015;Song et al., 2018;Ferreira et al., 2021), which would add a new option to fit farm management style and facility design. Ultimately, accurate prediction of DMI on farms will likely necessitate BW measurement in some capacity, and reasonable resources are available to implement those measurements on dairy farms.
Protein yield was a significant positive predictor variable for DMI in almost every model in this study (Tables 8 and 9). Milk protein yield was a significant predictor variable in previous DMI models (Holter et al., 1997;Roseler et al., 1997), and had a higher correlation with DMI than milk or fat yield (Roseler et al., 1997). We conducted a forward stepwise procedure with BYPE candidate predictor variables to determine the relative importance of all possible variables. Protein yield was the primary variable explaining variation in DMI, accounting for 48% of the variation in DMI alone (Supplemental Table S2; https: / / data .mendeley .com/ datasets/ 6f62gtxjt3/ 2; Brown, 2022), and milk protein yield has been shown to be the predictor variable with the greatest influence on DMI in other studies (Holter et al., 1997). The biological relationship between milk protein yield and DMI is complex. Metabolizable pro-tein supply is the main driver of milk protein yield (NRC, 2001), and is altered by forage quality, ruminal starch availability, and ruminal and intestinal protein digestibility. Cows with greater DMI likely have a greater supply of MP to support milk protein synthesis. Considering the consistent nature of the diets across cohorts used in the training data set, future models including data from diets that would alter the MP supply (and in turn the milk protein yield) would be useful.
A final effort to add additional options for farmers to choose from while developing these models was to incorporate previous lactation information for multiparous cows. The previous lactation mature-equivalent milk, protein, and fat yields, coupled with previous lactation length and days dry, were added to models BY, BY, and BYPE. Adding previous lactation candidate variables was only successful in BY, where 305-d mature-equivalent milk was retained, and resulted in poorer predictive ability than the original model B (R 2 = 0.54; CCC = 0.71; RMSE = 3.06; mean bias = 0.03; slope bias = 0.07; data not shown). In general, this suggests previous lactation information adds little benefit to current DMI prediction models.
Cow dynamics within the pen and at the bunk may influence DMI or feeding behavior. Within this feeding system, the calculated ratio of cows to bunks in the training data set was 2:1; however, the average occupancy of bunks was 32.1%, and 90% of bunks were only occupied concurrently for 33 min/d. Increasing feeding bunk stocking density from 0.75 to 3 cows/bunk quadratically increases standing time at peak feeding activity and for the entire day, and also increases feed bunk displacements (Huzzey et al., 2006). In cows managed with greater feed bunk density, the combination of social interactions, feed sorting, and more intense feeding bouts can theoretically alter ruminal pH and milk components (DeVries, 2019). Studies evaluating milk production and performance solely based on feeding bunk stocking density are limited because of confounding effects of freestall stocking density inherent in dairy facility design; however, evidence from an electronic feed bunk system suggests that increasing feed bunk competition (from 1 to 3 cows/bunk), absent changes in freestall stocking density, may alter milk composition (Crossley et al., 2017). Increasing stocking density does not appear to alter total DMI as cows alter their behavior to eat feed at different times throughout the day (Crossley et al., 2017). It is unclear how stocking density in our training data set may have altered predictor variables in our final model, but any variance introduced would be accounted for in the cross-validation performance (discussed below), given that the naive data set represents a feeding system with dedicated individual animal feeding bunks. Ongo-ing research examining the effect of feeding behavior, competition, and social dynamics on feed efficiency will provide valuable insight and may identify behavioral traits or patterns that can contribute to future DMI prediction models (Baier et al., 2021).

Model Validation
An important step in ensuring that prediction models will be robust when applied to naive data sets is cross validation. Leave-one-cohort-out cross-validation resulted in R 2 = 0.60, CCC = 0.63, and RMSEP = 3.86 for model BYPE. A reduction of model performance in leave-one-out cross validation by randomly eliminating full cow records or entire trials from the training data set was observed by others (Shetty et al., 2017;Dórea et al., 2018). Model BYPE was also applied to an external data set obtained from a separate institution (validation set; Iowa State University; Siberski-Cooper et al., 2022). The advantage of this external data set was that they used the same data collection procedures for variables used in our data set, including the same source for milk analysis to eliminate issues with lack of standardization of milk FA characterization. The results of model BYPE applied to the external data set resulted in R 2 of 0.66, CCC of 0.74, and RMSEP of 3.46 kg/d (Figure 2). Applying model BYPE to the external data set had a substantial mean and slope bias (35.96 and 8.46, respectively), where DMI was underpredicted at lesser DMI and overpredicted at greater DMI.
The NRC (2001) and de Souza et al. (2019) DMI models were also applied to the training data set (Figures 3 and 4) and the external validation data set (Supplemental Figure S1; https: / / data .mendeley .com/ datasets/ 6f62gtxjt3/ 2; Brown, 2022). Compared with NRC (2001) for the training data set, BYPE had a greater R 2 (21%) and CCC (21%), and a smaller RM-SEP by 1.02 kg (Figure 3). The mean and slope bias for NRC (2001) were 9.42 and 0.24% greater, respectively, than for BYPE using the training data set. Furthermore, de Souza et al. (2019) had a 18% lesser R 2 , 32% lesser CCC, and a RMSEP 1.8 kg/d greater than BYPE in the training data set (Figure 4). The slope and mean bias for de Souza et al.  2019) DMI models were larger and likely from a more diverse population than the data set herein, and we are unsure of how BYPE would perform with a more heterogeneous data set. To account for variation between cohorts in our work, the random effect of cohort was tested in all models; however, the random effect of cohort was not significant in any model (P > 0.13) and was subsequently removed. Another factor that may have contributed to the model performance was the use of adjustment factors to determine daily milk and component yields. Although these adjustment factors represent an estimate of daily production and may not be a perfect quantification of the cow milk yield, it is the most pragmatic approach to predicting DMI with other models when only 1 milk sample is available. Nonetheless, the comparison of BYPE to existing models underscores the benefit of inclusion of milk FA and PTA variables in productive, Midwest free stall herds, and gives confidence in the ability of model BYPE to accurately predict DMI when these additional variables are available.

CONCLUSIONS
Models predicting DMI using single point-in-time data to mimic a monthly DHI testing regimen resulted in moderately robust results in freestall-housed dairy cows with ad libitum access to feed via electronic feeding gates (2:1 bunk density; overall bunk occupancy rate of 32%). Model validation on an external data set revealed similarly robust results for freestall-housed Comparison of model BYPE [model BY (production, metabolic BW, BCS, lactation category, and week of lactation + fatty acid yield) + production and efficiency PTA] predicted DMI versus observed previous day DMI for primiparous (red) and multiparous (blue) Holstein cows using an external data set consisting of 52 DMI observations. The regression line (dashed) and line of equivalence (solid) are included for reference. R 2 = 0.66; concordance correlation coefficient = 0.74; root mean squared error of prediction = 3.46 kg/d; mean bias = 35.96%; slope bias = 8.46%. cows with dedicated individual feeding gates (1:1 bunk density). Prediction of PDI was improved over CDI when the milk sample was obtained in the early morning. Adding milk FA data enhanced the predictive ability of the models, perhaps due to patterns in preformed FA changes concomitant to alterations in energy balance. Novel inclusion of production and efficiency PTA were critical to the most robust model (BYPE) and indicates that PTA RFI has an important role in predicting DMI. Model BYPE showed promise when compared with DMI predictions from the NRC (2001) and de Souza et al. (2019). Overall, model BYPE was close to achieving our objective of explaining at least 70% of the variation in DMI in the training and external validation data sets (R 2 = 0.67 and 0.66, respectively). Future research should focus on developing models in early and late lactation cows using similar candidate predictor variables, and on identifying other accessible variables that can further improve DMI predictions.   2019) predicted daily DMI versus observed previous day DMI for primiparous (red) and multiparous (blue) Holstein cows. The regression line (dashed) and line of equivalence (solid) are included for reference. R 2 = 0.50; concordance correlation coefficient = 0.49; root mean squared error of prediction = 4.63 kg/d; mean bias = 38.65%; slope bias = 1.69%.