Na-Comparison of feed evaluation models on predictions of milk protein yield on Québec commercial dairy farms

Feed evaluation models (FEM) are a core part in dairy cow feeding. As these models are developed us-ing different biological and mathematical approaches mainly tested in a research context, their abilities to predict production in commercial farms need to be validated, even more so when they are used outside the context of their development. Four FEM—National Research Council, 2001 (NRC_2001); Cornell Net Carbohydrate and Protein System, 2015 (CNCPS); NorFor, 2011; and INRA, 2018 (INRA_2018)—were evaluated on their abilities to predict daily milk protein yield (MPY) of 541 cows from 23 dairy herds in the province of Québec, Canada. The effects of cow and diet characteristics were tested on the residuals of MPY. Sensitivity and uncertainty analyses were then performed to evaluate the influence of the uncertainty of the main characteristics of cows and feed ingredi-ents measured on the farm and used in the 4 FEM on the predictions of metabolizable protein (MP) supply and MPY. The 4 models had acceptable predictions of MPY, with concordance correlation coefficients (CCC) ranging from 0.75 to 0.82 and total bias ranging from 12.8% to 19.3% of the observed mean. The Scandinavian model NorFor had the best predictions with a CCC of 0.82, whereas the 3 other models had similar CCC at 0.75 to 0.76. The INRA_2018 and NRC_2001 models presented strong central tendency biases. Re-moving herd effect put the 4 FEM at the same level of performance, with 11.9 to 12.4% error. Analyzing model behavior within a herd seems to partly negate the effect of using predicted dry matter intake (DMI) in the comparison of models. Diet energy density, days in milk, and MPY estimated breeding value were related to the residual in the 4 models, and Lys and Met (as percent of MP) only in NRC_2001 and NorFor. This suggests that inclusion of these factors in these models would improve MPY predictions. From the sensitivity analysis, for the 4 FEM, DMI and factors affecting its prediction had the greatest influence on the predictions of MP supply and MPY. Of the feed ingredients, forage composition had the greatest effect on these predictions, including a strong effect of legume proportion with NorFor. Diet acid detergent fiber concentration had a very strong effect on MP supply and MPY predictions only in INRA_2018, because of its effect on organic matter digestibility estimation. The range of predictions of MP supply and MPY when combining all these potential uncertainties varied depending on the models. The INRA_2018 model presented the lowest standard deviation (SD) and NorFor the highest SD for the predictions of both MP supply and MPY. Overall, despite the fact that FEM were developed in a research context, their use in a commercial context yields acceptable predictions, with NorFor yielding the best predictions overall, although within-herd responses varied similarly for the 4 tested models.


INTRODUCTION
Feed evaluation models (FEM) developed by researchers are tools used by dairy producers and their advisors to balance and evaluate dairy rations.They represent a simplification of nutrient flows into and their utilization by the dairy cow.Protein is one important nutrient targeted by FEM.The National Research Council (NRC, 2001) model is still the base FEM of the formulation model used by Lactanet (Canadian Network for Dairy Excellence, Sainte-Anne-de-Bellevue, QC, Canada), the main DHI group in Canada.However, since 2001, FEM have been developed or updated across the world, with new versions being proposed: the Cornell Net Carbohydrate and Protein System (Van Amburgh et al., 2015); INRA (2018), developed in France by l'Institut Na-tional de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAe; formerly INRA); and NorFor in Scandinavia (Volden, 2011).Each FEM was developed with specificities and different levels of application of the accumulated knowledge at its time of creation.One would expect that a more recent FEM using more recent knowledge would provide a better formulation and yield better predictions of milk protein yield (MPY).For instance, it has been shown that the efficiency of MP utilization varies with multiple factors and that a fixed value may be problematic in the prediction of MP utilization (Daniel et al., 2016;Lapierre et al., 2020).
Balancing dairy rations using FEM requires input of parameters into the software to describe the characteristics of the animal and the composition of the feed ingredients to be used.Currently, there is no variation attached to these parameters, although it is known that all these characteristics are determined with some level of uncertainty.For example, uncertainties related to cow characteristics, such as milk yield and composition and BW, are inherent to these measurements.Also, many factors influence feed ingredient analyses.The method and frequency of sampling and the number of samples affect the quality and precision of feed ingredient analyses (Undersander et al., 2005).Furthermore, variation also occurs from day to day due to the unavoidable heterogeneity of feeds harvested, especially forages, which is not captured by one sample.Day-today variation of forages can range from 5.0% to 9.5% for CP, NDF, and starch (Weiss et al., 2012;St-Pierre and Weiss, 2015).
Therefore, the validation of FEM on commercial dairy farms is important to evaluate their accuracy outside the research context.Thus, the objectives of this paper were, with the perspective of an advisor: (1) to compare MPY predictions from 4 FEM with observed productions of 541 dairy cows from 23 commercial dairy farms in Québec; (2) to evaluate the effects of uncertainty of the main characteristics of cows and feed ingredients on the predictions of MP supplies and MPY within the 4 FEM (sensitivity); and (3) to evaluate the range of predictions of MP supply and expenditures and MPY when combining all these potential uncertainties, with these 4 FEM (uncertainty), with an on-farm perspective of risk management.We hypothesized that (1) more recent models using a variable efficiency of utilization of MP would perform the best; (2) overall, prediction of DMI and factors affecting DMI would have the largest influence on the predictions of MP supply and MPY; and (3) within each FEM, individual factors would affect each model with a different amplitude, which would lead to different confidence zones of predictions of MP supply, MP expenditure, and MPY.

Data Collection and Preparation
Cows and Management.Data used for this analysis were collected on 23 commercial dairy farms in Québec, on the test day with the DHI group Lactanet (Sainte-Anne-de-Bellevue, QC, Canada) between October 2014 and June 2015.To correctly characterize nutrient intake, only data from cows fed simple feed ingredients were used, excluding from the original data set data from cows fed commercial concentrates (except minerals), yielding data for 541 cows.Briefly, during the p.m. and subsequent a.m.milking, milk yield from each cow was recorded, and milk was sampled using online milk meters.Samples were stored at 4°C with a preservative (2-bromo-2-nitropan-1,3-diol) and sent to the Lactanet laboratory (Sainte-Anne-de-Bellevue, QC, Canada) for analyses of CP (infrared analyses, Foss MilkoScan FT 6000).Animal BW were estimated from the thoracic circumference, applying the equation of Yan et al. (2009).Milk-and protein-related EBV were also collected from Canadian Dairy Network at the time of collect (Canadian Network for Dairy Excellence, Sainte-Anne-de-Bellevue, QC, Canada).Information on management, such as feeding frequencies, feeding systems, and temperature and humidity index, were also collected on the test day.More information can be obtained from Fadul- Pacheco et al. (2017) and Duplessis et al. (2021).Data of animals with major health issue were not collected.
The days of gestation were unknown, so no requirement for gestation was included in NRC_2001, CNCPS, and NorFor models.For INRA_2018, however, a value for the last breeding time was required and obligate in the program, and thus an estimation was calculated from actual DIM.Furthermore, information on body condition score was not available; therefore, when required, the average value for the stage of lactation estimated by each model was used.
Rations and Feed Ingredients.Information on diet composition was collected on the test day.Feed distribution system [TMR, automatic concentrate feeding (ACF), or manual feeding system] were also noted.Feed and TMR samples were also collected on the test day, and refusals were collected the day after.Concentrate feeds were analyzed using wet chemistry by a commercial laboratory (SGS Agrifood Laboratories, Guelph, ON, Canada), whereas forages were analyzed using infrared methods by Lactanet.Laboratory analyses included DM, NDF, ADF, NDIN, ADIN, lignin, ether extract, starch, ash, and minerals, and fermentable products for silages.
Because individual DMI was not available, as it is almost impossible to measure on commercial farms, DMI was estimated within each model.In fact, this estimation is somewhat included in the objective of evaluating general model performance in commercial conditions.Silages, TMR, and partial mixed rations were considered fed ad libitum, and concentrate from ACF and manual feeding were estimated from values entered in the feeding systems given by the producers.Intakes of feed ingredients from TMR were estimated as total predicted DMI minus top-dress concentrate, if present, multiplied by their relative proportion in TMR.
For chemical composition of feed ingredients needed in some FEM but not available, such as OM digestibility, crude fiber, protein degradability, digestibility coefficients, and AA concentrations, either the model values (when available) or the table values of the feed ingredient most closely related to that used on the farm were used.For example, crude fiber estimation, used in INRA_2018, was estimated from ADF content, as it has been established that this relationship yielded the best fit (INRA, 2018).To estimate all the feed values required in the INRA_2018 model, we used the software Prevalim version 1.3.3-5 (released Jul. 7, 2020)  Model Comparison.Models were compared based on the relationship between measured and predicted MPY.Model fit and correspondence were calculated using concordance correlation coefficients (CCC), as described by Lin (1989) using the epiR package (Stevenson and Sergeant, 2021), where a value near 1 indicates the best fit.The CCC were also decomposed into their precision and accuracy sub-indicators, Pearson correlation coefficient (r) and bias correction factor, respectively.The root mean square error (RMSE) and its bias partition, as described by Bibby and Toutenburg (1977), were used, where the smallest value of RMSE is most desirable, with a minimal proportion of central tendency and regression bias (RB).This was done with and without a herd effect correction.Herd correction was performed by making, within each herd, predicted MPY mean equal to measured MPY by adding herd mean residuals to each predicted MPY.Then, using the lm function from R version 4.0.3(R Core Team, 2017), the following model was created and applied to each herd: where Y ij = measured MPY of the ith animal of the jth herd, b 0j = the fixed intercept of each herd, b 1j = the fixed slope coefficient of the ith animal MPY corrected for jth herd effect, x i = predicted milk protein yield of the ith animal, and ε ij = the residual of the ith animal in the jth herd.Slopes were then averaged to observe general tendencies of prediction of each FEM.A slope around 1 would indicate a tendency to predict well within a herd.
Univariate Residuals Analysis.To evaluate the effects of variables that might be related to the residuals, the lm function from the R package Stats was used.Residuals are the difference between measured and predicted values of MPY.Univariate analysis was performed for each variable.The following variables were tested for linear effect: His, Lys, and Met, as percentage of MP supply, energy supply and energy concentration of diet, DIM, temperature and humidity index, concentrate feeding frequency, BW, and parity.Furthermore, linear effect of the genetic potential (i.e., EBV), as evaluated by the breed association, for milk protein deviation (%) and MPY (kg), and EBV for milk production (kg) were also tested.Then, the following model was used to evaluate the significance of the continuous variables on the residuals: where Y ij is the difference between measured and predicted milk protein of the ith cow, b 0 = the fixed intercept, b 1 = the fixed linear coefficients of the xth variable assigned to the ith cow, and ε i = the residual of the ith cow.
Variables were considered significant at P ≤ 0.01 with R 2 ≥ 0.05.
Sensitivity and Uncertainty Analyses.For sensitivity and uncertainty analyses, characteristics of the cows tested were DMI, BW, milk yield, and protein composition.Characteristics of feed ingredients tested were NDF and CP of forages, NDF and CP of concentrates, independently, starch and ADF of all feeds altogether (i.e., of the diet), and maturity stage and legume content of forages, when applicable.The mag-nitude of the variations for each of these characteristics is explained subsequently and detailed in Table 1.
Cows.Based on average, minimal, and maximal residual feed intake from Potts et al. (2017), ad libitum DMI was assumed to vary by ±6% of the predicted values within each model.Animal BW uncertainty from thoracic estimation was established at 8%, based on Davis et al. (1961).Milk protein concentration uncertainty was established from the near infrared uncertainty of Tsenkova et al. (1999) at 4%.For milk production, the International Committee for Animal Recording (ICAR, 2016) tolerates a 2% margin of accuracy for most milk meter systems.By extension, for INRA_2018, estimated peak milk production was also varied by 2%.
Feed Ingredients.For silage, CP and NDF variations were established at 7% and 8% of initial value, respectively, as Weiss et al. (2012) reported variations of 7.3% and 7.6% for CP and NDF for corn silage and legume silage, respectively.When more than one forage was fed, all forages were affected simultaneously.Concentrates included all feed ingredients except forages and vitamin and mineral supplements.For concentrates, CP and NDF variations were set to 5% and 10%, as Weiss et al. (2012) obtained 5.0% and 10.5% variations for 5 major concentrates (wet and dry corn grain, soybean meal, dried distiller grain, wet brewer grains), and those variations were applied to all concentrates.Starch concentration uncertainty was assumed to be 10%, as St-Pierre and Weiss (2015) obtained 9.5% variation for corn silage, and this value was extrapolated to all feeds.The concentration of ADF in forages varied by 5%, based on Undersander (2006), and was also extrapolated to all feeds, as most concentrates usually have very low ADF concentration.These variations were assumed to include both day-to-day variations and laboratory uncertainty.Forage legume content and maturity were estimated by farmers at harvest from visual observation.Although forage NDF and CP concentrations are usually strongly related to maturity level, differences can occur within the same species for different varieties of the same maturity (Yu et al., 2003).Furthermore, for forages, a variation of 1 "level" of legume content, equivalent to 25% of legume content, and a variation of 1 level of maturity out of 3 (immature, mid-mature, and mature), when applicable, were considered.

Sensitivity Analysis
The sensitivity to a factor (animal or feed characteristics in the current study) in a model is measured as the difference between the parameter (e.g., MP supply, MPY) predicted with the initial characteristics and the parameter predicted once one characteristic is changed, relative to the initial prediction, expressed as percentage of the initial prediction, as described in Equation [1].
where O XYi is the initial predicted parameter using the Xth diet of the Yth animal; C XYZi is the predicted parameter using the Xth diet of the Yth animal, with the Zth variation on animal or feed ingredient characteristics, with the ith iteration within each FEM.Each factor was tested with its respective positive and negative variation.As analyses were conducted from variation of initial values, responses to negative variations were inverted and combined with the positive results, under the first assumption of symmetry of effect.They were analyzed with the median and 25th and 75th quantiles to account for possible skewness that mean and standard deviation (SD) might not show.In the sensitivity analysis, a negative value indicates an opposite reaction of the prediction of the supply of MP or MPY to the variation of the component.Symmetry between positive and negative error was also established by mean differences of positive and negative results.Perfect symmetry is described as a similar but opposite distribution of the effect for the positive and negative variation of one factor.The symmetry is important to evaluate whether the response to variation is linear.

Uncertainty Analysis
For the uncertainty analysis, a Monte Carlo simulation approach was performed: 3,500 fully randomized iterations around each animal production were created, where each characteristic tested in the sensitivity analysis was randomly varied.Monte Carlo methods consist of mathematical and modeling approaches relying on randomness and probability of income to find probabilistic outcomes to deterministic problems (Kalos and Whitlock, 2009).Each variation of each characteristic resulted from a random selection from a normal distribution where the mean was set to 1 and the SD is set to the variation used for the sensitivity analysis (Table 1), as follows: where C Xi = the corrected value of feed or animal characteristic X at iteration i; N i = random normal distribution number generator function with mean µ and SD σ Xi , value at iteration i; µ = mean = 1; σ X = SD of animal or feed ingredient characteristic X; and O X = initial characteristic of feed ingredient or animal X.
We assumed that the measured values were the most likely true values of the samples.For forage maturity and legume content changes, as these can be considered ordinal factors, a random uniform function was applied, where probabilities were equal to either the same or ± 1 level of initial value, when applicable.For example, fully mature forage cannot be more mature.Component variation probabilities were assumed independent because for each component, an initial value was known, and we were not creating new feed from average as done by Higgs et al. (2015).Uncertainties were assumed to be due to laboratory analysis uncertainty and feed day-to-day variation.At each iteration i, the same variations of each characteristic were applied to all feed ingredients, diets, and animals.
As for sensitivity, uncertainty analysis results were examined primarily based on descriptive statistics of the variation for each animal from their initial values.Main statistics extracted were median, SD, and 25th and 75th quantiles, as quartiles 1 and 3, and 5th and 95th quantiles, using the Stats package from R (R Core Team, 2017).Median was preferred to the mean, and quartiles were added, as we expected non-normality in results.When normality is observed, the median is approximately equal to the mean.To those statistics were added skewness and kurtosis, both with function of the same name from the PerformanceAnalytics package from R (Peterson and Carl, 2020) to assess normality of distribution.Skewness was analyzed with the "moment" method, whereas kurtosis was analyzed under the "excess" method, which centers normality of kurtosis at 0. Skewness estimates whether the distribution is symmetrical, whereas kurtosis measures how the tails of the distribution behave.A high kurtosis value indicates long tails and thus risk of extreme values.A normal distribution is supposed to have a skewness of 0 and a kurtosis excess of 0 (Bai and Ng, 2005).Normality of the distribution was assumed when kurtosis was between −2 and 2 and when skewness was between −1 and 1 (George and Mallery, 2016).

RESULTS AND DISCUSSION
To simplify and unify terminology, the flow of digestible proteins is referred to as MP supply; MP supply is as reported in NRC ( 2001) and CNCPS but refers to AAT N (absorbed AA) in NorFor and to PDI (protein digestible in the intestine) in INRA ( 2018).For the NRC_2001, CNCPS, and NorFor models, predicted MPY is estimated as MP supply minus predicted maintenance (nonproductive) and growth MP expenditures, multiplied by the efficiency of lactation.For INRA_2018, this follows their milk/milk protein response and DMI interaction, which also considers Lys and Met, as % of MP, on the marginal milk protein response.As there are no requirements, per se, with INRA_2018, expenditures were estimated using net protein exportation divided by the final calculated protein efficiency, when applicable, except for endogenous urinary loss, which has a fixed efficiency of 1.Therefore, expenditures represent the total utilization of MP for all considered protein functions for the estimated MP supply from the diets and predicted MPY from each FEM.On commercial farms, diets were initially formulated and optimized using the Lactanet feeding model, which was based on NRC (2001) with some adjustments; thus, the current FEM comparison might have been advantageous for this model.

Animal and Feed Ingredient Descriptions
All cows were from the Holstein breed, and their descriptive statistics are detailed in Table 2. Briefly, they averaged 672 kg BW and 29.1 kg of milk per day at 3.31% CP and 202 DIM.Fifty-eight different forages were fed on the different farms, mainly legume, mixed and corn silages, and grass hay.Canola straw, fed in one herd, had high NDF concentration and low CP concentration (66.8% DM and 4.6% DM, respectively), and was managed as a straw.There were 39 different concentrates considered as energy source (e.g., corn, barley, wheat and oat grains, rumen-protected fat) and 18 considered as protein supplements (e.g., soybean meal, canola meal, distiller grain, full-fat soybeans, and peas).Rumen-protected AA were not included in any of the diets.Six different TMR were fed to 203 animals from 4 herds, whereas 338 cows were fed partial mixed rations with different concentrate supplies through either ACF or manual feeding (Table 2).

Predictions of DMI, Supply of MP and AA, and MP Expenditures
The NRC_2001 model yielded the highest predictions of DMI, with an average of 22.9 kg/d, whereas at the other end, INRA_2018 had the lowest estimations, with 20.5 kg/d (Table 3).The CNCPS and NorFor average predictions were intermediate, at 21.2 kg/d.Estimations of DMI by both North American FEM are influenced solely by animal BW, productivity, and stage of lactation, as opposed to both European models, also influenced by the animal capacity and feed fill values, which require an optimization process.In INRA_2018, the intake capacity is related to the potential of milk production at the peak of lactation, which is then used to evaluate the potential milk production at each week of lactation from a set of 2 lactation curves.It is also influenced by the ratio of MP: energy, where a lower ratio decreases intake.As only milk production on the test day was available in our data set, to estimate maximum potential milk production for this model, we first assumed that cows were producing at their full potential.Then, increments of 5% and 10% of the estimated peak milk production were tested, and minimal milk protein residual was the criteria of selection for the best estimation of the maximum potential milk production within a farm.This correction was performed to mimic what an advisor would do on a farm, should the lactation potential of each animal be unknown.Correcting for the whole herd equally assumes a general effect within the herd current feeding strategies and genetic potential, and was chosen to reduce the risk of overperformance of the INRA_2018 model by selecting individual best predictions.It was also used to partially account for a possible effect of misestimating feed fill value or diet effect within herd.However, we cannot assess which model had the best DMI estimation, but this estimation is somewhat embedded into the capacity of the whole model to predict performances.Indeed, the prediction of DMI directly affects the estimation of the supply of all nutrients, and thus directly affects MPY predictions.
We observed a strong positive relationship between MP supply and DMI in all models (r = 0.88-0.93;data not shown).However, CNCPS predicted the highest MP supply, even though it did not predict the highest DMI, followed by NRC_2001, NorFor, and INRA_2018 (Table 3).Comparing MP supply relative to DMI makes it possible to partially disregard the relationship between them.The CNCPS model had the highest esti- The high average MP supply predicted by CNCPS seems to be linked to a higher average microbial crude protein (MCP), both in absolute terms and relative to DMI, averaging 61 g of MP/kg of DMI, and the low MP supply estimated with INRA might also be linked to a lower estimation of MCP, even when calculated relative to DMI, averaging 47 g/kg of DMI, whereas NRC_2001 and NorFor values were intermediate, averaging 52 and 59 g/kg of DMI, respectively.Similarly, in a very limited evaluation, the ratio of MCP: DMI was lowest for INRA_2008 and highest for CNCPS (Lapierre et al., 2018).These estimations are also in agreement with observations from Pacheco et al. (2012), where a previous version of CNCPS (version 6.0.0)under the AMTS application predicted, on average, higher MCP duodenal flows compared with NRC_2001, predictions from CNCPS being 4% higher than observed values from 154 diets reported in the literature.In this latter comparison, NRC_2001 MCP predictions averaged 98% of observed values.Also, Tedeschi et al. (2015) reported that NRC prediction of MP from MCP was usually lower than CNCPS predictions.In addition to this, the lower ratio of MP supply: DMI from INRA_2018 can partly be explained by the absence of a direct contribution of the duodenal flow of endogenous protein to the MP supply, whereas NRC_2001 and NorFor include this contribution and CNCPS indirectly included it in their RUP fraction, representing nonammonia-nonmicrobial duodenal protein flow.
For MP expenditures, estimated as previously described, both European models yielded lower estimations, with averages of 1978 and 1940 g/d for INRA_2018 and NorFor, respectively, as opposed to North American estimates of 2,183 and 2,153 g/d for NRC_2001 and CNCPS, respectively.These differences can be due to either different estimations of the net expenditure or different efficiency of utilization, or both.For the 2 European models, efficiency of MP utilization can vary, as opposed to the fixed efficiency of 0.67 used by NRC_2001 and CNCPS.However, the NorFor model applies the variable efficiency solely to milk protein secretion, whereas INRA_2018 assigns it to all protein outputs except endogenous urinary loss, which has an efficiency of 1.The average efficiency cal- All models had the same small scurf net expenditure, with a mean of 10 g/d, as they are all based on Swanson (1977) equations.An equation from Swanson (1977) is also used to estimate the endogenous urinary losses in NRC_2001, CNCPS, and NorFor, predicting an average net expenditure of 70 g/d, which, associated with a 0.67 efficiency, yields an MP expenditure of 104 g/d.The INRA_2018 model developed a new equation resulting in estimation of expenditure twice as large as previous estimations, due to the change in the definition of endogenous urinary loss.The endogenous urinary loss calculated by INRA_2018 represents urinary N losses when apparent N digestibility is 0, whereas the Swanson equation represents unavoidable N losses when animals are fed very low-protein diets, when energy supply is sufficient.For primiparous cows, growth expenditures were included in all models, whereas for the second lactation, NRC_2001, CNCPS, and NorFor include growth expenditures for the whole lactation, and INRA_2018 stops considering growth expenditure at 40 mo of age.When averaging this expenditure only for cows still growing, NorFor had the highest expenditures for growth, averaging 117 g/cow per day.The NRC_2001, CNCPS, and INRA_2018 models had lower and similar average expenditures, varying from 76 to 85 g/cow per day.The INRA_2018 model, however, differed, with the highest coefficient of variation (CV) at 47%, whereas NRC_2001 and CNCPS had CV below 5%.Unlike INRA_2018 or NorFor, NRC and CNCPS expenditures for growth do not vary with age but depend on the ADG set.This ADG is not described as varying but is set by the user, and thus a fixed value of ADG was set for requirements.Daily gain was established assuming the BW at first calving represents 82% of mature BW and second calving BW represent 92% of mature BW, from NRC_2001 assumptions.With that information, ADG was calculated from measured BW, DIM, and an average calving interval of 420 d, from the province of Québec average from Lactanet (2018).Average daily gain was assumed constant throughout the lactation, averaging 0.15 kg/d for primiparous cows and 0.12 kg/d for second-lactation cows.Growth expenditures are affected by a specific efficiency of utilization for gain, estimated by the NorFor model at 30%, and at 28% for NRC and CNCPS.In the INRA model, the efficiency was the same as for the other protein synthesis functions, in line with their response system.For INRA 2018, gestation expenditure was relatively low and had its median at 6.1 g/d, with therefore a limited influence on MPY predictions.Individual His, Lys, and Met supplies were also compared, as they are increasingly considered when balancing dairy rations.To remove the effect of predicted MP supply, AA supplies are compared relative to MP supply.For His supply, CNCPS has the highest value, with 2.53% of MP supply, followed by NorFor with 2.28% of MP, then NRC_2001 with 2.17%, and INRA_2018 with 2.11%.In CNCPS, the contribution of MCP to the digestible flow of EAA is assumed to originate only from non-cell-wall MCP.The high predicted His supply is largely due to the higher His concentration of non-cell-wall MCP (2.69% of CP) used by CNCPS, whereas NorFor uses 2.4% of CP in MCP, and INRA_2018 expresses His concentration in another unit, as 1.8% of 16 AA.The 16 AA basis in INRA uses Glx for Glu+Gln and Asx for Asp+Asn, excludes Cys and Trp, and assumes that the sum of the 16 AA equals 100%.It is possible that NorFor and CNCPS overestimate His supply, as Sok et al. (2017) reported, from a literature review, that His concentration of MCP, including protozoa contribution, averages of 1.88%.It must be remembered, however, that INRA_2018 uses adjustment factors to better fit the predicted duodenal flow of AA from the sum of AA from RUP and MCP to the observed duodenal AA flow.The NRC_2001 model does not use a bacterial AA profile to predict the duodenal flow of EAA, as the subcommittee opted for a multivariate approach based on (1) a regression to determine the duodenal flow of EAA and (2) regressions for individual EAA to determine the percentage of each EAA in duodenal EAA flow based on its proportion in RUP.
Because most of the diets in our study included soybean products, with 344 and 158 diets including soybean meal and toasted soybeans, respectively, Met supply was on average relatively low.With NRC_2001, Met averaged 1.88% of MP, with Met concentration in soybean products averaging 1.40% of CP and most forages being below 1.60% of CP.Although most feeds with CNCPS have a similar AA profile to NRC_2001, the high MCP estimation by CNCPS (62% of MP vs. 57% of MP in NRC) increased Met supply to 2.18% of MP, with CNCPS using a value 2.68% of Met in non-cell-wall MCP (Fox et al., 2004).The INRA_2018 model uses 2.50% of 16 AA, and NorFor uses 2.5 g/100 g compared with a value of 2.29% of MCP reported in Sok et al. (2017).Therefore, values of Met supply, averaging 2.07 and 2.11% of MP for NorFor and INRA_2018, respectively, are closer to CNCPS than to NRC_2001.
As corn grain, wet or dry, was present in 427 of our 541 diets, Lys supply was also relatively low.With 6.20% of MP on average in diets, NorFor yielded the lowest proportion of Lys, compared with the 3 other Binggeli et al.: PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS models, averaging 6.51%, 6.82%, and 6.60% of MP for NRC_2001, CNCPS, and INRA_2018, respectively.This lower supply is probably caused by a lower value of Lys in MCP.The higher value of CNCPS for Lys supply is also explained by the high Lys concentration in non-cell-wall MCP, established at 8.2% of CP, as opposed to 7.4% in NorFor and 8.0% of 16 AA in INRA_2018 for MCP, respectively (Fox et al., 2004;Volden, 2011;INRA, 2018).In agreement with these results, His and Met duodenal flows predicted with CNCPS were higher than NRC (2001) predictions, whereas predictions of Lys, as a percent of CP, were similar (Pacheco et al., 2012).
For INRA_2018, suggested AA requirements are set at 2.4%, 7.0%, and 2.4% of MP for His, Lys, and Met, respectively (INRA, 2018).Optimal MPY production with NRC was initially established at 7.08% and 2.35% of MP for Lys and Met, respectively (NRC, 2001).Whitehouse et al. (2010) revised those recommendations to 6.95% and 2.38% of MP.However, those values for NRC_2001 may differ if the goal is maximum MPY or maximum milk protein content.The NorFor model indicates 2.2%, 6.4%, and 2.2% of MP for His, Lys, and Met, respectively, for optimal production.With CNCPS, however, assuming the factorial AA repartition of each expenditure and using AA efficiencies from Van Amburgh et al. ( 2015), AA requirements averaged, in our data set, 2.30%, 6.91%, and 2.41% of MP (SD: 0.02%, 0.05%, 0.08%) for His, Lys, and Met, respectively.Clearly, with variability of supply across models, recommendations to balance AA must be model-specific, as also demonstrated by Whitehouse et al. (2010).From those values, however, we could suggest that, with the diets fed on the farms of the current study, Met was the AA with the shortest supply, followed by Lys.

Model Performance Comparison
Visual appraisal of the relationship between measured and predicted MPY by the 4 FEM is given in Figure 1.The MPY measured on test day averaged 894 g/d, ranging from 211 to 1,601 g/d (Table 2).For results without herd correction, the NorFor model yielded the best fit, with the highest CCC at 0.82 and the lowest RMSE at 136.1 g/d, followed by CNCPS, NRC_2001, and INRA_2018 (Table 4).In their evaluation of the NorFor model, with 329 animal productions from literature, Volden (2011) observed a prediction error of 5.5%, smaller than in the current study, but with a 14% mean bias and 2% RB.Commercial context variations and precision in DMI values used are most likely the main differences in total bias difference, as analyses from literature used observed DMI.In a model comparison using 2 studies varying protein and energy supply, NorFor was also the model with the lowest RMSE, followed by INRA_2018, NRC_2001, and CNCPS (Lapierre et al., 2018).Also, in another comparison between NRC_2001 and NorFor (Broderick and Åkerlind, 2012), NorFor had a better prediction, with R 2 of 0.97 compared with 0.66 for NRC_2001, with 20 data points from literature.
The NRC_2001 model yielded a major central tendency bias and was the only model with RB.Both biases might be related to the fixed efficiency of utilization of MP used by NRC_2001.However, CNCPS, also using a fixed efficiency, did not yield central tendency or RB.The INRA_2018 underestimation is probably mainly driven by the low estimation of DMI, which directly yielded a lower total MP supply, or by the evaluation of the potential milk production.The DMI estimation from INRA_2018 was on average 4% lower than the average of the 4 models.Increasing each DMI in INRA_2018 by 10.6% to be set at NRC_2001 predicted DMI, predicted MPY would average 901 g/d, a value close to the observed MPY (894 g/d).On a validation data set, when observed DMI is used, no significant slope or intercept bias was observed between observed and predicted MPY, with a RMSE of 4.3% of measured mean (957 g/d) when including Lys and Met in their response (INRA, 2018).Similarly, as stated previously, when comparing the model using reported DMI values, INRA_2018 performed second, after NorFor but before NRC and CNCPS (Lapierre et al., 2018).The first hypothesis to explain this low DMI value may be the fill value of the forages, in particular grass silages; the French reference forages from the INRA_2018 table could have fill values too high for Québec grass silages.The iterative process of DMI and milk response is built to limit high intake of feed, leading to an overestimation of milk responses.This is developed under the principle that an animal will not produce more than its potential, by control of DMI prediction with the influence of protein and energy on intake capacity during the optimization process.The adjustment of the intake from energy and protein supply assumes that a lower ratio of MP: energy reduces intake capacity of animals and vice versa.The milk potential should represent the highest milk production if the cow is kept in an ideal environment to fully express its milk synthesis capacity (i.e., high ingestible, digestible diets with high NE L and MP density).In that scope it could be higher than the genetic potential of the cow, since genetic potential was not established in an ideal environment.Under commercial conditions in France, it has been suggested to increase peak potential by 10% to 15% in INRA_2018 (S.Lemosquet and L. Delaby, INRAe, St-Gilles, France, personal communication), instead of the 5% to 10% used in the present study from observed or expected milk production.From the stepwise process performed on peak milk production, every 5% change in milk peak value increased predicted DMI by, on average, 1.1% augmentation, which fits with the intake capacity equation prediction.However, a 5% change in peak milk production led to a 1.0% average augmentation in MPY, whereas 10% caused an average 1.5% augmentation, showing a nonlinear response of this value on MPY prediction.The lack of information on the whole lactation probably misled peak production estimation and, thus, MPY predictions.The NorFor MP variable calculated efficiency averaged 0.65, whereas INRA_2018 predicted a variable efficiency of 0.73 but underestimated MPY, still pointing toward too low an estimation of DMI to explain MPY underestimation or low MP supply concentration.
Herd-Corrected Analysis.Removing herd effect by centering both predicted and observed means, we could analyze whether the disturbance observed is generated by the variability of prediction across herds, or by inability of the model to predict MPY even within similar herd conditions.We observed a 30% average reduction in RMSE with herd correction.Analysis of the results where values were herd-corrected for intercept indicated that all models had similar results, with CCC ranging from 0.88 to 0.89 and RMSE from 106.8 to 111.4 (Table 4).The INRA_2018 model slightly underperformed compared with the other 3 models, indicating a higher within-herd variability or individual variation in prediction capability.These results reinforce the necessity of adapting diet formulation to each farm situation.This herd intercept correction removed mean bias, as we would expect.Within-herd slopes averaged 1.15 with NRC_2001 but were highly variable, ranging from 0.92 to 1.54, with only 2 herds below 1.This indicates, within a herd, a tendency to overestimate MPY at low production and slightly underestimate at high production.With INRA_2018, the average of the within-herd slopes was smaller at 1.08, but all slopes were positive (1.01 to 1.22).The CNCPS and NorFor within-herd slopes averaged 1.03 and 1.01, respectively, but NorFor had a smaller range, from 0.92 to 1.21, compared with 0.79 to 1.31 for CNCPS.
Univariate Residuals Analysis.Univariate analysis of residuals can serve 3 purposes: (1) to analyze whether a model underperforms in certain specific feeding conditions, such as diets with or without corn silage; (2) to determine herd-or animal-level factors affecting predictions; and (3) to extract missing information, which could be explored to help improve each model.Most factors tested in our study are linked to animals, but some are also directly or indirectly linked to herd or environment.Whereas animal factors tend to analyze individual data, herd and environmental factors affect analysis at the level of a group of animals and may be used to explain the herd-correction results presented previously.A positive slope indicates higher residuals at high values of the studied factor, whereas a negative slope indicates higher residual at low values and lower residual at high values.
Correlations between residuals and the variables tested were all low (between 0.01 and 0.17), and only those yielding R 2 higher than 0.05 for at least one model are presented in Table 5. Across all FEM, DIM and diet energy concentration yielded R 2 ≥ 0.05 with a negative slope.The similar relationship of the residuals with DIM across the 4 models might indicate a potential effect of DIM on the efficiency of MP utilization.Indeed, in a meta-analysis, DIM had a negative effect on the efficiency of utilization of MP, with cows in early lactation being more efficient than cows in late lactation (Lapierre et al., 2020), in line with the negative relationship observed in the current study.None of the 4 models currently includes DIM in MPY prediction, directly or indirectly through estimation of efficiency.
The consistent pattern between diet energy concentration and residuals is counterintuitive.Indeed, this relationship indicates that at low diet energy concentrations, there is underestimation of MPY, whereas at higher supplies, an overestimation occurs.This goes against general observation from Lapierre et al. ( 2020) and Daniel et al. (2016), where the efficiency of utilization of MP increases with energy supply.However, in those latter analyses, the results were found in a multivariate analysis with interactions with MP supply or quadratic term, or both, which makes it impossible to fully interpret.However, with the current data, for NRC and CNCPS, a significant negative relationship was observed between the residuals and the MP: energy supply ratio (R 2 ≥ 0.05; data not shown).This indicates the importance of taking into account the interaction  (Lin, 1989).
3 Cb = bias correction factor (Lin, 1989).and relationship between MP and energy supply when predicting the efficiency of utilization of MP.Other than the possible effect of energy on the efficiency of utilization of MP, higher energy concentration of diets has been shown to reduce total DMI in beef cattle (Anele et al., 2014;NASEM, 2016), and in INRA_2018, energy supply, in relation to MP supply, is used to predict DMI, where increased supply of energy reduces DMI at a similar MP intake.This is, however, not the case for NRC_2001, CNCPS, and NorFor.Thus, the effect of energy concentration may also be linked to bias DMI predictions, and thus to total MP supply.Residuals were positively correlated with both Lys and Met, as percentage of MP, in NRC_2001 and achieved an R 2 of 0.05 only for Lys in NorFor, in line with the recommendations for these AA.The counterintuitive result on His in NRC_2001 and CNCPS can possibly be explained by the negative correlation of His concentration with Met and Lys concentration with NRC_2001.For INRA_2018, which already considers the percentage of Lys and Met in its MPY prediction, residuals did not correlate well with these EAA.Finally, although the relationship between genetic potential for MPY (protein EBV, kg) and residuals is not strong (0.02 < R 2 < 0.05) but is significant for all models, its positive relationship indicates that MPY of cows with low genetic potential were overestimated and MPY of cows with high potential were underestimated by all models.This consistency suggests that cows with high EBV are more efficient.We could then hypothesize that cows with higher potential for milk and milk protein yields use nutrients more efficiently to produce milk protein.As genetic values of the cows are usually not reported in scientific literature, data from commercial farms could be used to define how to include this important parameter in FEM to improve their accuracy in commercial conditions.Recently, Richardson and Van Doormaal (2021), in the establishment of a feed efficiency genetic index in Canada, showed a slight cor-relation of feed efficiency with protein-production-yield index (r = 0.20).This recent observation goes in the same direction of the current study, where cows with high MPY EBV would be more efficient.

Sensitivity Analysis
Sensitivity analysis was performed to represent the general uncertainty occurring from the main variables collected on farms, as a risk management indicator.Figure 2 presents the results of the sensitivity analysis for the predictions of MP supply and MPY, with each characteristic tested with the variation indicated in Table 1.

Cow Characteristics
Dry matter intake was the characteristic with the largest effect on predictions of MP supply and MPY for the 4 FEM.A 6% variation in DMI changed predicted MP supply median by 4.4% to 7.2% and predicted MPY median by 3.9 to 8.4% in the same direction (Figure 2).Although DMI variation had the lowest influence on predictions of MP supply for INRA_2018 and NRC_2001, with a median of 4.4%, it had the largest range of effect, compared with the other models.
Dry matter intake affected maintenance requirements with the 4 models, because of its effect on the estimation of metabolic fecal protein, which, on average, represented 66%, 85%, 76%, and 67% of the so-called maintenance expenditure in NRC_2001, CNCPS, Nor-For, and INRA_2018, respectively.However, the positive effect of increased DMI on MP supply is larger than the negative effect on the increment of maintenance requirements, and increment of DMI overall increased predictions of MPY, whereas decreased DMI decreased predicted MPY.Thus, this sensitivity to DMI indicates the critical importance of correctly determining DMI on farms to improve MPY predictions.This also in- dicates that all factors affecting DMI estimation will have an influence on the predictions of MP supply and consequently on MPY.Body weight is the second most important factor in the predictions of MP supply and MPY for CNCPS and NRC_2001, mainly due to its effects on DMI estimation.Indeed, increasing BW by 8% increased predicted DMI by 4.7% and 2.6%, for CNCPS and NRC_2001, respectively, whereas the reduction in BW reduced DMI in a similar fashion (data not shown).Accordingly, increasing BW increased the median of predicted MPY by 3.6% and 2.2% for CNCPS and NRC_2001, respectively.It also affected INRA and NorFor predicted MP supplies by 2.6% and 1.6% and MPY predictions by 1.2% and 2.0%, where BW variation induced 3.0% and 3.2% change in DMI for both European systems.Body weight is acknowledged in the 4 models as directly or indirectly affecting feed and nutrient passage rates in the rumen, thus altering RDP and RUP proportion of feed ingredients and their respective rumen degradability, affecting the prediction of MP supply.The influence of a variation of the measurement of milk production (2% variation) for NRC_2001, CNCPS, and NorFor, or potential milk production with INRA_2018, is limited to 0.5% to 1% change in predicted MPY.It is linked to the higher predictions of MP and energy supplies, with an increased predicted DMI linked to the level of milk production.Variation of milk protein concentration analysis only significantly affected INRA_2018 MPY predictions, with a minor effect of 0.3% on predictions of MP supply and MPY for NorFor.For INRA_2018, the MPY response system is based on an average potential milk protein concentration, as a central or pivot value.The response varies from the potential milk protein value.Thus, a positive or negative change influenced the initial value of the response system, with little to no effect on the intake.

Feed Ingredient Characteristics
Forage CP concentration is the feed ingredient characteristic with the largest positive influence on predictions of MP supply and MPY with the NRC_2001, INRA_2018, and NorFor models.With these models, an 8% change of forage CP changed predicted MP supply median by 1.3% to 3.1% and predicted MPY medians by 1.8% to 3.0% (Figure 2).In the CNCPS model, these increments were still positive but with medians of only 0.5% and 0.4% for MP supply and MPY, respectively.Increasing the CP concentration in concentrate also increased predictions of MP supply and MPY, but to a lesser extent than in forages, except in the CNCPS model.Crude protein concentration of both forages and concentrates positively affects RUP supply, thus increasing MP supply for the 4 models.For NRC_2001, unless RDP is limiting MCP production, increasing RDP supply from increased CP concentration in forages does not alter MP supply.Microbial crude protein is mostly not affected.In fact, from NRC_2001 calculations, only 6 cows had RDP as the limiting factor on MCP production in the initial scenario.Also, changes in CP inversely affect NFC, when all other components do not change.As they both have similar energy concentrations, MCP is also not affected by that change Changing forage NDF concentration by 7% had no effect on the predictions of MP supply and MPY in INRA_2018, but reduced these predictions by 0.3% to 2.2% and by 0.6% to 3.6%, respectively, in the 3 other models (Figure 2).For NRC_2001, this occurred through a small reduction of TDN, reducing MCP production.With CNCPS and NorFor, the effect of changing forage NDF concentration is larger than with NRC_2001.The stronger influence of NDF variation for NorFor is suspected to be caused by the total NDF supply, which also affects the efficiency of production of MCP by reducing NFC calculated from other components, and was not measured.Also, because in both models metabolic fecal protein requirements are estimated from undigested OM, and NDF is usually less digestible than NFC, 8% more NDF in forage led to increments of maintenance requirements by 0.8% and 1.3%, decreasing MP available for MPY.For INRA_2018, the null effect in our analysis is suspected to be caused by the absence of factors directly affected by forage NDF.As aforementioned, for INRA_2018 crude fiber content was estimated from ADF and not NDF.The flow of digestible NDF (in g/kg of DM) is then calculated from ADF, and therefore NDF by itself does not affect MCP estimation.The effect of a 10% variation in NDF concentration of concentrate is overall very small for the 4 models, ranging from almost null to −0.3 for MP supply and from near 0 to −0.45 for MPY.As most concentrates usually have low NDF concentration, 14.2% on average (Table 2), low effects were expected.However, variability effects and global differences between forages and concentrates can be explained by the forage-to-concentrate ratio.
Forage maturity had, on average, a small effect with the 4 models.In NRC_2001 feed tables, the lower digestibility of RUP of more mature forages is the main factor explaining this effect, where changing maturity by 1 level induced, on average, a 5% variation of digestibility.Rumen degradability rate is also lowered at higher maturity in table values, which reduces RDP and favors RUP.As discussed previously, as long as RDP is sufficient and TDN does not change, under the NRC_2001 principle, MCP also does not change.Therefore, increasing forage maturity will result in more RUP but makes it less digestible, explaining why, for most predictions, changing maturity has a very small effect.Although forage maturity had a small effect, on average, on their chemical composition, in 25% of cases, a change of 1 level in forage maturity induced a change of more than 2.4% in MP supply, where more mature forages would decrease MP supply, inducing a decrease in MPY prediction by 4.2% in NRC_2001 (data not shown).In the CNCPS and NorFor models, there is no maturity factor per se, but forage selection is mainly made upon NDF and CP concentrations; thus, changing maturity of forages had no relevance for these models.In INRA_2018, by contrast, although in reference feed values maturity is important to select the right forage, protein degradability and digestibility are not affected by this category once the selection is made, but by chemical composition.In this model, maturity mainly affects AA concentrations of the forages but also fill value of forages.
In literature, experiments considering variations of NDF or CP in forage or concentrate, and maintaining the same diet, are not common.Usually, the whole diet NDF or CP concentrations, or both, are altered, through changes in feed ingredients, either silage type or concentrate type.Therefore, it is hard to experimentally isolate the effect of a variation in concentration only.In our sensitivity analysis, increasing NDF or CP concentration led to increased NDF or CP intake, as DMI was not altered when NDF or CP concentrations were changed in the simulations.However, it has been reported that an augmentation in diet NDF tended to lower milk production, as well as milk protein con- centration, reducing MPY (e.g., Kendall et al., 2009;Hammond et al., 2016).Kendall et al. (2009) observed a significant reduction in milk protein concentration by 7% for a 15% increment in diet NDF, whereas Hammond et al. (2016) observed a 2% reduction for an 11% increment of NDF concentration.Milk protein yield was reduced by 13% and 5%, respectively (Kendall et al., 2009;Hammond et al., 2016).A reduction in MCP can also be suspected, as a 15% augmentation of ruminal NH 3 concentration was observed by Kendall et al. (2009) when NDF was increased, indicating poorer use of degradable N to support MCP synthesis.In the literature, increasing CP concentration or intake of the diet is usually reported to have a positive effect on MPY, with a quadratic component, having a larger effect at lower CP concentrations but a smaller effect at higher CP concentrations (Wu and Satter, 2000;Law et al., 2009).These observations support the general tendencies observed in this sensitivity analysis.
The NorFor model was the most sensitive to possible error in estimation of legume content of forage, with 7.6% change in predicted MPY for 1 level change in legume content.Similarly to NRC and INRA_2018, more legumes tended to reduce MP supply for the same chemical composition.This parameter is not relevant for the CNCPS model, as the table values offer only one type of mixed forage, with no reference to the proportion of each type of forage.
The INRA_2018 model was the only model where ADF concentration of the diet affected predictions, on average 0.6% on MP supply and −3.1% for MPY.In INRA_2018, increasing ADF decreased estimated OM digestibility and gross energy concentration of feed ingredients, both having negative effects on predictions of MP supply and MPY.Indeed, estimations of 3 main parameters are directly affected by OM digestibility: digestible energy, forage filling potential (and thus DMI), and MCP.In the range of tested variation, only forage characteristics (i.e., NDF, CP, legume content, and maturity level) showed some asymmetry between positive-and negative-variation sources.Thus, beside forage characteristics, positive and negative variations have similar absolute effects.

Uncertainty Analysis
According to Loucks et al. (2005), an uncertainty analysis is used to attempt to describe the whole set of possible outcomes along with their probability of occurrence.Uncertainty analyses can provide 2 main types of information: ranges of potential outputs and probability of each output.They often accompanied sensitivity analysis for their complementary analyses and interpretation.Indeed, although all models predict a single MPY for a given diet fed to a given cow, the uncertainty attached to each input into the model should generate uncertainty on that prediction, which is usually ignored.The purpose of this analysis was to estimate the level of uncertainty of the predictions of MP supply, MPY, MP from RUP and MCP, and maintenance expenditure in response to the combined uncertainties of the measures taken on the farm with the 4 FEM.
Uncertainty analysis values were all reported as the percentage of difference from the value of the initial scenario.Results of uncertainty analyses are based on the proportion of change from initial value because of the large difference between animals.This method allows a certain generalization of the uncertainty.
Measurement of MPY is affected by 2 factors-milk production and milk protein concentration-and is not model dependent.The distribution of possible values of the differences from initial values is normally distributed with a median approximately equal to 0 and little to no skewness (0.07).The SD of MPY was calculated as 4.46%, which equates to the variance of the product of 2 independent variables (Goodman, 1960).Thus, a confidence zone for MPY can be created with procedures and equipment calibrations.
However, due to the multilevel factors in diet formulation, this variance for the product of 2 variables cannot be used to establish a confidence interval for the predictions of MP supply, MP for maintenance expenditure, and MPY.Uncertainty analysis results are depicted in Figure 3. Based on skewness, CNCPS, INRA_2018, and NorFor uncertainty variations can be considered normally distributed for total MP supply.Only NRC_2001 is slightly skewed and long tailed, indicated by high skewness and kurtosis.From the previous sensitivity analysis, we can hypothesize that this slight non-normality occurred because of the highly variable effect of forage maturity and from the abnormal uncertainty observed on microbial MP, as discussed subsequently.The NorFor model had the largest uncertainty, whereas INRA_2018 had the smallest SD in its variation.This smallest value is probably due to the nonlinearity of the response system of INRA_2018.The response system, as noted previously, is designed as a change from a central value, set as the potential, and the gain from increased quality of a diet or increased intake decreases as the production gets closer to this potential.As diets fed to the cows were initially balanced to meet requirements from NRC_2001, they were not designed to test a response to diet change and to evaluate the capabilities of such diet to adapt to diet changes.Based on INRA_2018 predictions, a certain proportion of MPY was underpredicted, and these points are most likely those most sensible to uncertainty.Both NRC_2001 and INRA_2018 had similar curves of probability for MP from RUP, with normal distribution with low skewness (0.4 and 0.2; graphs not shown), and respective SD of 9.1% and 9.7%.Although NorFor RUP variations can also be considered normally distributed, it had the largest SD, 17.2%.From the sensitivity analysis on MP supply, we observed that the NorFor model was more sensitive to many characteristics of the cows or feed ingredients than the other models, indicating that at least one or both of the contributors to MP, RUP and MCP, were also probably more sensitive.By contrast, CNCPS has a large kurtosis, at 3.5, but small skewness, making it slightly abnormal.In comparison, in their study on the sensitivity to CNCPS feed library changes, Higgs et al. (2015) obtained 5.2% SD variation on RUP supply.This variation is lower than the 10.6% in the present study, but DMI was fixed in their simulations, and, as shown in the sensitivity analysis, DMI plays a major role in the uncertainty of total supply.
For CNCPS, NorFor, and INRA_2018, MP from MCP can be considered normally distributed with low skewness, with SD ranging from 6.3% to 10.5%.On the other hand, NRC_2001 had its distribution skewed (−1.3), most likely related to the use of a limiting factor, either TDN or RDP, in MCP estimation.Higgs et al. (2015) calculated a 3.6% SD for microbial MP in CNCPS, but, similarly to RUP, they did not consider a variation in DMI, explaining their lower value compared with our observation of 8.5.
In the 4 models, maintenance MP expenditure variation presented little skewness.However, although NRC_2001, CNCPS, and NorFor were normally tailed, INRA_2018 had very long tails, indicated by high kurtosis.Although most of the 5th and 95th quantiles were the smallest of the 4 models, only a very few results created those long tails.When more precisely analyzed, the values outside the 0.001 and 0.999 quantiles, being ±20.8% of variation, were derived from 2 animals, in 2 different herds.Thus, the long tails were explained by 2 individual variations, which could be considered as outliers in INRA_2018, although they were not with the other models.Thus, with no skewness and no kurtosis, NRC_2001, CNCPS, and NorFor maintenance expenditure variations can be approximated as normal distributions with mean and median of 0 and SD of 5.7%, 7.2%, and 6.4% respectively.
Finally, only the distribution for MPY predictions from CNCPS and INRA_2018 can be considered normal.Reflecting observations for MP supply, INRA_2018 had the lowest variation with SD of 7.0%.The same reasoning would apply as for MP supply: DMI is affected by MP and energy intakes, to which we would add the effect of variable efficiency of protein from the response system.These 2 parameters combined seemed to induce some stability in MPY prediction.The CNCPS SD was also low, at 8.8%.The NorFor MPY prediction was slightly abnormal, with kurtosis of 2.9.In line with MP supply, the Scandinavian model had the widest repartition, with the 5th and 95th quantiles reaching −26% and 27% of initial values.The NorFor model includes a variable efficiency of utilization of MP, as indirectly done in INRA_2018, and small changes in the MP: energy ratio affected the efficiency of utilization of MP.Predicted efficiency changes also followed a normal distribution, with a mean of 0 and SD of 8% from initial values (data not shown).However, this was not the case in INRA_2018, because of limits of potential milk production and autoregulation of intake from energy and protein supply.Finally, NRC_2001 MPY prediction uncertainty was skewed to the right and also had the longest tail.In a similar fashion to MP supply, this skewness would be explained by the irregular effect of forage maturity observed in the sensitivity analysis of MP supply.Uncertainty analysis indicated that Nor-For had the highest uncertainty distributions for most outputs, and thus higher risk of error when data input in the model are of bad quality.On the other hand, INRA_2018, due to its milk/milk protein response system, had the lowest uncertainty.

Limits on Comparison
St-Pierre and Weiss (2012) have criticized the requirement-based feeding evaluation models such as NRC_2001 and CNCPS, claiming that the information they bring is often misinterpreted.As they stated, this type of model does not imply that a change in supply will induce direct changes in production, relative to requirements.Requirement-based feeding evaluation models are better at formulating diets for production objectives, as used on commercial farms, than at prediction.We should thus be cautious when we extrapolate values from supplies below or above the predicted requirements.In the current comparison, MPY predictions can be perceived as how much MPY diets fed could sustain on a general basis, ignoring animal genetic potential and farm-management or environment effects.The latter is a partial explanation of the herd effect, calculated in prediction.This is mostly true in a commercial context, where, usually, a core TMR is balanced based on the characteristics of a reference animal to cover the requirements of most animals, as suggested by Stallings and McGilliard (1984).This type of practice most likely misestimates MPY if nutrients are not "balanced" in relation to DMI and productivity, as opposed to ACF or traditional feeding, where concentrates are fed according to a target milk production for each animal.
In estimation of milk yield or MPY in requirementbased FEM, with unknown DMI or animal efficiency, using measured milk production creates a loop to estimate DMI, which in turn may bias results toward better prediction than reality.It creates a paradox, leading to the question of what drives what.Does milk yield, and by extension MPY, drive animals to eat more, or will an animal eating more produce more, or is it a mixture of both?The NRC_2001 model already addressed this issue when explaining the choice of their equation for DMI.The response approach used by INRA_2018, considering this paradox, seems promising; the integration of a more flexible effect of potential milk protein and a well-established protocol for potential production at the peak may help to improve response prediction.Finally, all results obtained in this study are only responses to mathematical equations created within each model.

CONCLUSIONS
Although milk protein prediction capabilities of the 4 models compared are fairly similar, based on concordance correlation coefficient, the Nordic model NorFor overall yielded the best predictions of MPY with data from commercial farms.The INRA_2018 and NRC_2001 models had major mean biases (under-prediction), with INRA_2018 being slightly more precise but less accurate.The underestimation with INRA_2018 seems to be linked to a systematic underestimation of DMI: although real DMI were unknown, INRA_2018 predictions were much lower than predictions from the 3 other models.The NRC_2001 and CNCPS models could benefit from inclusion of the effects of energy and AA supply on the efficiency of utilization of protein, as done by NorFor and INRA_2018.Although it has a major influence on MPY prediction, DMI estimation on commercial farms remains a challenge, to adequately evaluate model predictions.However, analyzing model behavior within a herd seems to partly negate the effect of using predicted DMI in the comparison of models; this indicates how important it is that advisors adapt the formulation program to each herd.

Figure 1 .
Figure 1.Relationship of predicted milk protein yield and measured milk protein yield from NRC_2001 (upper left; National Research Council, 2001), CNCPS (upper right; Cornell Net Carbohydrate and Protein System, 2015), NorFor (bottom left; Scandinavian model, 2011), and INRA_2018 (bottom right; INRA French model); each point represents an animal and production, and each shape and line represent a different herd.

Figure 2 .
Figure 2. Quartiles 1, 2, and 3 of predicted MP supply (top), and predicted milk protein yield (bottom) for the variations of the different characteristics for the 4 models.NRC_2001 = National Research Council, 2001; CNCPS = Cornell Net Carbohydrate and Protein System, 2015; NorFor = Scandinavian model, 2011; INRA_2018 = INRA French model.Quartile 2 (median) is represented by the colored bar, quartiles 1 and 3 by the error bars.
included in INRAtion V5 software, using measured DM, NDF, ADF, ether extract, starch, ash, and minerals, and a reference feed from INRA_2018 feed table in the Prevalim model.

Table 1 .
Binggeli et al.: PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS Variations applied on each characteristic in the sensitivity analysis

Table 2 .
Binggeli et al.: PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS Descriptive statistics of cows and farms used for analysis Lapierre et al. (2018) fed by hand as top-dress; ACF = automatic concentrate feeding as top-dress.3SixdifferentTMRin 4 herds, 17 herds with ACF, and 2 herds with manual feeding.mate,with99 g of MP/kg of DMI, followed by NorFor at 94 g/kg, NRC_2001 at 90 g/kg, and INRA_2018 at 87 g/kg.Therefore, compared with CNCPS, the lower prediction of MP supply with INRA_2018 was more due to the lower ratio of MP supply: DMI (10%) than to the lower prediction of DMI (3%).In a model comparison byLapierre et al. (2018)using 9 diets varying in protein and energy supply from literature, with reported DMI, NRC_2001 yielded the highest or secondhighest predictions, whereas INRA_2018 predicted the lowest or second-lowest predictions of these 4 FEM, in line with our on-farm results.

Table 3 .
Binggeli et al.: PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS Predictions of DMI and MP supply and expenditure (exp.) and AA supply by 4 models 1 2 MCP = microbial crude protein.3 Milk protein exp.= observed MPY/calculated efficiency.

Table 4 .
Binggeli et al.:PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS Performance of the 4 models for predictions of milk protein yield 1 2 CCC = concordance correlation coefficient

Table 5 .
Binggeli et al.:PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS Univariate residual analysis, R 2 , and sign of the regression for each model 1 Binggeli et al.: PREDICTIONS OF PROTEIN YIELD FROM FEEDING MODELS