Methodological guidelines: Cow milk mid-infrared spectra to predict GreenFeed enteric methane emissions

Various methodological protocols were tested on milk samples from cows fed diets affecting both methanogenesis and milk synthesis to identify the best approach for the prediction of GreenFeed system (GF) measured methane (CH 4 ) emissions by milk mid-infrared (MIR) spectroscopy. The models developed were also tested on a data set from cows fed chemical inhibitors of CH 4 emission [3-nitrooxypropanol (3NOP)] that just marginally affect milk composition. A total of 129 primiparous and multiparous Holstein cows fed diets with different methanogenic potential were considered. Individual milk yield (MY) and dry matter intake were recorded daily, whereas fat- and protein-corrected milk (FPCM) was recorded twice a week. The MIR spectra from 2 consecutive milkings were collected twice a week. Twenty CH 4 spot measurements with GF were taken as the basic measurement unit (BMU) of CH 4 . The equations were built using partial least squares re-gression by splitting the database into calibration and validation data sets (excluding 3NOP samples). Models were developed for milk MIR spectra by milking and on day spectra obtained by averaging spectra from 2 consecutive milkings. Models based on day spectra were calibrated by using CH 4 reference data for a measurement duration of 1, 2, 3, or 4 BMU. Models built from the average of the day spectra collected during the corresponding CH 4 measurement periods were developed. Corrections of spectra by days in milk (DIM) and the inclusion of parity, MY, and FPCM as explanatory variables were tested as tools to improve model performance. Models built on day milk MIR spectra gave slightly better performances that those developed using spectra from a single milking. Long duration of CH 4 measurement by GF performed better than short duration: the coefficient of determination of validation (R 2 V) for CH 4 emissions expressed in grams per day were 0.60 vs. 0.52 for 4 and 1 BMU, respectively. When CH 4 emissions were expressed as grams per kilogram of dry of matter intake, grams per kilogram of MY, or grams per kilogram of FPCM, performance with a long duration also improved. Coupling GF reference data with the average of milk MIR spectra collected throughout the corresponding CH 4 measurement period gave better predictions than using day spectra (R 2 V = 0.70 vs. 0.60 for CH 4 as g/d on 4 BMU). Correcting the day spectra by DIM improved R 2 V compared with the equivalent DIM-uncorrected models (R 2 V = 0.67 vs. 0.60 for CH 4 as g/d on 4 BMU). Adding other phenotypic information as explanatory variables did not further improve the performance of models built on single day DIM-corrected spectra, whereas including MY (or FPCM) improved the performance of models built on the average of spectra (uncorrected by DIM) recorded during the CH 4 measurement period (R 2 V = 0.73 vs. 0.70 for CH 4 as g/d on 4 BMU). When validat-ing the models on the 3NOP data set, predictions were poor without (R 2 V = 0.13 for CH 4 as g/d on 1 BMU) or with (R 2 V = 0.31 for CH 4 as g/d on 1 BMU) integra-tion of 3NOP data in the models. Thus, specific models would be required for CH 4 prediction when cows receive chemical inhibitors of CH 4 emissions not affecting milk composition.

spectroscopy. The models developed were also tested on a data set from cows fed chemical inhibitors of CH 4 emission [3-nitrooxypropanol (3NOP)] that just marginally affect milk composition. A total of 129 primiparous and multiparous Holstein cows fed diets with different methanogenic potential were considered. Individual milk yield (MY) and dry matter intake were recorded daily, whereas fat-and protein-corrected milk (FPCM) was recorded twice a week. The MIR spectra from 2 consecutive milkings were collected twice a week. Twenty CH 4 spot measurements with GF were taken as the basic measurement unit (BMU) of CH 4 . The equations were built using partial least squares regression by splitting the database into calibration and validation data sets (excluding 3NOP samples). Models were developed for milk MIR spectra by milking and on day spectra obtained by averaging spectra from 2 consecutive milkings. Models based on day spectra were calibrated by using CH 4 reference data for a measurement duration of 1, 2, 3, or 4 BMU. Models built from the average of the day spectra collected during the corresponding CH 4 measurement periods were developed. Corrections of spectra by days in milk (DIM) and the inclusion of parity, MY, and FPCM as explanatory variables were tested as tools to improve model performance. Models built on day milk MIR spectra gave slightly better performances that those developed using spectra from a single milking. Long duration of CH 4 measurement by GF performed better than short duration: the coefficient of determination of validation (R 2 V) for CH 4 emissions expressed in grams per day were 0.60 vs. 0.52 for 4 and 1 BMU, respectively. When CH 4 emissions were expressed as grams per kilogram of dry of matter intake, grams per kilogram of MY, or grams per kilogram of FPCM, performance with a long duration also improved. Coupling GF reference data with the average of milk MIR spectra collected throughout the corresponding CH 4 measurement period gave better predictions than using day spectra (R 2 V = 0.70 vs. 0.60 for CH 4 as g/d on 4 BMU). Correcting the day spectra by DIM improved R 2 V compared with the equivalent DIM-uncorrected models (R 2 V = 0.67 vs. 0.60 for CH 4 as g/d on 4 BMU). Adding other phenotypic information as explanatory variables did not further improve the performance of models built on single day DIM-corrected spectra, whereas including MY (or FPCM) improved the performance of models built on the average of spectra (uncorrected by DIM) recorded during the CH 4 measurement period (R 2 V = 0.73 vs. 0.70 for CH 4 as g/d on 4 BMU). When validating the models on the 3NOP data set, predictions were poor without (R 2 V = 0.13 for CH 4 as g/d on 1 BMU) or with (R 2 V = 0.31 for CH 4 as g/d on 1 BMU) integration of 3NOP data in the models. Thus, specific models would be required for CH 4 prediction when cows receive chemical inhibitors of CH 4 emissions not affecting milk composition.

INTRODUCTION
Breeding and husbandry strategies to reduce enteric methane (CH 4 ) emissions by dairy cows require the estimation of individual CH 4 emissions from a large number of animals and farms to be successful. Measuring individual daily enteric CH 4 emissions using the classical reference techniques [i.e., respiration chamber (RC) or sulfur hexafluoride (SF 6 ) tracer gas technique] has proven difficult, expensive, and not feasible on a large scale, and proxies are required (Negussie et al., 2017). Among other proxies, mid-infrared (MIR) spectroscopy on milk has been identified as promising (Dehareng et al., 2012;Vanlierde et al., 2018Vanlierde et al., , 2021 as it is rapid and low-cost and is currently routinely applied to the recording of cow milk. These authors published MIR prediction models using CH 4 reference data from RC and SF 6 methods . However, among the different CH 4 measurement techniques, the GreenFeed system (GF; C-Lock Inc.) appears to be the most appropriate for application on commercial farms, and allows a high throughput of animals, and can measure over long periods. As MIR equations need to be consolidated routinely, it is thus expected that equations based on GF data will be the easiest to be implemented in the near future, but such equations are not developed yet.
Both RC and SF 6 techniques allow us to measure the quantity of CH 4 produced continuously over 24 h, integrating the diurnal pattern of emission and allowing to calculate the cumulated daily CH 4 emission. Thus, a daily CH 4 emission value is easy to compare with the corresponding milk MIR spectra to build prediction models. However, the GF allows estimation of average daily CH 4 emissions over a period of variable duration from spot samples of breath gas taken when animals visit the GF system at different time points spread over day and night. Accordingly, several studies have been conducted to understand how to manage GF data to achieve representative and repeatable CH 4 estimation (Arbre et al., 2016;Rischewski et al., 2017;Coppa et al., 2021). As repeatability of CH 4 measurement using GF increases with long measurement duration, we hypothesize that the reliability of CH 4 prediction by milk MIR may vary as well with the duration of CH 4 measurement. However, little is known about the best GF measurement duration to be coupled to milk MIR spectra to successfully build CH 4 prediction equations. Similarly, no research has identified which spectra are better to match CH 4 measurement by GF covering several days (i.e., spectra by milking, average day spectra, single day spectra or average spectra, or during the corresponding GF measurement period). Because GF is an estimation over an extended time period, we hypothesize that an average of several spectra collected over the period might better match with the cordoning GF CH 4 measurement than a single day spectrum. Guidelines on this topic are still lacking in the literature. Furthermore, Vanlierde et al. (2016Vanlierde et al. ( , 2021 have proven that the performance of MIR prediction models for CH 4 from SF 6 and RC reference data can be improved by including in the calibration phenotypic variables [such as lactation stage, parity, milk yield (MY), and so on] describing the physiological status of the cow on the day of CH 4 measurement. However, as MY and lactation stage vary over time, we hypothesize that including such phenotypic information as additional explicative variables in CH 4 prediction models based on long period measurements as with GF might be less effective, but this aspect has yet to be proven.
Finally, the capacity to predict CH 4 from milk MIR spectra is based on the assumption that milk composition is related to CH 4 emissions. Indeed, existing models have included only data from cows fed diets affecting the methanogenesis and milk synthesis (Dehareng et al., 2012;Vanlierde et al., 2016Vanlierde et al., , 2021. Thus, a specific response when predicting CH 4 from milk MIR spectra from dairy cows supplemented with additives affecting methanogenesis only [such as the commercially available 3-nitrooxypropanol (3NOP), Kim et al., 2020;Yanibada et al., 2020] is expected. Consequently, we hypothesized that diet affecting methanogenesis but not milk synthesis may represent a possible limit of application of predictive models in the field. This has yet to be studied.
The first aim of the present work was to test various methodological protocols both in terms of GF data and milk MIR spectra to identify the best approach to predict enteric CH 4 emissions from milk MIR spectra using the GF reference. A secondary aim was to test the application of the models on milk samples from cows fed with chemical inhibitors of CH 4 emissions without affecting milk composition.

MATERIALS AND METHODS
Animal procedures were carried out in accordance with the French Ministry of Agriculture guidelines for animal research and the applicable European Union guidelines and regulations on animal experiments.

Animals and Diets
The present study was carried out using spectra and reference data from 3 different experiments carried out between 2017 and 2020. The first experiment (described in detail by Coppa et al., 2021) was conducted at the experimental dairy farm of Les Trinottières (Chambre Agriculture, Montreuil sur Loir, France) with 45 lactating Holstein dairy cows of parity ranging from 1 to 7. Cows were enrolled within the first week of lactation and randomly distributed in 3 groups following a randomized block design balanced for parity and MY during the first week of lactation. The experiment was conducted as a continuous design from wk 2 to 27 of lactation with 3 successive periods: (1) pre-experimen-tal period (wk 2 to 6); (2) dietary treatment transition period (wk 7 to 11); (3) experimental period on dietary treatments (wk 12 to 27). During the pre-experimental period, all the cows received a common diet based on corn silage, grass silage, and concentrates. Then, the 3 groups received different dietary treatments (% on a DM basis): (1) a diet based on grass silage with low starch (1%) and low lipid (3%) content; (2) a diet based on corn silage and concentrates, containing high starch (25%) and low lipid (3%) content; and (3) a diet based on corn silage and concentrates, containing high starch (24%) and high lipid (5%) content.
The other 2 experiments were carried out at the experimental farm of Herbipôle (INRAE, Marcenat, France, https: / / doi .org/ 10 .15454/ 1 .5572318050509348E12). In the second experiment, described by Pourazad et al. (2021), 56 mid-lactating multiparous Holstein cows (120 ± 46 DIM) were allocated to 4 equivalent groups, balanced for parity and MY. The experiment was conducted as a continuous design with 3 successive periods: (1) pre-experimental period (4 wk); (2) dietary treatment transition period (2 wk); (3) experimental period on dietary treatments (10 wk). During the preexperimental period, all the cows received a common diet based on hay, haylage, and concentrates. Then, 3 groups were supplemented with different phytogenic feed additives (25 g/cow per d; based on cinnamaldehyde, condensed tannins, and garlic oil) acting both on methanogenesis and milk synthesis and the fourth group, without feed additives, was used as a control.
In the third experiment, described by Saro et al. (2019), 28 lactating dairy cows were recruited within the first week of lactation and randomly distributed in 2 groups, balanced for parity and MY during the first week of lactation, in a randomized block design. Both groups received the same diet based on corn silage, hay, and concentrates, but one group was supplemented with 3NOP (60 mg/kg of DM basis) and the other with a placebo from the second lactation week. The trial lasted for 14 lactation wk (from wk 14 to 28).
All the experimental diets of the 3 experiments acted both on methanogenesis and milk synthesis (Coppa et al., 2021;Pourazad et al., 2021), except for the diet supplemented with 3NOP in experiment 3, for which only methanogenesis was affected, but not milk synthesis (Saro et al., 2019).
In all the experiments, animals were housed in freestall barns and had free access to water throughout the experiment. The barn was opened at the sides for good and natural ventilation. Cows were fed individually using an electronic gate feeding system and ear-tag identification. Feedstuffs were distributed to cows ad libitum in the form of a mixed ration, once daily after the morning milking, except for the concentrate dis-tributed by the automatic feeder of the GF. The ration offered and refusals were weighed and recorded daily throughout the experiment to estimate TMR intake. Total DMI was obtained by adding the daily amount of commercial concentrate from the GF to ration intake, corrected by their DM content, determined through oven drying (160°C for 48 h).

Methane Measurements Using GreenFeed
Methane emissions from dairy cows were individually estimated using the same 2 coupled GF units in the 3 experiments. The GF system allowed spot measurements of exhaled gases emitted by individual animals, identified by a radio frequency ear-tag, during visits to the system. Each GF was fitted with one hopper continuously filled with pelleted concentrate (4 mm diameter and 15 mm length) used as a bait to attract animals with a correct position of the head (head sensor) at least 3 min in the open-circuit head chamber. The GF instrument characteristics and settings are detailed in Coppa et al. (2021). The GF units were set (number of visits per day, duration and time interval between visits, number of concentrate drops per visit, and so on; details given in Supplemental Figure S1, https: / / figshare .com/ articles/ figure/ Visiting _profile _over _24 _h _of _GreenFeed _unit/ 21082537; Coppa et al., 2022) to achieve at least 20 spot measurements per cow in 1 wk (experiments 1 and 3) and 3 wk (experiment 2). The average over 7 to 14 d with a minimum of 20 spot samples with GF is considered the minimum period [basic measurement unit (BMU)] to produce repeatable and reliable averaged daily CH 4 emissions (Manafiazar et al., 2016); One BMU was therefore considered equal to a duration of 1 wk in experiments 1 and 3, and 3 wk in experiment 2, allowing to have at least 20 spots per cow.

Milk Sampling, Analysis, and Spectra Collection
Cows were milked twice daily and MY was individually recorded at each milking. Twice a week, an individual sample (30 mL) obtained by each milking in experiments 2 and 3 or by mixing the milk of the evening milking (50% volume) and of the following morning milking (50%) in experiment 1 was stored at +4°C with Bronopol (2-2-nitropropane-1,3-diol) for determination of milk fat and protein concentrations by MIR, following the International Dairy Federation (2000) protocol. Fat-and protein-corrected milk (FPCM) was calculated according to Gerber et al. (2011). Milk MIR spectra were collected by 2 Milkoscan FT-Plus analyzers (Foss), working within the MIR region from 5,000 to 1,000 cm −1 from Analis for experiment 1, and from Agrolab for experiments 2 and 3. The spectra were standardized among laboratories and over time through the "Optimir" standardization protocol (Grelet et al., 2017).

Reference Data and Spectra Treatment
To avoid redundancy of reference data and spectral information within an individual cow, only spectra from the weeks of lactation representative of the most important variations in milk composition reflecting physiological status or dietary changes were included in the data set: weeks of lactation 2 to 6 (experiments 1 and 3), 11 (experiment 1), 15 (experiments 2 and 3), and 27 (experiments 1 and 2). Accordingly, a total of 280 morning and 280 evening milking individual milk spectra (experiments 2 and 3) and 315 individual day milk spectra (50% vol mix of morning and evening milkings from experiment 1) were used.
Because practice spectra from both daily milkings may not always be available, a test was performed (using samples from experiments 2 and 3 only) to understand if the predictive performance is affected when using the spectra of a single milking instead of a day spectrum. In addition, for the further calibration steps, spectra by a single milking of the same day of experiments 2 and 3 were arithmetically averaged (50% vol of morning and 50% vol of evening milk) to make them homogeneous with the day milk spectra from experiment 1.
Assuming that the repeatability of CH 4 emissions with GF increases with the duration of measurement (Coppa et al., 2021), different CH 4 data, varying in the duration of the measurement period (the average CH 4 emissions on 1, 2, 3, or 4 BMU) were coupled to the last day spectrum of the BMU. The aim was to test the best GF measurement duration for setting the MIR prediction equations. Averaging data on 2, 3, or 4 BMU implied a progressive reduction in the number of samples when increasing the number of averaged BMU, as it is usual when evaluating the repeatability of CH 4 emission measurement through GF (Arbre et al., 2016;Manafiazar et al., 2016;Coppa et al., 2021). A drawback of this approach is that a different number of samples may limit the comparison of statistical performance of the models. However, the great advantage is the maintenance of the same biological variability in the reference data (CH 4 emission) within each model, without any bias due to a different number of individual cows or a different range of lactation stage. In each case, the last daily spectrum of the BMU was used.
Similarly, as GF estimates an average CH 4 emission during the measurement period, the day spectra available during 1, 2, 3, or 4 BMU were also arithmetically averaged (by averaging the absorbance at each wavelength), to understand whether an average spectrum allows better predictive performance when compared with the single day spectrum of the same measurement period.
The effect of including DIM spectral correction through a modified Legendre polynomial (Gengler et al., 1999), according to Vanlierde et al. (2015), and incorporating parity, MY, and FPCM as explanatory variables in prediction models was also tested. The correlation between the residuals of a given model and a further phenotypic variable was tested. Such variable was added to the model only if the correlation was significant, following the same procedure proposed by Vanlierde et al. (2021). Data referring to the day of milk spectra collection were used for the correction of day spectra, whereas the average over the corresponding BMU was used when day spectra were averaged. Predictive performances were compared for CH 4 emissions expressed as grams per day, grams per kilogram of DMI, grams per kilogram of milk, and grams per kilogram of FPCM. The MY and FPCM were not used as additional explanatory variables when CH 4 was expressed as grams per kilogram of milk and grams per kilogram of FPCM, respectively.

Statistical Analysis
For all the options tested to analyze prediction performance, the original data set was divided into a calibration and a validation set (the number of each depending on the total number of available spectra). Samples were randomly assigned by cow within each experimental treatment to the calibration and validation sets, making them homogeneous. As the scientific bedrock for predicting CH 4 emissions by milk MIR is an effect of more or less methanogenic diets on milk composition, cows fed with 3NOP that reduced CH 4 without affecting milk composition were segregated into a third independent data set and initially excluded from the calibration and validation. The reference data on CH 4 emissions and phenotypic variables corresponding to the samples included in data set are given in Table 1. The 3NOP data set (80 d spectra) was, however, used to validate the model developed without 3NOP samples using day spectra and CH 4 from 1 BMU to test the prediction capacity on milk exclusively from cows fed additives that reduce clearly CH 4 emissions with just minor changes in milk composition. Thereafter, 3NOP samples were included in the calibration and validation data sets, to test the effect of such sample inclusion on the predictive performance. Such a test was performed on the model developed on 1 BMU, as it was a sole scenario allowing us to maintain several 3NOP samples to perform both external validation and the inclusion of samples in the calibration and validation data sets.
The WinISI II Project Manager software, version 1.50 (Infrasoft International), was used for the statistical models. The calibrations were calculated with modified partial least square regressions (Shenk and Westerhaus, 1995). The models were built by using only the segments between 2,966 and 2,561 cm −1 , between 1,809 and 1,720 cm −1 , and between 1,577 and 968 cm −1 , according to Vanlierde et al. (2015). A maximum calculation of 16 latent variables was set for each regression, and critical values for Student's t-test of T = 2.5 were adopted to remove any calibration outliers; 2 elimination passes during the full cross-validation (randomly dividing the data set into 4 cross-validation groups) were performed, following the procedure of Coppa et al. (2017). Two different spectral correction procedures and mathematical treatments were tested (no correction or first derivative with standard normal variate and detrend mathematical treatment).The tables report the best correction procedures and mathematical treatments. The statistics used to evaluate the calibration models were as follows: the standard error of cross-validation the coefficient of determination (R 2 ) for cross-validation, the ratio of the standard deviation of the reference data to the standard error of cross-validation, the R 2 in external validation, the standard error of prediction (SEP), the slope, the bias, and the standard error of the prediction corrected by the bias (SEPC) of the validation set. Table 1 gives the descriptive statistics of enteric CH 4 emissions and phenotypic variables (parity, DIM, MY, FPCM, and DMI). Overall, the parity ranged from 1 to 7, the DIM from 8 to 228, MY from 10.1 to 53.8 kg/d, the FPCM from 11.1 to 50.0 kg/d, and the DMI from 9.2 to 34.6 kg/d, covering a wide range of physiological status due to individual cow variation, lactation stage, parity, and diet. Such variability was also reflected by the CH 4 emissions, ranging from 107 to 596 g/d, from 5.8 to 49.2 g/kg of DMI, from 3.0 to 33.8 g/kg of milk, and from 2.0 to 47.3 g/kg of FPCM, showing values quite similar to those presented by Niu et al. (2018) when referring to the average and range of a European large data set. The priority given to reach at least 20 spot measurements to set 1 BMU led to a longer duration of BMU in experiment 2 (3 wk) compared with experiments 1 and 3 (1 wk  ure S1). The longer duration of the BMU in experiment 2 may have led to a less efficient detection of high and low emissions, smoothing the result by cow. However, when comparing the range of CH 4 values from the 3 experiments (183-531, 339-595, and 151-757 g/d, respectively, for experiments 1, 2, and 3), the maximums were similar. The minimum of experiment 2 was higher, compared with the other experiments, but this was expected as cows were in mid lactation, and the lowest CH 4 emission are registered in early lactation (Vanlierde et al., 2015). As expected, the average CH 4 emission was 275 g/d for the 3NOP data set, and 398, and 372 g/d for the calibration and validation data sets, respectively. Similarly, the minimum and maximum were 107 and 422 g/d for the 3NOP data set, and 151 and 596 g/d, and 183 and 586 g/d for the calibration and validation sets, respectively. Supplementing dairy cows with 3NOP reduced CH 4 emissions by 31% (Saro et al., 2019), in line with a similar reduction by 25 to 29% shown by Kim et al. (2020) and Yu et al. (2021).

Effect of Using Spectra by Single Milking or an Average Day Milk MIR Spectrum on the Predictive Performance of Enteric Methane Emissions
During milk recording at an individual level, milk is most of the time collected and analyzed by a single milking, with sampling during morning and evening milking. However, CH 4 reference data are expressed by day. To the best of our knowledge, the literature contains no evidence of the effect of predicting CH 4 from the milk MIR spectra of single milkings compared with an average day spectrum. Table 2 gives the calibration and validation statistics of the prediction equations for daily CH 4 emissions based on MIR spectra by single milkings or on an average day MIR spectrum from experiment 2. The CH 4 emissions in grams per day were predicted from milk MIR with R 2 V of 0.51, 0.48, and 0.54, and an SEPC of 55.7, 56.3, 52.0 g/d for the morning milking, evening milking, or their average day milk spectra, respectively. Similarly, predictive performances were slightly better when using the average day milk MIR spectra instead of the spectra from a single milking when CH 4 emissions was expressed as grams per kilogram of MY and grams per kilogram of FPCM, except when expressed as grams per kilogram of DMI, for which the model performed similarly. The composition of morning milk and evening milk can differ largely, because of differences either in the time and intervals between feeding and between milking (Ferlay et al., 2010). Thus, the best performance obtained by the average day spectra is not surprising as a spectrum is coupled to daily CH 4 emissions, even more so with GF techniques that estimate an average CH 4 emission over a long period. Accordingly, only models based on average day milk MIR spectra were used for further methodological exploration in the present research.

Effect of the Duration of Enteric Methane Emission Measurement on Its Prediction from Milk MIR Spectra
As a period of several days is needed to obtain a valid CH 4 emission value (g/d) with the GF technique (Manafiazar et al., 2016), the duration of CH 4 measurement considered for the reference value could affect the reliability of the prediction equations. Table 3 gives the calibration and validation statistics of the prediction equations for daily CH 4 emissions based on the last milk day spectrum of the BMU, according to the duration of CH 4 measurement by GF. The CH 4 emissions in grams per day were predicted by milk MIR spectra with R 2 V of 0.52, 0.49, 0.55, and 0.60, and an SEPC of 61.7, 62.8, 61.6, and 61.4 g/d for a duration of CH 4 measurement of 1, 2, 3, or 4 BMU, respectively. Similarly, validation performances slightly increased with the duration of CH 4 measurement, even when CH 4 was expressed as grams per kilogram of DMI, grams per kilogram of MY, and grams per kilogram of FPCM, showing the best performance with a duration of 4 BMU. Furthermore, validation performances in such CH 4 measurement unit showed better performances compared with CH 4 expressed as grams per day whatever the BMU duration (R 2 V > 0.65). Differences in model performances may be partially due to the different number of samples included in calibration and validation according to the duration of CH 4 measurement. However, predictive performance by MIR usually increased with the number of samples used for model construction (Vanlierde et al., 2016;Coppa et al., 2017). We found the best performance for models with the lowest number of samples, suggesting that other factors than the sample size were at the origin of the differences in prediction performance. The improvement of model performances when increasing the duration of CH 4 measurement can be due to the parallel increase in the repeatability of CH 4 measurement by GF (Arbre et al., 2016;Manafiazar et al., 2016). As GF is based on spot visits, day-to-day variations in the timing of sampling and in individual animal feeding behavior (e.g., many small meals vs. a few large meals) could increase measurement estimation error when using short measurement periods (Hammond et al., 2016). Increasing the number of spot measurements improves the reliability of CH 4 emission estimation by GF in dairy cows, especially over 80 visits (Arbre et al., 2016;Coppa et al., 2021).

Effect of Averaging Day Milk MIR Spectra During for the Corresponding Period of Enteric Methane Emission Measurement on Its Prediction
According to the previous section, the best predictive performance emerged for a long duration of CH 4 measurement. During such a period, several day milk spectra can be available, but little is known of the predictive performance of models on milk MIR when using a single day spectrum (at the end of the period) or the average of all the spectra available during the CH 4 measurement period. Table 4 gives the calibration and validation statistics of the prediction equations for daily CH 4 emissions based on the average of milk spectra collected during the CH 4 emission measurement, according to the duration of CH 4 measurement by GF. The CH 4 emissions in grams per day were predicted by MIR with R 2 V of 0.53, 0.48, 0.66, and 0.70, and an SEPC of 61.0, 63.3, 52.9, and 53.1 g/d when averaging the day spectra for a duration of CH 4 measurement of 1, 2, 3, or 4 BMU, respectively. Validation performances were thus improved when averaging the spectra for a CH 4 measurement period instead of using a single day spectrum, especially for a long duration of CH 4 measurement (>3 BMU). Similar results were observed when CH 4 was expressed as grams per kilogram of DMI, grams per kilogram of MY, and grams per kilogram of FPCM. As the CH 4 measurement by GF expressed an estimation of average emissions during the measurement period, it is not surprising that using an average spectrum based on all the milk spectra available during the same period increased predictive performance, especially for long measurement periods. Changes in animal physiological status over time (i.e., pregnancy, heat events, ruminal diseases, and so on) could affect milk composition. Furthermore, external factors similar to climatic changes (i.e., occurrence of heat stress period) or barn management activities (i.e., barn cleaning, veterinary visits, insemination practices, and so on) could also affect the feeding behavior and digestive physiology of animals (Hammond et al., 2016), with effects on day milk composition and consequently on milk spectra. Thus, averaging milk spectra can reduce daily variation in milk composition and provide a better match with an average estimation of reference CH 4 data in the long term. However, as several day milk spectra over a CH 4 measurement period may not always be available, both models based on a single day spectrum and on the average of day spectra per measurement period were kept for the further methodological tests.    Vanlierde et al. (2015) showed that incorporating DIM in the milk MIR prediction model of CH 4 emissions measured by the SF 6 tracer technique increased the validation performances of the models, whereas the opposite finding was reported by Shetty et al. (2017) using "sniffer" as the reference method for CH 4 measurement. Little is known about the effectiveness of such an approach when the prediction model is based on GF data corresponding to average CH 4 emissions for a long period rather than on a specific day of lactation, or when day spectra were averaged over a period of CH 4 measurement. Table 5 gives the calibration and validation statistics of the prediction equations for CH 4 emissions, based on the correction by DIM of day milk MIR spectra or of the average of milk spectra collected during the different periods of CH 4 measurement. The CH 4 emissions in grams per day were predicted by MIR with R 2 V of 0.46, 0.59, 0.68, and 0.67, and an SEPC of 65.5, 56.6, 52.2, and 56.5 g/d with calibration on day spectra corrected by DIM and a duration of CH 4 measurement of 1, 2, 3, or 4 BMU, respectively. In general, the R 2 V were increased and the SEP were reduced by DIM correction, when comparing such models to the equivalent models run on DIM-uncorrected spectra (Table 4) irrespective of the CH 4 measurement unit. However, the improvement was not systematic for all the models. The better performances when DIM correction is applied to day spectra may derive from a lesser influence of the lactation stage on the residuals (Vanlierde et al., 2015), but the lack of improvement of some models suggested that other factors may be related to the residuals, such as parity, MY, or FPCM.

Effect of DIM Spectra Correction on the Predictive Performance of Enteric Methane Emissions
On the other hand, when comparing models developed using DIM-corrected average spectra of a CH 4 measurement period to the equivalent DIM-uncorrected ones, the CH 4 emissions in grams per day were predicted by milk MIR with R 2 V of 0.53 versus 0.55, and 0.48 versus. 0.56, and an SEPC of, 61.0 versus 59.3, and 63.3 versus 58.3 g/d, for a duration of CH 4 measurement of 1 or 2 BMU, respectively (Tables 5 and 3). When spectra were averaged during a CH 4 measurement duration of 3, or 4 BMU, the R 2 V were higher and the SEP were lower for the model using DIM-uncorrected spectra instead of DIM-corrected spectra (0.66 vs. 0.62, and 0.70 vs. 0.68, and an SEPC of 52.9 vs. 54.8, and 53.1 vs. 55.5 g/d, 3, or 4 BMU, respectively). The DIM correction on the spectra averaged during a CH 4 measurement period improved model performances only for short durations. This is not surprising, as on long duration several external factors and variation in animal physiologic status may interfere, as discussed previously, reducing the importance of DIM as explanatory factor for CH 4 emission. Furthermore, when CH 4 was expressed as grams per kilogram of DMI, grams per kilogram of MY, and grams per kilogram of FPCM, models based on spectra averaged during any duration of CH 4 measurement were marginally or negatively affected by DIM correction, showing similar or lower R 2 V and higher SEPC. However, the MIR DIM-uncorrected or DIM-corrected predictive model developed by Vanlierde et al. (2015) on RC and SF 6 data showed a very close statistical performance in cross-validation, but the DIM-corrected ones better reflect biological processes that drive CH 4 emissions, particularly when externally validated on large independent data sets. The DIM-uncorrected model we developed was tested in external validation, so this problem should have been taken into account.
Based on our results, the spectra correction for lactation stage in some cases may maintain its relevance in improving model performance, even for long durations of CH 4 measurement, as spectra are still from a precise day, but averaging over a long duration of CH 4 measurement without DIM correction gave better predictive performances. Our findings appear in agreement with both the significance of DIM correction in improving models shown by Vanlierde et al. (2015) when working on day spectra and with the lack of improvement with DIM inclusion found by Shetty et al. (2017) who included in the data set a large part of spectra averaged over 2 to 6 wk.

Effect of Inclusion of Phenotypic Information as Explanatory Variables on the Predictive Performance of Daily Methane Emission Models
Some authors have highlighted the effectiveness of including further phenotypic variables (such as MY, parity, breed, and so on) to improve MIR predictive performances of milk for CH 4 emissions in dairy cows using "sniffer," RC and SF 6 tracer techniques as reference methods (Shetty et al., 2017;Vanlierde et al., 2021). Taking into account the performance of the model developed on day spectra and on averaging the spectra for a given duration of CH 4 measurement by GF, we tested the effect of the inclusion of parity, MY, and FPCM on the calibration performance of models conceived under 2 scenarios: having (1) several spectra or (2) just one day spectrum during a long duration of CH 4 measurement ( Table 6). The CH 4 emissions in grams per day were predicted by MIR with R 2 V of 0.67, 0.58, 0.58, 0.60, and 0.59, and an SEPC of 58.4, 62.5, 62.6, 61.8, and 61.6 g/d when calibrating on DIM-corrected day spectra with a duration of CH 4 measurement of 4 BMU and including parity, MY, FPCM, parity + MY, and parity + FPCM, respectively. The R 2 V were Models developed applying the first derivative transformation and the standard normal variate and detrend to the spectra; models without a superscript were developed without any mathematical treatment. similar or lower and the SEP were similar or higher, when comparing such models to the equivalent models run without the inclusion of phenotypic information as explanatory variables (R 2 V = 0.67, SEPC = 56.5 g/d; Table 5). Similar considerations can be made when CH 4 was expressed as grams per kilogram of DMI, grams per kilogram of milk, or grams per kilogram of FPCM. Shetty et al. (2017) also found no model improvement when adding lactation stage, parity, and MY to the model (based on the "sniffer" technique CH 4 reference data). The loss of performance of models based on day spectra when including MY, or FPCM, may suggest that DIM correction includes information related to DIM-dependent changes in other phenotypic variables. This should not be the case for parity (not affecting model performance), which is biologically complementary to DIM, but that did not seem to give supplementary information in our study when calibrating models on DIM-corrected day spectra. However, this was not observed by Vanlierde et al. (2021), who found minor model improvements when MY and parity were added to DIM-corrected spectra using RC and SF 6 reference data.
On the other hand, the CH 4 emissions in g/d were predicted by milk MIR with R 2 V of 0. 73,0.72,0.72,0.72,and 0.72,and an SEPC of 48.4,50.5,50.4,49.5,and 49.6 g/d when averaging the day spectra for a duration of CH 4 measurement of 4 BMU and including parity, MY, FPCM, parity + MY, and parity + FPCM, respectively ( Table 6). The R 2 V were increased and the SEP were reduced when comparing such models to the equivalent models run without the inclusion of phenotypic information as explanatory variables (R 2 V = 0.70, SEPC = 53.1 g/d; Table 5). These findings suggest that DIM would be highly informative when measuring CH 4 emissions day by day (as with RC or SF 6 techniques), especially at the beginning of lactation (Vanlierde et al., 2015), whereas parity, MY, or FPCM may be more informative than DIM when spectra are averaged over a long measurement duration.
Among phenotypic variables, the best performing model for CH 4 prediction as grams per day was obtained by including parity, both for models built on DIM-corrected day spectra and for models built on the average DIM-uncorrected spectra of 4 BMU. In this last case, however, the differences in performance between models including parity or other phenotypic variables were very small. Parity allows in particular to correct for the lower CH 4 emissions in grams per day of primiparous versus multiparous cows (Coppa et al., 2021). However, primiparous cows also had lower DMI, MY, and FPCM, suggesting that the additional information given may be partially redundant. Indeed, when expressing CH 4 as grams per kilogram of DMI, grams per kilogram of milk, or grams per kilogram of FPCM, including parity as an explanatory variable did not improve model performance, but including MY or FPMC increased R 2 V and reduced SEPC. The prediction improvement by adding MY or FPCM is not surprising, as the relationship between MY and CH 4 emissions expressed as grams per day is well known (Niu et al., 2018). This prediction improvement suggests that MY or FPCM is reflected by spectral data thanks to a dilution effect with increasing MY for the majority of milk components that share ruminal metabolism with CH 4 (Dehareng et al., 2012). The detection of changes in milk constituent, concentration, and composition related to MY also seems to be confirmed by the slightly better performance of models built on CH 4 expressed as grams per kilogram of FPCM rather than grams per kilogram of MY and by the slightly better performance when including FPCM instead of MY as explanatory variable. Including both parity and MY or FPCM did not further improve the models, once again suggesting the redundancy of the additional information given by such variables when considering average CH 4 emissions estimated by GF of long duration.

Effect of Including Samples From Cows Supplemented with 3NOP on the Predictive Performance of Enteric Methane Emissions
As milk MIR prediction capacity for CH 4 emissions is based on changes in milk composition related to CH 4 emissions thanks to actions targeted on common pathways in ruminal metabolism, little is known of the predictive response on spectra from cows supplemented with feed additives affecting methanogenesis without changes in milk composition, as already reported with 3NOP (Saro et al., 2019;Yanibada et al., 2020;Kim et al., 2020). To illustrate possible limits of in-field predictive mode applications in the case of milk from a diet affecting methanogenesis but not milk synthesis, the data set of samples only from cows supplemented with 3NOP was used to validate the model developed on day milk spectra for a duration of CH 4 measurement of 1 BMU not including 3NOP samples. The R 2 V was substantially lower and the SEPC was higher when compared with the validation performance on the data set not including 3NOP samples (0.13 vs. 0.52, and 67.0 vs. 61.7g/d, respectively; Tables 7 and 3). Similar poor performances in validation were observed for CH 4 expressed as grams per kilogram of DMI, grams per kilogram of milk, or grams per kilogram of FPCM. However, poor predictive performance in calibration and in validation (R 2 V = 0.31 and SEP = 78.5 g/d; Table 7) was also observed when a new model was built including in the calibration part of the 3NOP database  (Dehareng et al., 2012). Thus, at the current knowledge, the application of CH 4 predictive models by MIR should not be applied to milk samples derived from diets affecting only CH 4 emissions but not milk composition.

CONCLUSIONS
Our study shows that the calibration of MIR predictive models on cow milk for CH 4 emission data from GF requires specific reference data and management of spectra. As GF techniques measure an average CH 4 emission over a period and not on a specific day, long duration of CH 4 measurement by GF is required to optimize MIR predictive performances. Ideally, it would be better to obtain the spectra to be coupled to GF reference data by averaging several spectra collected throughout the period of CH 4 measurement by GF. It would be preferable to use a day spectrum from the average spectra of 2 consecutive milkings, instead of the spectrum of a single milking. If only a day spectrum is available during a CH 4 measurement period, correcting day spectra by the lactation stage increased predictive performance to close to those obtained with the average spectra collected during the measurement period. However, based on the data sets available, adding phe-notypic information as additional explanatory variables did not further improve the performance of models built on day DIM-corrected spectra. On the other hand, adding MY or FPCM improved the performance of models built on the average of spectra (uncorrected by DIM) recorded during the CH 4 measurement period, giving the best predictive performance. Specific models would be required to achieve reliable prediction on samples from cows receiving dietary treatments that decrease CH 4 emission without affecting milk composition.

ACKNOWLEDGMENTS
The dataset used in this work comes from 3 trials on dairy cows carried out in the framework of 3 different collaborative projects led by INRAE and co-funded by (1)