The use of milk Fourier-transform mid-infrared spectroscopy to diagnose pregnancy and determine spectral regional associations with pregnancy in US dairy cows

Accurate early diagnosis of pregnancy is important for timely reproductive management of dairy farms. Fourier-transform mid-infrared (FT-MIR) milk spectral data are routinely used for determining milk components such as fat and protein, whereas milk composition is known to change with advancing stages of pregnancy. The objectives of this study were to compare partial least squares discriminant analysis (PLS-DA) and a Bayesian variable selection regression model (BayesC) for the diagnosis of pregnancy status (PS) from milk FT-MIR data and to infer any spectral regions that might be highly associated with PS at various stages of pregnancy. Conception dates on confirmed pregnant cows were obtained from Holstein cows within 123 herds in Michigan, Ohio, and Indiana during 2018 and 2019. Milk samples from these pregnant cows at 7 different stages of pregnancy were case-control matched to open contemporary herd mates to be within the same stage (±10 d for days in milk) of lactation for the same milk sample test date. The FT-MIR data were obtained for all of these milk samples. Ten-fold herd-independent cross-validation was used to compare PLS-DA versus BayesC using the area under the receiver operating characteristic curve (AUC). The BayesC model demonstrated higher mean AUC compared with PLS-DA at all stages exceeding 60 d of pregnancy. The mean BayesC AUC at stage 1 (1–30 d) was 0.58 ± 0.02, which was superior to a random guess (AUC = 0.50) yet too low to be of practical use. The mean BayesC AUC at stage 7 (≥180 d) was 0.13 greater compared with that of stage 1 (1–30 d) and 0.07 to 0.10 greater compared with stages 2, 3, 4, 5, and 6 (31–180 d in 30-d increments). The mean AUC of stages 2 to 6 were 0.03 to 0.06 greater compared with stage 1 yet again too low to be of practical use. Because of high multicollinearity between many adjacent wavenumbers,


INTRODUCTION
Fourier-transform (FT) mid-infrared (MIR) spectroscopy is a relatively inexpensive, fast, and accurate method used to analyze milk samples for components such as fat, protein, and lactose (ICAR, 2017).The FT-MIR absorbance readings at individual wavenumbers within the MIR region are due to stretching and bending of specific chemical bonds present in milk, thereby helping characterize its chemical composition.There has been substantial research on MIR spectra for predicting important phenotypes including methane emissions (Vanlierde et al., 2018), ketone bodies (Grelet et al., 2016), fatty acids (Soyeurt et al., 2006;De Marchi et al., 2014), total protein, casein, β-LG, whey protein, and glycosylated κ-CN (Bonfatti et al., 2011).
An early and accurate diagnosis of pregnancy status (PS) is important for timely reproductive management The use of milk Fourier-transform mid-infrared spectroscopy to diagnose pregnancy and determine spectral regional associations with pregnancy in US dairy cows in livestock (Balhara et al., 2013) and is considered to be an ethical issue regarding the culling of pregnant livestock (More et al., 2017).Pregnancy diagnosis is routinely conducted using rectal palpation, ultrasonography, and analysis of pregnancy-associated glycoproteins in milk or blood between 26 and 40 d after insemination (Reese et al., 2018).During pregnancy, milk yield typically declines and milk composition changes with advancing stages of gestation due to altered nutrient partitioning between various physiological functions (Penasa et al., 2016).As this should also be reflected in changes in milk spectral absorbance at various wavenumbers, some researchers have recently focused their attention toward diagnosis of PS using FT-MIR data on milk since this does not require any extra phenotyping and animal handling costs (Toledo-Alvarado et al., 2018;Delhez et al., 2020;Brand et al., 2021).Toledo-Alvarado et al. (2018) investigated the diagnosis accuracy of PS from raw milk spectral data in Italian dairy cattle and determined, via a random 10-fold cross-validation, a mean area under the receiver operating characteristic curve (AUC) of 60%, albeit across a broad expanse of pregnancy stages.Delhez et al. (2020) highlighted the use of FT-MIR data to diagnose PS after 150 d of gestation in Australian Holsteins with an average AUC of 78% obtained through random cow independent cross-validation; nevertheless, diagnosing pregnancies this late might be far too late to be useful for management decisions.Wang and Bovenhuis (2019) demonstrated with a dramatic illustration how naive random cross-validation strategies could result in overly optimistic predictions from MIR spectra to predict methane emissions.They recommended block cross-validation strategies whereby all animals within a contemporary group or herd are kept together within the same training or testing data sets and which we refer to as herd-independent crossvalidation.Naïve random cross-validation strategies can be suboptimal because they often fail to recognize the dependence structures (i.e., within herds) that persist as dependencies in the residuals between cows in training sets with their contemporary herd mates in testing sets.A good general discussion of this problem within other contexts is also provided in Roberts et al. (2017).Herd-independent cross-validation seems far more appropriate for assessing the potential of a new diagnostic for dairy industry applications because it should more closely reflect out-of-sample predictive ability of milk spectral data for PS on cows within herds that are not part of the research study in question.
Relative to, say, the tens or hundreds of thousands of SNP markers used currently in genomic evaluations (Wiggans et al., 2017), milk infrared spectra are low dimensional (~1,000 wavenumbers).Furthermore, ab-sorbances on neighboring wavenumbers are often highly correlated such that the effective number of wavenumbers is substantially less (Rovere et al., 2021).The vast majority of FT-MIR studies predict phenotypes using partial least squares (PLS) for continuous responses and PLS discriminant analysis (PLS-DA) for binary outcomes (Bresolin and Dórea, 2020).Bayesian regression models originally intended for high dimensional regression of traits on genotypes (Meuwissen et al., 2001) have also been used to predict responses on various traits using FT-MIR spectra (Toledo-Alvarado et al., 2018).In a recent review, Bresolin and Dórea (2020) reported that, compared with PLS, Bayesian variable selection models such as BayesB (Meuwissen et al., 2001) or BayesC (Habier et al., 2011) can result in improved prediction accuracies when using milk spectral data to predict complex phenotypes.Ferragina et al. (2015) reported that BayesB outperformed PLS methods for predicting various fatty acids, whereas El Jabri et al. ( 2019) demonstrated that PLS generally outperformed Bayesian regression models for cheese-making properties.Furthermore, Bonfatti et al. (2017) reported that the differences in prediction accuracies between PLS and Bayesian methods depended upon the trait.However, formal tests of difference in performance between these 2 classes of methods are lacking in most of these studies.There may be considerable merit to using Bayesian over PLS methods as many Bayesian methods involve both shrinkage estimation and variable selection, whereas PLS is a dimension reduction procedure that often requires judicious deletion of wavenumbers to avoid suboptimal predictions on the trait of interest (Bresolin and Dórea, 2020).The comparison of PLS-DA and Bayesian methods for the diagnosis of PS using MIR spectra has not been extensively studied.
Joint analyses of the effects of all wavenumbers together can improve precision and account for population structure in a manner similar to that advocated for GWAS (Kang et al., 2008).That is, associations between individual wavenumbers and phenotypes may be biased by background management strategies and dietary rations if associations between wavenumber absorbances and the trait of interest are estimated one wavenumber at a time as is often done (e.g., Toledo-Alvarado et al., 2018).On the other hand, it has also been demonstrated in GWAS that SNP-based associations that simultaneously fit all other SNP markers can be badly maligned by a high degree of local multicollinearity involving adjacent SNP markers (Fernando et al., 2017).That is, in a joint association study involving FT-MIR wavenumbers, any single wavenumber may not display a strong association with the trait after adjusting for the effects of highly correlated neighboring wavenumbers.Hence, it seems then not surprising that various  (Lainé et al., 2017;Toledo-Alvarado et al., 2018) have reported single wavenumber associations with PS based on one wavenumber at a time analyses.Lainé et al. (2017) determined that the standardized solutions of coefficients are greater for days pregnant compared with milk production traits (milk yield, fat and protein contents) within some spectral regions when using spectral absorbance measures of individual wavenumbers as the response variables.
One strategy to mitigate this multicollinearity problem in variable selection models and adaptively reduce the number of hypotheses to be tested is to aggregate or cluster wavenumbers into windows based on correlation.Ghosh and Ghattas (2015) demonstrated the utility of determining joint inclusion probabilities for windows of correlated covariates, rather than the marginal inclusion probabilities of individual covariates, in a Bayesian model application that also involved infrared spectral data.Windows-based strategies have been similarly advocated for GWAS (Chen et al., 2017) and, most recently, for associating MIR regions with fatty acid composition in Michigan Holsteins (Rovere et al., 2021).
The objectives of this study were (1) to compare PLS-DA versus Bayesian methods, specifically BayesC, for diagnosing pregnancy using milk MIR spectral data at different various of gestation in large Midwest US dairy herds using herd-independent cross-validation, and (2) to identify specific wavenumber regions that are potentially important for the diagnosis of pregnancy at various stages after conception.

Field Data Retrieval and Editing
Milk spectral data and records of insemination were obtained from the Dairy Records Management Systems (www .drms.org) on Holstein cows from herds located in Michigan, Ohio, and Indiana.Data were collected in 2018 and 2019 from 123 herds milking at least 100 cows on a DHI test-day.Milk samples were sent to Central Star Cooperative Inc. (Grand Ledge, MI) for analysis using 2 FT-MIR spectrometers (Bentley Instruments NexGen Series FTS Combi machine).The milk FT-MIR absorbances were collected on 899 spectral wavenumbers ranging from 649 to 3,999 cm −1 .
The absorbance data for each wavenumber was scaled to a mean of zero and variance of one.A separate principal component analysis was performed on spectral absorbance data within each herd.The Mahalanobis distance was calculated from the first 15 principal components as these scores generally explained more than 95% of the spectral absorbance variance.
Outlier determination was based on comparing these Mahalanobis distances to a chi-squared distribution with degrees of freedom 15 using a type I error rate of 5% and Bonferroni-adjusted for the number of samples within each herd.Potential spectral outliers (11,936 out of 1,600,760 records) were removed because of this edit.
All cows that became pregnant during the study period were first retrieved from the database.Pregnancy was confirmed by requiring a subsequent calving record to be within 3 standard deviations (265-295 d) of the typical Holstein gestation length (Vieira-Neto et al., 2017) after the matching insemination date, thereby leading to confirmed pregnancies on 28,242 cows.We subsequently matched 94,728 milk samples from conception up until 275 d of pregnancy on these 28,242 cows.Each of these 94,728 milk samples were subsequently matched with milk samples from an open cow within the same contemporary group (herd-test-day) and such that the open cow's corresponding DIM was specified to be within a range of ±10 d of the milk sample from the pregnant cow.This case-control matching was used to avoid potentially confounding biases due to the effect of stage of lactation, i.e., energy balance, when inferring spectral data associations with pregnancy.The PS, defined as pregnant (PS = 1) or open (PS = 0), was determined for each test date by matching insemination records with subsequent calving dates.
As milk components change with advancing pregnancy (Roche, 2003;Penasa et al., 2016), a stronger FT-MIR based signal should be expected with later stages of pregnancy (Lainé et al., 2017;Delhez et al., 2020).Hence, the data set was further partitioned into 7 different stages of pregnancy (i.e., days after conception) in a manner similar to that considered by Delhez et al. (2020).These 7 different stages of pregnancy were 1-30 d, 31-60 d, 61-90 d, 91-120 d, 121-150 d, 151-180 d, and ≥181 d as further summarized in Table 1.Separate analyses were conducted within each stage, again recognizing that the pregnancy signals, and hence associations with milk spectral data, could vary over gestation.

Spectral Data Preprocessing Strategies
We considered preprocessing of spectral data using several different strategies advocated previously in the literature.One such strategy is piecewise direct standardization (PDS) based on the use of principal component regression as illustrated by Grelet et al. (2015).We conducted principal component regression of wavenumber absorbances from one spectrophotometer ("slave") onto the other ("master") using 42 different sets of calibration samples from Eastern Laboratory Services (Medina, OH) over the course of the study period, each set based on 24 raw milk samples.We were able to identify these calibration samples and their FT-MIR spectra from the spectrophotometers at various intervals ranging from one week to 3 mo.Using the raw absorbance data as provided, we adapted the approach of Grelet et al. (2015) based on sliding windows of 5 adjacent wavenumbers to do a 4-component principal component regression with the exception of the first 2 (649 and 652 cm −1 ) and the last 2 wavenumbers (3,995  and 3,999 cm −1 ) for which a simple linear regression was conducted.Spectral data from commercial milk samples were then calibrated using the principal component regression-based prediction equations to the respective calibration samples to which they were closest in time.This standardization is intended to reduce systematic bias caused by the use of different machines and potential machine drift over time (Grelet et al., 2015).
Derivatives have also been used to remove baseline variability (Rinnan et al., 2009) as an alternative to the use of PDS in dairy science research (Delhez et al., 2020).Hence, we also considered the use of first-derivative and second-derivative processing of raw spectra based on the smoothing method of Savitzky and Golay (1964).We specifically used the prospectr package (Stevens and Ramirez-Lopez, 2020) in R (2020; https: / / www .r-project .org/ ) based on a window size of 7 and polynomial degree of 3. Our final choice of data preprocessing (i.e., leaving spectral absorbances as is, PDS, or the use of first or second derivatives) was based on a 10-fold herd-independent cross-validation strategy separately within each stage using the computationally expedient PLS-DA model.In all cases, spectral data for each wavenumber was rescaled to a mean 0 and variance 1 before PLS-DA.
To formally compare these different preprocessing strategies, a separate ANOVA for each stage was fitted to AUC (defined later) with data preprocessing strategy as a fixed effects factor and fold as a random effects factor.The best fitting data preprocessing strategy was based on a formal post-ANOVA mean AUC comparison using Tukey's test.

PLS Analyses
The PLS-DA method within the caret package (Kuhn, 2008) in R was used to classify PS with milk spectral absorbance records after insemination being the independent variables.Herd-independent cross-validation was performed, which is described later in detail.The AUC was used to fine tune the model with respect to the number of components with the maximum number of PLS-DA components set to 30 to prevent overfitting.

BayesC
Our analysis was based on a Bayesian generalized linear model with a probit link function.The probability Pr(y i = 1|X i ) of a confirmed pregnancy (y i = 1) on cow i was defined as where Φ(.) represents the cumulative distribution function of the standard normal distribution.Also, β 0 is an intercept, X ij are the scaled (i.e., to a mean of 0 and variance 1 by wavenumber j = 1, 2, ..., 899) absorbances for the milk sample from cow i, β j is the effect of wavenumber j such that β ability π and β j = 0 with probability 1 − π with π being the prior probability of a nonzero wavenumber effect.Furthermore, σ β 2 is the variance for nonzero wavenumber effects.This model can be implemented as the binary BayesC model using the BGLR package (Pérez and de los Campos (2014) in R. We chose BayesC as opposed to the more popular BayesB method since BayesC is not likely to be sensitively different from BayesB for binary data yet is nearly half as computationally costly for each Markov chain Monte Carlo (MCMC) cycle.This additional cost for BayesB is because of the additional data augmentation required to sample wavenumber-specific variances at each cycle (Meuwissen et al., 2001) leading to Student t distributed, rather than normally distributed, nonzero wavenumber effects.
In BayesC, the prior proportion of nonzero effect (π) is assigned a Beta prior such that π ~Beta (p 0 , π 0 ) with p 0 = 10 representing the prior count and π 0 = 0.2 representing the prior mean using the parameterization of Pérez and de los Campos (2014).The prior density of the variance component σ β 2 is specified to be a scaled inverse chi-squared density with hyperparameters df0 and s0.We specified df0 as 15.6 and s0 as 0.33 based on prior experiences with the data to facilitate reasonable mixing with the MCMC algorithm invoked with the BGLR R package yet allow a fairly diffuse prior.A total of 240,000 MCMC iterations were conducted with the first 60,000 iterations discarded as burn-in and a subsequent thinning interval of 10 iterations such that 18,000 MCMC samples were used to estimate posterior means.Convergence of the models was checked using visual inspection of trace plots and by calculating the effective sample size of the slowest mixing parameters, π and σ β 2 , post burn-in using the CODA package (Plummer et al., 2006) in R. The number of MCMC iterations were chosen to ensure that effective sample size of these parameters was greater than 100.

Cross-Validation
To evaluate the predictive ability of the models, a stratified 10-fold herd-independent cross-validation scheme was used to split the data into training (~90% of herds) and testing sets (~10% of herds) separately within each pregnancy stage such that the distribution of herd sizes was similar between all 10 testing sets.The cross-validation assessment of model performance was evaluated using the AUC.A predicted probability for pregnancy (PPP) of 0.5 is typically used as the threshold for classification (i.e., those animals with PPP ≥0.5 are classified as pregnant whereas those animals with PPP <0.5 are classified as open).The ROC curve plots the true positive rate against the false positive rate as the PPP threshold changes from 0 to 1, whereas the AUC represents the area underneath this ROC curve.An AUC of 0.5 implies a random guess [i.e., the method is not effective (Zou et al., 2007)].The AUC was estimated using training data-based predictions to predict testing set phenotypes using the pROC package (Robin et al., 2011) in R.
Paired t-tests of PLS-DA-based assessments of AUC between noncalibrated, PDS-calibrated, first derivative and second derivative processed spectral data, blocking on cross-validation folds, was performed separately for each of the 7 pregnancy stages.The best performing calibration was then used to compare PLS-DA and Bayesian regression model predictions based on a paired t-test of AUC for each of the 7 pregnancy stages.

Posterior Probability of Association
Marginal Associations.The BayesC model is a Bayesian variable selection model such that one can readily estimate the probability of association (PPA) of any wavenumber j with PS by simply determining the proportion of all MCMC samples that are nonzero for that particular wavenumber effect β j .In other words, the PPA for each wavenumber j (i.e., PPA j ) was estimated as , where N denotes the number of MCMC cycles saved for posterior inference, and β j(l) is a draw from the full conditional distribution of β j at MCMC cycle l.Also, I(.) is the indicator function which is equal to 1 if the condition within the parentheses is true; otherwise it is equal to 0. These PPA were determined based on separate analyses of the data sets for each stage of gestation.Joint Associations.Recent statistical research (Ghosh and Ghattas, 2015) has indicated that estimated PPA for individual covariates (i.e., wavenumbers) can be badly misleading when these covariates are highly correlated with each other, as is generally true for MIR data.They advocated that any Bayesian variable selection involving associations of responses with highly correlated covariates be based on windows of joint (i.e., multiple covariate) rather than marginal (i.e., single covariate) inferences.
Windows of highly correlated wavenumbers were created using a spatially constrained hierarchical agglomerative clustering (HAC) technique whereby highly correlated and adjacent wavenumbers are adaptively clustered.This HAC was implemented using the adjclust package in R with details provided in Ambroise et al. (2019).The number of windows were determined using the slope heuristic method described in Ambroise For each pregnancy stage, we also determined the posterior mean of the proportion (s 2 ) of the total variance on the underlying latent scale due to spectral data associations in a manner not different from that determined for heritabilities in genomic studies except that spectral data absorbances rather than SNP genotypes are used as markers (Lehermeier et al., 2017).Denote the spectral absorbance merit of animal or record i sampled at MCMC cycle l as s where X j denotes the average spectral absorbance for wavenumber j in the corresponding data set.This further simplifies to because absorbances were standardized to a mean of 0. Noting that the residual variance is equal to 1 on the underlying latent variable scale for a probit link model, we define the "spectral repeatability" or s 2 sampled in MCMC cycle l to be s l ( ) 1 such the mean and standard deviation of 2 across the N MCMC cycles are the corresponding posterior mean and standard deviation estimates for s 2 .

Data Characterization and Summary Statistics
Descriptive statistics for pregnant and open records are presented in Table 1.Here, milk samples at various test dates are obtained from multiparous cows at various stages of lactation (35-420 d).Mean DIM at the milk test naturally increased from early (i.e., stage 1) to later (i.e., stage 7) stages of pregnancy.Nevertheless, as indicated previously, mean DIM of pregnant and matching open cows within each pregnancy stage were in the range of 10 DIM because of the case-control matching strategy described previously, although our sorting and matching algorithm did lead to a slightly higher mean DIM for pregnant animals.This DIM pairing within each pregnancy stage was used to avoid the partial confounding of lactation stage with pregnancy stage as milk components change across lactation stages (Mayeres et al., 2004;Delhez et al., 2020).Roughly 70% of pregnant cows had repeated records across the pregnancy stages, whereas a small percentage (5-7%) of cows had multiple records within pregnancy stages.

Assessment of Spectral Data Processing Strategy
Before formally comparing PLS-DA and Bayesian methods for their prediction of PS, we set out to determine the most appropriate preprocessed form for the spectral data for use in that comparison using just PLS-DA for computational reasons.
We determined that the mean AUC using raw spectral data did not typically differ (P > 0.05) from mean AUC using either first or second derivative processed spectra but was often greater (P < 0.05) than the mean AUC of PDS-calibrated spectral data at several different stages (2, 3, 4, 6, and 7) of pregnancy (Supplemental Web File S1, https: / / doi .org/ 10 .6084/m9 .figshare.15017046;Khanal and Tempelman, 2021).Hence, to simplify and for ease of interpretation, all subsequent results reported in this paper are based on raw unprocessed spectral data, albeit, again, standardized to a mean 0 and variance 1.In extensive literature reviews, De Marchi et al. ( 2014) and Bresolin and Dórea (2020) reported mixed results in assessing the prediction performance of various forms of spectral data preprocessing from a large number of different studies, suggesting no clear evidence of spectral data preprocessing affecting prediction performance.

Comparison of Models for Diagnosing Pregnancy
We compared the cross-validation performance of PLS-DA and BayesC based on whole unprocessed spectra after rescaling absorbances for each wavenumber to a mean 0 and variance 1.The cross-validation prediction accuracies (as measured by AUC) at different stages for PLS-DA and BayesC are presented in Table 2.The BayesC method demonstrated higher mean AUC (P < 0.05) compared with PLS-DA at all stages except for nonsignificant differences at the first 2 stages.Estimated AUC (~0.58 ± 0.02) for both methods at stage 1 (1-30 d) were particularly low.Nevertheless, there was evidence that the mean AUC were greater than 0.50 (i.e., a random guess) for every stage and for both methods (P < 0.0001).In a recent review paper, Bresolin and Dórea (2020) reported that Bayesian models often have better prediction properties compared with PLS-DA methods for some complex phenotypes.Some previous comparisons (Ferragina et al., 2015;Bonfatti et al., 2017;El Jabri et al., 2019;Grelet et al., 2020) have determined that Bayesian methods that incorporate variable selection outperform PLS-DA if whole spectra are used, whereas comparable prediction properties have been noted if various regions known to be highly sensitive to water absorption are removed.Our results demonstrated better predictive performance of BayesC for a categorical trait (PS).Bayesian variable selection methods, such as BayesB and BayesC, discount covariates that are clearly unimportant (de los Campos et al., 2013), whereas the dimension reduction strategy used in PLS-DA can be suboptimal when all spectral data are included (Visentin et al., 2015;Bonfatti et al., 2017).However, as recently noted by Rovere et al. (2021) andTiplady et al. (2019), strong signals of association are often determined in nominal water regions.
For comparisons between different stages of pregnancy for predictive performance and for wavelength associations, we focused on estimates from BayesC because it had superior performance compared with that of PLS-DA as previously noted.Mean AUC were unsurprisingly larger for later stages of gestation.From Table 2, estimated AUC (0.71 ± 0.02) at stage 7 (≥180 d) was 0.13 greater (P < 0.0001) compared with that of stage 1 (1-30 d) and 0.07 to 0.10 greater (P < 0.0001) compared with stages 2, 3, 4, 5, and 6 (31-180 d).The mean AUC of stages 2 to 6 did not vary substantially from each other but were 0.03 to 0.06 greater (P < 0.001) compared with stage 1.
The increasing trend of mean AUC across advanced stages of pregnancy in our study agreed with Delhez et al. (2020), although the mean AUC for all stages were lower in our study.This difference between the 2 studies may be due to greater number of commercial herds (123) varying widely in their management conditions as well as the herd-independent cross-validation strategy used in our study, whereas a random cow independent cross-validation was conducted using a much smaller number (19) of herds by Delhez et al. (2020).Wang and Bovenhuis (2019) reported large differences in prediction accuracies between random versus herdindependent cross-validation and suggested that random cross-validation underestimates the error rate of the prediction equation when cows from the same batch (i.e., herd) are included in both training and validation sets and there are systematic differences between batches.Herd-independent cross-validation recognizes that each herd has its own independent management and feeding systems which influence milk components (Chilliard et al., 2007) and potentially then the association between milk spectral data and pregnancy.This cross-validation strategy would be most pertinent for inferring the effectiveness of using FT-MIR data to diagnose pregnancy within herds not included in this study.Herd-independent cross-validation strategies have not been seemingly considered in previous studies involving the use of spectral data to diagnose pregnancies in dairy cattle (Toledo-Alvarado et al., 2018;Delhez et al., 2020;Brand et al., 2021).Brand et al. (2021) recently reported impressive prediction sensitivities and specificities exceeding 85% for pregnancy diagnosis using deep learning methods; however, they also reported sensitivities and specificities that were almost as impressive (82%) using PLS-DA.Conversely, we observed sensitivities ranging from 54% to 65% and specificities ranging from 58 to 64% from stages 1 to 7 using PLS-DA (results not reported), whereas we determined sensitivities and specificities for BayesC to be only marginally better.The difference in PLS-DA performance between the Brand et al. (2021) study and our study could be attributed to several factors.First, as with many other studies, the Brand et al. cross-validation scheme was based on purely random partitioning of training and validation data sets, whereas we conducted block or herd-independent cross-validation.Second, spectral data collected for nonpregnant or open records in their study were all based on milk samples collected before the first insemination, thereby partially confounding lactation stage with PS.Third, milk records for pregnant cows from a broad expanse of gestational stages in their study were combined together instead of partitioning them into different stages of pregnancy as we did such that it was not possible to parse out the stronger late gestation stage signals from the weaker early stage lactation signals.
Milk production decreases, whereas fat, lactose, and protein content all increase as pregnancy advances Within a method (i.e., column), stages not sharing common superscript differ (P < 0.05) from each other.
x,y Within a stage (i.e., row), methods not sharing same letter differ (P < 0.05) from each other.(Olori et al., 1997;Penasa et al., 2016).In later stages of gestation, the growth and nutrient requirements of conceptus increases compared with earlier stages (Loker et al., 2009).Furthermore, estrogen levels in maternal blood gradually increase from 3 mo with no meaningful differences in levels between 3 to 5 mo and 5 to 7 mo of pregnancy, followed by a sudden increase thereafter (Parkhie et al., 1966), thereby potentially affecting nutrient partitioning (Olori et al., 1997).This could be a reason for observing a higher mean AUC at later stages compared with earlier stages.
It is generally recognized that diagnostic procedures should have AUC >0.70 before they are considered to be useful (Mandrekar, 2010).Hence our study implies that milk FT-MIR data has limited use for early diagnosis of PS in dairy cows even though the AUC values generally increased over the stages of gestation.Detection of PS at early ages with higher accuracies is important to replace more conventional methods of pregnancy diagnosis in order for FT-MIR-based predictions to be useful.
We do recognize that some cows classified as open in our study may have actually conceived and lost their embryo after their corresponding milk sample collection.This could be a contributing source of attenuating bias, particularly in stage 1 where such losses are known to approach 35% of conceptions (Santos et al., 2004).Even within the first half of stage 2 (30-45 d), embryo losses can occur in as high as 13% of pregnancies (Santos et al., 2004).Hence, this may be a contributing factor to the increasing signals associated with later stages of pregnancy in our analyses.

Wavenumber Associations With Pregnancy
We inferred the marginal associations of individual wavenumbers with PS as well as the joint associations of adaptively clustered windows of correlated wavenumbers with PS.In our study, we inferred 68 different windows based on the HAC algorithm with the number of wavenumbers per window ranging from 1 to 79 with a median of 7.These window ranges, along with the number of wavenumbers per window, are presented in Table S2 in Supplemental Web File S2 (https: / / doi .org/ 10 .6084/m9 .figshare.15017046;Khanal and Tempelman, 2021).The marginal PPA of individual wavenumbers for each pregnancy state are presented in the left panels of Figure 1, whereas the joint PPA of windows of clustered wavenumbers for each pregnancy stage are presented in the right panels of Figure 1.These estimated PPA are also tabulated in detail in Supplemental Web File S2 (Tables S3 and S4; https: / / doi .org/ 10 .6084/m9 .figshare.15017046;Khanal and Tempelman, 2021).For all stages, the marginal PPA of individual wavenumbers were generally lower compared with the corresponding joint PPA of windowsbased associations within the same regions.These differences may be attributed to the multicollinearity problem characterized by Ghosh and Ghattas (2015) and further illustrated by Fernando et al. (2017) in the context of GWAS.Furthermore, there seemed to be far greater consistency with inferences on peaks (i.e., high PPA) associated with windows as opposed to individual wavenumbers across the various stages of pregnancy as can be observed from comparing the left versus right panels of Figure 1.
We determined that pregnancy was consistently highly (PPA = 1) associated with wavenumber regions 1,063 to 1,134 cm −1 , 1,201 to 1,257 cm −1 , and 1,260 to 1,432 cm −1 for all pregnancy stages (Supplemental Table S4).As an additional check (results not reported), we noted that the joint PPA for each training set within each cross-validation fold for each stage ranged from 0.85 to 1 for both regions 1,063 to 1,134 cm −1 and 1,260 to 1,432 cm −1 .The corresponding joint PPA of wavenumbers within the region 1,201 to 1,257 cm −1 ranged from 0.3 to 1.0 across folds and stages; nevertheless, the median joint PPA of this region was 0.75 across the different stages and folds.Our results agreed with those reported by Toledo-Alvarado et al. (2018) who noted that wavenumbers 1,245 and 1,399 cm −1 were associated with pregnancy.Bittante and Cecchinato (2013) reported that the range of wavenumbers from 930 to 1,582 cm −1 correspond to peaks of absorbance pertaining to the bonds C-H, CH 2 , aromatic C=C, C=O, and N-O all contained within the "fingerprint region" (Karoui et al., 2010), which has been used to distinguish between a large number of molecular compounds based on absorbance band patterns.Absorbances at wavenumbers near 1,375 cm −1 have been previously associated with pregnancy in Holstein cows (Lainé et al., 2017) and also with the stretching mode of N-O and NO 2 (Socrates, 2004), which are chemical bonds believed to constitute pregnancy-associated proteins (Lainé et al., 2017).
We determined that the signals for pregnancy in range of wavenumbers from 1,477 to 1,507 cm −1 and from 1,510 to 1,574 cm −1 were lower for stages 1 and 2 compared with later stages.These regions are known to be associated with protein and amines where chemical bonds that contain nitrogen molecules (C-N, N-N) are specific to the region from 1,492 to 1,546 cm −1 (Kaylegian et al., 2006;Lynch et al., 2006).Zaalberg et al. (2020) also reported that the bonds (CO-N and N-H) associated with protein molecules are associated with wavenumbers that range from 1,500 to 1,550 cm −1 and the region around 1,600 cm −1 .The wavenumber 1,485 cm −1 is associated with (CH 3 ) 3 N+ asymmetric bending   Khanal and Tempelman, 2021) in column B. (Stuart, 2004).Lainé et al. (2017) and Toledo-Alvarado et al. (2018) also found important associations with pregnancy at around 1,550 cm −1 , although, again, their definitions of pregnancy were much broader than the 7 different stages in our study.The higher PPA in later stages compared with earlier stages for these regions could be due to the higher protein content of milk in later stages of gestation (Roche, 2003) compared with earlier stages.The wavenumber in this region is known to be associated with the ANKH gene (Zaalberg et al., 2020), which is strongly associated with α-LA and β-LG (Sanchez et al., 2017).Bleck et al. (2009) reported an increase in α-LA and β-1,4-galactosyltransferase in milk of cows as gestation advances and both enzymes are associated with lactose concentration in milk.Lower lactose levels in earlier stages of gestation (Olori et al., 1997) may explain the lower PPA for stage 1 to 30 d in the wavenumber regions of 1,477 to 1,507 cm −1 and 1,510 to 1,574 cm −1 .
Our results showed that windows from 1,730 to 1,764 cm −1 , 1,775 to 1,992 cm −1 , 1,995 to 2,163 cm −1 , and 2,167 to 2,316 cm −1 had widely varying PPA (from 0.20 to 1.00) with PS with no obvious trends across different stages of pregnancy.The wavenumber 1,747 cm −1 within the first window is known to be associated with fat A (C=O stretch; Stuart, 2004;Kaylegian et al., 2006).
The windows of range from 2,693 to 2,839 cm −1 and 2,842 to 2,962 cm −1 had medium to high PPA (0.47 to 1.00) across different stages.We did not find any obvious trend in PPA across different stages but a higher PPA was determined for the latter 2 stages (stage 6 and stage 7).The vibrations of ester linkage and C-H stretch groups related to fatty acids are located between 2,800 to 3,000 cm −1 (Sivakesava and Irudayaraj, 2002).Bittante and Cecchinato (2013) and Grelet et al. (2016) reported the C-H 3 (methyl) group is associated with 2,862 cm −1 .An important bandwidth for PS was previously inferred between 2,872 and 2,973 cm −1 (Toledo-Alvarado et al., 2018) but again for a much broader definition of pregnancy.In this region, fat B (absorbance by carbon-hydrogen) is filtered (Stuart, 2004;Kaylegian et al., 2006).Between 2,862 cm −1 (CH 3 ) and 2,927 cm −1 (CH 2 ), a significant association of pregnancy was determined by Lainé et al. (2017).The 2 bands at 2,850 and 2,920 cm −1 are attributable to asymmetrical and symmetrical methylene (CH 2 ) stretching vibration respectively (Sinclair et al., 1952).
Our results showed that wavenumber regions ranging from 906 to 973 cm −1 , 1,577 to 1,626 cm −1 , 1,667 to 1,693 cm −1 , 2,984 to 3,077 cm −1 , 3,081 to 3,133 cm −1 , 3,662 to 3,703 cm −1 , and 3,707 to 3,998 cm −1 had highly varying PPA (0.06 to 0.95) across various stages.The wavenumber region 1,577 to 1,627 cm −1 had greater PPA at stages 5, 6, and 7 compared with stage 1 and 2, whereas the wavenumber regions 2,984 to 3,077 cm −1 and 3,081 to 3,133 cm −1 had greater PPA at stage 1 compared with later stages.All of these regions overlap substantially with "water" regions as reported by Tiplady et al. (2019), Toledo-Alvarado et al. (2018), Bittante andCecchinato (2013), andGrelet et al. (2015) and are believed to have low signal-to-noise ratios.We observed that the region 649 to 900 cm −1 was universally associated with near-zero PPA across all stages, whether for marginal or joint associations (Figure 1).These results were somewhat anticipated as this region has been associated with an area of the spectrum where the calcium fluoride cuvette of the spectrophotometer should render it opaque to infrared light (Rovere et al., 2021).Toledo-Alvarado et al. ( 2018) determined a peak at 3,683 cm −1 for pregnancy which is close to those known for characterizing the bonds: C=CH 2 and O-H (alcohols, phenols, and carboxyl) group.Wang et al. (2016) and Wang and Bovenhuis (2018) identified wavenumbers (1,619 to 1,674 cm −1 ; 3,073 to 3,667 cm −1 ) associated with fat, protein, or lactose percentage and associated with polymorphisms in the DGAT1 gene.Zaalberg et al. (2020) also reported that 1,604 cm −1 is associated with DGAT1 gene and 1,696 cm −1 is associated with the PAEP gene.Contrary to what has been often practiced for FT-MIR predictions of key traits in other studies (Bonfatti et al., 2011;Delhez et al., 2020), we, like Bittante and Cecchinato (2013), recommend using wavenumbers in nominal water regions, at least for those >900 cm −1 , for FT-MIR predictions.That is, these regions include absorbance peaks that are associated with nonwater components of milk as we have previously observed for fatty acids (Rovere et al., 2021) and as we seem to observe for PS in our study.

Proportion of Variance Explained by Milk Infrared Spectra
The proportion of variance explained by milk infrared spectrum, which we denote as spectral repeatability (s 2 ), was calculated in analogy to a genomic heritability but using spectral data instead of genomic markers.Spectral repeatability estimates are presented on Table 3.As anticipated, spectral repeatability estimates generally increased with increasing stages of pregnancy, ranging from 0.05 ± 0.02 (posterior mean ± posterior standard deviation) for stage 1 to 0.38 ± 0.12 for stage 7. Hence milk infrared spectral data at later stages seem more effective for diagnosing pregnancy relative to earlier stages as the spectral repeatability estimates were higher at later stages, thereby confirming our earlier reported cross-validation assessments.However, as also previously noted, the low estimates of s 2 at earlier We believe that the low early stage estimates of s 2 are somewhat counterbalanced by the strong associations (i.e., PPA approaching 100%) inferred between some spectral regions with PS in stage 1 as previously reported in this paper.By clear analogy, promising GWAS analyses have led to major QTL regions being discovered for pregnancy rates in spite of heritability estimates being lower than 10% (Ma et al., 2019).

CONCLUSIONS
Our study suggests that the potential for MIR spectroscopy to diagnose early the PS of dairy cows is limited.We compared PLS-DA regression and a Bayesian variable selection model BayesC and determined that BayesC had better out-of-sample prediction accuracies in diagnosing PS compared with PLS-DA.Crossvalidation prediction accuracies were higher at later stages of gestation (greater than 150 d); however, those stages are too late to be of practical use for management decisions.Nevertheless, this study demonstrated some associations between pregnancy at all stages with various wavenumber regions in the mid-infrared spectrum that include plausible candidates for chemical bonds in molecules believed to be associated with pregnancy.The nature of these inferences is not much unlike what is typically reported for genomic studies or GWAS involving fertility in dairy cattle (i.e., low heritability estimates yet some genomic regions known to be strongly associated with various measures of fertility).Further research is needed to improve prediction accuracies in earlier pregnancy stages by combining the use of FT-MIR spectra with other omics data sources such as metabolomics, and genomic and molecular data.Window-based associations require greater consideration as associations of individual wavenumbers with PS can be muted by multicollinearity with individual wavenumbers, whereas one wavenumber at a time regression may not fully account for population structures induced by management clusters.
et al.(2019)  and implemented using the capushe function in the adjclust package.The joint PPA of all n k wavenumbers within a window k was computed using PPA k = further detail but within a genomic context byChen et al. (2017).Here β kj(l) is the random draw from the full conditional distribution of β j for wavenumber j located within window k drawn during MCMC cycle l.Note that I of the draws of β kj(l) within window k are sampled to be nonzero.
Khanal and Tempelman: FOURIER-TRANSFORM MID-INFRARED SPECTROSCOPY TO DIAGNOSE PREGNANCY Khanal and Tempelman: FOURIER-TRANSFORM MID-INFRARED SPECTROSCOPY TO DIAGNOSE PREGNANCY Table2.Estimated receiver operating characteristic areas under the curve (AUC) for partial least squares discriminant analysis (PLS-DA) and BayesC based on 10-fold cross-validation at various stages of
Khanal and Tempelman: FOURIER-TRANSFORM MID-INFRARED SPECTROSCOPY TO DIAGNOSE PREGNANCY stages limits the practical use of FT-MIR data for early detection of pregnancy.
Khanal and Tempelman: FOURIER-TRANSFORM MID-INFRARED SPECTROSCOPY TO DIAGNOSE PREGNANCY researchers

Table 1 .
Khanal and Tempelman: FOURIER-TRANSFORM MID-INFRARED SPECTROSCOPY TO DIAGNOSE PREGNANCY Number of pregnant and open records and cows, and DIM (mean, with SD in parentheses) for different stages of pregnancy Table S2 in Supplemental Web File S2; https: / / doi .org/ 10 .6084/m9 .figshare.15017046;

Table 3 .
Spectral repeatability (s 2 ) or estimated proportion of variance explained by the milk infrared spectrum at different stages of pregnancy using BayesC 1Posterior mean (posterior SD).