Predictive models for disease detection in group-housed preweaned dairy calves using data collected from automated milk feeders

In the US, dairy calves are typically housed individually due to the perception of reduced risk of spreading infectious diseases between calves and the ability to monitor health on an individual calf basis. However, automated milk feeders (AMF) can provide individual monitoring of group-housed calves while allowing them to express more natural feeding behaviors and interact with each other. Research has shown that feeding behaviors recorded by AMF can be a helpful screening tool for detecting disease in dairy calves. Altogether, there is an opportunity to utilize the data from AMF to create a more robust and efficient model to predict disease, reducing the need for visual observation. Therefore, the objective of this observational study was to predict disease in preweaned dairy calves using AMF feeding behavior data and machine learning (ML) algorithms. This study was conducted on a dairy farm located in the Upper Midwest US and visited weekly from July 2018 to May 2019. During farm visits, AMF data and calves’ treatment records were collected, and calves were visually health scored for attitude, ear position, ocular discharge, nasal discharge, hide dirti-ness, and cough score. The final data sets used for the analyses consisted of 740 and 741 calves, with 1,007 (healthy = 594 and sick = 413) and 1,044 (healthy = 560 and sick = 484) observations (health events) for Data 1 and Data 2, respectively. Data 1 included only the weekly calf health scores observed by research personnel, whereas Data 2 included primarily daily calf treatment records by farm staff in addition to weekly health scores. Calf visit-level feeding behaviors from AMF data included milk intake (mL/d), drinking speed (mL/min), visit duration (min), rewarded (with milk being offered) and unrewarded (without milk) visits (number per d), and the interval between visits (min). Three approaches were used to predict health status: Generalized Linear Model, Random Forest, and Gradient Boosting Machine. A total of 16 models were built using different combinations of behavior parameters, including the number of rewarded visits, number of unrewarded visits, visit duration, the interval between visits, intake, intake divided by rewarded visits, drinking speed, and drinking speed divided by rewarded visits, and also calf age at the sick day as predictor variables. Of all algorithms, Random Forest and Gradient Boosting had the best performance predicting the health status of dairy calves. The results indicated that weekly health scores were not enough to predict calf health status. However, daily treatment records and AMF data were sufficient for creating predictive algorithms (e.g., F1-scores of 0.775 and 0.784 for Random Forest and Gradient Boosting, Data 2). This study suggests that ML was effective in determining the specific visit-level feeding behaviors that can be used to predict disease in group-housed preweaned dairy calves. Implementing these ML algorithms could reduce the need for visual calf observation on farms minimizing labor time and improving calf health.


INTRODUCTION
Preweaned dairy calves are individually housed in most of the dairy operations in the United States.Producers typically house calves individually due to the perception of reduced spread of infectious diseases between calves (von Keyserlingk and Weary, 2011) and limitations in infrastructure to house calves in groups.A survey conducted by the USDA concluded that the most prevalent causes of morbidity and mortality in preweaned dairy calves are diarrhea (also referred to as calf scours) and respiratory disease (Urie et al., 2018).Disease during the preweaning period has been reported to create problems for the calf later in life.Presence of diarrhea during the preweaning period was associated with lower milk production in the first lactation, lower average daily gain, and increased services per conception (Abuelo et al., 2021).
Predictive models for disease detection in group-housed preweaned dairy calves using data collected from automated milk feeders R. K. Perttu,* M. Peiter,* T. Bresolin,# J. R. R. Dórea,# and M. I. Endres* † Calves are herd animals that live in groups in natural conditions (Chua et al., 2002); therefore, individual housing could be detrimental to calves' affective state.Recent studies suggest that individual housing is associated with higher stress (Liu et al., 2019), impaired behavioral development, and compromised cognitive ability (Costa et al., 2019).Other research suggests that housing calves in groups may be beneficial to both the producer and the calves.Particularly, group rearing grants greater space allowance for calves to express natural behaviors and partake in social interactions with other calves (Whalin et al., 2021), which is necessary for their behavioral development (Meagher et al., 2015).Also, housing calves socially benefits the dairy industry's public image because the public desires housing options that promote socialization (Perttu et al., 2020(Perttu et al., , 2021)).Producers are interested in exploring housing options that merge the ability for calves to experience a more fulfilling social environment while also being healthy.Therefore, proactive and more automated disease detection strategies are required to identify sick calves.
For producers interested in pursuing management options that monitor calves individually and offer greater behavioral freedom for calves, an automated milk feeder (AMF) is an attractive option.The AMF allows calves to be housed in a group setting, therefore, promoting social interactions.Housing calves in groups was associated with increased time spent at an AMF and decreased time before the first visit to the feeder (De Paula Vieira et al., 2010;Duve et al., 2012) due to a process known as social learning, which increases exploration by calves provided the company of older calves (De Paula Vieira et al., 2012).The AMF accommodates key natural behaviors, such as teat feeding of milk and gradual weaning, therefore, improving the welfare of the calf (Whalin et al., 2021).Additionally, AMF gives farmers the opportunity to have a flexible labor schedule and more efficiently feed group-housed calves (Hepola, 2003).
The AMF identifies calves individually upon entry to the feeder using their unique radio-frequency ear tag.If the AMF determines the calf is eligible to receive milk, the feeder will dispense milk (whole milk or milk replacer) in a predetermined allotment, and the event is recorded as a rewarded visit by the software.If milk is not offered to the calf at a visit (i.e., the calf received its full allotment for the day or the time interval between meals has not been reached), the event is recorded as an unrewarded visit.During a rewarded visit, the AMF records feeding behaviors, such as each calf's milk intake (mL), drinking speed (mL per min), and duration of the visit (min).The software has the capability to summarize the data that can be accessed by the pro-ducer via an on-farm computer or remotely.The software program uses an algorithm to alert the producer of potentially sick calves by comparing the calf's daily intake to their rolling average.The producer can use this information to physically assess the flagged calves and decide whether treatment is necessary.Previous research suggests that AMF data are a useful screening tool for detecting disease in dairy calves (Knauer et al., 2017;Conboy et al., 2021;Perttu et al., 2023); however, further research is needed to create a more powerful and efficient tool to predict disease.
Multiple disciplines have utilized machine learning (ML) as a powerful analytical tool for predictive modeling (Yang and Guo, 2017).Combining ML algorithms with data generated from sensing technologies such as wearable sensors, cameras, and spectrometers has allowed the development of important farm management tools for early disease and estrus detection (Cairo et al., 2021;Teixeira et al., 2022), animal identification (Ferreira et al., 2022), and dry matter intake prediction (Dorea et al., 2018).Various fields of study are moving toward the ML trend due to the explosion of the big data era (Qiu et al., 2016).Specifically in dairy science, ML has been used to detect behavior anomalies and predict conception and metabolic status in dairy cows (Hempstalk et al., 2015;Xu et al., 2019;Wagner et al., 2020).Similar to other sensors, AMF generate large temporal datasets that can potentially be analyzed using ML techniques to build accurate and precise predictive algorithms.To our knowledge, research specifically on applying ML algorithms to predict the health status of preweaned calves using AMF data from a commercial dairy farm has not yet been published.Therefore, the objective of this study was to predict disease in group-housed preweaned dairy calves using AMF data and ML algorithms in a field setting.

MATERIALS AND METHODS
The University of Minnesota's Institutional Animal Care and Use Committee approved the use of animals in this study (protocol no.1806-36043A).

Animals, Management, and Feeding
Data collection for this study was performed from July 2018 to May 2019 at a 2500-cow dairy farm in the Upper Midwest US Calves (female Holsteins) were given heat-treated colostrum within 6 h of birth and moved to individual indoor hutches.Calves remained in the indoor, climate-controlled hutches until approximately 7 d of age and were manually fed 4 L of milk replacer daily divided into 2 meals.Once calves were approximately 7 d of age, they were moved to a dif-ferent barn into groups until 56 d of age.The farm had 12 groups, and calves remained in the same group throughout the study.All groups included 15 calves each, and the ages between the youngest and oldest calves in the group did not surpass 1 wk.Groups were housed in 11 × 11m pens with a 3.7 m wide, nonbedded alley that contained the AMF station, water trough, and concentrate starter.Each calf had 5.4 m 2 of average area allowance for resting that was bedded with sawdust during the summer and straw during the winter.The barn ventilation system consisted of a combination of tubes, exhaust fans, and side curtains.
Calves were fed a combination of whole milk (50%) and milk replacer (50%) with allowance peaking at 9 L/calf/d.The farm had 6 AMFs (DeLaval Calf Feeder CF1000S, DeLaval, Tumba, Sweden) with 2 feeding stations per AMF; therefore, equating to 1 feeding station per group.Once on the AMFs, allowance started at 5 L/d and increased to 6.5 L/d over a 7-d period.Then, milk allowance gradually increased from 6.5 L to 7.5 L/d over a 14-d period.Allowance peaked at 9 L/ calf/d at d 35 and stayed at 9 L for 18 d.The weaning process began at 46 d of age stepping-down milk allocation by 0.90 L/d (over a 10-d period) reaching zero at 56 d of age.Calves were offered concentrate starter and water ad libitum.

Feeding Behavior
Daily feeding data, recorded by the software Institute Program (Förster-Technik, Engen, Germany), were downloaded on a weekly basis by research personnel.The data included the calf identification number, beginning and end of each visit (timestamp), if the visit was rewarded or unrewarded, milk allowance per visit (mL), cumulative milk consumption for each d (mL), drinking speed (mL per min), and sudden withdrawal from the rationed meal (breakaway).Data were downloaded from the AMFs (n = 6) when no calves were in the AMF consuming a meal.Occasionally the AMF receiver and the transmitter on the calf's ear would disconnect, therefore leading to the software generating multiple instances for the same visit.The following procedure was adopted to create a single visit if multiple visits were incorrectly recorded: i) if the daily cumulative consumption was the same, and the interval between the multiple instances were lower than 30 s; ii) if milk allowances differed between instances of the same visit, that visit was excluded from the data set; iii) if the drinking speed was greater than 0 and cumulative consumption equal to 0, the data points were excluded from the data set; iv) if the interval between instances was lower than 2 min, then it was considered a single visit to the AMF based on manufacturer's instructions.After preprocessing the data, feeding behaviors were created including: i) Milk intake -total daily milk intake (mL/d); ii) Average milk intake -total milk intake divided by the number of visits to the AMF with reward (mL/visit); iii) Drinking speed -average daily drinking speed (mL/min); iv) Interval between visits -the average daily interval between visits (both rewarded and unrewarded) to the AMF (min); v) Visit duration -average daily length of visits to the AMF (min); vi); Number of rewarded visits -daily visits to the AMF in which there was consumption; vii) Number of unrewarded visits -daily visits to the AMF in which there was no consumption.In addition, viii) Age -the number of days from birth to the day of the health event (d 0) was included in the data set.We did not analyze milk allowance per visit and sudden withdrawal from rationed meal.The first day of AMF data collection was removed since animals were placed into group pens at varying times of the day (morning or evening).In addition, the last day of AMF data collection was removed due to differences in the time of the last meal of the weaning program.Finally, drinking speed >2000 (mL/sec), number of rewarded visits >11, number of unrewarded visits >47, the interval between visits >1000 (min), and visit duration <0.18 and >11 (min) were removed from the raw data set.Such cutoffs were defined based on the visual inspection of the data distribution.

Health Events
Calf health scores (Data 1) were collected on a weekly basis using a visual methodology adapted from McGuirk (2008) and Jorgensen et al. (2017) by a single trained observer for consistency.Intra-observer reliability was completed by having a veterinarian co-author accompany the single trained observer to the farm approximately every 2 mo throughout the 12-mo study.Calves were scored individually between the single trained observer and veterinarian co-author, and the health scores were compared.Throughout evaluations, the agreement was greater than 95%.Health scores of interest included ocular discharge, nasal discharge, ear position, attitude, and hide dirtiness.Ocular discharge (eye score) was assessed on a 0 to 3 scale (0 = no discharge; 1 = small amount of ocular discharge; 2 = moderate amount of discharge; and 3 = heavy amount of discharge).Nasal discharge (nasal score) was assessed on a 0 to 3 scale (0 = normal serous discharge; 1 = small amount of unilateral cloudy discharge; 2 = bilateral, cloudy, or excessive discharge; and 3 = copious bilateral muco-purulent discharge).Ear position (ear score) was assessed on a 0 to 4 scale (0 = no ear droop; 1 = unilateral ear droop; 2 = slight bilateral ear droop; 3 = severe bilateral ear droop; and 4 = head tilt).Attitude score was assessed on a 0 to 4 scale (0 = active; 1 = quiet/dull; 2 = depressed; 3 = nonresponsive; and 4 = dead).Hide dirtiness was assessed on a 0 to 2 scale by observing the perianal region, the underside of the tail, and tailhead of the calves (0 = clean; 1 = evidence of loose or abnormal fecal consistency; and 2 = significant evidence of watery diarrhea).Only assessing this limited area on the calf's rear for signs of diarrhea increased the likelihood that the fecal material observed was from the calf; therefore, being a more feasible evaluation for calves that are group-housed.However, we acknowledge that some fecal material could potentially be from the calf's environment.Rectal temperatures were recorded for calves scoring ≥2 on any health category since previous literature defined this score as abnormal, indicating illness (McGuirk, 2008).Calves scoring a cumulative score of ≥3 as the sum of all health category scores (ocular discharge, nasal discharge, ear position, attitude, hide dirtiness) were categorized as sick, whereas calves scoring less than sum of 3 were categorized as healthy.
In addition to the health score visually evaluated on a weekly basis, the farm's daily treatment records were added as an indication of calf health status to create another data set (Data 2).These records were written by hand by farm staff, photographed by the research personnel at each farm visit, and then entered into an Excel spreadsheet.The intention was to more accurately assign sick d 0 (the day the calf had first symptoms) after initial analyses indicated that weekly scores were not sufficiently accurate for the algorithms (results of Data 1 analyses are still presented herein for comparison).The treatment records included calf identification, feeder number, treatment date, product used to treat the calf, category of product (i.e., antibiotic, anti-inflammatory), symptoms, and potential diagnosis (i.e., bovine respiratory disease, neonatal calf diarrhea).The farm staff observed and treated calves on a daily basis as needed and only treated calves that showed symptoms and needed treatment for any disease (they did not treat all calves in the group as preventative measure as some farms do).Research personnel scored calves once weekly and shared those scores with farm staff.Since research personnel recorded calf health scores before the farm staff treating the calves that day, Data 2 still included the sick classification from health scoring for that day only and that day was considered sick d 0. For other days of the week, treatment day was considered sick d 0 for each calf that was treated that day.If a calf was not treated, then it was considered as healthy.
Day 0 was the day when a calf was first observed as sick (with symptoms), either by research personnel (weekly health scores) or by farm staff (daily treatment records) as described previously.For each calf, AMF data were selected from 6 d before sick d 0, considering every single day as an independent predictor to train the algorithms.The experimental unit for the current study was the disease category observation within calf (healthy or sick).No data after a calf's sick event were considered in this study due to uncertainty about whether the calf recovered from a disease or not, which could create bias in the analyses.

Final Data sets
In summary, a total of 951 calves were enrolled in the experiment from July 2018 to May 2019.Health scores and AMF data were merged to create Data 1, and health scores, the farm's daily treatment records, and AMF data were merged to create Data 2, where both data sets had a total of 760 out of 951 calves with health events and feeding behaviors (see "Health Events" section for more details).To have an approximately equal number of observations in each class (i.e., healthy and sick), the data was down sampled (randomly).The decision to use down sampling rather than oversampling was based on the nature of the data and the specific research question being addressed.There was a large number of instances of the majority class (healthy calves) and a smaller number of instances of the minority class (sick calves) in the data set.In this scenario, oversampling the minority class could lead to overfitting and bias in the results, as it could artificially increase the representation of the minority class.To address this issue, down sampling was used.Instances of the majority class were randomly selected to be removed from the data set until the number of instances in the majority and minority classes was similar.This process was repeated multiple times to obtain different random samples, and then the algorithm was trained on each of these samples to ensure the robustness of the results.After applying the cutoffs described in the "Feeding Behavior" section, and the down-sampling strategy, the data sets used in the analyses consisted of 740 and 741 calves, with 1,007 (healthy = 594 and sick = 413) and 1,044 (healthy = 560 and sick = 484) observations (health events) for Data 1 and Data 2, respectively.

Data Analysis
Generalized Linear Model (GLM), Random Forest (RF), and Gradient Boosting Machine (GBM) were used to train classifier algorithms to predict whether the calf would be healthy ( = 0) or sick ( = 1).The H2O package (The H2O.ai Team, 2020) implemented in The GLM proposed by Nelder and Wedderburn (1972) is an extension of the linear regression models, which allows the response variable to have an error distribution other than the Gaussian distribution.The GLM consists of 3 elements: 1) a conditional distribution for modeling the response variable given the predictors; 2) a linear predictor to build the linear relationship between the response and predictors variables, even though their underlying relationship is not linear; and 3) a link function to provide the relationship between the linear predictor and the mean of the distribution function.Bernoulli was used as the conditional distribution and logit as the link function.Elastic net regularization was used to fit GLM, which requires defining lambda and α hyperparameters, both in a range of 0 and 1. Lambda controls the amount of regularization (larger lambdas shrink the coefficients toward zero), while α controls the distribution between Lasso (l1) and ridge regression (l2).
The GBM gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees (Friedman, 2001).The statistical framework is cast boosting a numerical optimization problem where the objective is to minimize the loss function by adding weak learners using a gradient descent procedure.This class of algorithms is described as a stage-wise additive model since one new weak learner is added at a time, and the existing weak learners are frozen and left unchanged.Therefore, GBM trains many models in a gradual, additive, and sequential manner where the loss function measures how good the models are at fitting the underlying data.Similar to the RF, GBM has several hyperparameters that must be defined, including the number of trees (10, 20, 40, 80, 100, 200, and 300), maximum tree depth (1, 10, 20, 40, and 80), minimum number of observations per leaf (1, 2, 10, 20, and 30), number of variables in the subset (2, 3, 4, 5, and 6), sampling rate (0.1 to 1.0), column sampling rate (0.1 to 1.0), and histogram type (UniformAdaptive, Random, QuantilsGlobal, and RoundRobin).
The best sets of hyperparameters for all ML algorithms were defined through a discrete random grid search by using a combination of a maximum number of 400 models and/or maximum runtime of 860 s, and misclassification metric as early stopping criteria.The hyperparameters tuning was performed on the training set using a 5-fold cross-validation strategy.The best set of hyperparameters was chosen based on the prediction accuracy from the 5-fold cross-validation.
The ML algorithms used in this study were trained using only Data 1 or only Data 2. As a reminder, Data 2 combined daily treatment records and weekly health scores for a more accurate classification of sick d 0. A total of 16 models were trained for ML algorithms, considering a combination of predictors such as: number of rewarded visits (RV), number of unrewarded visits (URV), visit duration (VD), the interval between visits (IBV), intake divided by rewarded visits (ITR), intake (IT), drinking speed divided by rewarded visits (DSR), drinking speed (DS) and number of days from birth to sick d 0 (Age).All ML algorithms were trained using the 16 models and the hold-out cross-validation strategy was used to split the data into training and testing sets.In the hold-out strategy, 20% of samples in the data were kept as the testing data set while the remaining 80% were used as the training data set.The testing and training data sets did not each contain records from the same calf.
The model performance was evaluated using precision, recall, and F1-score metrics as precision = TP / TP + FP; recall = TP / TP + FN; and F1-score = 2 * (precision * recall / precision + recall).The TP represents the number of times a sick event was correctly classified as sick, FP represents the number of times an event was predicted as sick, but it was a healthy event, and FN represents the number of times an event was predicted as healthy, but it was a sick event.

RESULTS
The description of the covariate sets used to create ML algorithms is presented in Table 1.The final data sets consisted of 740 calves (1,007 disease category observations, with healthy = 594 and sick = 413 observations) and 741 calves (1,044 observations, with healthy = 560 and sick = 484 observations), for Data 1 and Data 2, respectively.As a reminder, for each calf, AMF data were selected from 6 d before sick d 0, considering every single day as an independent predictor in the model.In other words, 6 d were used as predictors in the models for each of the disease category observations.A summary of all best set of hyperparameters by model and ML algorithm is presented in supplementary Tables S1, S2, and S3.

Generalized Linear Model -Data 1
Weekly health scores as the response variable to predict disease using GLM approach on the corresponding training and test sets are presented in Table 2.The highest F1-score (30.3%) was achieved by the "M11" model (DS + IT + IBV + VD) when predicting the health status on the testing set.The best-performing execution of the "M11" model resulted in 75.7% precision and 18.9% recall on the test set.The "M11" model's train-validation resulted in a 44.3% F1-score, 76.0% precision, and 31.3%recall.

Generalized Linear Model -Data 2
Treatment records, in addition to the weekly health scores as the response variable to predict disease using GLM approach on the corresponding training and test sets, are presented in Table 3.The highest F1-score (74.8%) was achieved by the "M16" model (DS + IT + IBV + VD + RV + URV + DSR + ITR + Age) when predicting the health status on the testing set.The best-performing execution of the "M16" model resulted in 78.5% precision and 71.4% recall on the test set.

Random Forest -Data 1
Weekly health scores as the response variable to predict disease using RF approach on the corresponding training and test sets are presented in Table 4.The highest F1-score (29.2%) was achieved by the "M14" model (DS + IT + IBV + VD + RV + URV + DSR) when predicting the health status on the testing set.The best-performing execution of the "M14" model resulted in 100% precision and 17.1% recall on the test set.The "M14" model's train-validation resulted in a 44.5% F1-score, 81.5% precision, and 30.6% recall.

Random Forest -Data 2
Treatment records, in addition to the weekly health scores as the response variable to predict disease using RF approach on the corresponding training and test sets, are presented in Table 5.The highest F1-score (78.4%) was achieved by the "M6" model (IT) when predicting health status on the testing set.The bestperforming execution of the "M6" model resulted in 88.5% precision and 70.4% recall on the test set.The "M6" model's train-validation resulted in a 74.4% F1score, 79.7% precision, and 69.7% recall.

Gradient Boosting Machine -Data 1
Weekly health scores as the response variable to predict disease using GBM approach on the corresponding training and test sets are presented in Table 6.The highest F1-score (30.1%) was achieved by the "M4" model (IBV) when predicting health status on the testing set.The best-performing execution of the "M4" model resulted in 48.4% precision and 21.9% recall on the test set.The "M4" model's train-validation resulted in a 44.0%F1-score, 93.9% precision, and 28.7% recall.

Gradient Boosting Machine -Data 2
Treatment records, in addition to the weekly health scores as the response variable to predict disease using GBM approach on the corresponding training and test sets, are presented in Table 7.The highest F1-score (77.5%) was achieved by the "M16" model (DS + IT + IBV + VD + RV + URV + DSR + ITR + Age) when predicting the health status on the testing set.The best-performing execution of the "M16" model resulted in 81.4% precision and 74.0%recall on the test set.The "M16" model's train-validation resulted in a 75.3%F1-score, 78.2% precision, and 72.6% recall.

Predictive Performance of Machine Learning Algorithms
The RF and GBM models had the most consistent performance across all 16 models compared with GLM.Consistently, "M16," which was trained using all predictor features (DS + IT + IBV + VD + RV + URV + DSR + ITR + Age), had the best performance based on the parsimonious point, meaning it provided the simplest explanation that fits the best results when more than one option is available.The results indicate that weekly health scores alone were insufficient to predict calf health status.A Recurrent Neural Network (RNN) algorithm that considers the temporal effects of predictors on the response variables was also tested (results not shown) in addition to the 3 ML presented above.However, the RNN trained using Data 1 or Data 2 could not predict any sick event regardless of the predictor variables used to train the algorithm.One possible explanation for the low performance observed for the RNN compared with the other ML is the short time series used in this study (6 d).

Feeding Behavior and Disease Detection
Research has shown that calf feeding behaviors collected from AMFs were associated with disease in group-housed preweaned dairy calves (Cramer and Ollivett, 2020;Conboy et al., 2021;Perttu et al., 2023).Duthie et al. (2020) found associations with rewarded visits, intake, and visit duration as it relates to calf sickness.Specifically, studies have found reductions in intake when calves are diagnosed with respiratory disease (Duthie et al., 2020) and diarrhea (Sutherland et al., 2018) when given a high milk allowance indicating a reduction in the motivation to feed.In calves given a restricted milk allowance (<6 L/d), a reduction in feeding behavior is expressed as fewer unrewarded visits compared with their healthy counterparts rather than a reduction in consumption (Svensson and Jensen, 2007;Knauer et al., 2017).This indicates that sick calves are motivated to consume their full milk allotment on restricted diets.Therefore, assessment of intake alone is not enough to detect disease and other feeding behaviors need to be considered.
The results of the current study indicate that multiple factors, such DS, IT, IBV, VD, RV, URV, DSR, ITR, and age were needed to predict health status in dairy calves, and that calf disease could be predicted by ML algorithms.The ability to detect disease automatically using ML algorithms could result in reduced labor, and allow for more timely delivery of supportive therapy such as pre and probiotics, electrolytes, vitamins, or other treatments as needed.Timely disease detection is vital to improve animal welfare and to reduce economic losses for the producer.

Predicting Disease with Machine Learning Algorithms
The superior performance of RF and GBM is similar with other studies that predicted metabolic status (Xu et al., 2019) and forecasted chronic mastitis (Bonestroo et al., 2022) of dairy cows.Such improved performance of these algorithms are usually attributed to the fact that ensemble methods can generate high-performance classifiers by collecting individually trained classifiers (Xu et al., 2019).In the current study, these algorithms were effective when daily treatment records were used and not effective when considering only weekly health scores.Therefore, weekly health scores alone were not enough to predict calf sickness most likely because the trained observer could be evaluating the calf days after onset of the illness.The performance of all algorithms (e.g., GLM, RF, GBM) depends on various factors, such as features and parameters.To determine  the most robust algorithm, it is imperative to evaluate algorithms on a case-by-case basis and use a multi parallel comparison approach.The RF approach had a better prediction of the disease of dairy calves when considering only IT as a predictor factor.Among previous research, reduced intake has frequently been an indicator of disease in preweaned dairy calves fed a higher milk allowance (Borderas et al., 2009;Knauer et al., 2017;Duthie et al., 2020).In contrast, the GLM and GBM had a better prediction of dairy calf sickness when all feeding behaviors and age at health events were considered to train the algorithm.Previous research has suggested that multiple feeding behaviors (Borderas et al., 2009;Knauer et al., 2017), in addition to age, need to be considered to define imperative health indicators such as disease (Santman-Berends et al., 2019).
The performance of algorithms with different precision, recall, and F1-scores relating to the training and testing sets determines application in practice.The relative higher precision of RF and GBM could be helpful to automatically detect sick dairy calves.To make the algorithms more robust, further research could investigate methods of ML performance concerning different sensitivity and specificity using management factors (e.g., bedding type, group size).Data sets that include an even larger number of sick observations would allow a more precise prediction of disease and potentially the recovery timeframe from the illness.In addition, other on-farm calf data could contribute to predict calf disease, such as lying behavior and activity coupled with feeding behavior (Sutherland et al., 2018;Duthie et al., 2020;Cantor et al., 2022) or starter intake (Cantor and Costa, 2022).Lastly, ML algorithms could potentially be used to predict not only health status, as in the current study, but calves more susceptible to specific diseases, such as bovine respiratory disease (Cantor et al., 2022) or a framework for the therapeutic efficacy of treatment for calf diarrhea (Islam et al., 2022).

CONCLUSION
Automated milk feeder data were useful to predict health status in preweaned dairy calves using ML algorithms.Of all algorithms, Random Forest and Gradient Boosting Machine had best performance to predict health status of dairy calves.These ML algorithms could be used to correctly identify disease in preweaned dairy calves, thus opening the possibility for more timely or automated delivery of supportive therapy and reduced need for visual calf observations by caretakers.
Perttu et al.: PREDICTING DISEASE IN CALVES USING AUTOMATED MILK FEEDER DATA R (R Core Team, 2020) was used in the current study to fit all models and ML algorithms.
Perttu et al.: PREDICTING DISEASE IN CALVES USING AUTOMATED MILK FEEDER DATA Perttu et al.: PREDICTING DISEASE IN CALVES USING AUTOMATED MILK FEEDER DATA Perttu et al.: PREDICTING DISEASE IN CALVES USING AUTOMATED MILK FEEDER DATA Perttu et al.: PREDICTING DISEASE IN CALVES USING AUTOMATED MILK FEEDER DATA

Table 1 .
Perttu et al.: PREDICTING DISEASE IN CALVES USING AUTOMATED MILK FEEDER DATA Model descriptions for the machine learning algorithms on predictor features used to predict health status observed on 740 (Data 1) and 741 (Data 2) preweaned dairy calves (from d 7 to d 56 of age) representing 1,007 and 1,044 observations, respectively

Table 2 .
Generalized Linear Model using weekly visual health scores as the response variable to predict health

Table 3 .
Generalized Linear Model using daily treatment records as the response variable to predict health status observed on 741 preweaned dairy calves (from d 7 to d 56 of age) representing 1,044 observations (Data 2)

Table 4 .
Random forest using weekly visual health scores as the response variable to predict health status observed on 740 preweaned dairy calves (from d 7 to d 56 of age) representing 1,007 observations (Data 1)

Table 5 .
Random forest using daily treatment records as the response variable to predict health status observed on 741 preweaned dairy calves (from d 7 to d 56 of age) representing 1,044 observations (Data 2)

Table 6 .
Gradient Boosting Machine using weekly visual health scores as the response variable to predict

Table 7 .
Gradient Boosting Machine using daily treatment records as the response variable to predict health status observed on 741 preweaned dairy calves (from d 7 to d 56 of age) representing 1,044 observations (Data 2)