ABSTRACT
In this study, we developed a machine learning framework to detect clinical mastitis (CM) at the current milking (i.e., the same milking) and predict CM at the next milking (i.e., one milking before CM occurrence) at the quarter level. Time series quarter-level milking data were extracted from an automated milking system (AMS). For both CM detection and prediction, the best classification performance was obtained from the decision tree–based ensemble models. Moreover, applying models on a data set containing data from the current milking and past 9 milkings before the current milking showed the best accuracy for detecting CM; modeling with a data set containing data from the current milking and past 7 milkings before the current milking yielded the best results for predicting CM. The models combined with oversampling methods resulted in specificity of 95 and 93% for CM detection and prediction, respectively, with the same sensitivity (82%) for both scenarios; when lowering specificity to 80 to 83%, undersampling techniques facilitated models to increase sensitivity to 95%. We propose a feasible machine learning framework to identify CM in a timely manner using imbalanced data from an AMS, which could provide useful information for farmers to manage the negative effects of CM.
Key words
INTRODUCTION
Clinical mastitis (CM) is common in the dairy sector. The USDA's National Animal Health Monitoring System reported that about 25% of US dairy cows suffered from CM in 2013 (
USDA, 2016
). The economic costs of CM include treatment costs, milk yield reduction, and increased culling rate and mortality (- USDA
Dairy 2014, Milk Quality, Milking Procedures, and Mastitis in the United States, 2014.
https://www.aphis.usda.gov/animal_health/nahms/dairy/downloads/dairy14/Dairy14_dr_Mastitis.pdf
Date: 2016
Date accessed: July 6, 2022
Bar et al., 2007
; Mostert et al., 2019
). Reports of the average cost of a CM case ranged from $179 to $444, which is dominated by indirect costs, such as milk loss in the current and subsequent lactations and reproductive loss (Bar et al., 2008
; Rollin et al., 2015
). Furthermore, CM symptoms can cause pain and discomfort, which negatively affect animal welfare (Petersson-Wolfe et al., 2018
).On dairy farms in the United States, forestripping cows before milking is a common practice to detect most cases of CM (
Sepúlveda-Varas et al., 2016
; USDA, 2016
). This is because the abnormalities in milk (visual appearance of clots, flakes, changes in color) are often the first clinical signs of inflammation (- USDA
Dairy 2014, Milk Quality, Milking Procedures, and Mastitis in the United States, 2014.
https://www.aphis.usda.gov/animal_health/nahms/dairy/downloads/dairy14/Dairy14_dr_Mastitis.pdf
Date: 2016
Date accessed: July 6, 2022
Rasmussen, 2005
). Additionally, monitoring changes in electrical conductivity (EC; Milner et al., 1996
) and SCC in milk (Lund et al., 1994
) can provide information for CM detection. Accurate and timely detection of CM, which has been explored in several studies (Kamphuis et al., 2010b
; Miekley et al., 2013
; Fadul-Pacheco et al., 2021
), would potentially trigger proper treatment or earlier intervention, reduce economic losses, maintain milk quality, and improve the welfare of cows (Milner et al., 1997
; Leslie and Petersson-Wolfe, 2012
; Michie et al., 2020
).In recent years, with increases in labor costs, automated milking systems (AMS) have experienced rapid development and are being adopted by dairy farms. The AMS technology generates a relatively large amount of consistently recorded data, which has led to progress in developing algorithms that can identify CM accurately and promptly (
Hogeveen et al., 2010
; Sun et al., 2010
; Rutten et al., 2013
). Many of the measurements captured by the AMS software during the milking process, such as milk yield, EC, color of the milk, occurrence of incompletely milked quarters, and kicked-off milk cups have been recognized as potential predictors for CM detection (Norberg et al., 2004
; Jamali et al., 2018
; Khatun et al., 2018
). Furthermore, the large amount of automatically recorded data by the AMS technology has facilitated the use of machine learning (ML) methods to predict and diagnose diseases in dairy cows (Dhoble et al., 2019
; Hyde et al., 2020
). Relative to conventional statistical models, ML methods show more advantages and potentials, such as no assumption requiring data samples to follow a certain probability distribution (Shin et al., 2021
), making rapid examination using large numbers of observations (Fatima and Pasha, 2017
), and identifying trends and patterns that cannot easily be visualized (Deo, 2015
).Some studies have focused on developing ML models to detect cows with CM using AMS data [e.g., a neural network model by
Cavero et al., 2008
, a decision tree model by Kamphuis et al., 2010b
, and a recurrent neural network model by Naqvi et al., 2022a
]. Previous works often relied on metrics of specificity (Sp) and sensitivity (Se) to evaluate model performance. Hogeveen et al., 2021
suggested that a feasible ML scheme for CM detection should have high Se and Sp, and the requirement for the detection performance should be higher for a more severe CM situation. Although many researchers have tried to improve CM detection with data from AMS, there is still a need for improvement in the current models that often have wide ranges in prediction Se (Kamphuis et al., 2010b
; Steeneveld et al., 2010
) or a high error rate (Miekley et al., 2012
).Hence, our study objective was to train and validate ML models to detect CM at the current milking (i.e., the same milking) and to predict the occurrence of CM at the next milking (i.e., 1 milking before CM occurrence). Quarter-level data from multiple milking periods used in this analysis were recorded by AMS software. More importantly, by detecting and predicting CM with high Se and Sp, our modeling analysis could list cow quarters that are likely infected at the current and next milking and would provide proper alerts for farmers to reduce the negative effects of CM. To the best of our knowledge, we are not aware of any study that predicts CM 1 milking prior, although a few studies have worked on detection of the onset of CM (
Kamphuis et al., 2010
; Fadul-Pacheco et al., 2021
), and Anglart et al., 2021
used AMS data to detect the presence of milk clots at the milking of identification as well as predicting clots in advance.MATERIALS AND METHODS
Data and Variables
The quarter-level milking time data used to train and build the ML framework for detecting and predicting the occurrence of CM in this study were collected from a commercial dairy farm located in New York State during summers (June to August) of 2020 and 2021. The quarter-level data set contained 427,596 observations from 373 cows being milked with an AMS; that is, a voluntarily milking system (VMS Classic; DeLaval International AB). Ethics approval was not needed for this study because only routine animal procedures (e.g., milking) were performed. The cows enrolled in our study were Holsteins with average milk production of 13,302 kg/cow per year. During the data collection period, in 2020, 24% of the cows were in the first lactation, 45% in the second lactation, and 31% in their third or later lactation; in 2021, 12, 56, and 32% of the cows were in their first, second, and third or later lactation, respectively. All data computation and cleaning procedures were conducted in Python 3.6.0 (
VanRossum and Drake, 2010
).Nineteen measurements (variables) related to the individual milking event for each cow quarter were evaluated as potential CM predictors. These included 13 measurements directly reported from the herd management system DelPro (DeLaval International AB): milk yield (kg), mean milk flow rate (kg/min), peak milk flow rate (kg/min), EC (mS/cm), box time (min), milking interval between 2 successive milking events (h; data for milking interval received from the AMS were in the time format of hh:mm:ss and then converted to hours), lactation number, DIM, teats not found (yes or no), presence of blood in milk (yes or no), kick-offs (denoting milk cups being kicked off during milking; yes or no), occurrence of incompletely milked quarter (yes or no), smart pulsation ratio (4 levels denoted by integer numbers where a higher number indicated a higher level). For cow-level variables (i.e., box time, milking interval, lactation number, DIM, and smart pulsation ratio), the same values were attached to the corresponding quarter-level records of each cow. Other attributes considered were milk yield ratio, which was calculated as the ratio of a quarter's yield divided by the total yield of a cow for each milking event; EC ratio, equal to a quarter's EC divided by the EC of its corresponding cow by milking; quarter index [denoting 4 quarters of a cow as the left front (LFQ), left hind (LHQ), right front (RFQ), and right hind (RHQ) quarter, respectively]; cow identification number (cow ID); previous occurrence of CM before the current milking by quarter, which was defined during the data collection period, and denoted by 1 or 0 showing if a cow quarter had suffered from CM before the current milking event; the order of the milking event within a day (order of the milking event in a day was denoted by integer numbers from 0 to n; a higher number indicated that a milking event occurred later) and milking frequency per day. In addition, each quarter-level AMS milking record was uniquely identified by cow ID, quarter index, and the start time of the milking. Trained study technicians (n = 8) identified positive CM cases by inspecting milk samples collected (3 times per week in year 1 and 7 times per week in year 2). A positive case of CM was considered present if milk from one or more quarters was abnormal (watery, contained flakes or clots, or changes in color) with or without signs of local inflammation of the affected quarter as previously described (
Erskine et al., 2003
). Cows identified as having CM were either monitored or treated based upon treatment protocols setup by the herd veterinarian. Cows were not forestripped before milking with or without milk sample collection.Data Processing
First, data records under 2 conditions were removed: milking records with missing values, in which the milk yield data for one or more quarters of cows were not captured by the AMS (i.e., blank values for milk yield), and records with improper values that resulted from cows who revisited the AMS several times without yield. Furthermore, 14 d of milking records after the detection of a positive CM incidence for a certain quarter of a cow were removed. In this way, any CM-positive case can be considered as a new event (
Barkema et al., 1998
; Cha et al., 2013
; Hertl et al., 2014
). The cleaned data set that used for modeling analysis after data removal consisted of 389,345 milking observations with 80 CM (positive) cases. Statistics computed over the cleaned data set for key variables are shown in Appendix Table A1, and the occurrences of CM against DIM are depicted in Appendix Figure A1.Other data processing procedures applied in this analysis are as follows:
Feature Encoding.
Because most ML models only accept numerical variables, encoding the categorical variables becomes a necessary step such that the model is able to understand and extract valuable information. All categorical variables were converted into binary numbers (0/1). Thereafter, 3 types of encoding methods were applied, as follows. For binary variables labeled yes or no (i.e., teats not found, presence of blood in milk, kick-offs, and occurrence of incompletely milked quarter), labels were converted into 1 or 0. For quarter index with 4 categories (LFQ, LHQ, RFQ, RHQ), where no ordinal relationship existed among different categories, one-hot encoding (
Seger, 2018
) was used and 4 new binary variables were created. A “1” value was in placed in the binary variable for the quarter that a milking record belongs to, and “0” values for other 3 quarters. For cow ID with high cardinality (more than 100 distinct categories; Moeyersoms and Martens, 2015
), binary encoding was applied to first assign a numerical value to each category of this variable, transfer those integers into the binary code, and, finally, split the digits from the binary string into separate columns (Seger, 2018
; Jackson and Agrawal, 2019
). Nine new binary variables were generated with 1s or 0s, and each cow ID was uniquely represented by a combination of the values from these 9 variables. The encoding results are shown in Appendix Table A2.The dependent variable (y) was a time series variable and denoted whether a cow quarter was identified as negative or positive for CM at a certain milking event, correspondingly equal to 0 or 1.
Format Transformation.
We transformed each time series variable (e.g., milk yield, EC) in the data set to an autoregressive form of order p, AR(p) (
where K is the total number of numerical input variables, and p is the total number of milkings before the current milking (here, p = 3, 5, 7, or 9). Specially, the variable “previous occurrence of CM before the current milking by quarter” was not lagged and only the value at the current milking was included in the modeling analysis, showing whether a cow quarter had suffered from CM or not before the current milking during the data collection period.
Hamilton, 2020
). Hence, in our classification model fitting, the current milking (t) denoted the milking event at which a quarter was identified as negative or positive for CM; for each predictor variable, the lagged p data set contained values observed from the past p milkings before the current milking and observations from the current milking t, denoted as follows:[1]
where K is the total number of numerical input variables, and p is the total number of milkings before the current milking (here, p = 3, 5, 7, or 9). Specially, the variable “previous occurrence of CM before the current milking by quarter” was not lagged and only the value at the current milking was included in the modeling analysis, showing whether a cow quarter had suffered from CM or not before the current milking during the data collection period.
After data transformation, each row of the final data set contained values of the dependent variable and 3 types of predictor variables. For CM detection, the dependent variable was y(t), which denoted a cow quarter with or without CM at the current milking (t); for CM prediction, the dependent variable was y(t + 1), which represented a cow quarter with or without CM at the next milking (t + 1). The predictor variables included numerical variables from the past p milkings and that from the current milking t as shown in [1]; encoded categorical variables used to represent the categorical predictors, such as cow ID and quarter index; and a set of time interval variables, denoting the time intervals between the past milkings and the milking event at which the dependent variable was defined. In particular, for the dependent variable at the current milking y(t), delta (t – 1) denoted the time interval between the past 1 milking (t – 1) and the current milking (t), …, delta (t – p) denoted the time interval between the past pth milking (t – p) and the current milking (t). Then, for the dependent variable at the next milking y(t + 1), delta (t) denoted the time interval between the current milking (t) and the next milking (t + 1), …, delta (t – p) for the time interval between the past pth milking (t – p) and the next milking (t + 1).
Data Splitting, Cross-Validation, and Resampling Techniques.
In ML model development, it is common practice to split the data into training and testing data sets. In this study, the data were split by date, with the purpose of avoiding information leakage from the future to the past (
Zheng and Casari, 2018
), and resulted in a training set containing 80% of the data (from June 1, 2020, to July 13, 2021; 311,476 observations) to train and create the models. The remaining 20% (from July 14, 2021, to August 15, 2021; 77,869 observations) were reserved as the testing set for evaluating model accuracy. This train-test splitting was stratified by 2 classes of the dependent variable to make sure that both the training set and testing set preserved similar proportions of observations in each class as observed in the original data set (Yadav and Shukla, 2016
).Next, we applied repeated stratified 5-fold cross-validation (
Berrar, 2019
) on the training set. The splitting of data into folds is governed by the criteria of “stratified by dependent variable y” to ensure that each fold has the same proportion of observations across different classes of the dependent variable (Zeng and Martinez, 2000
). One fold is used for validation (evaluating model performance), and the remaining 4 groups are merged together as a subset of the training data (i.e., training subset) for fitting a model. This 5-fold cross-validation procedure was repeated 3 times.Furthermore, the cleaned data set for modeling analysis only had 80 CM (positive) records, accounting for 0.02% of total records; this data imbalance could negatively affect model prediction. Classification models perform better on the majority class and are prone to categorize unseen observations into the majority class. Classifiers might be more likely to misclassify rare instances or even ignore them (
Ali et al., 2019a
). To address the issues induced by data imbalance, after splitting the training data set into 5 folds, 4 of those 5 folds were merged as the training subset and resampled for training the ML models. Additionally, no resampling method was applied to the remaining validation fold. Either the small number of positive CM records was oversampled or the large number of negative CM records was undersampled. Several resampling methods were tested: random oversampling, synthetic minority oversampling technique (SMOTE; Cateni et al., 2014
), adaptive synthetic (ADASYN) oversampling (Zhang et al., 2018
), random undersampling, one-sided selection (Bach et al., 2019
), near-miss undersampling (Zhang and Mani, 2003
), and edited nearest neighbors undersampling (Bach et al., 2017
).Algorithms Evaluated
Logistic Regression.
Logistic regression (LR) is a fundamental algorithm for disease detection (
Green et al., 2002
). It computes the probability that a set of input variables belongs to a discrete outcome. With the assumption that all input features are independent, the LR algorithm typically uses maximum likelihood estimation to estimate parameters (Khanna et al., 2015
). Because the dependent variable in this study was binary, a binary LR method was utilized for classification.Support Vector Machine.
The support vector machine (SVM) algorithm generates a decision boundary to optimally separate the 2 classes of the dependent variable. It can effectively reduce the computational complexity arising from the high dimensionality of the input variables (
Yu et al., 2010
; Miekley et al., 2013
; Schölkopf and Smola, 2018
). A linear SVM method that assumes that the 2 classes are linearly separable and a nonlinear SVM method that can classify nonlinearly separable data were both applied to investigate the effective separation of CM positive and negative classes (Olson and Delen, 2008
).Naïve Bayes.
The naïve Bayes (NB) algorithm is a probabilistic classification technique based on Bayes' theorem. It assumes that the input features are conditionally independent given the target class. Although this strong assumption is rarely met in real-life data, this algorithm often outperforms even highly sophisticated classification methods (
Vembandasamy et al., 2015
). Three NB methods with different assumptions for the input data were explored in this analysis: Gaussian NB (Ali et al., 2019b
), multinomial NB (Rennie et al., 2003
), and Bernoulli NB (Metsis et al., 2006
).Decision Tree–Based Ensemble Algorithm.
The decision tree (DT)–based ensemble algorithm includes a set of algorithms that have the potential of achieving better performance than a single algorithm (
Doupe et al., 2019
) and are invariant to feature scaling as they are robust to monotonic transformations of variables (Xiao et al., 2017
). Hence, relative to other algorithms mentioned above, more methods under the DT-based ensemble algorithm were applied in this study: random forest (RF), adaptive boosting (AdaBoost), gradient boosting (GB), and bagging with GB classification trees.The RF method usually shows high performance due to its randomness: it creates an uncorrelated forest of decision trees by training each tree with a bootstrap sample from the training data and a randomly selected subset of input features (
Breiman, 2001
). In this study, the balanced RF method was implemented as recommended by Chen et al., 2004
, Kobyliński and Przepiórkowski, 2008
, and Agusta and Adiwijaya, 2019
) to better address data imbalance. The AdaBoost and GB methods are both additive and combine several DT methods to create a strong predictive method (Dev and Eden, 2019
; Schapire, 2013
). In particular, 3 GB methods were evaluated: histogram-based GB (HGB; Hossain and Deb, 2021
), light GB (LGB; Ke et al., 2017
), and eXtreme GB (XGB; Chen et al., 2018
). Other methods used were those that combine the bagging technique (Breiman, 1996
) with different GB classification trees; that is, bagging with histogram-based GB classification tree (BHGB), bagging with light GB (BLGB), and bagging with eXtreme GB (BXGB). The balanced bagging approach that can better deal with class imbalance (Zareapoor and Shamsolmoali, 2015
; Blaszczyński and Stefanowski, 2017
) was applied. All ML methods used in this study are listed in Table 1.Table 1Machine learning algorithms, specific methods, and possible hyperparameter settings evaluated
Algorithm | Method | Hyperparameter and space of possible values |
---|---|---|
Logistic regression (LR) | Binary LR | Norm of the regularization: [“none,” “l1,” “l2,” “elasticnet”] |
Support vector machine (SVM) | Linear SVM | Regularization parameter: [100, 10, 1.0, 0.1, 0.001] |
Nonlinear SVM | ||
Naïve Bayes (NB) | Gaussian NB | Variance smoothing, which is the portion of the largest variance of all features that is added to variances for calculation stability, from 0.0001 to 1 with the increment value equal to 0.005 |
Multinomial NB | Additive (Laplace/Lidstone) smoothing parameter: [0.01, 0.1, 0.5, 1.0, 10.0] | |
Bernoulli NB | ||
Decision tree (DT)–based ensemble algorithm | Random forest (RF) | Maximum number of DT classifiers: from 50 to 1,500 with the increment value equal to 50; maximum depth of each tree: “None” or from 1 to 20 with the increment value equal to 1; Minimum number of samples used to split an internal DT node: from 1 to 20 with the increment value equal to 1; Number of features used when looking for the best split: [“sqrt,” “log2,” “None”]; Weights associated with classes of the dependent variable: [“none,” “balanced,” “balanced_subsample”] |
Adaptive boosting (AdaBoost) | ||
Gradient boosting (GB); Histogram-based GB (HGB); Light GB (LGB); eXtreme GB (XGB) | ||
Bagging with gradient boosting classification trees (BGB); Bagging with histogram-based GB (BHGB); Bagging with light GB (BLGB); Bagging with eXtreme GB (BXGB) |
1 Details described in the Scikit-learn library (https://scikit-learn.org/stable/).
Hyperparameter Tuning
Hyperparameters are the internal coefficients or weights for a method that need to be defined before the training process. Table 1 lists the hyperparameters that were tuned and the space of the possible hyperparameter values that were searched for all ML methods evaluated in this study. The detailed definitions of the hyperparameters were described in Scikit-learn (
Pedregosa et al., 2011
; ). To achieve optimal hyperparameters, the grid search technique was used (Shekar and Dagnew, 2019
).Modeling Framework
The modeling framework developed in this study consisted of 2 phases to detect CM at the same milking and predict CM 1 milking before occurrence using the lagged 3, 5, 7, and 9 data sets. Phase 1 selected the models with the best performance across different ML methods and correspondingly the resampling methods used on the training sets. Phase 2 used the finalized classification models on the milking records reserved in the testing sets to assess the performance of the different models on new data. Sensitivity, Sp, and the area under the receiver operating characteristic (AUC-ROC) curve were used to evaluate model performance.
In phase 1, all models across different ML methods as described earlier were evaluated by the stratified 5-fold cross-validation on the training sets of the lagged 3, 5, 7, and 9 data sets. Every model was trained on the non-resampled (i.e., original) data and resampled 4 out of 5 folds of each training set. Then, the model that produced the best results on the remaining validation fold of that training set was selected. At the end, the models with the best performance and the corresponding resampling methods used for the 4 types of lagged data sets were obtained. In addition, to take both Sp and Se into account, Sp was fixed in a certain range and then the model that resulted from the ML method and resampling technique that led to the highest Se was selected. More specifically, 4 Sp intervals based on the results from the validation fold were defined in our analysis: Sp ≥99% (99% Sp interval), 95% ≤ Sp <99% (95% Sp interval), 90% ≤ Sp <95% (90% Sp interval), and 85% ≤ Sp <90% (85% Sp interval).
In phase 2, the combinations of model and resampling method selected from phase 1 for the lagged 3, 5, 7, and 9 data sets by each Sp interval were used to classify quarters with or without CM on their corresponding testing sets. For every Sp interval, the selected model was fit on the entire data of the training set, which were resampled by the corresponding resampling method selected, and then used to classify the data of the testing set. This “fitting and classifying” loop was repeated N times. Because N > 30 is expected to result in relatively stable model outcomes (
Hogg et al., 2015
; Machin et al., 2018
), N = 51 was used in this study. For the testing set, each time the model reported a value (0/1) to classify the quarter-level milking event as negative or positive for CM, and the class receiving the majority votes after N reported values was the final outcome for an observation (Rojarath et al., 2016
).RESULTS
CM Prediction One Milking Before CM Occurrence
The balanced RF model and balanced BGB model were preferred over other models by outputting relatively higher Se at various intervals of Sp for CM prediction at the next milking (t + 1). Note that when Sp was in the 95% and 99% Sp intervals, the selected models paired with an oversampling method produced the highest Se; for lower levels of Sp, undersampling methods resulted in models with the best Se. Models with the lagged 7 data set performed slightly better than those with the lagged 3 and 5 data sets, and no improvement was obtained via modeling with the lagged 9 data set. In particular, comparing the results on the validation folds across 4 different lagged data sets (Table 2 and Appendix Table A3) with Sp being equal to 99%, the combination of the balanced BHGB model, random oversampling method, and the lagged 7 data set performed the best, with 37% Se and 68% weighted AUC-ROC. When lowering Sp to the 95% Sp interval, Se increased by 31 percentage points with a weighted AUC-ROC of 82% and the balanced RF model resulted in the best performance. Further increases in Se could be achieved by decreasing Sp to a lower Sp interval.
Table 2Model evaluation results for predicting clinical mastitis at the next milking (t + 1)
Evaluation data set | Item | Input data structure | |||
---|---|---|---|---|---|
Lagged 3 | Lagged 5 | Lagged 7 | Lagged 9 | ||
99% Sp interval on validation fold: [99%, 100%] | |||||
Validation fold | Sp | 99% | 99% | 99% | 99% |
Se | 35% | 34% | 37% | 34% | |
Weighted AUC-ROC | 67% | 66% | 68% | 66% | |
Resampling method | SMOTE | SMOTE | Random oversampling | Random oversampling | |
Model | Balanced BHGB | Balanced BHGB | Balanced BHGB | Balanced BHGB | |
Testing set | Sp | 98% | 98% | 98% | 98% |
Se | 45% | 45% | 55% | 41% | |
Weighted AUC-ROC | 72% | 72% | 76% | 70% | |
95% Sp interval on validation fold: [95%, 99%) | |||||
Validation fold | Sp | 95% | 95% | 95% | 95% |
Se | 67% | 63% | 68% | 64% | |
Weighted AUC-ROC | 81% | 79% | 82% | 79% | |
Resampling method | SMOTE | Random oversampling | Random oversampling | Random oversampling | |
Model | Balanced BHGB | Balanced RF | Balanced RF | Balanced RF | |
Testing set | Sp | 94% | 93% | 93% | 93% |
Se | 77% | 77% | 82% | 82% | |
Weighted AUC-ROC | 85% | 85% | 87% | 87% | |
90% Sp interval on validation fold: [90%, 95%) | |||||
Validation fold | Sp | 91% | 91% | 91% | 91% |
Se | 77% | 72% | 76% | 74% | |
Weighted AUC-ROC | 84% | 82% | 83% | 82% | |
Resampling method | Random undersampling | Random undersampling | Random undersampling | Random undersampling | |
Model | Balanced BHGB | Balanced BXGB | Balanced BLGB | Balanced BHGB | |
Testing set | Sp | 89% | 89% | 89% | 89% |
Se | 77% | 82% | 82% | 82% | |
Weighted AUC-ROC | 83% | 85% | 85% | 85% | |
85% Sp interval on validation fold: [85%, 90%) | |||||
Validation fold | Sp | 86% | 85% | 86% | 85% |
Se | 83% | 79% | 69% | 62% | |
Weighted AUC-ROC | 84% | 82% | 77% | 73% | |
Resampling method | One-sided selection | Random undersampling | Random undersampling | Random undersampling | |
Model | Balanced RF | Balanced RF | Balanced RF | Balanced RF | |
Testing set | Sp | 81% | 80% | 80% | 81% |
Se | 91% | 95% | 95% | 77% | |
Weighted AUC-ROC | 86% | 88% | 88% | 79% |
1 The data were split into training and testing data sets. Within the training data set, the validation fold was used to assess model performance as part of the model selection process.
2 Sp = specificity; Se = sensitivity; AUC-ROC = area under the receiver operating characteristic curve.
3 The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
4 Please refer to the Appendix for the detailed settings of resampling methods and models.
5 SMOTE = synthetic minority oversampling technique.
6 BHGB = bagging ensemble of histogram-based gradient boosting.
7 RF = random forest.
8 BXGB = bagging ensemble of extreme gradient boosting.
9 BLGB = bagging ensemble of light gradient boosting.
Moreover, relative to the model results on the validation folds used for model selection, the prediction results on the testing sets of 4 different lagged data sets showed that the Se increased (by up to 26 percentage points in some instances) with only a slight decline in the Sp (1 to 6 percentage points). For example, in the 99% Sp interval, the Se obtained from the testing set of the lagged 7 data set was 18 percentage points higher than that on the validation fold, whereas the Sp decreased to 98% and the weighted AUC-ROC increased by 8 percentage points (Table 2).
CM Detection at the Same Milking
The model results for detecting CM at the current milking (t) across the 4 lagged data sets illustrated that the balanced RF model and balanced BGB model on the lagged 9 data set tended to perform better than other model and data set combinations by generating relatively higher Se values across different Sp intervals. In the 99% Sp interval, the balanced BHGB model with ADASYN oversampling method led to the highest Se (57%) on the validation fold of the lagged 9 data set, in contrast to that of other lagged data sets (Table 3 and Appendix Table A4). Applying this model to the testing set of the lagged 9 data set, Se increased to 73% with an Sp of 98%. For the Sp levels that were lower than 95%, the DT-based models facilitated by undersampling methods yielded the highest Se.
Table 3Model evaluation results for detecting clinical mastitis at the current milking (t)
Evaluation data set | Item | Input data structure | |||
---|---|---|---|---|---|
Lagged 3 | Lagged 5 | Lagged 7 | Lagged 9 | ||
99% Sp interval on validation fold: [99%, 100%] | |||||
Validation fold | Sp | 99% | 99% | 99% | 99% |
Se | 54% | 54% | 53% | 57% | |
Weighted AUC-ROC | 77% | 76% | 76% | 78% | |
Resampling method | Random oversampling | Random oversampling | Random oversampling | ADASYN | |
Model | Balanced BHGB | Balanced BHGB | Balanced BHGB | Balanced BHGB | |
Testing set | Sp | 98% | 98% | 98% | 98% |
Se | 68% | 64% | 68% | 73% | |
Weighted AUC-ROC | 83% | 81% | 83% | 85% | |
95% Sp interval on validation fold: [95%, 99%) | |||||
Validation fold | Sp | 96% | 96% | 96% | 97% |
Se | 75% | 76% | 79% | 79% | |
Weighted AUC-ROC | 85% | 86% | 88% | 88% | |
Resampling method | SMOTE | SMOTE | SMOTE | ADASYN | |
Model | Balanced BXGB | Balanced BHGB | Balanced BHGB | Balanced BLGB | |
Testing set | Sp | 94% | 94% | 95% | 95% |
Se | 77% | 77% | 77% | 82% | |
Weighted AUC-ROC | 86% | 86% | 86% | 88% | |
90% Sp interval on validation fold: [90%, 95%) | |||||
Validation fold | Sp | 90% | 93% | 93% | 93% |
Se | 83% | 81% | 84% | 84% | |
Weighted AUC-ROC | 86% | 87% | 89% | 89% | |
Resampling method | NearMiss (version 1) | Random undersampling | Random undersampling | Random undersampling | |
Model | Balanced BHGB | Balanced BHGB | Balanced BHGB | Balanced BHGB | |
Testing set | Sp | 86% | 91% | 91% | 91% |
Se | 91% | 86% | 82% | 86% | |
Weighted AUC-ROC | 88% | 89% | 87% | 89% | |
85% Sp interval on validation fold: [85%, 90%) | |||||
Validation fold | Sp | 85% | 88% | 88% | 87% |
Se | 87% | 86% | 87% | 88% | |
Weighted AUC-ROC | 86% | 87% | 88% | 88% | |
Resampling method | NearMiss (version 1) | Random undersampling | No resampling | Random undersampling | |
Model | Balanced RF | Balanced RF | Balanced RF | Balanced RF | |
Testing set | Sp | 78% | 85% | 84% | 83% |
Se | 95% | 95% | 91% | 95% | |
Weighted AUC-ROC | 87% | 90% | 88% | 89% |
1 The data were split into training and testing data sets. Within the training data set, the validation fold was used to assess model performance as part of the model selection process.
2 Sp = specificity; Se = sensitivity; AUC-ROC = area under the receiver operating characteristic curve.
3 The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
4 Please refer to Appendix for the detailed settings of resampling methods and models.
5 ADASYN = the adaptive synthetic oversampling method.
6 BHGB = bagging ensemble of histogram-based gradient boosting.
7 SMOTE = synthetic minority oversampling technique.
8 BXGB = bagging ensemble of extreme gradient boosting.
9 BLGB = bagging ensemble of light gradient boosting.
10 RF = random forest.
In comparing model performance for detecting CM at the current milking (t) with that for predicting CM at the next milking (t + 1) by Sp interval, the results showed that on the validation folds of the different lagged data sets, Se and Sp were generally higher under the CM detection scenario than under the CM prediction scenario, whereas the results from the testing sets under these 2 scenarios were close, except for the 99% interval. For example, in the 95% Sp interval, for the lagged 9 data set, the Se on the validation fold for CM prediction was 15 percentage points lower than that for CM detection, whereas on the testing set, the Se was the same (82%) for detecting and predicting CM.
DISCUSSION
The ML framework developed in this study demonstrated a promising ability to detect CM at the same milking and predict CM 1 milking before occurrence at the quarter level by integrating DT-based ensemble models with resampling techniques. Moreover, the input variables used in the modeling analysis were based on AMS data (except the variable of previous occurrence of CM before the current milking by quarter). Hence, the ML framework developed in this study could contribute to the literature by offering insights into the applications of ML with AMS data to achieve accurate CM diagnosis.
The targeted Se and Sp levels that detection models should achieve for practical utilization are still under discussion. Some studies have suggested that for detecting cows with CM, models should have an Se of ≥80% (
Hillerton, 2000
; Hogeveen et al., 2010
, Hogeveen et al., 2021
). The International Organization for Standardization (ISO, 2007
) recommend that Se be >70% with an Sp level of 99%. However, the Se and Sp requirements mentioned have rarely been met to date (Khatun et al., 2018
; Anglart et al., 2021
; Hogeveen et al., 2021
). Enforcing an Sp of 99% using the current ML framework resulted in a maximum Se of 73%, which is close to but still below 80%. To achieve the recommended Se and Sp levels, Kamphuis et al., 2010b
widened the time window of detection to, for example, 4 d before and 3 d after a CM observation. However, the utility of alerts for farmers to reduce the negative effects of CM is diminished if the alert is not generated until days after the onset of CM. A narrower window of detection that does not extend beyond the onset of CM is likely more useful from a management perspective.When comparing model performance between studies, it is important to consider the protocol used to define CM because it often differs by study. For example, CM definitions from recent studies included various abnormalities in milk (
Kamphuis et al., 2010
), a focus on the presence of clots in milk (Anglart et al., 2021
), and determination through SCC thresholds (Cavero et al., 2008
). Accordingly, compared with previous studies in which the definition of CM was close to that in the current study, this study showed similar detection performance in terms of Se and Sp but represents an improvement in automated CM detection for 2 reasons. First, our methodology focused on incorporating AMS data to train the ML models; second, it assigned a CM classification at the time point of each milking, in contrast to applying a longer time window for CM detection. Khatun et al., 2018
used similar AMS data and showed a relatively high Se (90%) and Sp (91%) for CM detection. Naqvi et al., 2022a
fed the time series data from AMS farms into recurrent neural networks for CM detection and reported 90% Se with an Sp of 86%. Furthermore, good detection performance can be achieved when a short time window is defined, such as the narrow time window of less than 24 h before CM observation considered by Kamphuis et al., 2010b
, which led to an Se of 75% by fixing the Sp at 93%. The DT-based ensemble models were also applied in a recent study by Fadul-Pacheco et al., 2021
to detect the onset of CM on a daily basis; they reported an Se of 85% but Sp varied from 31 to 62%. Relative to these studies, the ML framework in this study effectively fit the DT-based ensemble models on data mainly from AMS to detect CM in a single milking, and resulted in a higher Sp of 95% with a similar Se (82%). The relatively good detection performance found in the current study is likely because the DT-based ensemble models (e.g., RF model) can increase overall classification accuracy by aggregating a group of base ML models (i.e., base classifiers) that will outperform a single classifier (Yang et al., 2010
; Sagi and Rokach, 2018
). Additionally, the inclusion of some predictor variables, such as EC from past milkings or previous occurrences of CM, may contribute to the improvement in CM detection, which aligns with findings in the literature (Norberg et al., 2004
; Khatun et al., 2018
).The short period of detection, which reduces the probability for a cow to have a new case of CM, and low incidence of CM resulted in an observed data set that was highly imbalanced. Past studies have addressed this imbalance by randomly undersampling the data of healthy cows to balance the output classes (
Ankinakatte et al., 2013
; Khatun et al., 2018
). In this study, both oversampling and undersampling methods were applied on the training set during the model selection phase. In addition to the model fitting method, the corresponding resampling techniques that facilitated the models to generate the best performance were obtained. For both CM detection at the same milking and CM prediction 1 milking prior, we found that oversampling methods enabled the DT-based ensemble models to achieve the highest Se at an Sp level of >90%. One explanation for the improved performance of oversampling methods is that the challenges of misclassifying or ignoring rare instances induced by imbalanced data (Ali et al., 2019a
) can be alleviated through oversampling with all information in the original data being kept (Park and Park, 2021
). Interestingly, when lowering the Sp level to be <90%, combining models with undersampling methods produced the best performance. Although there is a risk of eliminating or ignoring important information when using undersampling methods, these methods have also been shown to increase detection performance. For example, near-miss undersampling can optimize the allocation of data samples by randomly removing instances from the majority class to provide a stable data distribution boundary between 2 classes (Gunturi and Sarkar, 2021
; Budianto et al., 2022
).Time windows have been widely used in extant studies, during which if one alarm is generated by the model, an instance is considered correctly detected (
de Mol et al., 1997
; Cavero et al., 2008
; Kramer et al., 2009
; Miekley et al., 2013
). However, there are some drawbacks to the application of time windows, and one can argue that Se calculated over a time window does not properly reflect a model's capability to detect CM incidents. For example, the levels of Se and Sp will be higher in detection models that use wider time windows for alerts than in models with narrower time windows, but the significant increases in Se and Sp are mainly due to the length of time in which the disease can occur rather than improvements in the model detection (Sherlock et al., 2008
; Kamphuis et al., 2010b
). Models that use large time windows showing high Se might not create an alert in a short enough time interval or at the time required for a management response (Hogeveen and Ouweltjes, 2003
). Furthermore, the definition of time windows varies by study, which, to some extent, increases the difficulty of comparing detection performance among analyses (Kamphuis et al., 2010a
). In the current study, the ML framework showed good performance in detecting single milkings with or without CM, which could generate CM alerts for farmers at the milking when CM occurs. Moreover, its potential of predicting CM 1 milking before occurrence would flag cow quarters that are likely to have CM at the next milking with the purpose of warning farmers in advance and helping reduce the negative effects of CM.In terms of practical applications, a high number of false alerts is undesirable because it increases the number of labor hours required for mastitis monitoring. A low level of false-positive alerts is essential for farmers (
Steeneveld et al., 2010
) and, under some circumstances, even more important than correctly identifying all CM positive cases (Claycomb et al., 2009
; Mollenhorst et al., 2012
; Anglart et al., 2021
). Our ML framework was able to decrease false-positive alerts compared with previously reported models while keeping Se at a similar or higher level. To our knowledge, no existing works have reported a method for CM prediction that can produce an alert for CM before it occurs. The study by Anglart et al., 2021
used multilayer perceptron models to predict single milkings containing clots or not, and the highest Se obtained was 25% with an Sp of 98%. The ML framework in this study exhibited a significant increase in Se of 57% and a small decline (5 percentage points) in Sp when predicting per-milking CM 1 milking before CM occurrence, and thus it could be an effective mechanism in AMS farms to forecast CM.The results from the developed ML framework are promising in both detecting and predicting CM. There are, however, several methodologies and data features that should be explored to further improve model accuracy and thereby increase the feasibility of adopting this modeling framework on dairy farms. Future work should include expansion of the ML methods evaluated to consider techniques such as the recurrent neural networks reported by
Naqvi et al., 2022a
,Naqvi et al., 2022b
) and development of more robust methods to deal with milking records with missing values, such as imputation methods (Emmanuel et al., 2021
). To overcome the limitation of generalization of the proposed methods and increase the number of CM observations in the imbalanced data, future work should expand the data set by collecting data from more AMS farms and in different seasons. Further, although quarters are often considered to be physiologically distinct and treated independently as in this study, exploration of markers of inflammation and the onset of CM in all 4 quarters, rather than individual quarters, could be an alternative approach to identify cows that are at risk for CM. As AMS sensor technology expands, opportunities to integrate data from other dairy management software exist, and the benefit of additional input variables should also be explored.CONCLUSIONS
The ML framework developed in this study showed the possibility of using imbalanced data recorded by AMS to properly detect CM at the same milking and to predict CM 1 milking before occurrence. Combining the DT-based ensemble models with oversampling techniques achieved a relatively high Se (82%) and Sp (95% for CM detection and 93% for CM prediction). The Se could be increased from 82 to 95% when the Sp level decreased to 80 to 83%, and this situation could be achieved by applying the DT-based ensemble models with undersampling methods. In addition, creating models with AMS data from the past 7 to 9 milkings (approximately 3 d) is recommended to identify positive CM cases for farmers.
ACKNOWLEDGMENTS
This project was funded by Cornell Institute for Digital Agriculture (Ithaca, NY). The authors acknowledge and thank the AMS dairy farm, veterinarians, and students who participated and assisted with milk sample and data collection. The authors have not stated any conflicts of interest.
APPENDIX

Figure A1Occurrence (y-axis) of clinical mastitis (CM) observed by DIM in the data. The data were cleaned by removing missing values that were not captured by the automatic milking system and 14 d of milking records after the detection of a positive CM case for a certain quarter of a cow.
Table A1Statistical summary for key input variables of the automatic milking system (AMS) data
Variable | No. of observations | Mean | SD | Minimum | 1st quartile | 2nd quartile (median) | 3rd quartile | Maximum |
---|---|---|---|---|---|---|---|---|
Milk yield (kg) | ||||||||
Negative | 389,265 | 3.40 | 1.45 | 0.00 | 2.42 | 3.26 | 4.27 | 12.60 |
Positive | 80 | 2.80 | 1.76 | 0.00 | 1.37 | 2.71 | 3.87 | 8.10 |
Mean milk flow rate (kg/min) | ||||||||
Negative | 389,265 | 1.15 | 0.37 | 0.00 | 0.90 | 1.14 | 1.38 | 4.50 |
Positive | 80 | 1.10 | 0.45 | 0.00 | 0.78 | 1.08 | 1.44 | 2.16 |
Peak milk flow rate (kg/min) | ||||||||
Negative | 389,265 | 1.64 | 0.47 | 0.00 | 1.38 | 1.62 | 1.92 | 5.10 |
Positive | 80 | 1.68 | 0.62 | 0.00 | 1.31 | 1.62 | 2.04 | 3.48 |
Electrical conductivity (mS/cm) | ||||||||
Negative | 389,265 | 4.73 | 0.70 | 0.00 | 4.50 | 4.74 | 5.02 | 12.06 |
Positive | 80 | 5.63 | 1.39 | 0.00 | 5.15 | 5.48 | 6.01 | 8.84 |
Box time (min) | ||||||||
Negative | 389,265 | 7.13 | 2.06 | 0.60 | 5.70 | 6.77 | 8.13 | 12.55 |
Positive | 80 | 11.00 | 4.67 | 4.32 | 8.03 | 10.32 | 51.87 | 34.92 |
Milking interval (h) | ||||||||
Negative | 389,265 | 10.32 | 50.40 | 0.00 | 6.19 | 7.98 | 10.26 | 2,476.14 |
Positive | 80 | 11.16 | 4.52 | 0.01 | 7.98 | 10.14 | 13.44 | 24.51 |
DIM (d) | ||||||||
Negative | 389,265 | 169.01 | 111.50 | 0.00 | 73.00 | 161.00 | 248.00 | 556.00 |
Positive | 80 | 142.00 | 98.22 | 5.00 | 71.50 | 116.50 | 204.00 | 460.00 |
Milking frequency per day | ||||||||
Negative | 389,265 | 2.75 | 0.84 | 1.00 | 2.00 | 3.00 | 3.00 | 5.00 |
Positive | 80 | 1.58 | 0.57 | 1.00 | 1.00 | 2.00 | 2.00 | 3.00 |
1 The data were cleaned by removing missing values that were not captured by the AMS and 14 d of milking records after the detection of a positive clinical mastitis incidence for a certain quarter of a cow.
Table A2Binary encoding results of representing the cow ID variable by 9 new binary variables
Cow ID | Cow ID 1 | Cow ID 2 | Cow ID 3 | Cow ID 4 | Cow ID 5 | Cow ID 6 | Cow ID 7 | Cow ID 8 | Cow ID 9 |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
399 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
450 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
568 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
674 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
710 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
771 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
792 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
795 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
796 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
825 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
828 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
859 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
12491 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
1 Because the data set has too many unique Cow ID (about 400), only some are listed in this table.
Table A3Model selected for predicting clinical mastitis at the next milking (t+ 1)
Method | Input data structure | |||
---|---|---|---|---|
Lagged 3 | Lagged 5 | Lagged 7 | Lagged 9 | |
99% specificity (Sp) interval on validation fold: [99%, 100%] | ||||
Selected resampling method | SMOTE | SMOTE | Random oversampling | Random oversampling |
Resampling method setting | Ratio = 0.0025 | Ratio = 0.0025 | Ratio = 0.0025 | Ratio = 0.0025 |
Selected model | Balanced BHGB | Balanced BHGB | Balanced BHGB | Balanced BHGB |
95% Sp interval on validation fold: [95%, 99%) | ||||
Selected resampling method | SMOTE | Random oversampling | Random oversampling | Random oversampling |
Resampling method setting | Ratio = 0.0005 | Ratio = 0.0005 | Ratio = 0.0005 | Ratio = 0.0005 |
Selected model | Balanced BHGB | Balanced RF | Balanced RF | Balanced RF |
90% Sp interval on validation fold: [90%, 95%) | ||||
Selected resampling method | Random undersampling | Random undersampling | Random undersampling | Random undersampling |
Resampling method setting | Ratio = 0.005 | Ratio = 0.005 | Ratio = 0.0007 | Ratio = 0.001 |
Selected model | Balanced BHGB | Balanced BXGB | Balanced BLGB | Balanced BHGB |
85% Sp interval on validation fold: [85%, 90%) | ||||
Selected resampling method | One-sided selection | Random undersampling | Random undersampling | Random undersampling |
Resampling method setting | n_neighbors = 3, n_seeds_S = 5,000 | Ratio = 0.005 | Ratio = 0.1 | Ratio = 0.1 |
Selected model | Balanced RF | Balanced RF | Balanced RF | Balanced RF |
1 The current milking event was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
2 SMOTE = synthetic minority oversampling technique.
3 The number of samples in minority class/the number of samples in the majority class.
4 Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
5 BHGB = bagging with histogram-based gradient boosting (the number of base estimators is equal to 100).
6 RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
7 BXGB = bagging with extreme gradient boosting (the number of base estimators is equal to 100).
8 BLGB = bagging with light gradient boosting (the number of base estimators is equal to 100).
9 The size of the neighborhood being considered to compute the nearest neighbors.
10 The number of samples to extract to build the set.
11 RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced,” and the number of base estimators is equal to 100).
Table A4.Model selected for detecting clinical mastitis at the current milking (t)
Method | Input data structure | |||
---|---|---|---|---|
Lagged 3 | Lagged 5 | Lagged 7 | Lagged 9 | |
99% specificity (Sp) interval on validation fold: [99%, 100%] | ||||
Selected resampling method | Random oversampling | Random oversampling | Random oversampling | ADASYN |
Resampling method setting | Ratio = 0.0025 | Ratio = 0.0025 | Ratio = 0.0025 | Ratio = 0.0025 |
Selected model | Balanced BHGB | Balanced BHGB | Balanced BHGB | Balanced BHGB |
95% Sp interval on validation fold: [95%, 99%) | ||||
Selected resampling method | SMOTE | SMOTE | SMOTE | ADASYN |
Resampling method setting | Ratio = 0.0005 | Ratio = 0.0005 | Ratio = 0.0005 | Ratio = 0.0005 |
Selected model | Balanced BXGB | Balanced BHGB | Balanced BHGB | Balanced BLGB |
90% Sp interval on validation fold: [90%, 95%) | ||||
Selected resampling method | NearMiss (version 1) | Random undersampling | Random undersampling | Random undersampling |
Resampling method setting | Ratio = 0.0007 | Ratio = 0.0007 | Ratio = 0.01 | Ratio = 0.05 |
Selected model | Balanced BHGB | Balanced BHGB | Balanced BHGB | Balanced BHGB |
85% Sp interval on validation fold: [85%, 90%) | ||||
Selected resampling method | NearMiss (version 1) | Random undersampling | No resampling | Random undersampling |
Resampling method setting | Ratio = 0.0007 | Ratio= 0.01 | Not applicable | Ratio = 0.001 |
Selected model | Balanced RF | Balanced RF | Balanced RF | Balanced RF |
1 The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
2 ADASYN = adaptive synthetic oversampling method.
3 The number of samples in minority class/the number of samples in the majority class.
4 Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
5 BHGB = bagging ensemble of histogram-based gradient boosting (the number of base estimators is equal to 100).
6 SMOTE = synthetic minority oversampling technique.
7 BXGB = bagging ensemble of extreme gradient boosting (the number of base estimators is equal to 100).
8 BLGB = bagging ensemble of light gradient boosting (the number of base estimators is equal to 100).
9 NearMiss (version 1) is an undersampling method that selects examples from the majority class that have the smallest average distance to the 3 closest examples from the minority class.
10 RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample” and the number of base estimators is equal to 100).
11 RF = random forest (the strategy used to assign weights to majority and minority classes is “none” and the number of base estimators is equal to 100).
REFERENCES
- Modified balanced random forest for improving imbalanced data prediction.Int. J. Adv. Intell. Informatics. 2019; 5: 58-65
- Imbalance class problems in data mining: A review.Indones. J. Electr. Eng. Comput. Sci. 2019; 14: 1560-1571
- A feature-driven decision support system for heart failure prediction based on statistical model and Gaussian naive Bayes.Comput. Math. Methods Med. 2019; 2019 (31885684)6314328
- Detecting and predicting changes in milk homogeneity using data from automatic milking systems.J. Dairy Sci. 2021; 104 (34218914): 11009-11017
- Predicting mastitis in dairy cows using neural networks and generalized additive models: A comparison.Comput. Electron. Agric. 2013; 99: 1-6
- The proposal of undersampling method for learning from imbalanced datasets.Procedia Comput. Sci. 2019; 159: 125-134
- The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis.Inf. Sci. 2017; 384: 174-190
- Effect of repeated episodes of generic clinical mastitis on milk yield in dairy cows.J. Dairy Sci. 2007; 90 (17881685): 4643-4653
- The cost of generic clinical mastitis in dairy cows as estimated by using dynamic programming.J. Dairy Sci. 2008; 91 (18487643): 2205-2214
- Incidence of clinical mastitis in dairy herds grouped in three categories by bulk milk somatic cell counts.J. Dairy Sci. 1998; 81 (9532494): 411-419
- Cross-validation.Encycl. Bioinform. Comput. Biol. 2019; 1: 542-545
- Actively balanced bagging for imbalanced data.in: Proc. Int. Symp. Methodologies for Intelligent Systems. Springer, 2017: 271-281
- Bagging predictors.Mach. Learn. 1996; 24: 123-140
- Random forests.Mach. Learn. 2001; 45: 5-32
- Machine learning-based approach on dealing with binary classification problem in imbalanced financial data.in: 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). IEEE, 2022: 152-156
- A method for resampling imbalanced datasets in binary classification tasks for real-world problems.Neurocomputing. 2014; 135: 32-41
- Mastitis detection in dairy cows by application of neural networks.Livest. Sci. 2008; 114: 280-286
- The effect of repeated episodes of bacteria-specific clinical mastitis on mortality and culling in Holstein dairy cows.J. Dairy Sci. 2013; 96 (23769361): 4993-5007
- Using random forest to learn imbalanced data. Tech. Rep. 666.Department of Statistics, University of California, 2004
- EGBMMDA: Extreme gradient boosting machine for MiRNA-disease association prediction.Cell Death Dis. 2018; 9 (29305594): 3
- An automated in-line clinical mastitis detection system using measurement of conductivity from foremilk of individual udder quarters.N. Z. Vet. J. 2009; 57 (19649014): 208-214
- Results of a multivariate approach to automated oestrus and mastitis detection.Livest. Prod. Sci. 1997; 48: 219-227
- Machine learning in medicine.Circulation. 2015; 132 (26572668): 1920-1930
- Gradient boosted decision trees for lithology classification.Computer-Aided Chem. Eng. 2019; 47: 113-118
- Cytometric fingerprinting and machine learning (CFML): A novel label-free, objective method for routine mastitis screening.Comput. Electron. Agric. 2019; 162: 505-513
- Machine learning for health services researchers.Value Health. 2019; 22 (31277828): 808-815
- A survey on missing data in machine learning.J. Big Data. 2021; 8 (34722113): 140
- Mastitis therapy and pharmacology.Vet. Clin. North Am. Food Anim. Pract. 2003; 19 (12682938): 109-138
- Exploring machine learning algorithms for early prediction of clinical mastitis.Int. Dairy J. 2021; 119105051
- Survey of machine learning algorithms for disease diagnostic.J. Intell. Learn. Syst. Appl. 2017; 9: 1-16
- Influence of dry period bacterial intramammary infection on clinical mastitis in dairy cows.J. Dairy Sci. 2002; 85 (12416812): 2589-2599
- Ensemble machine learning models for the detection of energy theft.Electr. Power Syst. Res. 2021; 192106904
- Mastering Machine Learning with Scikit-Learn.Packt Publishing Ltd, 2017
- Autoregressive processes.in: Mickey R. Time Series Analysis. Princeton University Press, 2020: 53-59
- Pathogen-specific effects on milk yield in repeated clinical mastitis episodes in holstein dairy cows.J. Dairy Sci. 2014; 97 (24418269): 1465-1480
- Detecting mastitis cow-side.in: National Mastitis Council 39th Annual Meeting, Atlanta, GA. Natl. Mastitis Counc, 2000: 48-53
- Sensors and clinical mastitis—The quest for the perfect alert.Sensors (Basel). 2010; 10 (22163637): 7991-8009
- Novel ways to use sensor data to improve mastitis management.J. Dairy Sci. 2021; 104 (34304877): 11317-11332
- Sensors and management support in high-technology milking.J. Anim. Sci. 2003; 81 (15000401): 1-10
- Distributions of functions of random variables.in: Probability and Statistical Inference. 9th ed. Pearson Education, 2015: 163-217
- Plant leaf disease recognition using histogram based gradient boosting classifier.in: International Conference on Intelligent Computing & Optimization 2020. Springer, 2021: 530-545
- Automated prediction of mastitis infection patterns in dairy herds using machine learning.Sci. Rep. 2020; 10 (32152401)4289
- ISO 20966:2007: Automatic milking systems—requirements and testing. Annex C: Example of methods of evaluating detection systems for milk deemed as abnormal due to blood or to changes in homogeneity.International Organization for Standardization (ISO), 2007
- Performance evaluation of different feature encoding schemes on cybersecurity logs.in: 2019 SoutheastCon. IEEE, 2019: 1-9
- Invited review: Incidence, risk factors, and effects of clinical mastitis recurrence in dairy cows.J. Dairy Sci. 2018; 101 (29525302): 4729-4746
- Decision-tree induction to detect clinical mastitis with automatic milking.Comput. Electron. Agric. 2010; 70: 60-68
- Data mining to detect clinical mastitis with automatic milking.in: Proc. 5th IDF Mastitis Conf.: Mastitis Research into Practice, Christchurch, New Zealand. New Zealand Veterinary Association For Continuing Education Inc, 2010: 568-573
- Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.J. Dairy Sci. 2010; 93 (20655431): 3616-3627
- Lightgbm: A highly efficient gradient boosting decision tree.in: Proc. 31st Int. Conf. Neural Inf. Process. Syst. Curran Associates Inc, 2017: 3146-3154
- Comparative study of classification techniques (SVM, logistic regression and neural networks) to predict the prevalence of heart disease.Int. J. Mach. Learn. Comput. 2015; 5: 414-419
- Development of a new clinical mastitis detection method for automatic milking systems.J. Dairy Sci. 2018; 101 (30055925): 9385-9395
- Definition extraction with balanced random forests.in: Int. Conf. Natural Language Processing. Springer, 2008: 237-247
- Mastitis and lameness detection in dairy cows by application of fuzzy logic.Livest. Sci. 2009; 125: 92-96
- Assessment and management of pain in dairy cows with clinical mastitis.Vet. Clin. North Am. Food Anim. Pract. 2012; 28 (22664209): 289-305
- Genetic relationships between clinical mastitis, somatic cell count, and udder conformation in Danish Holsteins.Livest. Prod. Sci. 1994; 39: 243-251
- Sample Sizes for Clinical, Laboratory and Epidemiology Studies.4th ed. John Wiley & Sons, 2018
- Spam filtering with naive Bayes-which naive Bayes?.in: Third Conf. Email and Anti-Spam. CEAS, 2006: 28-69
- The Internet of Things enhancing animal welfare and farm operational efficiency.J. Dairy Res. 2020; 87 (33213573): 20-27
- Detection of mastitis and lameness in dairy cows using wavelet analysis.Livest. Sci. 2012; 148: 227-236
- Mastitis detection in dairy cows: The application of support vector machines.J. Agric. Sci. 2013; 151: 889-897
- The effects of early antibiotic treatment following diagnosis of mastitis detected by a change in the electrical conductivity of milk.J. Dairy Sci. 1997; 80 (9178126): 859-863
- Detection of clinical mastitis by changes in electrical conductivity of foremilk before visible changes in milk.J. Dairy Sci. 1996; 79 (8675786): 83-86
- Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector.Decis. Support Syst. 2015; 72: 72-81
- Mastitis alert preferences of farmers milking with automatic milking systems.J. Dairy Sci. 2012; 95 (22541479): 2523-2530
- Estimating the impact of clinical mastitis in dairy cows on greenhouse gas emissions using a dynamic stochastic simulation model: A case study.Animal. 2019; 13 (31210122): 2913-2921
- Data considerations for developing deep learning models for dairy applications: A simulation study on mastitis detection.Comput. Electron. Agric. 2022; 196106895
- Mastitis detection with recurrent neural networks in farms using automated milking systems.Comput. Electron. Agric. 2022; 192106618
- Electrical conductivity of milk: Ability to predict mastitis status.J. Dairy Sci. 2004; 87 (15259246): 1099-1107
- Support vector machines.in: Advanced Data Mining Techniques. Springer Science & Business Media, 2008: 111-122
- Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic.Computing. 2021; 103: 401-424
- Scikit-Learn: Machine learning in Python.J. Mach. Learn. Res. 2011; 12: 2825-2830
- An update on the effect of clinical mastitis on the welfare of dairy cows and potential therapies.Vet. Clin. North Am. Food Anim. Pract. 2018; 34 (30316508): 525-535
- Visual scoring of clots in foremilk.J. Dairy Res. 2005; 72 (16223455): 406-414
- Tackling the poor assumptions of naive Bayes text classifiers.in: Proc. 20th Int. Conf. Machine Learning (ICML-03). ICML, 2003: 616-623
- Improved ensemble learning for classification techniques based on majority voting.in: 7th IEEE Int. Conf. Software Engineering and Service Science (ICSESS). IEEE, 2016: 107-110
- The cost of clinical mastitis in the first 30 days of lactation: An economic modeling tool.Prev. Vet. Med. 2015; 122 (26596651): 257-264
- Invited review: Sensors to support health management on dairy farms.J. Dairy Sci. 2013; 96 (23462176): 1928-1952
- Ensemble learning: A survey.Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018; 8e1249
- Explaining Adaboost.in: Empirical Inference. Springer, 2013: 37-52
- Learning with kernels: Support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning Series.MIT Press, 2018
- An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing.(MS Thesis) School of Electrical Engineering and Computer Science, KTH-Roy. Inst. Technol., Stockholm, Sweden2018
- Changes in behaviour of dairy cows with clinical mastitis.Appl. Anim. Behav. Sci. 2016; 175: 8-13
- Grid search-based hyperparameter tuning and classification of microarray cancer data.in: 2019 Second International Conf. on Advanced Comput. and Commun. Paradigms (ICACCP). IEEE, 2019: 1-8
- Performance evaluation of systems for automated monitoring of udder health: Analytical issues and guidelines.in: Lam T.J.G.M. Mastitis Control—From Science to Practice. Wageningen Academic Publishers, 2008: 275-282
- Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality.ESC Heart Fail. 2021; 8 (33205591): 106-115
- Discriminating between true-positive and false-positive clinical mastitis alerts from automatic milking systems.J. Dairy Sci. 2010; 93 (20494164): 2559-2568
- Detection of mastitis and its stage of progression by automatic milking systems using artificial neural networks.J. Dairy Res. 2010; 77 (20030900): 168-175
- Dairy 2014, Milk Quality, Milking Procedures, and Mastitis in the United States, 2014.https://www.aphis.usda.gov/animal_health/nahms/dairy/downloads/dairy14/Dairy14_dr_Mastitis.pdfDate: 2016Date accessed: July 6, 2022
- The Python Language Reference.Python Software Foundation, 2010
- Heart diseases detection using naive Bayes algorithm.Int. J. Innov. Sci. Eng. Technol. 2015; 2: 441-444
- Identifying different transportation modes from trajectory data using tree-based ensemble classifiers.ISPRS Int. J. Geoinf. 2017; 6: 57
- Analysis of K-fold cross-validation over hold-out validation on colossal datasets for quality classification.in: 2016 IEEE 6th International Int. Conf. Adv. Comput (IACC). IEEE, 2016: 78-83
- A review of ensemble methods in bioinformatics.Curr. Bioinform. 2010; 5: 296-308
- Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes.BMC Med. Inform. Decis. Mak. 2010; 10 (20307319): 16
- Application of credit card fraud detection: based on bagging ensemble classifier.Procedia Comput. Sci. 2015; 48: 679-685
- Distribution-balanced stratified cross-validation for accuracy estimation.J. Exp. Theor. Artif. Intell. 2000; 12: 1-12
- KNN approach to unbalanced data distributions: A case study involving information extraction.in: Proceedings of Workshop on Learning from Imbalanced Datasets II. ICML, 2003: 1-7
- Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning.J. Manuf. Syst. 2018; 48: 34-50
- Categorical variables: Counting eggs in the age of robotic chickens.in: Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O'Reilly Media Inc, 2018: 77-97
Article info
Publication history
Published online: March 17, 2023
Accepted:
November 16,
2022
Received:
May 31,
2022
Publication stage
In Press Corrected ProofIdentification
Copyright
© 2023 The Author(s).
User license
Creative Commons Attribution (CC BY 4.0) | How you can reuse
Elsevier's open access license policy

Creative Commons Attribution (CC BY 4.0)
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy