Advertisement
Research Article|Articles in Press

Multivariable time series classification for clinical mastitis detection and prediction in automated milking systems

Open AccessPublished:March 17, 2023DOI:https://doi.org/10.3168/jds.2022-22355

      ABSTRACT

      In this study, we developed a machine learning framework to detect clinical mastitis (CM) at the current milking (i.e., the same milking) and predict CM at the next milking (i.e., one milking before CM occurrence) at the quarter level. Time series quarter-level milking data were extracted from an automated milking system (AMS). For both CM detection and prediction, the best classification performance was obtained from the decision tree–based ensemble models. Moreover, applying models on a data set containing data from the current milking and past 9 milkings before the current milking showed the best accuracy for detecting CM; modeling with a data set containing data from the current milking and past 7 milkings before the current milking yielded the best results for predicting CM. The models combined with oversampling methods resulted in specificity of 95 and 93% for CM detection and prediction, respectively, with the same sensitivity (82%) for both scenarios; when lowering specificity to 80 to 83%, undersampling techniques facilitated models to increase sensitivity to 95%. We propose a feasible machine learning framework to identify CM in a timely manner using imbalanced data from an AMS, which could provide useful information for farmers to manage the negative effects of CM.

      Key words

      INTRODUCTION

      Clinical mastitis (CM) is common in the dairy sector. The USDA's National Animal Health Monitoring System reported that about 25% of US dairy cows suffered from CM in 2013 (
      • USDA
      Dairy 2014, Milk Quality, Milking Procedures, and Mastitis in the United States, 2014.
      ). The economic costs of CM include treatment costs, milk yield reduction, and increased culling rate and mortality (
      • Bar D.
      • Gröhn Y.T.
      • Bennett G.
      • González R.N.
      • Hertl J.A.
      • Schulte H.F.
      • Tauer L.W.
      • Welcome F.L.
      • Schukken Y.H.
      Effect of repeated episodes of generic clinical mastitis on milk yield in dairy cows.
      ;
      • Mostert P.F.
      • Bokkers E.A.M.
      • de Boer I.J.M.
      • van Middelaar C.E.
      Estimating the impact of clinical mastitis in dairy cows on greenhouse gas emissions using a dynamic stochastic simulation model: A case study.
      ). Reports of the average cost of a CM case ranged from $179 to $444, which is dominated by indirect costs, such as milk loss in the current and subsequent lactations and reproductive loss (
      • Bar D.
      • Tauer L.W.
      • Bennett G.
      • González R.N.
      • Hertl J.A.
      • Schukken Y.H.
      • Schulte H.F.
      • Welcome F.L.
      • Gröhn Y.T.
      The cost of generic clinical mastitis in dairy cows as estimated by using dynamic programming.
      ;
      • Rollin E.
      • Dhuyvetter K.C.
      • Overton M.W.
      The cost of clinical mastitis in the first 30 days of lactation: An economic modeling tool.
      ). Furthermore, CM symptoms can cause pain and discomfort, which negatively affect animal welfare (
      • Petersson-Wolfe C.S.
      • Leslie K.E.
      • Swartz T.H.
      An update on the effect of clinical mastitis on the welfare of dairy cows and potential therapies.
      ).
      On dairy farms in the United States, forestripping cows before milking is a common practice to detect most cases of CM (
      • Sepúlveda-Varas P.
      • Proudfoot K.L.
      • Weary D.M.
      • von Keyserlingk M.A.G.
      Changes in behaviour of dairy cows with clinical mastitis.
      ;
      • USDA
      Dairy 2014, Milk Quality, Milking Procedures, and Mastitis in the United States, 2014.
      ). This is because the abnormalities in milk (visual appearance of clots, flakes, changes in color) are often the first clinical signs of inflammation (
      • Rasmussen M.D.
      Visual scoring of clots in foremilk.
      ). Additionally, monitoring changes in electrical conductivity (EC;
      • Milner P.
      • Page K.L.
      • Walton A.W.
      • Hillerton J.E.
      Detection of clinical mastitis by changes in electrical conductivity of foremilk before visible changes in milk.
      ) and SCC in milk (
      • Lund T.
      • Miglior F.
      • Dekkers J.C.M.
      • Burnside E.B.
      Genetic relationships between clinical mastitis, somatic cell count, and udder conformation in Danish Holsteins.
      ) can provide information for CM detection. Accurate and timely detection of CM, which has been explored in several studies (
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
      ;
      • Miekley B.
      • Traulsen I.
      • Krieter J.
      Mastitis detection in dairy cows: The application of support vector machines.
      ;
      • Fadul-Pacheco L.
      • Delgado H.
      • Cabrera V.E.
      Exploring machine learning algorithms for early prediction of clinical mastitis.
      ), would potentially trigger proper treatment or earlier intervention, reduce economic losses, maintain milk quality, and improve the welfare of cows (
      • Milner P.
      • Page K.L.
      • Hillerton J.E.
      The effects of early antibiotic treatment following diagnosis of mastitis detected by a change in the electrical conductivity of milk.
      ;
      • Leslie K.E.
      • Petersson-Wolfe C.S.
      Assessment and management of pain in dairy cows with clinical mastitis.
      ;
      • Michie C.
      • Andonovic I.
      • Davison C.
      • Hamilton A.
      • Tachtatzis C.
      • Jonsson N.
      • Duthie C.A.
      • Bowen J.
      • Gilroy M.
      The Internet of Things enhancing animal welfare and farm operational efficiency.
      ).
      In recent years, with increases in labor costs, automated milking systems (AMS) have experienced rapid development and are being adopted by dairy farms. The AMS technology generates a relatively large amount of consistently recorded data, which has led to progress in developing algorithms that can identify CM accurately and promptly (
      • Hogeveen H.
      • Kamphuis C.
      • Steeneveld W.
      • Mollenhorst H.
      Sensors and clinical mastitis—The quest for the perfect alert.
      ;
      • Sun Z.
      • Samarasinghe S.
      • Jago J.
      Detection of mastitis and its stage of progression by automatic milking systems using artificial neural networks.
      ;
      • Rutten C.J.
      • Velthuis A.G.J.
      • Steeneveld W.
      • Hogeveen H.
      Invited review: Sensors to support health management on dairy farms.
      ). Many of the measurements captured by the AMS software during the milking process, such as milk yield, EC, color of the milk, occurrence of incompletely milked quarters, and kicked-off milk cups have been recognized as potential predictors for CM detection (
      • Norberg E.
      • Hogeveen H.
      • Korsgaard I.R.
      • Friggens N.C.
      • Sloth K.H.M.N.
      • Løvendahl P.
      Electrical conductivity of milk: Ability to predict mastitis status.
      ;
      • Jamali H.
      • Barkema H.W.
      • Jacques M.
      • Lavallée-Bourget E.M.
      • Malouin F.
      • Saini V.
      • Stryhn H.
      • Dufour S.
      Invited review: Incidence, risk factors, and effects of clinical mastitis recurrence in dairy cows.
      ;
      • Khatun M.
      • Thomson P.C.
      • Kerrisk K.L.
      • Lyons N.A.
      • Clark C.E.F.
      • Molfino J.
      • García S.C.
      Development of a new clinical mastitis detection method for automatic milking systems.
      ). Furthermore, the large amount of automatically recorded data by the AMS technology has facilitated the use of machine learning (ML) methods to predict and diagnose diseases in dairy cows (
      • Dhoble A.S.
      • Ryan K.T.
      • Lahiri P.
      • Chen M.
      • Pang X.
      • Cardoso F.C.
      • Bhalerao K.D.
      Cytometric fingerprinting and machine learning (CFML): A novel label-free, objective method for routine mastitis screening.
      ;
      • Hyde R.M.
      • Down P.M.
      • Bradley A.J.
      • Breen J.E.
      • Hudson C.
      • Leach K.A.
      • Green M.J.
      Automated prediction of mastitis infection patterns in dairy herds using machine learning.
      ). Relative to conventional statistical models, ML methods show more advantages and potentials, such as no assumption requiring data samples to follow a certain probability distribution (
      • Shin S.
      • Austin P.C.
      • Ross H.J.
      • Abdel-Qadir H.
      • Freitas C.
      • Tomlinson G.
      • Chicco D.
      • Mahendiran M.
      • Lawler P.R.
      • Billia F.
      • Gramolini A.
      • Epelman S.
      • Wang B.
      • Lee D.S.
      Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality.
      ), making rapid examination using large numbers of observations (
      • Fatima M.
      • Pasha M.
      Survey of machine learning algorithms for disease diagnostic.
      ), and identifying trends and patterns that cannot easily be visualized (
      • Deo R.C.
      Machine learning in medicine.
      ).
      Some studies have focused on developing ML models to detect cows with CM using AMS data [e.g., a neural network model by
      • Cavero D.
      • Tölle K.H.
      • Henze C.
      • Buxadé C.
      • Krieter J.
      Mastitis detection in dairy cows by application of neural networks.
      , a decision tree model by
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
      , and a recurrent neural network model by
      • Naqvi S.A.
      • King M.T.M.
      • Matson R.D.
      • DeVries T.J.
      • Deardon R.
      • Barkema H.W.
      Mastitis detection with recurrent neural networks in farms using automated milking systems.
      ]. Previous works often relied on metrics of specificity (Sp) and sensitivity (Se) to evaluate model performance.
      • Hogeveen H.
      • Klaas I.C.
      • Dalen G.
      • Honig H.
      • Zecconi A.
      • Kelton D.F.
      • Mainar M.S.
      Novel ways to use sensor data to improve mastitis management.
      suggested that a feasible ML scheme for CM detection should have high Se and Sp, and the requirement for the detection performance should be higher for a more severe CM situation. Although many researchers have tried to improve CM detection with data from AMS, there is still a need for improvement in the current models that often have wide ranges in prediction Se (
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
      ;
      • Steeneveld W.
      • van der Gaag L.C.
      • Ouweltjes W.
      • Mollenhorst H.
      • Hogeveen H.
      Discriminating between true-positive and false-positive clinical mastitis alerts from automatic milking systems.
      ) or a high error rate (
      • Miekley B.
      • Traulsen I.
      • Krieter J.
      Detection of mastitis and lameness in dairy cows using wavelet analysis.
      ).
      Hence, our study objective was to train and validate ML models to detect CM at the current milking (i.e., the same milking) and to predict the occurrence of CM at the next milking (i.e., 1 milking before CM occurrence). Quarter-level data from multiple milking periods used in this analysis were recorded by AMS software. More importantly, by detecting and predicting CM with high Se and Sp, our modeling analysis could list cow quarters that are likely infected at the current and next milking and would provide proper alerts for farmers to reduce the negative effects of CM. To the best of our knowledge, we are not aware of any study that predicts CM 1 milking prior, although a few studies have worked on detection of the onset of CM (
      • Kamphuis C.
      • Mollenhorst H.
      • Feelders A.
      • Pietersma D.
      • Hogeveen H.
      Decision-tree induction to detect clinical mastitis with automatic milking.
      ;
      • Fadul-Pacheco L.
      • Delgado H.
      • Cabrera V.E.
      Exploring machine learning algorithms for early prediction of clinical mastitis.
      ), and
      • Anglart D.
      • Emanuelson U.
      • Rönnegård L.
      • Hallén Sandgren C.
      Detecting and predicting changes in milk homogeneity using data from automatic milking systems.
      used AMS data to detect the presence of milk clots at the milking of identification as well as predicting clots in advance.

      MATERIALS AND METHODS

      Data and Variables

      The quarter-level milking time data used to train and build the ML framework for detecting and predicting the occurrence of CM in this study were collected from a commercial dairy farm located in New York State during summers (June to August) of 2020 and 2021. The quarter-level data set contained 427,596 observations from 373 cows being milked with an AMS; that is, a voluntarily milking system (VMS Classic; DeLaval International AB). Ethics approval was not needed for this study because only routine animal procedures (e.g., milking) were performed. The cows enrolled in our study were Holsteins with average milk production of 13,302 kg/cow per year. During the data collection period, in 2020, 24% of the cows were in the first lactation, 45% in the second lactation, and 31% in their third or later lactation; in 2021, 12, 56, and 32% of the cows were in their first, second, and third or later lactation, respectively. All data computation and cleaning procedures were conducted in Python 3.6.0 (
      • VanRossum G.
      • Drake F.L.
      The Python Language Reference.
      ).
      Nineteen measurements (variables) related to the individual milking event for each cow quarter were evaluated as potential CM predictors. These included 13 measurements directly reported from the herd management system DelPro (DeLaval International AB): milk yield (kg), mean milk flow rate (kg/min), peak milk flow rate (kg/min), EC (mS/cm), box time (min), milking interval between 2 successive milking events (h; data for milking interval received from the AMS were in the time format of hh:mm:ss and then converted to hours), lactation number, DIM, teats not found (yes or no), presence of blood in milk (yes or no), kick-offs (denoting milk cups being kicked off during milking; yes or no), occurrence of incompletely milked quarter (yes or no), smart pulsation ratio (4 levels denoted by integer numbers where a higher number indicated a higher level). For cow-level variables (i.e., box time, milking interval, lactation number, DIM, and smart pulsation ratio), the same values were attached to the corresponding quarter-level records of each cow. Other attributes considered were milk yield ratio, which was calculated as the ratio of a quarter's yield divided by the total yield of a cow for each milking event; EC ratio, equal to a quarter's EC divided by the EC of its corresponding cow by milking; quarter index [denoting 4 quarters of a cow as the left front (LFQ), left hind (LHQ), right front (RFQ), and right hind (RHQ) quarter, respectively]; cow identification number (cow ID); previous occurrence of CM before the current milking by quarter, which was defined during the data collection period, and denoted by 1 or 0 showing if a cow quarter had suffered from CM before the current milking event; the order of the milking event within a day (order of the milking event in a day was denoted by integer numbers from 0 to n; a higher number indicated that a milking event occurred later) and milking frequency per day. In addition, each quarter-level AMS milking record was uniquely identified by cow ID, quarter index, and the start time of the milking. Trained study technicians (n = 8) identified positive CM cases by inspecting milk samples collected (3 times per week in year 1 and 7 times per week in year 2). A positive case of CM was considered present if milk from one or more quarters was abnormal (watery, contained flakes or clots, or changes in color) with or without signs of local inflammation of the affected quarter as previously described (
      • Erskine R.J.
      • Wagner S.
      • DeGraves F.J.
      Mastitis therapy and pharmacology.
      ). Cows identified as having CM were either monitored or treated based upon treatment protocols setup by the herd veterinarian. Cows were not forestripped before milking with or without milk sample collection.

      Data Processing

      First, data records under 2 conditions were removed: milking records with missing values, in which the milk yield data for one or more quarters of cows were not captured by the AMS (i.e., blank values for milk yield), and records with improper values that resulted from cows who revisited the AMS several times without yield. Furthermore, 14 d of milking records after the detection of a positive CM incidence for a certain quarter of a cow were removed. In this way, any CM-positive case can be considered as a new event (
      • Barkema H.W.
      • Schukken Y.H.
      • Lam T.J.G.M.
      • Beiboer M.L.
      • Wilmink H.
      • Benedictus G.
      • Brand A.
      Incidence of clinical mastitis in dairy herds grouped in three categories by bulk milk somatic cell counts.
      ;
      • Cha E.
      • Hertl J.A.
      • Schukken Y.H.
      • Tauer L.W.
      • Welcome F.L.
      • Gröhn Y.T.
      The effect of repeated episodes of bacteria-specific clinical mastitis on mortality and culling in Holstein dairy cows.
      ;
      • Hertl J.A.
      • Schukken Y.H.
      • Welcome F.L.
      • Tauer L.W.
      • Gröhn Y.T.
      Pathogen-specific effects on milk yield in repeated clinical mastitis episodes in holstein dairy cows.
      ). The cleaned data set that used for modeling analysis after data removal consisted of 389,345 milking observations with 80 CM (positive) cases. Statistics computed over the cleaned data set for key variables are shown in Appendix Table A1, and the occurrences of CM against DIM are depicted in Appendix Figure A1.
      Other data processing procedures applied in this analysis are as follows:

      Feature Encoding.

      Because most ML models only accept numerical variables, encoding the categorical variables becomes a necessary step such that the model is able to understand and extract valuable information. All categorical variables were converted into binary numbers (0/1). Thereafter, 3 types of encoding methods were applied, as follows. For binary variables labeled yes or no (i.e., teats not found, presence of blood in milk, kick-offs, and occurrence of incompletely milked quarter), labels were converted into 1 or 0. For quarter index with 4 categories (LFQ, LHQ, RFQ, RHQ), where no ordinal relationship existed among different categories, one-hot encoding (
      • Seger C.
      An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing.
      ) was used and 4 new binary variables were created. A “1” value was in placed in the binary variable for the quarter that a milking record belongs to, and “0” values for other 3 quarters. For cow ID with high cardinality (more than 100 distinct categories;
      • Moeyersoms J.
      • Martens D.
      Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector.
      ), binary encoding was applied to first assign a numerical value to each category of this variable, transfer those integers into the binary code, and, finally, split the digits from the binary string into separate columns (
      • Seger C.
      An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing.
      ;
      • Jackson E.
      • Agrawal R.
      Performance evaluation of different feature encoding schemes on cybersecurity logs.
      ). Nine new binary variables were generated with 1s or 0s, and each cow ID was uniquely represented by a combination of the values from these 9 variables. The encoding results are shown in Appendix Table A2.
      The dependent variable (y) was a time series variable and denoted whether a cow quarter was identified as negative or positive for CM at a certain milking event, correspondingly equal to 0 or 1.

      Format Transformation.

      We transformed each time series variable (e.g., milk yield, EC) in the data set to an autoregressive form of order p, AR(p) (
      • Hamilton J.D.
      Autoregressive processes.
      ). Hence, in our classification model fitting, the current milking (t) denoted the milking event at which a quarter was identified as negative or positive for CM; for each predictor variable, the lagged p data set contained values observed from the past p milkings before the current milking and observations from the current milking t, denoted as follows:
      x1(t),x1(t1),x1(t2),...,x1(tp);


      x2(t),x2(t1),x2(t2),...,x2(tp);


      xK(t),xK(t1),xK(t2),...,xK(tp);
      [1]


      where K is the total number of numerical input variables, and p is the total number of milkings before the current milking (here, p = 3, 5, 7, or 9). Specially, the variable “previous occurrence of CM before the current milking by quarter” was not lagged and only the value at the current milking was included in the modeling analysis, showing whether a cow quarter had suffered from CM or not before the current milking during the data collection period.
      After data transformation, each row of the final data set contained values of the dependent variable and 3 types of predictor variables. For CM detection, the dependent variable was y(t), which denoted a cow quarter with or without CM at the current milking (t); for CM prediction, the dependent variable was y(t + 1), which represented a cow quarter with or without CM at the next milking (t + 1). The predictor variables included numerical variables from the past p milkings and that from the current milking t as shown in [1]; encoded categorical variables used to represent the categorical predictors, such as cow ID and quarter index; and a set of time interval variables, denoting the time intervals between the past milkings and the milking event at which the dependent variable was defined. In particular, for the dependent variable at the current milking y(t), delta (t – 1) denoted the time interval between the past 1 milking (t – 1) and the current milking (t), …, delta (tp) denoted the time interval between the past pth milking (tp) and the current milking (t). Then, for the dependent variable at the next milking y(t + 1), delta (t) denoted the time interval between the current milking (t) and the next milking (t + 1), …, delta (tp) for the time interval between the past pth milking (tp) and the next milking (t + 1).

      Data Splitting, Cross-Validation, and Resampling Techniques.

      In ML model development, it is common practice to split the data into training and testing data sets. In this study, the data were split by date, with the purpose of avoiding information leakage from the future to the past (
      • Zheng A.
      • Casari A.
      Categorical variables: Counting eggs in the age of robotic chickens.
      ), and resulted in a training set containing 80% of the data (from June 1, 2020, to July 13, 2021; 311,476 observations) to train and create the models. The remaining 20% (from July 14, 2021, to August 15, 2021; 77,869 observations) were reserved as the testing set for evaluating model accuracy. This train-test splitting was stratified by 2 classes of the dependent variable to make sure that both the training set and testing set preserved similar proportions of observations in each class as observed in the original data set (
      • Yadav S.
      • Shukla S.
      Analysis of K-fold cross-validation over hold-out validation on colossal datasets for quality classification.
      ).
      Next, we applied repeated stratified 5-fold cross-validation (
      • Berrar D.
      Cross-validation.
      ) on the training set. The splitting of data into folds is governed by the criteria of “stratified by dependent variable y” to ensure that each fold has the same proportion of observations across different classes of the dependent variable (
      • Zeng X.
      • Martinez T.R.
      Distribution-balanced stratified cross-validation for accuracy estimation.
      ). One fold is used for validation (evaluating model performance), and the remaining 4 groups are merged together as a subset of the training data (i.e., training subset) for fitting a model. This 5-fold cross-validation procedure was repeated 3 times.
      Furthermore, the cleaned data set for modeling analysis only had 80 CM (positive) records, accounting for 0.02% of total records; this data imbalance could negatively affect model prediction. Classification models perform better on the majority class and are prone to categorize unseen observations into the majority class. Classifiers might be more likely to misclassify rare instances or even ignore them (
      • Ali H.
      • Salleh M.N.M.
      • Saedudin R.
      • Hussain K.
      • Mushtaq M.F.
      Imbalance class problems in data mining: A review.
      ). To address the issues induced by data imbalance, after splitting the training data set into 5 folds, 4 of those 5 folds were merged as the training subset and resampled for training the ML models. Additionally, no resampling method was applied to the remaining validation fold. Either the small number of positive CM records was oversampled or the large number of negative CM records was undersampled. Several resampling methods were tested: random oversampling, synthetic minority oversampling technique (SMOTE;
      • Cateni S.
      • Colla V.
      • Vannucci M.
      A method for resampling imbalanced datasets in binary classification tasks for real-world problems.
      ), adaptive synthetic (ADASYN) oversampling (
      • Zhang Y.
      • Li X.
      • Gao L.
      • Wang L.
      • Wen L.
      Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning.
      ), random undersampling, one-sided selection (
      • Bach M.
      • Werner A.
      • Palt M.
      The proposal of undersampling method for learning from imbalanced datasets.
      ), near-miss undersampling (
      • Zhang J.
      • Mani I.
      KNN approach to unbalanced data distributions: A case study involving information extraction.
      ), and edited nearest neighbors undersampling (
      • Bach M.
      • Werner A.
      • Żywiec J.
      • Pluskiewicz W.
      The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis.
      ).

      Algorithms Evaluated

      Logistic Regression.

      Logistic regression (LR) is a fundamental algorithm for disease detection (
      • Green M.J.
      • Green L.E.
      • Medley G.F.
      • Schukken Y.H.
      • Bradley A.J.
      Influence of dry period bacterial intramammary infection on clinical mastitis in dairy cows.
      ). It computes the probability that a set of input variables belongs to a discrete outcome. With the assumption that all input features are independent, the LR algorithm typically uses maximum likelihood estimation to estimate parameters (
      • Khanna D.
      • Sahu R.
      • Baths V.
      • Deshpande B.
      Comparative study of classification techniques (SVM, logistic regression and neural networks) to predict the prevalence of heart disease.
      ). Because the dependent variable in this study was binary, a binary LR method was utilized for classification.

      Support Vector Machine.

      The support vector machine (SVM) algorithm generates a decision boundary to optimally separate the 2 classes of the dependent variable. It can effectively reduce the computational complexity arising from the high dimensionality of the input variables (
      • Yu W.
      • Liu T.
      • Valdez R.
      • Gwinn M.
      • Khoury M.J.
      Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes.
      ;
      • Miekley B.
      • Traulsen I.
      • Krieter J.
      Mastitis detection in dairy cows: The application of support vector machines.
      ;
      • Schölkopf B.
      • Smola A.J.
      Learning with kernels: Support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning Series.
      ). A linear SVM method that assumes that the 2 classes are linearly separable and a nonlinear SVM method that can classify nonlinearly separable data were both applied to investigate the effective separation of CM positive and negative classes (
      • Olson D.L.
      • Delen D.
      Support vector machines.
      ).

      Naïve Bayes.

      The naïve Bayes (NB) algorithm is a probabilistic classification technique based on Bayes' theorem. It assumes that the input features are conditionally independent given the target class. Although this strong assumption is rarely met in real-life data, this algorithm often outperforms even highly sophisticated classification methods (
      • Vembandasamy K.
      • Sasipriya R.
      • Deepa E.
      Heart diseases detection using naive Bayes algorithm.
      ). Three NB methods with different assumptions for the input data were explored in this analysis: Gaussian NB (
      • Ali L.
      • Khan S.U.
      • Golilarz N.A.
      • Yakubu I.
      • Qasim I.
      • Noor A.
      • Nour R.
      A feature-driven decision support system for heart failure prediction based on statistical model and Gaussian naive Bayes.
      ), multinomial NB (
      • Rennie J.D.
      • Shih L.
      • Teevan J.
      • Karger D.R.
      Tackling the poor assumptions of naive Bayes text classifiers.
      ), and Bernoulli NB (
      • Metsis V.
      • Androutsopoulos I.
      • Paliouras G.
      Spam filtering with naive Bayes-which naive Bayes?.
      ).

      Decision Tree–Based Ensemble Algorithm.

      The decision tree (DT)–based ensemble algorithm includes a set of algorithms that have the potential of achieving better performance than a single algorithm (
      • Doupe P.
      • Faghmous J.
      • Basu S.
      Machine learning for health services researchers.
      ) and are invariant to feature scaling as they are robust to monotonic transformations of variables (
      • Xiao Z.
      • Wang Y.
      • Fu K.
      • Wu F.
      Identifying different transportation modes from trajectory data using tree-based ensemble classifiers.
      ). Hence, relative to other algorithms mentioned above, more methods under the DT-based ensemble algorithm were applied in this study: random forest (RF), adaptive boosting (AdaBoost), gradient boosting (GB), and bagging with GB classification trees.
      The RF method usually shows high performance due to its randomness: it creates an uncorrelated forest of decision trees by training each tree with a bootstrap sample from the training data and a randomly selected subset of input features (
      • Breiman L.
      Random forests.
      ). In this study, the balanced RF method was implemented as recommended by
      • Chen C.
      • Liaw A.
      • Breiman L.
      Using random forest to learn imbalanced data. Tech. Rep. 666.
      ,
      • Kobyliński L.
      • Przepiórkowski A.
      Definition extraction with balanced random forests.
      , and
      • Agusta Z.P.
      • Adiwijaya A.
      Modified balanced random forest for improving imbalanced data prediction.
      ) to better address data imbalance. The AdaBoost and GB methods are both additive and combine several DT methods to create a strong predictive method (
      • Dev V.A.
      • Eden M.R.
      Gradient boosted decision trees for lithology classification.
      ;
      • Schapire R.E.
      Explaining Adaboost.
      ). In particular, 3 GB methods were evaluated: histogram-based GB (HGB;
      • Hossain S.M.M.
      • Deb K.
      Plant leaf disease recognition using histogram based gradient boosting classifier.
      ), light GB (LGB;
      • Ke G.
      • Meng Q.
      • Finley T.
      • Wang T.
      • Chen W.
      • Ma W.
      • Ye Q.
      • Liu T.Y.
      Lightgbm: A highly efficient gradient boosting decision tree.
      ), and eXtreme GB (XGB;
      • Chen X.
      • Huang L.
      • Xie D.
      • Zhao Q.
      EGBMMDA: Extreme gradient boosting machine for MiRNA-disease association prediction.
      ). Other methods used were those that combine the bagging technique (
      • Breiman L.
      Bagging predictors.
      ) with different GB classification trees; that is, bagging with histogram-based GB classification tree (BHGB), bagging with light GB (BLGB), and bagging with eXtreme GB (BXGB). The balanced bagging approach that can better deal with class imbalance (
      • Zareapoor M.
      • Shamsolmoali P.
      Application of credit card fraud detection: based on bagging ensemble classifier.
      ;
      • Blaszczyński J.
      • Stefanowski J.
      Actively balanced bagging for imbalanced data.
      ) was applied. All ML methods used in this study are listed in Table 1.
      Table 1Machine learning algorithms, specific methods, and possible hyperparameter settings evaluated
      AlgorithmMethodHyperparameter and space of possible values
      Details described in the Scikit-learn library (https://scikit-learn.org/stable/).
      Logistic regression (LR)Binary LRNorm of the regularization: [“none,” “l1,” “l2,” “elasticnet”]
      Support vector machine (SVM)Linear SVMRegularization parameter: [100, 10, 1.0, 0.1, 0.001]
      Nonlinear SVM
      Naïve Bayes (NB)Gaussian NBVariance smoothing, which is the portion of the largest variance of all features that is added to variances for calculation stability, from 0.0001 to 1 with the increment value equal to 0.005
      Multinomial NBAdditive (Laplace/Lidstone) smoothing parameter: [0.01, 0.1, 0.5, 1.0, 10.0]
      Bernoulli NB
      Decision tree (DT)–based ensemble algorithmRandom forest (RF)Maximum number of DT classifiers: from 50 to 1,500 with the increment value equal to 50; maximum depth of each tree: “None” or from 1 to 20 with the increment value equal to 1; Minimum number of samples used to split an internal DT node: from 1 to 20 with the increment value equal to 1; Number of features used when looking for the best split: [“sqrt,” “log2,” “None”]; Weights associated with classes of the dependent variable: [“none,” “balanced,” “balanced_subsample”]
      Adaptive boosting (AdaBoost)
      Gradient boosting (GB); Histogram-based GB (HGB); Light GB (LGB); eXtreme GB (XGB)
      Bagging with gradient boosting classification trees (BGB); Bagging with histogram-based GB (BHGB); Bagging with light GB (BLGB); Bagging with eXtreme GB (BXGB)
      1 Details described in the Scikit-learn library (https://scikit-learn.org/stable/).

      Hyperparameter Tuning

      Hyperparameters are the internal coefficients or weights for a method that need to be defined before the training process. Table 1 lists the hyperparameters that were tuned and the space of the possible hyperparameter values that were searched for all ML methods evaluated in this study. The detailed definitions of the hyperparameters were described in Scikit-learn (
      • Pedregosa F.
      • Varoquaux G.
      • Gramfort A.
      • Michel V.
      • Thirion B.
      • Grisel O.
      • Blondel M.
      • Prettenhofer P.
      • Weiss R.
      • Dubourg V.
      Scikit-Learn: Machine learning in Python.
      ;
      • Hackeling G.
      Mastering Machine Learning with Scikit-Learn.
      ). To achieve optimal hyperparameters, the grid search technique was used (
      • Shekar B.H.
      • Dagnew G.
      Grid search-based hyperparameter tuning and classification of microarray cancer data.
      ).

      Modeling Framework

      The modeling framework developed in this study consisted of 2 phases to detect CM at the same milking and predict CM 1 milking before occurrence using the lagged 3, 5, 7, and 9 data sets. Phase 1 selected the models with the best performance across different ML methods and correspondingly the resampling methods used on the training sets. Phase 2 used the finalized classification models on the milking records reserved in the testing sets to assess the performance of the different models on new data. Sensitivity, Sp, and the area under the receiver operating characteristic (AUC-ROC) curve were used to evaluate model performance.
      In phase 1, all models across different ML methods as described earlier were evaluated by the stratified 5-fold cross-validation on the training sets of the lagged 3, 5, 7, and 9 data sets. Every model was trained on the non-resampled (i.e., original) data and resampled 4 out of 5 folds of each training set. Then, the model that produced the best results on the remaining validation fold of that training set was selected. At the end, the models with the best performance and the corresponding resampling methods used for the 4 types of lagged data sets were obtained. In addition, to take both Sp and Se into account, Sp was fixed in a certain range and then the model that resulted from the ML method and resampling technique that led to the highest Se was selected. More specifically, 4 Sp intervals based on the results from the validation fold were defined in our analysis: Sp ≥99% (99% Sp interval), 95% ≤ Sp <99% (95% Sp interval), 90% ≤ Sp <95% (90% Sp interval), and 85% ≤ Sp <90% (85% Sp interval).
      In phase 2, the combinations of model and resampling method selected from phase 1 for the lagged 3, 5, 7, and 9 data sets by each Sp interval were used to classify quarters with or without CM on their corresponding testing sets. For every Sp interval, the selected model was fit on the entire data of the training set, which were resampled by the corresponding resampling method selected, and then used to classify the data of the testing set. This “fitting and classifying” loop was repeated N times. Because N > 30 is expected to result in relatively stable model outcomes (
      • Hogg R.V.
      • Tanis E.A.
      • Zimmerman D.L.
      Distributions of functions of random variables.
      ;
      • Machin D.
      • Campbell M.J.
      • Tan S.B.
      • Tan S.H.
      Sample Sizes for Clinical, Laboratory and Epidemiology Studies.
      ), N = 51 was used in this study. For the testing set, each time the model reported a value (0/1) to classify the quarter-level milking event as negative or positive for CM, and the class receiving the majority votes after N reported values was the final outcome for an observation (
      • Rojarath A.
      • Songpan W.
      • Pong-inwong C.
      Improved ensemble learning for classification techniques based on majority voting.
      ).

      RESULTS

      CM Prediction One Milking Before CM Occurrence

      The balanced RF model and balanced BGB model were preferred over other models by outputting relatively higher Se at various intervals of Sp for CM prediction at the next milking (t + 1). Note that when Sp was in the 95% and 99% Sp intervals, the selected models paired with an oversampling method produced the highest Se; for lower levels of Sp, undersampling methods resulted in models with the best Se. Models with the lagged 7 data set performed slightly better than those with the lagged 3 and 5 data sets, and no improvement was obtained via modeling with the lagged 9 data set. In particular, comparing the results on the validation folds across 4 different lagged data sets (Table 2 and Appendix Table A3) with Sp being equal to 99%, the combination of the balanced BHGB model, random oversampling method, and the lagged 7 data set performed the best, with 37% Se and 68% weighted AUC-ROC. When lowering Sp to the 95% Sp interval, Se increased by 31 percentage points with a weighted AUC-ROC of 82% and the balanced RF model resulted in the best performance. Further increases in Se could be achieved by decreasing Sp to a lower Sp interval.
      Table 2Model evaluation results for predicting clinical mastitis at the next milking (t + 1)
      Evaluation data set
      The data were split into training and testing data sets. Within the training data set, the validation fold was used to assess model performance as part of the model selection process.
      Item
      Sp = specificity; Se = sensitivity; AUC-ROC = area under the receiver operating characteristic curve.
      Input data structure
      The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      Lagged 3Lagged 5Lagged 7Lagged 9
      99% Sp interval on validation fold: [99%, 100%]
      Validation foldSp99%99%99%99%
      Se35%34%37%34%
      Weighted AUC-ROC67%66%68%66%
      Resampling method
      Please refer to the Appendix for the detailed settings of resampling methods and models.
      SMOTE
      SMOTE = synthetic minority oversampling technique.
      SMOTERandom oversamplingRandom oversampling
      Model
      Please refer to the Appendix for the detailed settings of resampling methods and models.
      Balanced BHGB
      BHGB = bagging ensemble of histogram-based gradient boosting.
      Balanced BHGBBalanced BHGBBalanced BHGB
      Testing setSp98%98%98%98%
      Se45%45%55%41%
      Weighted AUC-ROC72%72%76%70%
      95% Sp interval on validation fold: [95%, 99%)
      Validation foldSp95%95%95%95%
      Se67%63%68%64%
      Weighted AUC-ROC81%79%82%79%
      Resampling methodSMOTERandom oversamplingRandom oversamplingRandom oversampling
      ModelBalanced BHGBBalanced RF
      RF = random forest.
      Balanced RFBalanced RF
      Testing setSp94%93%93%93%
      Se77%77%82%82%
      Weighted AUC-ROC85%85%87%87%
      90% Sp interval on validation fold: [90%, 95%)
      Validation foldSp91%91%91%91%
      Se77%72%76%74%
      Weighted AUC-ROC84%82%83%82%
      Resampling methodRandom undersamplingRandom undersamplingRandom undersamplingRandom undersampling
      ModelBalanced BHGBBalanced BXGB
      BXGB = bagging ensemble of extreme gradient boosting.
      Balanced BLGB
      BLGB = bagging ensemble of light gradient boosting.
      Balanced BHGB
      Testing setSp89%89%89%89%
      Se77%82%82%82%
      Weighted AUC-ROC83%85%85%85%
      85% Sp interval on validation fold: [85%, 90%)
      Validation foldSp86%85%86%85%
      Se83%79%69%62%
      Weighted AUC-ROC84%82%77%73%
      Resampling methodOne-sided selectionRandom undersamplingRandom undersamplingRandom undersampling
      ModelBalanced RFBalanced RFBalanced RFBalanced RF
      Testing setSp81%80%80%81%
      Se91%95%95%77%
      Weighted AUC-ROC86%88%88%79%
      1 The data were split into training and testing data sets. Within the training data set, the validation fold was used to assess model performance as part of the model selection process.
      2 Sp = specificity; Se = sensitivity; AUC-ROC = area under the receiver operating characteristic curve.
      3 The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      4 Please refer to the Appendix for the detailed settings of resampling methods and models.
      5 SMOTE = synthetic minority oversampling technique.
      6 BHGB = bagging ensemble of histogram-based gradient boosting.
      7 RF = random forest.
      8 BXGB = bagging ensemble of extreme gradient boosting.
      9 BLGB = bagging ensemble of light gradient boosting.
      Moreover, relative to the model results on the validation folds used for model selection, the prediction results on the testing sets of 4 different lagged data sets showed that the Se increased (by up to 26 percentage points in some instances) with only a slight decline in the Sp (1 to 6 percentage points). For example, in the 99% Sp interval, the Se obtained from the testing set of the lagged 7 data set was 18 percentage points higher than that on the validation fold, whereas the Sp decreased to 98% and the weighted AUC-ROC increased by 8 percentage points (Table 2).

      CM Detection at the Same Milking

      The model results for detecting CM at the current milking (t) across the 4 lagged data sets illustrated that the balanced RF model and balanced BGB model on the lagged 9 data set tended to perform better than other model and data set combinations by generating relatively higher Se values across different Sp intervals. In the 99% Sp interval, the balanced BHGB model with ADASYN oversampling method led to the highest Se (57%) on the validation fold of the lagged 9 data set, in contrast to that of other lagged data sets (Table 3 and Appendix Table A4). Applying this model to the testing set of the lagged 9 data set, Se increased to 73% with an Sp of 98%. For the Sp levels that were lower than 95%, the DT-based models facilitated by undersampling methods yielded the highest Se.
      Table 3Model evaluation results for detecting clinical mastitis at the current milking (t)
      Evaluation data set
      The data were split into training and testing data sets. Within the training data set, the validation fold was used to assess model performance as part of the model selection process.
      Item
      Sp = specificity; Se = sensitivity; AUC-ROC = area under the receiver operating characteristic curve.
      Input data structure
      The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      Lagged 3Lagged 5Lagged 7Lagged 9
      99% Sp interval on validation fold: [99%, 100%]
      Validation foldSp99%99%99%99%
      Se54%54%53%57%
      Weighted AUC-ROC77%76%76%78%
      Resampling method
      Please refer to Appendix for the detailed settings of resampling methods and models.
      Random oversamplingRandom oversamplingRandom oversamplingADASYN
      ADASYN = the adaptive synthetic oversampling method.
      Model
      Please refer to Appendix for the detailed settings of resampling methods and models.
      Balanced BHGB
      BHGB = bagging ensemble of histogram-based gradient boosting.
      Balanced BHGBBalanced BHGBBalanced BHGB
      Testing setSp98%98%98%98%
      Se68%64%68%73%
      Weighted AUC-ROC83%81%83%85%
      95% Sp interval on validation fold: [95%, 99%)
      Validation foldSp96%96%96%97%
      Se75%76%79%79%
      Weighted AUC-ROC85%86%88%88%
      Resampling methodSMOTE
      SMOTE = synthetic minority oversampling technique.
      SMOTESMOTEADASYN
      ModelBalanced BXGB
      BXGB = bagging ensemble of extreme gradient boosting.
      Balanced BHGBBalanced BHGBBalanced BLGB
      BLGB = bagging ensemble of light gradient boosting.
      Testing setSp94%94%95%95%
      Se77%77%77%82%
      Weighted AUC-ROC86%86%86%88%
      90% Sp interval on validation fold: [90%, 95%)
      Validation foldSp90%93%93%93%
      Se83%81%84%84%
      Weighted AUC-ROC86%87%89%89%
      Resampling methodNearMiss (version 1)Random undersamplingRandom undersamplingRandom undersampling
      ModelBalanced BHGBBalanced BHGBBalanced BHGBBalanced BHGB
      Testing setSp86%91%91%91%
      Se91%86%82%86%
      Weighted AUC-ROC88%89%87%89%
      85% Sp interval on validation fold: [85%, 90%)
      Validation foldSp85%88%88%87%
      Se87%86%87%88%
      Weighted AUC-ROC86%87%88%88%
      Resampling methodNearMiss (version 1)Random undersamplingNo resamplingRandom undersampling
      ModelBalanced RF
      RF = random forest.
      Balanced RFBalanced RFBalanced RF
      Testing setSp78%85%84%83%
      Se95%95%91%95%
      Weighted AUC-ROC87%90%88%89%
      1 The data were split into training and testing data sets. Within the training data set, the validation fold was used to assess model performance as part of the model selection process.
      2 Sp = specificity; Se = sensitivity; AUC-ROC = area under the receiver operating characteristic curve.
      3 The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      4 Please refer to Appendix for the detailed settings of resampling methods and models.
      5 ADASYN = the adaptive synthetic oversampling method.
      6 BHGB = bagging ensemble of histogram-based gradient boosting.
      7 SMOTE = synthetic minority oversampling technique.
      8 BXGB = bagging ensemble of extreme gradient boosting.
      9 BLGB = bagging ensemble of light gradient boosting.
      10 RF = random forest.
      In comparing model performance for detecting CM at the current milking (t) with that for predicting CM at the next milking (t + 1) by Sp interval, the results showed that on the validation folds of the different lagged data sets, Se and Sp were generally higher under the CM detection scenario than under the CM prediction scenario, whereas the results from the testing sets under these 2 scenarios were close, except for the 99% interval. For example, in the 95% Sp interval, for the lagged 9 data set, the Se on the validation fold for CM prediction was 15 percentage points lower than that for CM detection, whereas on the testing set, the Se was the same (82%) for detecting and predicting CM.

      DISCUSSION

      The ML framework developed in this study demonstrated a promising ability to detect CM at the same milking and predict CM 1 milking before occurrence at the quarter level by integrating DT-based ensemble models with resampling techniques. Moreover, the input variables used in the modeling analysis were based on AMS data (except the variable of previous occurrence of CM before the current milking by quarter). Hence, the ML framework developed in this study could contribute to the literature by offering insights into the applications of ML with AMS data to achieve accurate CM diagnosis.
      The targeted Se and Sp levels that detection models should achieve for practical utilization are still under discussion. Some studies have suggested that for detecting cows with CM, models should have an Se of ≥80% (
      • Hillerton J.E.
      Detecting mastitis cow-side.
      ;
      • Hogeveen H.
      • Kamphuis C.
      • Steeneveld W.
      • Mollenhorst H.
      Sensors and clinical mastitis—The quest for the perfect alert.
      ,
      • Hogeveen H.
      • Klaas I.C.
      • Dalen G.
      • Honig H.
      • Zecconi A.
      • Kelton D.F.
      • Mainar M.S.
      Novel ways to use sensor data to improve mastitis management.
      ). The International Organization for Standardization (
      • ISO
      ISO 20966:2007: Automatic milking systems—requirements and testing. Annex C: Example of methods of evaluating detection systems for milk deemed as abnormal due to blood or to changes in homogeneity.
      ) recommend that Se be >70% with an Sp level of 99%. However, the Se and Sp requirements mentioned have rarely been met to date (
      • Khatun M.
      • Thomson P.C.
      • Kerrisk K.L.
      • Lyons N.A.
      • Clark C.E.F.
      • Molfino J.
      • García S.C.
      Development of a new clinical mastitis detection method for automatic milking systems.
      ;
      • Anglart D.
      • Emanuelson U.
      • Rönnegård L.
      • Hallén Sandgren C.
      Detecting and predicting changes in milk homogeneity using data from automatic milking systems.
      ;
      • Hogeveen H.
      • Klaas I.C.
      • Dalen G.
      • Honig H.
      • Zecconi A.
      • Kelton D.F.
      • Mainar M.S.
      Novel ways to use sensor data to improve mastitis management.
      ). Enforcing an Sp of 99% using the current ML framework resulted in a maximum Se of 73%, which is close to but still below 80%. To achieve the recommended Se and Sp levels,
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
      widened the time window of detection to, for example, 4 d before and 3 d after a CM observation. However, the utility of alerts for farmers to reduce the negative effects of CM is diminished if the alert is not generated until days after the onset of CM. A narrower window of detection that does not extend beyond the onset of CM is likely more useful from a management perspective.
      When comparing model performance between studies, it is important to consider the protocol used to define CM because it often differs by study. For example, CM definitions from recent studies included various abnormalities in milk (
      • Kamphuis C.
      • Mollenhorst H.
      • Feelders A.
      • Pietersma D.
      • Hogeveen H.
      Decision-tree induction to detect clinical mastitis with automatic milking.
      ), a focus on the presence of clots in milk (
      • Anglart D.
      • Emanuelson U.
      • Rönnegård L.
      • Hallén Sandgren C.
      Detecting and predicting changes in milk homogeneity using data from automatic milking systems.
      ), and determination through SCC thresholds (
      • Cavero D.
      • Tölle K.H.
      • Henze C.
      • Buxadé C.
      • Krieter J.
      Mastitis detection in dairy cows by application of neural networks.
      ). Accordingly, compared with previous studies in which the definition of CM was close to that in the current study, this study showed similar detection performance in terms of Se and Sp but represents an improvement in automated CM detection for 2 reasons. First, our methodology focused on incorporating AMS data to train the ML models; second, it assigned a CM classification at the time point of each milking, in contrast to applying a longer time window for CM detection.
      • Khatun M.
      • Thomson P.C.
      • Kerrisk K.L.
      • Lyons N.A.
      • Clark C.E.F.
      • Molfino J.
      • García S.C.
      Development of a new clinical mastitis detection method for automatic milking systems.
      used similar AMS data and showed a relatively high Se (90%) and Sp (91%) for CM detection.
      • Naqvi S.A.
      • King M.T.M.
      • Matson R.D.
      • DeVries T.J.
      • Deardon R.
      • Barkema H.W.
      Mastitis detection with recurrent neural networks in farms using automated milking systems.
      fed the time series data from AMS farms into recurrent neural networks for CM detection and reported 90% Se with an Sp of 86%. Furthermore, good detection performance can be achieved when a short time window is defined, such as the narrow time window of less than 24 h before CM observation considered by
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
      , which led to an Se of 75% by fixing the Sp at 93%. The DT-based ensemble models were also applied in a recent study by
      • Fadul-Pacheco L.
      • Delgado H.
      • Cabrera V.E.
      Exploring machine learning algorithms for early prediction of clinical mastitis.
      to detect the onset of CM on a daily basis; they reported an Se of 85% but Sp varied from 31 to 62%. Relative to these studies, the ML framework in this study effectively fit the DT-based ensemble models on data mainly from AMS to detect CM in a single milking, and resulted in a higher Sp of 95% with a similar Se (82%). The relatively good detection performance found in the current study is likely because the DT-based ensemble models (e.g., RF model) can increase overall classification accuracy by aggregating a group of base ML models (i.e., base classifiers) that will outperform a single classifier (
      • Yang P.
      • Yang Y.H.
      • Zhou B.B.
      • Zomaya A.Y.
      A review of ensemble methods in bioinformatics.
      ;
      • Sagi O.
      • Rokach L.
      Ensemble learning: A survey.
      ). Additionally, the inclusion of some predictor variables, such as EC from past milkings or previous occurrences of CM, may contribute to the improvement in CM detection, which aligns with findings in the literature (
      • Norberg E.
      • Hogeveen H.
      • Korsgaard I.R.
      • Friggens N.C.
      • Sloth K.H.M.N.
      • Løvendahl P.
      Electrical conductivity of milk: Ability to predict mastitis status.
      ;
      • Khatun M.
      • Thomson P.C.
      • Kerrisk K.L.
      • Lyons N.A.
      • Clark C.E.F.
      • Molfino J.
      • García S.C.
      Development of a new clinical mastitis detection method for automatic milking systems.
      ).
      The short period of detection, which reduces the probability for a cow to have a new case of CM, and low incidence of CM resulted in an observed data set that was highly imbalanced. Past studies have addressed this imbalance by randomly undersampling the data of healthy cows to balance the output classes (
      • Ankinakatte S.
      • Norberg E.
      • Løvendahl P.
      • Edwards D.
      • Højsgaard S.
      Predicting mastitis in dairy cows using neural networks and generalized additive models: A comparison.
      ;
      • Khatun M.
      • Thomson P.C.
      • Kerrisk K.L.
      • Lyons N.A.
      • Clark C.E.F.
      • Molfino J.
      • García S.C.
      Development of a new clinical mastitis detection method for automatic milking systems.
      ). In this study, both oversampling and undersampling methods were applied on the training set during the model selection phase. In addition to the model fitting method, the corresponding resampling techniques that facilitated the models to generate the best performance were obtained. For both CM detection at the same milking and CM prediction 1 milking prior, we found that oversampling methods enabled the DT-based ensemble models to achieve the highest Se at an Sp level of >90%. One explanation for the improved performance of oversampling methods is that the challenges of misclassifying or ignoring rare instances induced by imbalanced data (
      • Ali H.
      • Salleh M.N.M.
      • Saedudin R.
      • Hussain K.
      • Mushtaq M.F.
      Imbalance class problems in data mining: A review.
      ) can be alleviated through oversampling with all information in the original data being kept (
      • Park S.
      • Park H.
      Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic.
      ). Interestingly, when lowering the Sp level to be <90%, combining models with undersampling methods produced the best performance. Although there is a risk of eliminating or ignoring important information when using undersampling methods, these methods have also been shown to increase detection performance. For example, near-miss undersampling can optimize the allocation of data samples by randomly removing instances from the majority class to provide a stable data distribution boundary between 2 classes (
      • Gunturi S.K.
      • Sarkar D.
      Ensemble machine learning models for the detection of energy theft.
      ;
      • Budianto I.R.
      • Azaria R.K.
      • Gunawan A.A.S.
      Machine learning-based approach on dealing with binary classification problem in imbalanced financial data.
      ).
      Time windows have been widely used in extant studies, during which if one alarm is generated by the model, an instance is considered correctly detected (
      • de Mol R.M.
      • Kroeze G.H.
      • Achten J.M.F.H.
      • Maatje K.
      • Rossing W.
      Results of a multivariate approach to automated oestrus and mastitis detection.
      ;
      • Cavero D.
      • Tölle K.H.
      • Henze C.
      • Buxadé C.
      • Krieter J.
      Mastitis detection in dairy cows by application of neural networks.
      ;
      • Kramer E.
      • Cavero D.
      • Stamer E.
      • Krieter J.
      Mastitis and lameness detection in dairy cows by application of fuzzy logic.
      ;
      • Miekley B.
      • Traulsen I.
      • Krieter J.
      Mastitis detection in dairy cows: The application of support vector machines.
      ). However, there are some drawbacks to the application of time windows, and one can argue that Se calculated over a time window does not properly reflect a model's capability to detect CM incidents. For example, the levels of Se and Sp will be higher in detection models that use wider time windows for alerts than in models with narrower time windows, but the significant increases in Se and Sp are mainly due to the length of time in which the disease can occur rather than improvements in the model detection (
      • Sherlock R.
      • Hogeveen H.
      • Mein G.
      • Rasmussen M.D.
      Performance evaluation of systems for automated monitoring of udder health: Analytical issues and guidelines.
      ;
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
      ). Models that use large time windows showing high Se might not create an alert in a short enough time interval or at the time required for a management response (
      • Hogeveen H.
      • Ouweltjes W.
      Sensors and management support in high-technology milking.
      ). Furthermore, the definition of time windows varies by study, which, to some extent, increases the difficulty of comparing detection performance among analyses (
      • Kamphuis C.
      • Mollenhorst H.
      • Heesterbeek J.A.P.
      • Hogeveen H.
      Data mining to detect clinical mastitis with automatic milking.
      ). In the current study, the ML framework showed good performance in detecting single milkings with or without CM, which could generate CM alerts for farmers at the milking when CM occurs. Moreover, its potential of predicting CM 1 milking before occurrence would flag cow quarters that are likely to have CM at the next milking with the purpose of warning farmers in advance and helping reduce the negative effects of CM.
      In terms of practical applications, a high number of false alerts is undesirable because it increases the number of labor hours required for mastitis monitoring. A low level of false-positive alerts is essential for farmers (
      • Steeneveld W.
      • van der Gaag L.C.
      • Ouweltjes W.
      • Mollenhorst H.
      • Hogeveen H.
      Discriminating between true-positive and false-positive clinical mastitis alerts from automatic milking systems.
      ) and, under some circumstances, even more important than correctly identifying all CM positive cases (
      • Claycomb R.W.
      • Johnstone P.T.
      • Mein G.A.
      • Sherlock R.A.
      An automated in-line clinical mastitis detection system using measurement of conductivity from foremilk of individual udder quarters.
      ;
      • Mollenhorst H.
      • Rijkaart L.J.
      • Hogeveen H.
      Mastitis alert preferences of farmers milking with automatic milking systems.
      ;
      • Anglart D.
      • Emanuelson U.
      • Rönnegård L.
      • Hallén Sandgren C.
      Detecting and predicting changes in milk homogeneity using data from automatic milking systems.
      ). Our ML framework was able to decrease false-positive alerts compared with previously reported models while keeping Se at a similar or higher level. To our knowledge, no existing works have reported a method for CM prediction that can produce an alert for CM before it occurs. The study by
      • Anglart D.
      • Emanuelson U.
      • Rönnegård L.
      • Hallén Sandgren C.
      Detecting and predicting changes in milk homogeneity using data from automatic milking systems.
      used multilayer perceptron models to predict single milkings containing clots or not, and the highest Se obtained was 25% with an Sp of 98%. The ML framework in this study exhibited a significant increase in Se of 57% and a small decline (5 percentage points) in Sp when predicting per-milking CM 1 milking before CM occurrence, and thus it could be an effective mechanism in AMS farms to forecast CM.
      The results from the developed ML framework are promising in both detecting and predicting CM. There are, however, several methodologies and data features that should be explored to further improve model accuracy and thereby increase the feasibility of adopting this modeling framework on dairy farms. Future work should include expansion of the ML methods evaluated to consider techniques such as the recurrent neural networks reported by
      • Naqvi S.A.
      • King M.T.M.
      • Matson R.D.
      • DeVries T.J.
      • Deardon R.
      • Barkema H.W.
      Mastitis detection with recurrent neural networks in farms using automated milking systems.
      ,
      • Naqvi S.A.
      • King M.T.M.
      • DeVries T.J.
      • Barkema H.W.
      • Deardon R.
      Data considerations for developing deep learning models for dairy applications: A simulation study on mastitis detection.
      ) and development of more robust methods to deal with milking records with missing values, such as imputation methods (
      • Emmanuel T.
      • Maupong T.
      • Mpoeleng D.
      • Semong T.
      • Mphago B.
      • Tabona O.
      A survey on missing data in machine learning.
      ). To overcome the limitation of generalization of the proposed methods and increase the number of CM observations in the imbalanced data, future work should expand the data set by collecting data from more AMS farms and in different seasons. Further, although quarters are often considered to be physiologically distinct and treated independently as in this study, exploration of markers of inflammation and the onset of CM in all 4 quarters, rather than individual quarters, could be an alternative approach to identify cows that are at risk for CM. As AMS sensor technology expands, opportunities to integrate data from other dairy management software exist, and the benefit of additional input variables should also be explored.

      CONCLUSIONS

      The ML framework developed in this study showed the possibility of using imbalanced data recorded by AMS to properly detect CM at the same milking and to predict CM 1 milking before occurrence. Combining the DT-based ensemble models with oversampling techniques achieved a relatively high Se (82%) and Sp (95% for CM detection and 93% for CM prediction). The Se could be increased from 82 to 95% when the Sp level decreased to 80 to 83%, and this situation could be achieved by applying the DT-based ensemble models with undersampling methods. In addition, creating models with AMS data from the past 7 to 9 milkings (approximately 3 d) is recommended to identify positive CM cases for farmers.

      ACKNOWLEDGMENTS

      This project was funded by Cornell Institute for Digital Agriculture (Ithaca, NY). The authors acknowledge and thank the AMS dairy farm, veterinarians, and students who participated and assisted with milk sample and data collection. The authors have not stated any conflicts of interest.

      APPENDIX

      Figure thumbnail gr1
      Figure A1Occurrence (y-axis) of clinical mastitis (CM) observed by DIM in the data. The data were cleaned by removing missing values that were not captured by the automatic milking system and 14 d of milking records after the detection of a positive CM case for a certain quarter of a cow.
      Table A1Statistical summary for key input variables of the automatic milking system (AMS) data
      The data were cleaned by removing missing values that were not captured by the AMS and 14 d of milking records after the detection of a positive clinical mastitis incidence for a certain quarter of a cow.
      VariableNo. of observationsMeanSDMinimum1st quartile2nd quartile (median)3rd quartileMaximum
      Milk yield (kg)
       Negative389,2653.401.450.002.423.264.2712.60
       Positive802.801.760.001.372.713.878.10
      Mean milk flow rate (kg/min)
       Negative389,2651.150.370.000.901.141.384.50
       Positive801.100.450.000.781.081.442.16
      Peak milk flow rate (kg/min)
       Negative389,2651.640.470.001.381.621.925.10
       Positive801.680.620.001.311.622.043.48
      Electrical conductivity (mS/cm)
       Negative389,2654.730.700.004.504.745.0212.06
       Positive805.631.390.005.155.486.018.84
      Box time (min)
       Negative389,2657.132.060.605.706.778.1312.55
       Positive8011.004.674.328.0310.3251.8734.92
      Milking interval (h)
       Negative389,26510.3250.400.006.197.9810.262,476.14
       Positive8011.164.520.017.9810.1413.4424.51
      DIM (d)
       Negative389,265169.01111.500.0073.00161.00248.00556.00
       Positive80142.0098.225.0071.50116.50204.00460.00
      Milking frequency per day
       Negative389,2652.750.841.002.003.003.005.00
       Positive801.580.571.001.002.002.003.00
      1 The data were cleaned by removing missing values that were not captured by the AMS and 14 d of milking records after the detection of a positive clinical mastitis incidence for a certain quarter of a cow.
      Table A2Binary encoding results of representing the cow ID variable by 9 new binary variables
      Cow ID
      Because the data set has too many unique Cow ID (about 400), only some are listed in this table.
      Cow ID 1Cow ID 2Cow ID 3Cow ID 4Cow ID 5Cow ID 6Cow ID 7Cow ID 8Cow ID 9
      1000000001
      2000000010
      399000000011
      450000000100
      568000000101
      674000000110
      710000000111
      771000001000
      792000001001
      795000001010
      796000001011
      825000001100
      828000001101
      859000001110
      12491101111010
      1 Because the data set has too many unique Cow ID (about 400), only some are listed in this table.
      Table A3Model selected for predicting clinical mastitis at the next milking (t+ 1)
      MethodInput data structure
      The current milking event was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      Lagged 3Lagged 5Lagged 7Lagged 9
      99% specificity (Sp) interval on validation fold: [99%, 100%]
       Selected resampling methodSMOTE
      SMOTE = synthetic minority oversampling technique.
      SMOTERandom oversamplingRandom oversampling
       Resampling method settingRatio
      The number of samples in minority class/the number of samples in the majority class.
      = 0.0025
      Ratio = 0.0025Ratio = 0.0025Ratio = 0.0025
       Selected model
      Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      Balanced BHGB
      BHGB = bagging with histogram-based gradient boosting (the number of base estimators is equal to 100).
      Balanced BHGBBalanced BHGBBalanced BHGB
      95% Sp interval on validation fold: [95%, 99%)
       Selected resampling methodSMOTERandom oversamplingRandom oversamplingRandom oversampling
       Resampling method settingRatio = 0.0005Ratio = 0.0005Ratio = 0.0005Ratio = 0.0005
       Selected modelBalanced BHGBBalanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      90% Sp interval on validation fold: [90%, 95%)
       Selected resampling methodRandom undersamplingRandom undersamplingRandom undersamplingRandom undersampling
       Resampling method settingRatio = 0.005Ratio = 0.005Ratio = 0.0007Ratio = 0.001
       Selected modelBalanced BHGBBalanced BXGB
      BXGB = bagging with extreme gradient boosting (the number of base estimators is equal to 100).
      Balanced BLGB
      BLGB = bagging with light gradient boosting (the number of base estimators is equal to 100).
      Balanced BHGB
      85% Sp interval on validation fold: [85%, 90%)
       Selected resampling methodOne-sided selectionRandom undersamplingRandom undersamplingRandom undersampling
       Resampling method settingn_neighbors
      The size of the neighborhood being considered to compute the nearest neighbors.
      = 3, n_seeds_S
      The number of samples to extract to build the set.
      = 5,000
      Ratio = 0.005Ratio = 0.1Ratio = 0.1
       Selected modelBalanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced,” and the number of base estimators is equal to 100).
      1 The current milking event was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      2 SMOTE = synthetic minority oversampling technique.
      3 The number of samples in minority class/the number of samples in the majority class.
      4 Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      5 BHGB = bagging with histogram-based gradient boosting (the number of base estimators is equal to 100).
      6 RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample,” and the number of base estimators is equal to 100).
      7 BXGB = bagging with extreme gradient boosting (the number of base estimators is equal to 100).
      8 BLGB = bagging with light gradient boosting (the number of base estimators is equal to 100).
      9 The size of the neighborhood being considered to compute the nearest neighbors.
      10 The number of samples to extract to build the set.
      11 RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced,” and the number of base estimators is equal to 100).
      Table A4.Model selected for detecting clinical mastitis at the current milking (t)
      MethodInput data structure
      The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      Lagged 3Lagged 5Lagged 7Lagged 9
      99% specificity (Sp) interval on validation fold: [99%, 100%]
      Selected resampling methodRandom oversamplingRandom oversamplingRandom oversamplingADASYN
      ADASYN = adaptive synthetic oversampling method.
      Resampling method settingRatio
      The number of samples in minority class/the number of samples in the majority class.
      = 0.0025
      Ratio = 0.0025Ratio = 0.0025Ratio = 0.0025
      Selected model
      Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      Balanced BHGB
      BHGB = bagging ensemble of histogram-based gradient boosting (the number of base estimators is equal to 100).
      Balanced BHGBBalanced BHGBBalanced BHGB
      95% Sp interval on validation fold: [95%, 99%)
      Selected resampling methodSMOTE
      SMOTE = synthetic minority oversampling technique.
      SMOTESMOTEADASYN
      Resampling method settingRatio = 0.0005Ratio = 0.0005Ratio = 0.0005Ratio = 0.0005
      Selected model
      Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      Balanced BXGB
      BXGB = bagging ensemble of extreme gradient boosting (the number of base estimators is equal to 100).
      Balanced BHGBBalanced BHGBBalanced BLGB
      BLGB = bagging ensemble of light gradient boosting (the number of base estimators is equal to 100).
      90% Sp interval on validation fold: [90%, 95%)
      Selected resampling methodNearMiss (version 1)
      NearMiss (version 1) is an undersampling method that selects examples from the majority class that have the smallest average distance to the 3 closest examples from the minority class.
      Random undersamplingRandom undersamplingRandom undersampling
      Resampling method settingRatio = 0.0007Ratio = 0.0007Ratio = 0.01Ratio = 0.05
      Selected model
      Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      Balanced BHGBBalanced BHGBBalanced BHGBBalanced BHGB
      85% Sp interval on validation fold: [85%, 90%)
      Selected resampling methodNearMiss (version 1)
      NearMiss (version 1) is an undersampling method that selects examples from the majority class that have the smallest average distance to the 3 closest examples from the minority class.
      Random undersamplingNo resamplingRandom undersampling
      Resampling method settingRatio = 0.0007Ratio= 0.01Not applicableRatio = 0.001
      Selected model
      Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “none” and the number of base estimators is equal to 100).
      Balanced RF
      RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample” and the number of base estimators is equal to 100).
      1 The current milking was the milking event at time t. Lagged 3 (5, 7, 9) denotes data from the current milking and past 3 (5, 7, 9) milkings before the current milking.
      2 ADASYN = adaptive synthetic oversampling method.
      3 The number of samples in minority class/the number of samples in the majority class.
      4 Refer to the Scikit-learn package (https://scikit-learn.org/stable/index.html) for other model settings by default.
      5 BHGB = bagging ensemble of histogram-based gradient boosting (the number of base estimators is equal to 100).
      6 SMOTE = synthetic minority oversampling technique.
      7 BXGB = bagging ensemble of extreme gradient boosting (the number of base estimators is equal to 100).
      8 BLGB = bagging ensemble of light gradient boosting (the number of base estimators is equal to 100).
      9 NearMiss (version 1) is an undersampling method that selects examples from the majority class that have the smallest average distance to the 3 closest examples from the minority class.
      10 RF = random forest (the strategy used to assign weights to majority and minority classes is “balanced subsample” and the number of base estimators is equal to 100).
      11 RF = random forest (the strategy used to assign weights to majority and minority classes is “none” and the number of base estimators is equal to 100).

      REFERENCES

        • Agusta Z.P.
        • Adiwijaya A.
        Modified balanced random forest for improving imbalanced data prediction.
        Int. J. Adv. Intell. Informatics. 2019; 5: 58-65
        • Ali H.
        • Salleh M.N.M.
        • Saedudin R.
        • Hussain K.
        • Mushtaq M.F.
        Imbalance class problems in data mining: A review.
        Indones. J. Electr. Eng. Comput. Sci. 2019; 14: 1560-1571
        • Ali L.
        • Khan S.U.
        • Golilarz N.A.
        • Yakubu I.
        • Qasim I.
        • Noor A.
        • Nour R.
        A feature-driven decision support system for heart failure prediction based on statistical model and Gaussian naive Bayes.
        Comput. Math. Methods Med. 2019; 2019 (31885684)6314328
        • Anglart D.
        • Emanuelson U.
        • Rönnegård L.
        • Hallén Sandgren C.
        Detecting and predicting changes in milk homogeneity using data from automatic milking systems.
        J. Dairy Sci. 2021; 104 (34218914): 11009-11017
        • Ankinakatte S.
        • Norberg E.
        • Løvendahl P.
        • Edwards D.
        • Højsgaard S.
        Predicting mastitis in dairy cows using neural networks and generalized additive models: A comparison.
        Comput. Electron. Agric. 2013; 99: 1-6
        • Bach M.
        • Werner A.
        • Palt M.
        The proposal of undersampling method for learning from imbalanced datasets.
        Procedia Comput. Sci. 2019; 159: 125-134
        • Bach M.
        • Werner A.
        • Żywiec J.
        • Pluskiewicz W.
        The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis.
        Inf. Sci. 2017; 384: 174-190
        • Bar D.
        • Gröhn Y.T.
        • Bennett G.
        • González R.N.
        • Hertl J.A.
        • Schulte H.F.
        • Tauer L.W.
        • Welcome F.L.
        • Schukken Y.H.
        Effect of repeated episodes of generic clinical mastitis on milk yield in dairy cows.
        J. Dairy Sci. 2007; 90 (17881685): 4643-4653
        • Bar D.
        • Tauer L.W.
        • Bennett G.
        • González R.N.
        • Hertl J.A.
        • Schukken Y.H.
        • Schulte H.F.
        • Welcome F.L.
        • Gröhn Y.T.
        The cost of generic clinical mastitis in dairy cows as estimated by using dynamic programming.
        J. Dairy Sci. 2008; 91 (18487643): 2205-2214
        • Barkema H.W.
        • Schukken Y.H.
        • Lam T.J.G.M.
        • Beiboer M.L.
        • Wilmink H.
        • Benedictus G.
        • Brand A.
        Incidence of clinical mastitis in dairy herds grouped in three categories by bulk milk somatic cell counts.
        J. Dairy Sci. 1998; 81 (9532494): 411-419
        • Berrar D.
        Cross-validation.
        Encycl. Bioinform. Comput. Biol. 2019; 1: 542-545
        • Blaszczyński J.
        • Stefanowski J.
        Actively balanced bagging for imbalanced data.
        in: Proc. Int. Symp. Methodologies for Intelligent Systems. Springer, 2017: 271-281
        • Breiman L.
        Bagging predictors.
        Mach. Learn. 1996; 24: 123-140
        • Breiman L.
        Random forests.
        Mach. Learn. 2001; 45: 5-32
        • Budianto I.R.
        • Azaria R.K.
        • Gunawan A.A.S.
        Machine learning-based approach on dealing with binary classification problem in imbalanced financial data.
        in: 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). IEEE, 2022: 152-156
        • Cateni S.
        • Colla V.
        • Vannucci M.
        A method for resampling imbalanced datasets in binary classification tasks for real-world problems.
        Neurocomputing. 2014; 135: 32-41
        • Cavero D.
        • Tölle K.H.
        • Henze C.
        • Buxadé C.
        • Krieter J.
        Mastitis detection in dairy cows by application of neural networks.
        Livest. Sci. 2008; 114: 280-286
        • Cha E.
        • Hertl J.A.
        • Schukken Y.H.
        • Tauer L.W.
        • Welcome F.L.
        • Gröhn Y.T.
        The effect of repeated episodes of bacteria-specific clinical mastitis on mortality and culling in Holstein dairy cows.
        J. Dairy Sci. 2013; 96 (23769361): 4993-5007
        • Chen C.
        • Liaw A.
        • Breiman L.
        Using random forest to learn imbalanced data. Tech. Rep. 666.
        Department of Statistics, University of California, 2004
        • Chen X.
        • Huang L.
        • Xie D.
        • Zhao Q.
        EGBMMDA: Extreme gradient boosting machine for MiRNA-disease association prediction.
        Cell Death Dis. 2018; 9 (29305594): 3
        • Claycomb R.W.
        • Johnstone P.T.
        • Mein G.A.
        • Sherlock R.A.
        An automated in-line clinical mastitis detection system using measurement of conductivity from foremilk of individual udder quarters.
        N. Z. Vet. J. 2009; 57 (19649014): 208-214
        • de Mol R.M.
        • Kroeze G.H.
        • Achten J.M.F.H.
        • Maatje K.
        • Rossing W.
        Results of a multivariate approach to automated oestrus and mastitis detection.
        Livest. Prod. Sci. 1997; 48: 219-227
        • Deo R.C.
        Machine learning in medicine.
        Circulation. 2015; 132 (26572668): 1920-1930
        • Dev V.A.
        • Eden M.R.
        Gradient boosted decision trees for lithology classification.
        Computer-Aided Chem. Eng. 2019; 47: 113-118
        • Dhoble A.S.
        • Ryan K.T.
        • Lahiri P.
        • Chen M.
        • Pang X.
        • Cardoso F.C.
        • Bhalerao K.D.
        Cytometric fingerprinting and machine learning (CFML): A novel label-free, objective method for routine mastitis screening.
        Comput. Electron. Agric. 2019; 162: 505-513
        • Doupe P.
        • Faghmous J.
        • Basu S.
        Machine learning for health services researchers.
        Value Health. 2019; 22 (31277828): 808-815
        • Emmanuel T.
        • Maupong T.
        • Mpoeleng D.
        • Semong T.
        • Mphago B.
        • Tabona O.
        A survey on missing data in machine learning.
        J. Big Data. 2021; 8 (34722113): 140
        • Erskine R.J.
        • Wagner S.
        • DeGraves F.J.
        Mastitis therapy and pharmacology.
        Vet. Clin. North Am. Food Anim. Pract. 2003; 19 (12682938): 109-138
        • Fadul-Pacheco L.
        • Delgado H.
        • Cabrera V.E.
        Exploring machine learning algorithms for early prediction of clinical mastitis.
        Int. Dairy J. 2021; 119105051
        • Fatima M.
        • Pasha M.
        Survey of machine learning algorithms for disease diagnostic.
        J. Intell. Learn. Syst. Appl. 2017; 9: 1-16
        • Green M.J.
        • Green L.E.
        • Medley G.F.
        • Schukken Y.H.
        • Bradley A.J.
        Influence of dry period bacterial intramammary infection on clinical mastitis in dairy cows.
        J. Dairy Sci. 2002; 85 (12416812): 2589-2599
        • Gunturi S.K.
        • Sarkar D.
        Ensemble machine learning models for the detection of energy theft.
        Electr. Power Syst. Res. 2021; 192106904
        • Hackeling G.
        Mastering Machine Learning with Scikit-Learn.
        Packt Publishing Ltd, 2017
        • Hamilton J.D.
        Autoregressive processes.
        in: Mickey R. Time Series Analysis. Princeton University Press, 2020: 53-59
        • Hertl J.A.
        • Schukken Y.H.
        • Welcome F.L.
        • Tauer L.W.
        • Gröhn Y.T.
        Pathogen-specific effects on milk yield in repeated clinical mastitis episodes in holstein dairy cows.
        J. Dairy Sci. 2014; 97 (24418269): 1465-1480
        • Hillerton J.E.
        Detecting mastitis cow-side.
        in: National Mastitis Council 39th Annual Meeting, Atlanta, GA. Natl. Mastitis Counc, 2000: 48-53
        • Hogeveen H.
        • Kamphuis C.
        • Steeneveld W.
        • Mollenhorst H.
        Sensors and clinical mastitis—The quest for the perfect alert.
        Sensors (Basel). 2010; 10 (22163637): 7991-8009
        • Hogeveen H.
        • Klaas I.C.
        • Dalen G.
        • Honig H.
        • Zecconi A.
        • Kelton D.F.
        • Mainar M.S.
        Novel ways to use sensor data to improve mastitis management.
        J. Dairy Sci. 2021; 104 (34304877): 11317-11332
        • Hogeveen H.
        • Ouweltjes W.
        Sensors and management support in high-technology milking.
        J. Anim. Sci. 2003; 81 (15000401): 1-10
        • Hogg R.V.
        • Tanis E.A.
        • Zimmerman D.L.
        Distributions of functions of random variables.
        in: Probability and Statistical Inference. 9th ed. Pearson Education, 2015: 163-217
        • Hossain S.M.M.
        • Deb K.
        Plant leaf disease recognition using histogram based gradient boosting classifier.
        in: International Conference on Intelligent Computing & Optimization 2020. Springer, 2021: 530-545
        • Hyde R.M.
        • Down P.M.
        • Bradley A.J.
        • Breen J.E.
        • Hudson C.
        • Leach K.A.
        • Green M.J.
        Automated prediction of mastitis infection patterns in dairy herds using machine learning.
        Sci. Rep. 2020; 10 (32152401)4289
        • ISO
        ISO 20966:2007: Automatic milking systems—requirements and testing. Annex C: Example of methods of evaluating detection systems for milk deemed as abnormal due to blood or to changes in homogeneity.
        International Organization for Standardization (ISO), 2007
        • Jackson E.
        • Agrawal R.
        Performance evaluation of different feature encoding schemes on cybersecurity logs.
        in: 2019 SoutheastCon. IEEE, 2019: 1-9
        • Jamali H.
        • Barkema H.W.
        • Jacques M.
        • Lavallée-Bourget E.M.
        • Malouin F.
        • Saini V.
        • Stryhn H.
        • Dufour S.
        Invited review: Incidence, risk factors, and effects of clinical mastitis recurrence in dairy cows.
        J. Dairy Sci. 2018; 101 (29525302): 4729-4746
        • Kamphuis C.
        • Mollenhorst H.
        • Feelders A.
        • Pietersma D.
        • Hogeveen H.
        Decision-tree induction to detect clinical mastitis with automatic milking.
        Comput. Electron. Agric. 2010; 70: 60-68
        • Kamphuis C.
        • Mollenhorst H.
        • Heesterbeek J.A.P.
        • Hogeveen H.
        Data mining to detect clinical mastitis with automatic milking.
        in: Proc. 5th IDF Mastitis Conf.: Mastitis Research into Practice, Christchurch, New Zealand. New Zealand Veterinary Association For Continuing Education Inc, 2010: 568-573
        • Kamphuis C.
        • Mollenhorst H.
        • Heesterbeek J.A.P.
        • Hogeveen H.
        Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.
        J. Dairy Sci. 2010; 93 (20655431): 3616-3627
        • Ke G.
        • Meng Q.
        • Finley T.
        • Wang T.
        • Chen W.
        • Ma W.
        • Ye Q.
        • Liu T.Y.
        Lightgbm: A highly efficient gradient boosting decision tree.
        in: Proc. 31st Int. Conf. Neural Inf. Process. Syst. Curran Associates Inc, 2017: 3146-3154
        • Khanna D.
        • Sahu R.
        • Baths V.
        • Deshpande B.
        Comparative study of classification techniques (SVM, logistic regression and neural networks) to predict the prevalence of heart disease.
        Int. J. Mach. Learn. Comput. 2015; 5: 414-419
        • Khatun M.
        • Thomson P.C.
        • Kerrisk K.L.
        • Lyons N.A.
        • Clark C.E.F.
        • Molfino J.
        • García S.C.
        Development of a new clinical mastitis detection method for automatic milking systems.
        J. Dairy Sci. 2018; 101 (30055925): 9385-9395
        • Kobyliński L.
        • Przepiórkowski A.
        Definition extraction with balanced random forests.
        in: Int. Conf. Natural Language Processing. Springer, 2008: 237-247
        • Kramer E.
        • Cavero D.
        • Stamer E.
        • Krieter J.
        Mastitis and lameness detection in dairy cows by application of fuzzy logic.
        Livest. Sci. 2009; 125: 92-96
        • Leslie K.E.
        • Petersson-Wolfe C.S.
        Assessment and management of pain in dairy cows with clinical mastitis.
        Vet. Clin. North Am. Food Anim. Pract. 2012; 28 (22664209): 289-305
        • Lund T.
        • Miglior F.
        • Dekkers J.C.M.
        • Burnside E.B.
        Genetic relationships between clinical mastitis, somatic cell count, and udder conformation in Danish Holsteins.
        Livest. Prod. Sci. 1994; 39: 243-251
        • Machin D.
        • Campbell M.J.
        • Tan S.B.
        • Tan S.H.
        Sample Sizes for Clinical, Laboratory and Epidemiology Studies.
        4th ed. John Wiley & Sons, 2018
        • Metsis V.
        • Androutsopoulos I.
        • Paliouras G.
        Spam filtering with naive Bayes-which naive Bayes?.
        in: Third Conf. Email and Anti-Spam. CEAS, 2006: 28-69
        • Michie C.
        • Andonovic I.
        • Davison C.
        • Hamilton A.
        • Tachtatzis C.
        • Jonsson N.
        • Duthie C.A.
        • Bowen J.
        • Gilroy M.
        The Internet of Things enhancing animal welfare and farm operational efficiency.
        J. Dairy Res. 2020; 87 (33213573): 20-27
        • Miekley B.
        • Traulsen I.
        • Krieter J.
        Detection of mastitis and lameness in dairy cows using wavelet analysis.
        Livest. Sci. 2012; 148: 227-236
        • Miekley B.
        • Traulsen I.
        • Krieter J.
        Mastitis detection in dairy cows: The application of support vector machines.
        J. Agric. Sci. 2013; 151: 889-897
        • Milner P.
        • Page K.L.
        • Hillerton J.E.
        The effects of early antibiotic treatment following diagnosis of mastitis detected by a change in the electrical conductivity of milk.
        J. Dairy Sci. 1997; 80 (9178126): 859-863
        • Milner P.
        • Page K.L.
        • Walton A.W.
        • Hillerton J.E.
        Detection of clinical mastitis by changes in electrical conductivity of foremilk before visible changes in milk.
        J. Dairy Sci. 1996; 79 (8675786): 83-86
        • Moeyersoms J.
        • Martens D.
        Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector.
        Decis. Support Syst. 2015; 72: 72-81
        • Mollenhorst H.
        • Rijkaart L.J.
        • Hogeveen H.
        Mastitis alert preferences of farmers milking with automatic milking systems.
        J. Dairy Sci. 2012; 95 (22541479): 2523-2530
        • Mostert P.F.
        • Bokkers E.A.M.
        • de Boer I.J.M.
        • van Middelaar C.E.
        Estimating the impact of clinical mastitis in dairy cows on greenhouse gas emissions using a dynamic stochastic simulation model: A case study.
        Animal. 2019; 13 (31210122): 2913-2921
        • Naqvi S.A.
        • King M.T.M.
        • DeVries T.J.
        • Barkema H.W.
        • Deardon R.
        Data considerations for developing deep learning models for dairy applications: A simulation study on mastitis detection.
        Comput. Electron. Agric. 2022; 196106895
        • Naqvi S.A.
        • King M.T.M.
        • Matson R.D.
        • DeVries T.J.
        • Deardon R.
        • Barkema H.W.
        Mastitis detection with recurrent neural networks in farms using automated milking systems.
        Comput. Electron. Agric. 2022; 192106618
        • Norberg E.
        • Hogeveen H.
        • Korsgaard I.R.
        • Friggens N.C.
        • Sloth K.H.M.N.
        • Løvendahl P.
        Electrical conductivity of milk: Ability to predict mastitis status.
        J. Dairy Sci. 2004; 87 (15259246): 1099-1107
        • Olson D.L.
        • Delen D.
        Support vector machines.
        in: Advanced Data Mining Techniques. Springer Science & Business Media, 2008: 111-122
        • Park S.
        • Park H.
        Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic.
        Computing. 2021; 103: 401-424
        • Pedregosa F.
        • Varoquaux G.
        • Gramfort A.
        • Michel V.
        • Thirion B.
        • Grisel O.
        • Blondel M.
        • Prettenhofer P.
        • Weiss R.
        • Dubourg V.
        Scikit-Learn: Machine learning in Python.
        J. Mach. Learn. Res. 2011; 12: 2825-2830
        • Petersson-Wolfe C.S.
        • Leslie K.E.
        • Swartz T.H.
        An update on the effect of clinical mastitis on the welfare of dairy cows and potential therapies.
        Vet. Clin. North Am. Food Anim. Pract. 2018; 34 (30316508): 525-535
        • Rasmussen M.D.
        Visual scoring of clots in foremilk.
        J. Dairy Res. 2005; 72 (16223455): 406-414
        • Rennie J.D.
        • Shih L.
        • Teevan J.
        • Karger D.R.
        Tackling the poor assumptions of naive Bayes text classifiers.
        in: Proc. 20th Int. Conf. Machine Learning (ICML-03). ICML, 2003: 616-623
        • Rojarath A.
        • Songpan W.
        • Pong-inwong C.
        Improved ensemble learning for classification techniques based on majority voting.
        in: 7th IEEE Int. Conf. Software Engineering and Service Science (ICSESS). IEEE, 2016: 107-110
        • Rollin E.
        • Dhuyvetter K.C.
        • Overton M.W.
        The cost of clinical mastitis in the first 30 days of lactation: An economic modeling tool.
        Prev. Vet. Med. 2015; 122 (26596651): 257-264
        • Rutten C.J.
        • Velthuis A.G.J.
        • Steeneveld W.
        • Hogeveen H.
        Invited review: Sensors to support health management on dairy farms.
        J. Dairy Sci. 2013; 96 (23462176): 1928-1952
        • Sagi O.
        • Rokach L.
        Ensemble learning: A survey.
        Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018; 8e1249
        • Schapire R.E.
        Explaining Adaboost.
        in: Empirical Inference. Springer, 2013: 37-52
        • Schölkopf B.
        • Smola A.J.
        Learning with kernels: Support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning Series.
        MIT Press, 2018
        • Seger C.
        An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing.
        (MS Thesis) School of Electrical Engineering and Computer Science, KTH-Roy. Inst. Technol., Stockholm, Sweden2018
        • Sepúlveda-Varas P.
        • Proudfoot K.L.
        • Weary D.M.
        • von Keyserlingk M.A.G.
        Changes in behaviour of dairy cows with clinical mastitis.
        Appl. Anim. Behav. Sci. 2016; 175: 8-13
        • Shekar B.H.
        • Dagnew G.
        Grid search-based hyperparameter tuning and classification of microarray cancer data.
        in: 2019 Second International Conf. on Advanced Comput. and Commun. Paradigms (ICACCP). IEEE, 2019: 1-8
        • Sherlock R.
        • Hogeveen H.
        • Mein G.
        • Rasmussen M.D.
        Performance evaluation of systems for automated monitoring of udder health: Analytical issues and guidelines.
        in: Lam T.J.G.M. Mastitis Control—From Science to Practice. Wageningen Academic Publishers, 2008: 275-282
        • Shin S.
        • Austin P.C.
        • Ross H.J.
        • Abdel-Qadir H.
        • Freitas C.
        • Tomlinson G.
        • Chicco D.
        • Mahendiran M.
        • Lawler P.R.
        • Billia F.
        • Gramolini A.
        • Epelman S.
        • Wang B.
        • Lee D.S.
        Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality.
        ESC Heart Fail. 2021; 8 (33205591): 106-115
        • Steeneveld W.
        • van der Gaag L.C.
        • Ouweltjes W.
        • Mollenhorst H.
        • Hogeveen H.
        Discriminating between true-positive and false-positive clinical mastitis alerts from automatic milking systems.
        J. Dairy Sci. 2010; 93 (20494164): 2559-2568
        • Sun Z.
        • Samarasinghe S.
        • Jago J.
        Detection of mastitis and its stage of progression by automatic milking systems using artificial neural networks.
        J. Dairy Res. 2010; 77 (20030900): 168-175
        • USDA
        Dairy 2014, Milk Quality, Milking Procedures, and Mastitis in the United States, 2014.
        • VanRossum G.
        • Drake F.L.
        The Python Language Reference.
        Python Software Foundation, 2010
        • Vembandasamy K.
        • Sasipriya R.
        • Deepa E.
        Heart diseases detection using naive Bayes algorithm.
        Int. J. Innov. Sci. Eng. Technol. 2015; 2: 441-444
        • Xiao Z.
        • Wang Y.
        • Fu K.
        • Wu F.
        Identifying different transportation modes from trajectory data using tree-based ensemble classifiers.
        ISPRS Int. J. Geoinf. 2017; 6: 57
        • Yadav S.
        • Shukla S.
        Analysis of K-fold cross-validation over hold-out validation on colossal datasets for quality classification.
        in: 2016 IEEE 6th International Int. Conf. Adv. Comput (IACC). IEEE, 2016: 78-83
        • Yang P.
        • Yang Y.H.
        • Zhou B.B.
        • Zomaya A.Y.
        A review of ensemble methods in bioinformatics.
        Curr. Bioinform. 2010; 5: 296-308
        • Yu W.
        • Liu T.
        • Valdez R.
        • Gwinn M.
        • Khoury M.J.
        Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes.
        BMC Med. Inform. Decis. Mak. 2010; 10 (20307319): 16
        • Zareapoor M.
        • Shamsolmoali P.
        Application of credit card fraud detection: based on bagging ensemble classifier.
        Procedia Comput. Sci. 2015; 48: 679-685
        • Zeng X.
        • Martinez T.R.
        Distribution-balanced stratified cross-validation for accuracy estimation.
        J. Exp. Theor. Artif. Intell. 2000; 12: 1-12
        • Zhang J.
        • Mani I.
        KNN approach to unbalanced data distributions: A case study involving information extraction.
        in: Proceedings of Workshop on Learning from Imbalanced Datasets II. ICML, 2003: 1-7
        • Zhang Y.
        • Li X.
        • Gao L.
        • Wang L.
        • Wen L.
        Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning.
        J. Manuf. Syst. 2018; 48: 34-50
        • Zheng A.
        • Casari A.
        Categorical variables: Counting eggs in the age of robotic chickens.
        in: Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O'Reilly Media Inc, 2018: 77-97