Data driven prediction of dairy cattle lifetime production and its use as a guideline to select surplus youngstock

The lifetime production of dairy cows is a complex trait influenced not only by genetics, but also by the environment in which a cow lives and the management practices of the farmer. Moreover, these influential factors show complex interactions with each other, making it difficult to reliably predict the lifetime production of individual animals at birth. However, since well managed dairy farms often have a surplus of youngstock, reliable lifetime production predictions would offer the opportunity to make more substantiated decisions when selecting calves or heifers to sell. Therefore, using data from Dutch herds, we constructed a data set capturing information on genetics, environment and management practices to develop multiple machine learning models capable of predicting the lifetime production of dairy cattle soon after birth. We found that a coupling of trends observed at the country level with farm-specific models largely outperforms off-the-shelf approaches. At birth, our best model could explain up to 47% of the variance in lifetime production, a considerable improvement in comparison with linear regression on the breeding values supplemented with the average lifetime production at farm level, which could only explain 21.7% of the variance in lifetime production. Moreover, we demonstrated surplus youngstock selection according to our model could more than double the surplus animal selection effect in comparison with the benchmark methodology, offering opportunities to increase the average (future) potential lifetime production of the retained heifers significantly. Assuming a static 20% surplus liveborn heifer scenario and random surplus animal selection as the default, our best model for surplus animal selection resulted in a 9.4% greater lifetime production in the retained animals compared with the current Dutch average lifetime production.


INTRODUCTION
The lifetime production of dairy cows is one of the key performance indicators (KPI) used to evaluate management quality within the dairy sector (Adamczyk et al., 2017).Similar to other KPIs in agriculture, a large variability is observed for this KPI, both at the inter-farm-level and at the inter-animal level, e.g., phenotypic observations for lifetime production range from values lower than 1000 kg fat and protein corrected milk (FPCM) to more than 100,000 kg FPCM.In this prospect, reliable models that take into account both genomic and (future) environmental conditions to predict the expected lifetime production of dairy cows at young age, could offer huge opportunities to improve the sustainability of the dairy sector.This could be realized by exploiting these predictive models to select surplus heifer calves, retaining only those animals which are expected to realize a reasonably high lifetime production.In the past, only a limited number of studies explored the potential of using genomic breeding values to select surplus youngstock (Weigel et al., 2012), but none of them was validated with real data.Moreover, while various environmental influences on the lifetime production of dairy cows were identified in the past (Alvåsen et al., 2018), the expected environmental conditions were never taken into account in these models.In this paper, we present a data-driven approach to predict the lifetime production of dairy cattle using both genomic and environmental data.Moreover, we explore the added value of lifetime production prediction regarding the selective removal of inferior surplus youngstock, with a goal of improving potential lifetime production of Dutch dairy cattle at farm-level.
The lifetime production of dairy cattle is largely influenced by the animal's production level and its longevity.For both of these traits, extensive research is available, disentangling the genetic and non-genetic component of observed phenotypes.With regard to the milk production level, 305d-milk production is the most commonly used trait.One of the important factors influencing milk production is the parity, since dairy cows are not yet fully mature when they calve for the first time at approximately 2 years of age (Hansen et al., 2006;Lee and Kim, 2006).Hence, during genetic evaluations, milk production levels of different lactations are mostly considered as being different traits.Moreover, the milk production level is also influenced by the milking regimen (Campos et al., 1994;Klei et al., 1997;Smith et al., 2002), age at calving (Berry and Cromie, 2009;Nor et al., 2013;Pirlo et al., 2000) and pregnancy state (Loker et al., 2009;Olori et al., 1997).When breeding values are calculated for 305d-milk production, the former factors are incorporated into a statistical model as fixed effects and supplemented with a herd-year-season effect (CRV, 2023).Breeding values for milk production resulting from these models have reasonable heritabilities, which are slightly higher during first lactation (h 2 = 0.43-0.48),compared with later lactations (h 2 = 0.29-0.41)(CRV, 2023;NAV, 2013).Longevity, the other important trait with regard to the lifetime production, has a much lower heritability (h 2 = 0.06-0.14)(CRV, 2020a;GenEval, 2021;Pedersen et al., 2010;Pritchard et al., 2013;vit, 2023) indicating a large influence of environmental factors, some of which are difficult to consider during genetic evaluations.Part of the nongenetic variance in longevity finds its origin in differences in management practices between farms, but also the age (and mindset) of farmers and geographical region in which the farm is located have been found to be influential (Alvåsen et al., 2018).Still, the vast majority of the variance in longevity of dairy cows remains unexplained.While intuitively appealing, multiplying the breeding values of 305d-milk production and longevity does not allow farmers to evaluate the genetic potential of an animal to achieve a high lifetime production, because breeding values are mostly centralized, i.e., the population mean is equal to zero (Henderson, 1975).Hence, multiplying the breeding value for milk production with the one for longevity does not make any sense from a mathematical point of view.However, a direct breeding value for lifetime production based on actual lifetime production data is not yet available, hence there are also no heritability estimates.As an alternative, some breeding organizations publish an indirect breeding value for lifetime production, based on the breeding values for production (milk, fat and protein), maturity rate, longevity and calving interval (CRV, 2020b;De Jong, 2014).
Part of the unexplained variance for longevity can be attributed to unforeseeable events and coincidence, making it difficult to predict the lifetime production of dairy cattle.However, complex interactions between the animal's genome and the environment should not be neglected when studying dairy cow (lifetime) milk production (Gerber et al., 2008;Windig et al., 2006).In contrast to breeding value estimation, where models are constructed to distinguish the genetic effect from environmental effects, we aim to generate predictions of the lifetime production that are of direct use for farmers to select surplus youngstock at their own farm.Hence, we integrate environmental effects, as well as their interactions with the breeding values within our models.Therefore, we applied a profound data preprocessing to reveal relevant farm-related environmental influences and genotype by environment interactions.The resulting data set is used to develop several machine learning models, able to handle the high-dimensional data and consider complex interactions between variables within the data set.
Since only a limited number of studies have examined predictive models with regard to the lifetime production of dairy cattle (Kelleher et al., 2015), the possibilities to benchmark our models with previous research are limited.However, our study is, to the best of our knowledge, the first that allows to examine how much of the variability in lifetime production can be explained by data-driven models based on data available soon after birth.Moreover, we quantify the effect of using these predictive models during selection of surplus youngstock on the average realized lifetime production of the retained animals.Therefore, we evaluate the obtained machine learning models not only by means of their predictive performance, but also based on a simulation study during which their practical applicability to select surplus youngstock is examined.

DATA
The data used in this study was provided by CRV and was collected in the context of breeding value estimations.Data was extracted from the database of CRV in January 2020.The provided raw data set contained pedigree information, genomic breeding values (BV), birth-and culling dates, DHI records (DHI = dairy herd improvement), 305d productions, fertility events and calving data for various numbers of animals, born between 1912 and 2019 (Table 1).All animals in the data set for which genomic breeding values were available, were born in Dutch herds and had at least 75% Holstein Friesian genes.Genomic breeding values were available for 74 distinct traits related to both (re)production and conformation.All milk recording data was collected in the period 1990 -2019 and contained 10 variables per record: parity, calving date, days in milk (DIM), kg milk, %fat, %protein, somatic cell count (SCC), urea concentration, number of milkings per day (2, 3 or 9 = robot) and state of the cow (normal, triple teat, in heat, mastitis, sampling impossible, ill, just calved, calved too early).

DATA SELECTION AND PREPROCESSING
DHI records/305d productions The DHI records were filtered as follows: first, all records at which a cow was assigned a state other than normal or triple teat were discarded.Second, lactations where the first DHI record was collected after 75 DIM were removed.Third, only DHI records of animals who spend their whole life on a single farm were retained in the data set.Based on the measured milk production, the percentages fat and protein were converted to daily productions expressed in kg fat and kg protein.Finally, 7 variables were retained for each DHI record: parity, DIM, kg milk (kgM), kg fat (kgF), kg protein (kgP), SCC and number of milkings per day.
To be able to determine the realized milk production at various moments in the life of a cow, the mixed log model in Eq. ( 1) proposed by Guo and Swalve (1995) was fitted to the production records (expressed in kgM, kgF and kgP) using least squares regression for each lactation and the resulting parameters were stored (data set A).In Eq. ( 1), time t is expressed as DIM, Y(t) is the production record at time t, and β 0 , β 1 , and β 2 are model coefficients estimated using least squares regression.
When the 305d-productions provided by CRV (kgM, kgF and kgP) were compared with those obtained by integration of the mixed log model, slight deviations were observed (R 2 = 0.98, 0.97 and 0.97 for kgM, kgF and kgP respectively).Therefore, the 305d productions provided by CRV were retained for use later on (data set B).

Epigenetic effects
Two epigenetic effects were considered: the weight at birth and the parity of the mother.Parities exceeding 3 were grouped, resulting in 4 parity groups.Animals for which the parity of the mother was unknown, were assigned to a fifth parity group.Animals for which the birth weight was unknown, were assigned the average birth weight of the female calves with known weight within the data set (40.34 kg) (data set C).

Farm parameters
Variables to characterize the farm environment were derived at 3 levels: i) production, ii) reproduction and iii) replacement management.The production-related farm-level variables concern the SCC, the 305d production, expressed in kgM, kgF, kgP, %F and %P.For these parameters, 5 descriptive statistics were computed: mean, standard deviation, 25% percentile, median and 75% percentile.Additionally, the milking regimen was quantified by 3 variables indicating the fraction of DHI recordings according to a specific milking regimen (twice daily, trice daily or robot).The farm specific production-related variables were recalculated for each first day of the month, based on DHI data of the previous 2 years.Data of the previous 2 years was used to mitigate random yearly fluctuations in the production parameters.All statistics were determined on the herd level, as well as on the parity level.For the parity level, the animals were subdivided in 3 groups according to their parity (1, 2, ≥ 3).If one of the parity groups contained less than 10 animals, the record for that month was discarded (data set D).
Similarly, reproduction-related parameters were calculated on the farm-level for each first day of the month based on all finished lactations of the past 2 years.Again, the animals were grouped into 3 parity groups (0, 1 and ≥ 2).If applicable for a specific parity group, descriptive statistics were calculated for the age at first insemination, interval calving-first insemination, number of inseminations per pregnancy and calving interval.As for the production-related farm-level variables, the following descriptive statistics were computed: mean, standard deviation, 25%-percentile, median and 75% percentile.If for one of the parity groups less than 10 animals were available, the record for that month was discarded.Additionally, 4 variables were added to indicate the usage of natural mating for heifers/cows (factor variables) and the fraction of natural matings on the total number of inseminations/matings for heifers/cows (data set E).
In contrast to the production and reproduction-related parameters, the farm specific parameters describing the replacement management were computed only once for each farm within the data set, based on the culled animals within the 5 year period between 2014 and 2018.Both the replacement rate and the parity-distribution of the culled animals were computed.The replacement rate (RR) was computed using Eqn.(2), in which c i indicates the number of animals culled during year i and n i indicates the number of animals present at the first of January of year i.When computing the parity distributions, animals which had a parity higher than 5 at culling were pooled, resulting in 6 parity groups.Farms which culled less than 50 animals in the period between 2014 and 2018 were not retained (data set F).

Key performance indicators
Based on the preprocessed data, 16 key performance indicators were calculated for each animal.For the first parity, the 305d-production was calculated, expressed in kgM, kgF, kgP, kg F+P, %F, %P, kg fat corrected milk (FCM) using Eqn.(3) and kg FPCM using Eqn.(4) (CVB, 2016).To calculate these performance indicators, the 305dproductions were used (data set B). Furthermore, the calving interval (CI) between the first and second calving was determined.In addition to the variables related to the first parity, the production, expressed in kg FPCM at 3, 4, 5, 6, 7 and 8 years of age was computed by integrating the previously determined lactation curves described by the parameters in data set A. If an animal did not reach the age under consideration, the corresponding KPI was omitted.Similarly, the lifetime production, expressed in kg FPCM was calculated, again based on the previously determined lactation curves.To avoid an under-representation of long living animals within the series of calculated lifetime productions, the lifetime production was truncated at 9 years of age and only animals which were born before 2011 and thus could reach the age of 9 years at 31 December 2019 were considered during the calculation of lifetime productions.For animals which became older than 9 years (<2% of all animals) this means the milk produc-tion they realized after reaching the age of 9, was not accounted for during the calculation of their lifetime production.The mean and standard deviation of the 16 computed KPIs are listed in Table 2 (data set G).

Interaction breeding value × farm
To model the interaction between the farm and the breeding values, a 2-phased approach was used.First, after standardizing (mean centering and rescaling to a standard deviation of one) all breeding values and KPIs (data set G) at the national level, for each combination of a } the simple linear regression model given by Eq. ( 5) was fitted to predict the key performance indicator y ijkl based on the breeding value 5), a ijk , b ijk , and ∈ ijkl are the intercept, slope and residual of the model, respectively.
For each KPI, this resulted in 74 slopes b ijk per farm, quantifying the interaction between breeding values and the farm environment.During the second step, for each KPI, principal component analysis was applied on these 74 regression coefficient variables to reduce the number of variables to 10 per KPI, representing between 62% (kg FPCM at 8 years of age) and 79% (305d kg P first parity) of total variance in the original 74 variables per KPI.This resulted in a total of 160 variables quantifying the possible interaction between farms and genetics/breeding values (data set H).

Data set assembly
The previous sections describe how the raw data records (Table 1) were preprocessed and transformed into a set of data-tables (A -H).To fit machine learning models and evaluate their predictive performance with regard to lifetime production prediction at birth, an overall data set was assembled by joining all data available at birth: breeding values, month of birth, milking regimens during life, epigenetic effects (data set C), farm specific variables (data sets D -F) and breeding value × farm interaction effects (data set H).The information on the milking regimens was transformed into a discrete distribution describing the number of DHI-measurements that would be carried out under a twice, trice or robot milking regimen, this to avoid leakage of data concerning the real culling age into the data set.After all, the culling age is unknown at birth of the animals, and therefore, any indication of the culling age in the data set has to be removed.Concerning the farm specific variables (data sets D -F), the most recent record before the birthdate of the animal was added to the data set.Animals for which the most recent farm specific variables were calculated more than a month before the birthdate, were not retained in the data set.The month of birth was added to allow the models to consider possible seasonal effects.Animals for which the record contained missing data were not retained.After adding all this information to the data set, all categorical variables were one hot encoded.Finally, farms with less than 25 animals in the data set were removed.The resulting data set contained 5493 animals distributed over 104 farms and consisted out of 432 variables.

Machine learning techniques and model evaluation
After standardizing the data set, we explored which machine learning technique delivers the best baseline to predict the lifetime production (kg FPCM) of dairy cows based on the variables present in the data set.Therefore, 10 different machine learning techniques were applied: multiple linear regression (linreg), ridge regression (ridge), lasso regression (lasso), principal component analysis followed by linear regression on the first 10 principal components (PCA_10), principal component analysis followed by linear regression on the first 100 principal components (PCA_100), principal component analysis followed by lasso regression (PCA + lasso), random forest regression (random forest), boosted regression trees (boosting tree), support vector regression with a radial kernel (SVR_radial) and support vector regression with a polynomial kernel (SVR_poly).See Hastie et al. (2008) for an overview of these methods.Moreover, we examined 3 different modeling strategies to get insight into the most suited data-scale to model kg FPCM lifetime productions and possible added value of stacking as a hybrid modeling technique.For the first strategy, a single model was fitted to the data of all farms simultaneously (One For All farms, OFA).For the second strategy, machine learning models were constructed for each farm individually and the ensemble of all farm-specific models was considered as the resulting model (One For Each farm, OFE).Finally, the third strategy is based on stacking (STACK): the original data set was extended with the model predictions of all OFA and OFE models and a single model was fitted on the extended data set.
To obtain unbiased estimates for R 2 and avoid information-leakage considering the actual lifetime productions into the training data sets of the STACK models, nested 5-fold cross validation was used (Hastie et al., 2008), leaving out each time 20% of the data (whole farms in OFA, animals in OFE).Model predictions for the test split from the outer cross validation loop were used to evaluate the model and were added to the training data set of the STACK models, which was in turn also fitted and evaluated by means of nested 5-folded cross validation.Within the inner cross validation loop, hyperparameters were selected in a generic way.By consequence, the procedure of optimizing hyperparameters was performed 5 times independently for each machine learning technique, once for each fold of the outer cross validation loop.
If variables within the training data set contained no variance and by consequence also no information, these variables were removed before fitting any machine learning model.This was rarely the case for the data sets used with the OFA and STACK strategies, but occurred always within the OFE strategy, in which training data sets contained only data of a single farm.If a model could not be fitted because the number of variables exceeded the number of records, the machine learning technique was not evaluated, nor included in the training data set for STACK.In practice, only 2 machine learning techniques were struggling with the large number of variables, and this only within the OFE strategy: linreg and PCA_100.By consequence, we opted to omit linreg and PCA_100 within the OFE strategy.To provide a reasonable alternative, machine learning PCA_10 was introduced and applied within the OFE strategy as an alternative for PCA_100.

Surplus animal selection
To complement the former performance analysis, the usability of our approach to select surplus youngstock shortly after birth was explored.Therefore, simulations were performed in which animals were ranked per farm according to their predicted kg FPCM lifetime production and the x ∈ 0 100 %, % [ ] lowest ranked animals were selected as surplus animals.x is the surplus selection rate and is assumed as a given.Our method is able to handle any chosen surplus selection rate, but does not optimize the surplus selection rate itself.To assess the selection performance of our model predictions in a way that is consistent with a real world scenario, surplus animal selection was performed at farm-level and not at population level during the simulations.Simulations were made for the best performing model at birth for each strategy and were benchmarked with ordinary linear regression on all available breeding values.Next to the benchmark, also perfect surplus animal selection was simulated, based on the actual (observed) lifetime productions of the animals, computed as described in §5.2.4.To evaluate the surplus animal selection strategies resulting from the model predictions, 3 tailored metrics were computed.The first metric considers the percentage correctly sold animals.A surplus animal selection decision was considered to be correct if a selected animal would also have been selected under the previously described perfect surplus selection conditions.Complementary, for surplus selection rates between 0 and 50%, the retention rate of the top half of the youngstock population (on farm-level) was determined.
The top half of the youngstock population was hereby defined as the 50% of the animals (per farm) realizing the highest lifetime production.Finally, the surplus animal selection effect was computed, i.e., the amount that the average national lifetime production, expressed in kg FPCM, would increase when surplus animal selection performed at farm level would be based on our model predictions instead of random selection of surplus animals.During the computation of the surplus animal selection effect, we assume surplus animal selection happened at random before, which is justified since farmers do not have a breeding value for lifetime production.This assumption implies that the distribution of the lifetime productions in our data set can be assumed to be equal to those of the female calves born during the period of data collection.Furthermore, we assumed a static insemination strategy and constant culling thresholds for older cows.As a consequence, the obtained results are conditional on the current farm management practices.
During evaluation of surplus animal selection, all possible surplus selection rates x ∈ 0 100 % % ,

[
] were con- sidered to enable a thorough evaluation of the performance of youngstock selection based on lifetime production predictions generated by our models.To quantify the overall performance of the models, 2 metrics were used: (i) the percentage of correctly sold animals and (ii) the retention rate of the top half of the youngstock population.Both metrics are plotted as a function of the surplus selection rate.To obtain a single measure that does not depend on the selected surplus selection rate, we use the Area Under the Curve (AUC), a common metric in machine learning (Hastie et al., 2008) that integrates the respective metric values of (i) and (ii) over their relevant domains which are [0%, 100%] and [0%, 50%], respectively.It is important to consider that surplus selection rates close to 100% can be inverted to 'top selection rates'.For example, a surplus selection rate of 95% corresponds with a top selection rate of 5%.These top selection rates may be of limited interest to the majority of farmers, but they are possibly of interest to nucleus breeding programs or nucleus farms, in which the surplus selection rates for born calves can be easily higher than 90%.Moreover, examining surplus selection rates close to 100% also allows for the examination of performance and possible bias of the model across the range of realized lifetime productions.
We opted to simulate surplus animal selection soon after birth, from the moment the genomic breeding values are available.This choice is motivated by the European Union regulation 1/2005/EC (annex I, chapter VI, item 1.9), which requires that calves cannot be transported over long distances before reaching a minimum age of 14 d.Because it takes approximately the same time to determine the genomic breeding values of an animal, surplus animal selection "at birth" is practically equivalent to surplus animal selection as soon as the genomic breeding values are available.Therefore, in the remainder of the text, we will refer to the moment of surplus animal selection as "at birth."We opted to simulate surplus animal selection at birth for 2 reasons.First, one can expect the selection performance will improve with increasing age of the animals at the moment of surplus animal selection, since the older the animals become, the more actual performance data of (related) animals will become available.Therefore, the model predictions at birth give an indication of the minimal selection performance that can be expected.Second, rearing youngstock is a significant cost in dairy farming, therefore, the earlier a surplus animal can be selected, the lower the negative impact on the profitability of the dairy cooperation.By consequence, in practice, most farmers want to select surplus youngstock as soon as possible.

Prediction of lifetime production at birth
The R 2 values obtained by the machine learning models fitted on the available data at birth are shown in Table 3. Concerning OFA, where one model was fitted at the national level, the highest R 2 is obtained with PCA + lasso, with different other machine learning techniques realizing an R 2 which is only slightly lower.It is, however, remarkable that machine learning techniques which are able to model more interactions between the variables in the data set (random forest, boosting tree, SVR_radial and SVR_poly) show a significantly lower performance compared with linear machine learning techniques.Although the latter observation is uncommon (Becker et al., 2021;Van der Heide et al., 2019), other studies have shown that treebased techniques and support vector regression do not by definition result in a higher predictive ability (Sarini and Dharmawan, 2023;Shine et al., 2018).Even though the cause of the lower performance of random forest, boosting tree, SVR_radial and SVR_poly in comparison to linear machine learning techniques cannot be elucidated, it can be argued that our approach to include the BV × farm interaction as a variable into the data set, obviates the need to use nonlinear techniques to model BV × farm interactions.It should be noted that reaction norm models (Hayes et al., 2016) take a similar approach and include (nonlinear) interaction effects as auxiliary variables in a statistical model.When all interaction effects and other nonlinear patterns can be described by these auxiliary variables, machine learning techniques specialized in fitting linear models will perform equal to or even better compared with nonlinear techniques (Hastie et al., 2008).Hence, our results suggest that nonlinear machine learning techniques, which were developed to grasp complex interactions between the variables within the data set, cannot consolidate this ability on our data set Although the performance differences between the machine learning techniques are non-negligible, the influence of data-availability and data-preprocessing is considerably larger.Simple linear regression on all available breeding values (the benchmark approach) obtained an R 2 of only 0.085.When an alternative benchmark is constructed by using not only the breeding values, but also the average realized lifetime production at farm level, R 2 rises to 0.217.By adding more farm-specific parameters and performing an elaborate data-preprocessing, the amount of variance in lifetime production that is explained increased even more, illustrated by the R 2 of 0.274 from linear regression applied according to OFA (Table 3).
Important to note is that the obtained R 2 -values are probably slightly biased upwards because all data was extracted simultaneously and hence, next to genomic information, also phenotypic information of the animals under consideration was used during breeding value calculations.As a result, the breeding values we used had a slightly higher reliability in comparison with breeding values that would be obtained using only genomic information of the animals.However, current genomic breeding values obtained by CRV at birth of the animal have already a reliability of 70%.When additional information becomes available due to an increased amount of phenotypic data from the animals and their relatives, the reliability of genomic breeding values for cows increases only slightly, from 70% at birth up to on average 85% for animals in the tenth lactation.Based on the mentioned breeding value reliabilities, the magnitude of this bias is expected be no more than 0.015 for the R 2 .Moreover, since all of our models use the same breeding values, it can be expected that the R 2 bias will be similar for all of our models, and hence, the comparisons between our models and the benchmark models can be assumed to be unbiased.
In the OFE strategy, models were fitted on farm-specific data and the ensemble of all farm-specific models was considered as the final model.For a given animal, a prediction of the lifetime production is obtained by selecting the appropriate model from the ensemble and generate thereafter a prediction based on this farmspecific model.The idea behind this approach is that by constructing farm-specific models, the model parameters can be adapted optimally to the specificity of an individual farm.Each individual farm-specific model thus accounts for the variation within farms, while the variation between farms will be grasped by constructing the ensemble of farm-specific models.However, a tradeoff could be expected with data availability, which is much lower for models fitted on farm-level compared  with models fitted on national level.By consequence there is a risk that some relationships between the variables within the data set remain hidden for the OFE models, while the OFA models can disentangle these relationships due to the higher data availability at the national level.From the R 2 values reported in Table 3, it can be concluded that the trade-off is in favor of the higher data availability at the national level: all evaluated models, irrespective of the used machine learning technique, report a lower R 2 compared with their OFA equivalent, which was fitted on a national level.Thus, with our data set, the capability of the OFE models to focus on the within farm variability does not compensate for the lower data availability on a single farm compared with the national level.
The decline in predictive performance of OFE versus OFA models differs for the various applied machine learning techniques.For most machine learning techniques fitting a linear model, R 2 in the OFE strategy is approximately 0.10 (0.09 -0.12) lower compared with the OFA strategy.A similar decline in predictive performance can be seen for the support vector regressions with a radial or polynomial kernel, where R 2 drops with respectively 0.091 and 0.104.This is in contrast with the machine learning techniques fitting regression tree based models (random forest and boosting tree), which show only a minor decrease in R 2 of 0.026 and 0.029 for random forest and boosting tree respectively.These regression tree-based machine learning techniques thus seem to be impacted only to a minor degree by the lower data availability when fitting a model on farmlevel, as could be expected from previous research (Luan et al., 2020;Negussie et al., 2022).Moreover, the regression tree based methods even deliver the best predictive performance when data availability appears to be a limiting factor for the performance of other machine learning techniques.
The third and last strategy (STACK) builds further on all fitted OFA and OFE models.Therefore, it is not surprising that the performance of the STACK models is approximately equal or higher than the performance of their OFA and OFE equivalent, irrespective of the considered machine learning technique.However, the order of magnitude of the performance increment is for some machine learning techniques highly remarkable.Multiple machine learning techniques obtain an R 2 of 0.46 -0.47, which is more than 7-fold of the R 2 value realized by the benchmark technique and is an increment of 17% explained variance compared with the best OFA model.This high performance increment indicates stacking offers a large added value during the prediction of KPIs which are influenced by complex genotype by environment interactions.The differences in performance between linreg, ridge, lasso, PCA + ridge and PCA + lasso are very small, making them perform equivalent, despite different loss functions used during the determination of the model parameters and the application of principal component analysis in the case of PCA + lasso.As with the OFA models, the STACK models which are able to consider nonlinear patterns in the data set (random forest, boosting tree, SVR_radial and SVR_poly) achieve significantly lower R 2 values compared with the techniques that fit a linear model and thus seem less appropriate to consolidate the added value of stacking.
The high predictive performance of the best STACK model offers opportunities in the perspective of previous research with regard to replacement management on dairy farms, such as Heikkilä et al. (2008) and Kelleher et al. (2015).Both of these studies deployed models used in breeding value estimation to predict the milk production of animals in future lactations.However, they use generic factors to model the survival of cows and neglect interactions between the breeding values and the farm environment.Our strategy is able to take these interaction effects into account and moreover implicitly considers an individualized survival curve, which could result in more reliable estimates of the current economic value of a cow.Moreover, past research with regard to replacement management mainly focused on selection of adult cows to be culled (Kelleher et al., 2015), resulting in models which are impractical to select surplus animals at young age.This is in contrast to our model, which is developed to make lifetime production predictions at birth, but can easily be extended to make predictions for older heifers or even lactating cows.

Surplus animal selection
Figures 1 -4 show the effect of using the trained models to select surplus youngstock.For clarity, only the best model per strategy is visualized and discussed: PCA + lasso for OFA/STACK and random forest for OFE.In Figure 1, the percentage of correctly selected surplus animals is shown in function of the surplus animal selection rate.Random surplus selection results in a curve coinciding with the main diagonal, while perfect surplus selection would result in a horizontal line, intersecting the vertical axis of the figure at 100%.To quantify the performance of the models with regard to the percentage of correctly sold animals, the Area Under the Curve (AUC) is used, which is frequently used to evaluate classification models in the dairy sector (Fenlon et al., 2017;Van der Heide et al., 2019), but can be generalized to ranking problems such as surplus youngstock selection.As described in §5.5, the AUC is obtained by integration of the curve over its domain Perneel et al.: Prediction of dairy cattle lifetime production (0 -100%).Perfect selection results in an AUC equal to 1, while random selection corresponds with an AUC of 0.5.The benchmark technique, linear regression on the breeding values, results only in an AUC of 0.57.Hence, taking into account only breeding values is not well suited to make substantiated decisions to select surplus youngstock for higher lifetime productions.However, machine learning models trained at the individual farm-level (OFE) perform even worse compared with the benchmark technique (AUC = 0.55), while machine learning models fitted at national level (OFA) result in only a slightly more accurate selection of surplus youngstock (AUC = 0.59).However, these poor selection performance results of machine learning models fitted according to the OFA and OFE strategy are in contrast with the selection performance realized by STACK, which realizes an AUC of 0.67, a considerable improvement compared with the benchmark technique.
Since farmers want to avoid at all times to sell their best heifers by accident, it is important to assess a surplus animal selection methodology not only by its percentage of correctly selected surplus animals, but also based on the retention rate of the animals which are expected to perform the best.Therefore, we investigated the retention rate of the top half of the population for kg FPCM lifetime production (at farm-level) for surplus animal selection rates between 0 and 50% and visualized the results in Figure 2. Perfect surplus animal selection would again result in a horizontal line, intersecting the vertical axis of the figure at 100%.Random surplus animal selection, however, corresponds with a line running from the top left of the figure toward the bottom right of the figure.As in Figure 1, the AUC can be used to evaluate the selection performance of the lifetime prediction models.However, in the case of the retention rate of the top half animals, the curve is only integrated over the interval 0 -50%.Hence, perfect surplus animal selection corresponds with an AUC of 0.5, while random surplus animal selection results in an AUC of 0.375.
The conclusions that can be drawn from Figure 2 are in line with those from Figure 1.However, as Figure 2 assesses the models based on the retention rate for the top half of the animals, even if only 10% of the youngstock should be sold, there is some more tolerance toward incorrectly ranking of animals in comparison with the metric visualized in Figure 1 an in depth analysis of the partition of the animals into sold and retained animals at a surplus animal selection rate of 20%, and this for different percentiles of the population (determined at farm level).The results are shown in Table 4.Note that a surplus selection rate of 20% is a realistic surplus selection rate for well managed dairy farms.To prevent a shortage of replacement heifers, these farms often take into account the real risk that in a given year, only about 40% of the calves is female.Therefore, farmers will typically inseminate slightly too much cows with dairy semen.As a result, in the case the farmer tailored the insemination strategy toward 40% female calves, the farm will have on average 20% surplus female calves.
As shown earlier (Figure 1), the number of correctly selected surplus animals (0 -20% population percentile, correctly sold) is lower for the OFE model compared with the benchmark, while the OFA model realizes a slight improvement and the STACK model results in the most correct surplus selection of all studied models.Analogously, Table 4 also confirms the conclusions drawn from Figure 2 with regard to the retention rate of the top half of the population (50 -100% population percentile, retained).The STACK model realizes the best performance, while the OFA and OFE model are slightly better and slightly worse respectively compared with the benchmark technique.On the other hand, the partition of the 20 -50% percentile over sold and retained animals was not yet described previously and differs relatively little across the different machine learning techniques presented in Table 4.Moreover, none of the analyzed models shows a significant improvement when compared with random surplus animal selection.This indicates, when compared with both random surplus  animal selection and the benchmark technique, that the performance increment realized by our models is mainly due to an improved classification of the outer quantiles of the lifetime production distribution, rather than by an improvement of the overall ranking accuracy.
More than the number of animals that was correctly retained/sold, the surplus animal selection effect is of primary economic interest of the farmers.Therefore, the surplus animal selection effect was determined for the various studied machine learning models at surplus animal selection rates between 0 and 99% and is visualized in Figures 3 and 4. Figure 3 shows the absolute values for the surplus animal selection effect, while in Figure 4, the surplus animal selection effect is expressed relative to the surplus animal selection effect resulting from perfect surplus animal selection.As stated previously, in perfect surplus animal selection, the surplus animal selection is based on the real lifetime productions and therefore, the curves for perfect surplus animal selection in Figures 3 and 4 correspond with the maximum achievable surplus animal selection effect.By consequence, these curves allow to evaluate objectively the surplus animal selection effects realized when surplus animal selection would be performed based on our models.
We quantified the overall performance of a surplus animal selection strategy by averaging the relative surplus animal selection effects in Figure 4 for surplus animal selection rates between 1 and 99%.Based on these average relative surplus animal selection effects, we conclude that the benchmark technique (linear regression on the breeding values) realizes on average a surplus animal selection effect of 27% of the theoretically maximum effect.As could be expected, OFE results in a lower average relative surplus animal selection effect (17%), while OFA allows for a slightly improved average relative surplus animal selection effect (34%).STACK however, results in an average relative surplus animal selection effect of 58%, more than twice the (relative) surplus animal selection effect that could be realized with linear regression on the breeding values.Remarkably, for all studied models, the relative surplus animal selection effect is quite stable over the whole interval of possible surplus animal selection rates.This indicates the performance with which the best, respectively the worst animals can be selected is quite similar.Stated differently, no technique can select the best animals with a higher accuracy than it can select the worst animals.
In Figure 3, the ratios between the different curves are less visible, however still respected since both Figure 3 and 4 represent the same data, up to a normalization constant given by the selection effect under perfect surplus animal selection conditions.The curve corresponding to perfect surplus animal selection, behaves approximately linear for surplus animal selection rates below 50%, but gradually starts increasing exponentially in the second half of the graph.Part of this behavior find its origin in the normal distribution along which the lifetime productions are approximately distributed.If this distribution is divided in bins containing 1% of the population, the bins in the lower and upper end of the distribution will be broader compared with the bins near the center of the distribution.This effect leads to an increased slope of the surplus animal selection effect curve at surplus animal selection rates close to 0 or 100%, while it decreases the slope for surplus animal selection rates around 50%.A second reason for the behavior of the curve corresponding with perfect surplus animal selection in Figure 3 is given by the size of the remaining population.If a population of 100 animals is considered, selling an animal corresponds with selling 1% of the population and therefore, the effect on the average lifetime production of the remaining population will be limited.If however a population of 10 animals is considered, selling an animal corresponds with selling 10% of the population, therefore, the effect on the average lifetime production will be much larger compared with the previously described scenario.This effect tempers the slope of the perfect surplus animal selection effect curve at low surplus animal selection rates, while it increases the slope when surplus animal selection rates are increasing.These 2 effects, applying surplus animal selection on a normal distribution and a decreasing size of the remaining population at higher surplus animal selection rates, counteract each other for surplus animal selection rates lower than 50%, leading to the approximate linear behavior of the perfect surplus animal selection effect curve in Figure 3. On the other hand, for surplus animal selection rates over 50%, these 2 effects add up, resulting in the observed exponential behavior of the perfect surplus animal selection effect curve.
As mentioned above, we analyzed how different percentiles of the animal population were partitioned over sold and retained animals at a surplus animal selection rate of 20% (Table 4).When the absolute surplus animal selection effect at this surplus animal selection rate is evaluated in Figure 3 the dairy cows, the average expected genetic level of the offspring will decrease.Since genetics have an indisputable but complex influence on the lifetime production, this will result in a decrease of the (predicted) average lifetime production of the newborn animals.Therefore, the realized extra surplus animal selection effect will be lower compared with the extra surplus animal selection effect that could be derived from Figure 3. Similarly, when the quality of the replacement heifers is improved due to better selection of surplus animals, this could result in an increased selection pressure against the older animals of the herd.Earlier culling of older cows can in the short term decrease the currently realized lifetime production, but can in the long run possibly result in a higher and faster increment of the lifetime production.In our study we were not able to quantify the effect of an increased/decreased usage of (sexed) Holstein semen on the surplus animal selection effect, nor the impact of an improved quality of the replacement heifers on culling rates.Therefore, further research on these aspects seems necessary to optimize the insemination and surplus animal selection strategies to optimize the realized lifetime production of dairy cows.

CONCLUSIONS
We combined a comprehensive data preprocessing with machine learning to predict the lifetime production of Dutch dairy cows soon after birth.Various machine learning techniques were applied at the national level as well as at farm-level.Moreover, stacking was used to improve the predictive performance even further.By using this approach, we could explain 47% of the variance in lifetime production, only using data available at birth of the animal.This is a considerable improvement in comparison to linear regression on the breeding values supplemented with the average lifetime production at farm level, which could only explain 21.7% of the variance in lifetime production.Moreover, we demonstrated the usability of our models to select surplus youngstock, aiming toward an increased lifetime production of the remaining population.While using predictions of the lifetime production generated by the benchmark model resulted in a surplus animal selection effect of only 27% of the theoretical maximum, our best machine learning model could more than double this surplus animal selection effect, up to 58% of the theoretical maximum surplus animal selection effect.Assuming a surplus of 20% youngstock beyond actual replacement needs, selective removal of heifers using our best model could result in an increase of the national average potential lifetime production by 3369 kg FPCM, which equates to an increase of 9.4% relative to the current Dutch national lifetime production of 35,886 kg FPCM.
Figure 2. Retention rate for the top half of the animals (per farm) when selection of surplus youngstock is based on lifetime production predictions generated according to different modeling strategies.The top half of the youngstock population was defined as the 50% of the animals (per farm) realizing the highest lifetime production.
Figure 3. Absolute surplus animal selection effect of the lifetime production, expressed as average kg FPCM per animal, when surplus youngstock is selected based on model predictions for lifetime production.Perfect selection is defined as selection according to the realized (observed) lifetime production of the animals.

Figure 4 .
Figure 4. Relative surplus animal selection effect of the lifetime production (in % relative to perfect selection) when surplus youngstock is selected based on model predictions for lifetime production.Perfect selection is defined as selection according to the realized (observed) lifetime production of the animals.
Perneel et al.: Prediction of dairy cattle lifetime production

Table 1 .
Perneel et al.: Prediction of dairy cattle lifetime production Composition of the data set delivered by CRV

Table 3 .
R 2 realized by different machine learning techniques, applied at birth according to three different strategies: OFA (One For All farms), OFE (One For Each farm) and STACK (stacking).For each strategy, R2 values which deviate less than 0.01 from the highest obtained R 2 are indicated with *

Table 4 .
Partitioning of the animals into sold and retained animals at a surplus animal selection rate of 20% for different observed lifetime production percentiles (at farm level).Surplus animal selection happens at birth, based on predicted kg FPCM lifetime productions.The observed percentiles are determined at farm level and based on the kg FPCM lifetime production actually realized by the animals