cou-Effects of a mastitis J5 bacterin vaccination on the productive performance of dairy cows: An observational study using propensity score matching techniques

Inferring causal effects between variables when utilizing observational data is challenging due to confounding factors not controlled through a randomized experiment. Propensity score matching can decrease confounding in observational studies and offers insights about potential causal effects of prophylactic management interventions such as vaccinations. The objec-tive of this study was to determine potential causality and impact of vaccination with an Escherichia coli J5 bacterin on the productive performance of dairy cows applying propensity score matching techniques to farm-recorded (e.g., observational) data. Traits of interest included 305-d milk yield (MY305), 305-d fat yield (FY305), 305-d protein yield (PY305), and somatic cell score (SCS). Records from 6,418 lactations generated by 5,121 animals were available for the analysis. Vaccination status of each animal was obtained from producer-recorded information. Confounding variables considered were herd-year-season groups (56 levels), parity (5 levels: 1, 2, 3, 4, and ≥5), and genetic quartile groups (4 levels: top 25% through bottom 25%) derived from genetic predictions for MY305, FY305, PY305, and SCS, as well as for the genetic susceptibility to mastitis. A logistic regression model was applied to estimate the propensity score (PS) for each cow. Subse-quently, PS values were used to form pairs of animals (1 vaccinated with 1 unvaccinated control), depending on their PS similarities (difference in PS values of cows within a match required to be <20% of 1 standard deviation of the logit of PS). After the matching process, 2,091 pairs of animals (4,182 records) remained available to infer the causal effects of vaccinating dairy cows with the E. coli J5 bacterin. Causal effects estimation was performed using 2 approaches: simple matching and a bias-corrected matching. According to the PS methodology, causal effects of vaccinating dairy cows with a J5 bacterin on their productive performance were identified for MY305. The simple matched estimator suggested that vaccinated cows produced 163.89 kg more milk over an entire lactation when compared with nonvaccinated counterparts, whereas the bias-corrected estimator suggested that such increment in milk production was of 150.48 kg. Conversely, no causal effects of immunizing dairy cows with a J5 bacterin were identified for FY305, PY305, or SCS. In conclusion, the utilization of PS matching techniques applied to farm-recorded data was feasible and allowed us to identify that vaccination with an E. coli J5 bacterin relates to an overall milk production increment without compromising milk quality.


INTRODUCTION
Mastitis is one of the most prevalent and costly diseases affecting the dairy industry worldwide (Seegers et al., 2003;Piepers and de Vliegher, 2018).According to Jones and Bailey (2009), mastitis causing pathogens are typically bacteria residing in the udder (contagious agents) or in the cow surroundings (environmental agents).Although the establishment of sanitizing processes after milking, the application of dry cow therapies and antibiotics use have markedly reduced the number of cases associated with agents naturally inhabiting the udder, risks of infection remain high for pathogens located in the environment (Jones and Swisher, 2009).Normally found in bedding materials, soil, manure and other organic matter, coliforms represent the main environmental agents isolated from acute clinical cases of mastitis occurring in the period within 2 wk before calving and the early lactation (Smith et al., 1985;Hogan and Smith, 2003).
The ideal approach to control environmentally induced mastitis involves management strategies focused on keeping cows clean and dry between milkings, cou-pled with preventive actions intended to increase the resistance of cows to infections (Jones and Swisher, 2009;Vangroenweghe et al., 2020).In this sense, it has been shown that the administration of bacterins based on an Escherichia coli mutant strain commonly denominated as "J5" is effective in reducing the incidence of clinical mastitis during the first 90 d of lactation (Hogan et al., 1992).Some of the J5 bacterins available on the US market include J.VAC (Merial Limited, Athens, GA), Bovilis J-5 (Intervet/Merck Animal Health, Omaha, NE) and Enviracor J-5 (Zoetis Inc., Kalamazoo, MI).For its part, genetic selection of cows less susceptible to mastitis is also an interesting option given genetic gains in resistance are expected to be permanent and cumulative through generations (Weigel and Shook, 2018).In 2016, Zoetis pioneered the generation of genomic predictions in the United States for common health disorders (including mastitis) in dairy cattle, followed by the Council on Dairy Cattle Breeding in April of 2018 (Vukasinovic et al., 2017;Nicolazzi et al., 2018;Gonzalez-Peña et al., 2020).Within the large amounts of producer-recorded data used to produce genomic predictions in Zoetis, some of the cooperating dairies also include vaccination records of animals administered particularly with Enviracor J-5.
The abundancy of observational field data directly recorded by participant dairies not only offers the opportunity to use it for quantitative genetic analyses, but also to explore potential causal effects of prophylactic management interventions (e.g., vaccinations) over relevant production outcomes (Rosa and Valente, 2013).Nonetheless, inferring causal effects between variables utilizing observational data is challenging because the potential association of a causal variable with an outcome of interest, may not be exclusively due to the effects of the former on the latter; but can also originate from uncontrolled sources of variation known as confounding factors (Pearl, 2009).In this regard, statistical techniques used in other research fields to decrease confounding in observational data sets can also be applied to the routinely collected information within livestock production systems (Rosa and Valente, 2013;Bello et al., 2018).Among those techniques, propensity score (PS) matching has been successfully used to estimate the causal effect of prolificacy on milk yield in dairy sheep (Ferreira et al., 2017), as well as the effect of bovine somatotropin usage on the cost of producing milk in US dairy cattle (Tauer, 2016).Briefly, PS provide the probability of a treatment assignment conditional on measured baseline characteristics within the individuals of the population study (e.g., confounding factors).By knowing the distribution of the measured baseline covariates between treated and untreated subjects, a matching process based on the similarities of PS among individuals can be performed to reduce the level of confounding.Ultimately, the previous process allows to estimate the average treatment effect of the potential causal variable over the outcome of interest (Abaasa et al., 2021).Although it has been demonstrated in controlled experiments that Enviracor J-5 reduces the clinical severity of induced E. coli mastitis (Gurjar et al., 2013); no studies intended to elucidate the potential causal effects of this product over the productive performance of dairy cows have been performed using observational data.Thus, the objective of this study was to determine if a causal effect of vaccination with an E. coli J5 bacterin exist over the productive performance of dairy cows through the usage of PS matching techniques applied to producer-recorded data.

Data Sources and Editing
Data for this study were obtained directly from dairy producers upon obtaining their signed permission.Specifically, 7 commercial dairy farms located at New York (n = 1), Wisconsin (n = 3), Minnesota (n = 2), and South Dakota (n = 1) contributed information for traits such as milk, fat, protein, and SCS.This information was subsequently used to estimate the 305-d mature equivalent version of milk yield (MY305), fat yield (FY305), protein yield (PY305), and SCS using the BESTPRED program (Cole and VanRaden, 2007).To ensure accurate estimates of the 305-d adjusted phenotype of all traits, only lactation records that had a data collection rating (DCR) of at least 0.70 within BESTPRED were retained.The DCR represents the squared correlation of estimated and true yields; therefore, serves as an indicator of the accuracy of the estimated lactation yields based on the number and frequency of test-day recordings available.Retaining lactations having a DCR ≥ 0.70, ensured that lactation records considered in the study belonged to animals that have reached a minimum opportunity period of approximately 120 DIM (~4 monthly tests), which has been reported previously to be enough to obtain precise estimates of adjusted 305-d observations (Quist et al., 2007;Ospina et al., 2010).These dairies were considered because they provided their historical vaccination records (list of animals treated, product applied and date of application); among which, those associated with Enviracor J-5 bacterin were retained for this study.The label of this bacterin indicates an administration in a 3-injection regimen, with the first dose given at 7 mo of gestation, the second dose approximately 1 mo later, and the third dose within 2 wk postpartum.Alternatively, a nontraditional (e.g.,

Sánchez-Castro et al.: CAUSAL EFFECTS OF A J5 BACTERIN IN DAIRY COWS
off-label) dosing regimen intended to finalize its application before parturition has been documented on literature (Gurjar et al., 2013).For this study, the producer-recorded data shown that rather than per label indications, doses were administered to animals as per farm vaccination protocols, determined by herd owners and veterinarians of record.Considering the observational nature of this data, validity of immunizations was determined by exploring if they were given within a reasonable time window around a calving event.Such time window included vaccination events that occurred between 110 d prepartum (e.g., 195 d of gestation) and 60 d postpartum.
Contemporary groups were formed as the combination of herd, year, and season (HYS) of calving.Calving seasons were defined as follows: January to March = 1; April to June = 2; July to September = 3, October to December = 4.Each HYS was required to have vaccinated and nonvaccinated animals, whereas those containing less than 40 individuals and with vaccination rates outside of the 10 to 90% range were removed from the data.This was done to ensure an optimal utilization of PS because results obtained with this methodology could be biased when working with small sample sizes and rare treatment exposures (Pirracchio et al., 2012;Hajage et al., 2016).There were 5 parity groups considered in the study (1, 2, 3, 4, and ≥5).For cows with multiple lactations, the vaccination status was considered independent across parities because J5 bacterins provide only a short-term immunity against coliforms (e.g., 90-100 d after vaccination).After the target period of protection, the immunity induced by these bacterins wanes quickly, and it is required to revaccinate animals in following lactations (Erskine et al., 2007;Statham, 2014;Steele et al., 2019).Consequently, every cow was allocated to the vaccinated or control (nonvaccinated) group depending on her vaccination history before the calving event that initiated each of her lactations.At the end of the editing process, a total of 6,418 lactation records (vaccinated = 3,267 and control = 3,151) belonging to 5,121 animals remained available.
Preferential treatment has been recognized as a longstanding issue in genetic evaluations of dairy cattle and, generally, manifests itself mainly in productive traits (Dassonneville et al., 2012).However, although genetic merit plays an important role in the performance of animals, to the best of our knowledge, no previous PS study in livestock has included this source of information as a potential confounding factor.Ferreira et al. (2017) suggested that the inclusion of genetic merit information of each animal in PS analyses could be beneficial when attempting to elucidate causal effects.For the animals on the present study, genetic predictions for the traits of interest (MY305, FY305, PY305, and SCS) as well as for the genetic susceptibility to mastitis (MAST) were available, because the participating dairies were routinely evaluated by Zoetis (Vukasinovic et al., 2017;Gonzalez-Peña et al., 2020).The PTA of the animals were obtained using the single-step genomic best linear unbiased predictor, a method that combines pedigree, phenotypic, and genotypic information (e.g., SNP) into a single analysis to estimate the genetic merit of animals (Legarra et al., 2009;Misztal et al., 2009;Aguilar et al., 2010).Considering that at a farm level normally the population of cows to be bred is usually segmented on the basis of their genetic merit to make a more efficient usage of the semen resources (Fetrow et al., 2007); cows within this study were ranked according to their PTA for MY305, FY305, PY305, SCS, and MAST, and subsequently, within each trait, cows were assigned to one of 4 possible genetic quartile groups (Q1 = best 25%, Q2 = 26-50%, Q3 = 51-75%, and Q4 = 76-100%).Once the genetic quartile groups for all traits were defined, such information was incorporated into the final data file for subsequent analyses.

Statistical Analysis
Ferreira et al. ( 2017) described that a PS analysis requires the performance of 5 steps that include declaration of confounder variables (known as covariates within the PS framework), estimation of PS values for individuals on the treated and untreated groups (e.g., vaccinated and nonvaccinated), bias correction through a matching process of subjects with different causal variable levels but similar PS values, balance diagnosis after the matching process, and finally, the estimation of the causal effect of the treatment of interest over the response variables.In accordance with such procedure, details pertaining to the present study for all steps are described below.
Step 1: Declaration of Confounder Variables.Our hypothesized causal network connecting confounder variables with vaccination status and trait's performance is shown in Figure 1.The selection of confounder variables to include in the analysis was based on biological reasoning and data structure complexities associated with the available information.For instance, considering that the 7 dairies included on this study do not represent a random sample and that the usage of Enviracor J-5 is voluntary and not random, selection bias was undoubtedly present on the data.Moreover, the hierarchical structure of our data in where units at the first level (i.e., cows) were clustered into groups (i.e., farms) represented another layer of complexity to directly analyze it.These challenges highly resemble to those found in data obtained through the applica-tion of voluntary surveys (Lee, 2006;Tauer, 2016) and even to human-medicine related data sets generated in multiple sites or hospitals (Griswold et al., 2010;Lalani et al., 2010).In this regard, the PS methodology is especially suited to counteract the selection bias inherent to multisite observational studies.According to Griswold et al. (2010), it is possible to deal with the multilevel nature of observational data sets by including site indicator variables into the logistic regression model employed to estimate the PS.Therefore, the inclusion of HYS classes as a confounder variable within the logistic regression model represented the site indicator variable that required to be considered to decrease the bias related to the different management practices among the farms where data were originated.Moreover, and from a biological point of view, HYS represented groups of animals that experienced similar environmental challenges in terms of microbial loads (e.g., E. coli exposure), within their surroundings (bedding materials, soil, manure), at the riskier moment of infection (the period within 2 wk before calving and the early lactation).In the case of parity, this confounder was included because the risk of clinical mastitis has been associated with increasing parities and this can influence vaccination decisions (Breen et al., 2009).Lastly, considering the possibility of a potential preferential treatment when deciding which cows would be vaccinated based on each animal perceived genetic merit, the inclusion of genetic quartile groups for MY305, FY305, PY305, SCS, and MAST as confounders in the PS analysis was performed to equalize the average genetic potential of both vaccinated and unvaccinated animals.As an important note, when estimating MAST PTA, no minimum or maximum values were imposed for the number of DIM to consider a MAST event (Vukasinovic et al., 2017).Hence, even if a particular lactation record of an animal was not considered for not reaching the required opportunity period (~120 DIM); if such animal had a MAST event at any point during its lactation (at any of its lactations), such information was accounted for in the generation of its MAST PTA.In this way, including MAST genetic quartile groups helped to reduce the potential bias associated with not including observations of animals culled before 120 DIM.Evidently, all variables (HYS classes, parities, and genetic quartile groups for all traits) were also suspected to influence response variables and, therefore, capable of cause confounding.
Step 2: Estimation of PS.In this study, PS values represented the probability that a cow got vaccinated conditional upon the values of all the previously defined confounder variables for that cow.To estimate PS for all animals, the R package nonrandom was utilized (Stampf, 2014).A multivariable logistic regression model was implemented where the vaccination status of individuals was coded as 1 if they received the J5 bacterin or as 0 if they were not treated at all (no vaccination with any J5 bacterin was given, even outside of the acceptable time window for the application regimens).Covariates included in the model were those defined in step 1.Given a logistic regression model assumes that the logit function (natural logarithm of the odds) is linearly related to the independent variables, the PS for each cow was estimated as a logit function of the confounder variables using the following model: where n i was the linear predictor that mapped the binary vaccination status space of the ith parity record within a cow through the logit function (with i = 1 to 6,418), β 0 was the intercept, X ik were the confounder variables representing HYS (56 levels), parity (5 levels), and genetic quartile group (4 levels) for MY305, FY305, PY305, SCS, and MAST; respectively, whereas β k represented the regression coefficients for the confounder variables.Additionally, an alternative analysis was also performed by including PTA for all traits as continuous variables within the logistic regression model instead of grouping them into genetic quartile groups.
Step 3: Bias Correction Via a Matching Process.An adjustment method capable of minimizing bias by reducing confounding was required to be applied before estimating causal effects.In this regard, the most applied methodology is a matched sampling process that choses untreated individuals in such a way that they are similar to the treated subjects with respect to the confounder variables (Rosenbaum and Rubin, 1985).Consequently, the matching process of vaccinated and nonvaccinated animals was done based on the similarity of their calculated PS.The criterion of similarity was defined by a caliper size, which represented the maximum distance allowed between PS of individuals of differing vaccinal groups.The caliper size was determined based on previous reports and corresponded to a 20% of 1 standard deviation of the logit of the PS (Austin and Mamdani, 2006;Austin, 2009;Ferreira et al., 2017).The resulting caliper size in this study was 0.039 within the link function scale.For those cases in where multiple matching options were available within the allowed caliper size, individuals with the closest PS were paired up.In practical terms, the previous process mimics the typically employed procedure of random placement of experimental units  (Tauer, 2016).Importantly, due to the nature of the matching process; individuals whose PS value was not close enough to form an acceptable match with a corresponding counterpart from a different vaccinal group were discarded and the original sample was reduced.
Step 4: Balance Diagnosis.When performing PS analyses, it is crucial to employ appropriate diagnostics methods to demonstrate if the PS were able to establish balance with regards to the confounders variables; otherwise, no basis for trusting the results exist (Granger et al., 2020).Therefore, once the matching process was performed, its effectiveness to correct for confounding was verified by inspecting if the distributions of PS between vaccinal groups became more symmetrical after the matching process.Also, the frequency distributions of all confounder variables on each vaccinal group were evaluated before and after matching.Finally, standardized differences for each variable in the confounder set before and after matching were estimated to formally assess if balance was achieved (Ferreira et al., 2017).Standardized differences were estimated as follows: denoted the sample variance of the kth confounder variable in each vaccination group.Although no unified consent exists in literature regarding which should be the value for a standardized difference to be considered negligible, it has been indicated that standardized differences greater than 20% are concerning and indicative of lack of balance (Rosenbaum and Rubin, 1985;Lee et al., 2010;Ferreira et al., 2017).Consequently, a threshold of 20% for the standardized differences to be declared as null was used in this study.Additionally, is important to mention that no interactions among confounder variables were included in the logistic regression model specified in step 2, because no evidence of improvement in the resulting balance between treatment groups was found when they were tested (Shlipak et al., 2001).
Step 5: Estimation of Causal Effects.Estimation of causal effects was performed using 2 methods: the simple matching approach and the bias-corrected approach.For the simple matching approach, the causal effect (τ) of the vaccination on the variables of interest (MY305, FY305, PY305, and SCS) was calculated applying the following formula: where j represented each of the n matched pairs and y was the value of the variable of interest of cows in the vaccinated group (vac) and control group (con), respectively.
The simple matching approach represents the simplest way to calculate the average treatment effects between groups after the matching process.Even when matching pairs with similar PS is done precisely to reduce bias, it is often impossible in practice to form pairs of individuals from different treatment groups having the exact same PS value.Therefore, the estimation of causal effects as the mean difference between the outcome variables within matched pairs could still contain bias.In fact, the amount of bias contained in the results of the simple matching process is proportional to the magnitude of residual differences on PS values within pairs of animals.Due to such drawback, an alternative method known as bias-corrected matching was also implemented.The bias-corrected matching procedure is based on the utilization of least-squares regression to adjust for the remaining discrepancies in the PS values within members of matched pairs (Abadie and Imbens, 2011).The correction was done by modifying the value of one member in the matched pair as a function of ) where Y t was the value for the trait of interest given the causal variable (e.g., vaccination status) and x k represented the confounder variables included in the model (e.g., HYS, parity, and genetic quartile groups for all traits).According to Ferreira et al. (2017), the utilization of a bias-corrected estimator allows for a more precise estimation of the true effects of the causal variable over the response variables of interest.

RESULTS AND DISCUSSION
The distribution of the estimated PS for each vaccination group before and after matching is shown in Figures 2A and 2B, respectively.Before matching, is important to stress that if there were no confounding between the set of covariates included in the logistic regression model used to estimate the PS for all cows, the distribution of PS values of each vaccinal group should totally overlap.Nonetheless, we observed only a partial overlapping between the PS distributions of vaccinal groups which confirmed the presence of confounding in the original data set (Figure 2A).Although the distribution peak of each vaccinal group occurred at different points within the PS scale, none of such peaks occurred at the extreme of the probability scale.Furthermore, there was a common support area in the mid-range with overlapping between the PS distributions for vaccinated and control animals.These features were important to verify because they are key requirements to utilize PS matching techniques to estimate causal effects from observational data.Particularly, it has been indicated that the estimated probabilities of treatment for differing groups must at least partially overlap, and that the probability mass must not occur at either zero or one in either distribution (Abadie and Imbens, 2009;Tauer, 2016).
After the matching process, distributions of PS among vaccinal groups became more similar (Figure 2B).Although it can be argued that distributions were still not completely equal, this was expected, because the matching technique implemented allowed for small differences of PS values between individuals from different vaccinal groups, as long as such differences were within the limit established by the caliper size.Although performing an exact matching could yield a greater similarity between distributions, such approach often produces dramatic reductions in sample size that can result in a larger bias than if the matches are inexact, but more individuals remain in the analysis (Rosenbaum and Rubin, 1985;Stuart, 2010).Another option to increase the similarity of PS distributions could be the reduction of the caliper size; however, this also leads to greater loss of information affecting the final number of matched pairs and consequently the validity of the inferences (Ferreira et al., 2017).Therefore, small differences in PS distributions should not be considered as a prohibiting factor to proceed with the estimation of causal effects, unless the balance diagnosis indicates that not enough confounding has been removed from the data.
A total of 2,091 pairs of vaccinated and control animals with similar PS values were formed with the matching process.Hence, 65.16% of the originally available observations remained available to estimate the causal effect of vaccination with Enviracor J-5 on the productive performance of dairy cows.A slightly smaller percentage of observations (e.g., 54%) was retained by Ferreira et al. (2017) when applying the  1 shows the frequencies of different levels of confounder variables for each vaccination group, before (n = 6,418) and after (n = 4,182) matching.The graphical assessment of the effectiveness of the balancing process for all confounders is presented in Figures 3A to 3N.In general, the variables that had the highest degree of unbalance before matching were HYS and parity, which can be attributed to the heterogeneity in management practices and age structures of the participant dairies.However, after the matching procedure, both HYS and parity presented balanced frequencies of observations per vaccinal group on their respective sublevels.Distributions of observations within all genetic quartile groups were balanced before matching.This could be due to the way of how genetic quartile groups were built in this study, but it is possible that other herds exhibiting greater degrees of preferential treatment display different results.Table 2 shows descriptive statistics for all traits of interest by vaccination group before and after matching.The most noticeable change observed in the average performance of vaccinal groups before and after matching was for MY305, because before the formation of pairs based on PS values, the average of the vaccinated group was 173 kg smaller than the average of the control group.Conversely, after the matching procedure, the result-ing averages swapped, and the vaccinated group had an average 163.89kg higher than the control group.Similar but smaller in magnitude results were obtained for protein yield.Conversely, in the case of fat and SCS, the average performance of the vaccinated group was slightly higher before matching, but smaller after matching.
The standardized differences before and after matching for all the confounders contemplated in the PS model are shown in Table 3.Before the application of the matching process, 2 of the 7 confounder variables in our study were unbalanced (HYS and parity); nonetheless, after matching, all covariates were considered balanced because the standardized differences between vaccinal groups were smaller than the established threshold value of 20%.The greatest reduction in the level of unbalance was observed for HYS because before matching; the absolute standardized difference for this confounder was 75.21%, but after matching, this difference decreased to 10.08%.Similarly, parity groups had a standardized difference of 29.22% before matching, but such difference got reduced to 0.28% after paring animals with similar PS values.Similar results were reported by Ferreira et al. (2017) when balancing the confounder variables of lactation and breed composition in dairy sheep via PS matching.In the case of the genetic quartile groups for MY305, FY305, PY305, SCS, and MAST, all these covariates were balanced be- fore and after the matching process.Although grouping animals in genetic quartile groups based on the value of a continuous variable (PTA) may have caused some loss of variation, the alternative analysis (including PTA as a continuous variable) resulted in the same pattern of balance before and after the matching process (Supplemental Table S1; https: / / doi .org/ 10 .6084/m9 .figshare .22662496 .v1;Sánchez-Castro, 2023).Despite of being already balanced before the matching process, it was decided to keep PTA information in the model because it has been indicated that every covariate related to an outcome of interest should be maintained in a PS model to reduce bias and control for the inflation of the treatment effects estimates (Brookhart et al., 2006;Heinze and Jüni, 2011).In this context, it is well known that all the traits for which the genetic merit was considered in this study are low or moderately heritable (Vukasinovic et al., 2017;Gonzalez-Peña et al., 2020;Oliveira Junior et al., 2021), meaning that an important portion of the phenotypic variability observed in the performance of the animals is attributable to the additive genetic effects represented by their PTA.Furthermore, it is important to consider that the main purpose of estimating standardized differences is to verify if the relevant differences (e.g., confounding) in the covariate means between the 2 treatment groups in the matched sample have been eliminated.When the matching process generates standardized differences below the established threshold value (20% in our study), an increase in the likelihood of an unbiased estimation of treatment effects occurs.Conversely, when standardized differences after matching process remain above the threshold value, not enough confounding has been removed from the data and refinements to the PS model specification should be made to improve the resulting balance (Heinrich et al., 2010;Lee, 2013).In our case, all covariates were appropriately balanced after the matching process (all standardized differences were lower than 20%), meaning that enough confounding was removed and the probabilities of estimating unbiased treatment effects were high.
Conversely, considering that genetic correlations exist among the traits for which the genetic merit of the animals was considered in this investigation, it may be Due to the elevated number of herd-year-season classes, the categories of this confounder variable were lumped in equally sized groups on the table just to summarize the information.
thought that including variables exhibiting collinearity could be counterproductive for the purposes of investigating causal effects.Nevertheless, Elze et al. (2017) indicated that within a PS framework, all observed variables should be included in the logistic regression model, regardless of their statistical significance or collinearity with other variables.Similarly, Jennings et al.
(2014) explained that PS matching permits the inclusion of any number of confounders and is not limited by the number of variables that can be used or multicollinearity concerns as opposed to traditional regression methods.Moreover, the author indicated that for PS calculations, the more covariates included in the model, the better, because accounting for as many confound- because it leads to inflation of the treatment effects estimates (Brookhart et al., 2006;Heinze and Jüni, 2011).
Table 4 contains estimates of the causal effects of vaccinating dairy cows with Enviracor J-5 over their performance in MY305, FY305, PY305, and SCS.Two results are presented for each trait according to the matching method implemented (simple matching, the bias-corrected matching).In the case of MY305, the simple matching estimated effect was 163.89 kg (SE = 79.01 kg., 95% CI = 9.03-318.75kg), whereas the biascorrected matching estimate was 150.48 kg (SE = 61.68kg, 95% CI = 29.59-271.37kg).For both estimates, their respective 95% CI did not include 0, which can be interpreted as the differences in the average MY305 between vaccinated and control animals was significant (P < 0.05).Similar results were reported by Ferreira et al. (2017) when using PS analysis attempting to determine the causal effect of prolificacy on milk yield of dairy sheep.Given the simple matching approach contains more bias in comparison to the bias-corrected method, authors of such study suggested that the biascorrected result was a better representation of the true effects of the causal variable.Consequently, in our case the most reliable estimate of the causal effect of vaccinating dairy cows with the E. coli J5 bacterin over their milk production was the bias-corrected one, which suggested that vaccinated animals would be expected to produce 150.48 kg more of milk over an entire lactation compared with nonvaccinated cows.
The increments in milk production found in this study align with previous reports that have indicated that vaccinating dairy cows with J5 bacterins has benefits in economic returns related to less milk discarded due to mastitis.For instance, DeGraves and Fetrow (1991) reported increments in profitability associated with the vaccination of dairy cows with a J5 bacterin of $57/cow per lactation.Additionally, authors indicated that herd vaccination programs were expected to be profitable at all herd milk production levels.Vargas et al. (2016) reported that vaccinating dairy heifers with the same brand of the E. coli J5 bacterin as investigated in this study significantly reduced the number of cases of clinical mastitis, which in turn, resulted in lower volumes of discarded milk and higher economic returns during the first 190 d of lactation.Similarly, studies performed in  Brazilian dairy cattle also attributed the higher milk production of J5-vaccinated cows to the significant reductions in the occurrence of clinical mastitis cases by E. coli during the first 100 d of lactation (Gentilini et al., 2012;Molina et al., 2013).Gurjar et al. (2013) indicated that vaccination with Enviracor J-5 reduced the clinical severity of experimentally induced E. coli mastitis and that vaccinated cows had an increased milk production in comparison with control cows during the first week of lactation.Despite the majority of studies related to vaccination with J5 bacterins have been restricted to early lactation stages, our results based on entire lactation performances align well with such reports.
No causal effects were detected for fat or protein yields on J5 vaccinated cows since their respective 95% confidence interval included 0. These results agree to those reported by Steele et al. (2019) who indicated that no significant differences in fat, protein, and lactose existed among groups of cows vaccinated with 2 different J5 bacterins when compared with unvaccinated cows that were experimentally challenged with E. coli in the rear quarter at approximately 100 d in milk.Similarly, no significant differences between vaccinated and unvaccinated cows with Enviracor J-5 were found for lactose, fat, protein, and TS content in naturally occurring mastitis episodes (Vargas et al., 2016).Likewise, no causal effects of vaccinating cows with the J5 bacterin were identified either for SCS (the 95% CI for this variable also included zero).In this regard, a plethora of reports exist in literature indicating that cows vaccinated with the E. coli J5 strain do not exhibit significant differences in SCS compared with nonvaccinated or placebo-injected groups of cows (Hogan et al., 1992;Smith et al., 1999;Vangroenweghe et al., 2020).A plausible explanation for this may be the fact that the elevation in SCS is a response to insults to the mammary gland such as infections, but J5 bacterins do not prevent intramammary infections.Rather, these bacterins enhance the bacterial clearance from infected quarters helping the animals recover sooner from bacterial challenges in comparison to nonprotected cows (Tomita et al., 2000).Lastly, results obtained with the alternative analysis that included the PTA for all traits as continuous variables were not different to those obtained when forming genetic quartile groups (Supplemental Table S2; https: / / doi .org/ 10 .6084/m9 .figshare .22662496 .v1;Sánchez-Castro, 2023).In general, only small changes on the estimates were found for all variables, but the only significant causal effects identified were for MY305.

CONCLUSIONS
The usage of PS matching techniques applied to farm-recorded data was feasible and allowed to identify the causal effects of vaccinating dairy cows with an E. coli J5 bacterin over their productive performance.On average, vaccinated cows produced 150.48 kg more of milk over a 305-d lactation than nonvaccinated counterparts.Conversely, no causal effects of immunizing dairy cows with the E. coli J5 bacterin were identified for FY305, PY305, or SCS.In conclusion, vaccination with the Enviracor J-5 bacterin was related to significant increments in milk production without compromising milk quality.

ACKNOWLEDGMENTS
The first author acknowledges Zoetis LLC for providing the postdoctoral fellowship that allowed him to conduct this investigation.Additionally, authors are grateful to Gonzalo Rincon for proposing and facilitating this study, as well as to Dianelys González-Peña for the Sánchez-Castro et al.: CAUSAL EFFECTS OF A J5 BACTERIN IN DAIRY COWS into treatments under the research scenario of a preplanned experiment corresponded to the standardized difference of the kth confounder variable, x a v c k , and x con k , represented the sample mean of the kth confounder variable in each vaccinal group (vac = vaccinated; con = control), and
Sánchez-Castro et al.: CAUSAL EFFECTS OF A J5 BACTERIN IN DAIRY COWS same PS matching technique in a study intended to investigate the causal effect of prolificacy on milk yield of dairy sheep.Table

Figure 2 .
Figure 2. Distribution of propensity score values for each vaccination group (vaccinated or unvaccinated control) before matching (A), conditionally on the confounder variables herd-year-season, parity, and genetic quartile groups for milk yield, fat yield, protein yield, SCS adjusted to 305 d, and mastitis.Distribution of propensity score values for each vaccination group after matching (B).

Figure 3 .
Figure 3. Distribution of individuals within each vaccination group (0 = unvaccinated control, 1 = vaccinated) for different levels of the confounder variables contemplated in the study.Panels A and B show herd-year-season classes for original and matched data, respectively.Panels C and D show parity for original and matched data, respectively.Panels E and F show milk yield genetic quartile groups for original and matched data, respectively.Panels G and H show fat yield genetic quartile groups for original and matched data, respectively.Panels I and J show protein yield genetic quartile groups for original and matched data, respectively.Panels K and L show SCS genetic quartile groups for original and matched data, respectively.Panels M and N show mastitis genetic quartile groups for original and matched data, respectively.
Sánchez-Castro et al.: CAUSAL EFFECTS OF A J5 BACTERIN IN DAIRY COWS Sánchez-Castro et al.: CAUSAL EFFECTS OF A J5 BACTERIN IN DAIRY COWS

Table 1 .
Sánchez-Castro et al.: CAUSAL EFFECTS OF A J5 BACTERIN IN DAIRY COWS Number of observations of different levels of confounder variables for each vaccination group (vaccinated and control) before and after matching

Table 2 .
Summary statistics for all dependent variables for each vaccination group (vaccinated vs. unvaccinated control) before and after matching

Table 3 .
Standardized differences (%) for each confounder before and after matching

Table 4 .
Estimated causal effect of vaccination with a mastitis J5 bacterin on the productive performance of dairy cows using propensity scores with matched samples discussions and the generous support provided.Authors declare that this research was conducted as a part of all authors' employment with Zoetis LLC.The authors have not stated any other conflicts of interest. valuable