Journal of Dairy Science
Volume 89, Issue 8 , Pages 2833-2845, August 2006

Modified Versus Producer Milk Calibration: Mid-Infrared Analyzer Performance Validation1

  • K.E. Kaylegian

      Affiliations

    • Northeast Dairy Foods Research Center, Department of Food Science, Cornell University, Ithaca, NY 14853
  • ,
  • J.M. Lynch

      Affiliations

    • Northeast Dairy Foods Research Center, Department of Food Science, Cornell University, Ithaca, NY 14853
  • ,
  • G.E. Houghton

      Affiliations

    • Kestrel Software Consulting, Berkshire, NY 13736
  • ,
  • J.R. Fleming

      Affiliations

    • USDA, Agricultural Marketing Service, Texas Milk Marketing Area, P.O. Box 110939, Carrollton, TX 75011
  • ,
  • D.M. Barbano

      Affiliations

    • Northeast Dairy Foods Research Center, Department of Food Science, Cornell University, Ithaca, NY 14853
    • Corresponding Author InformationCorresponding author.

Received 3 January 2006; accepted 16 March 2006.

Article Outline

Abstract 

Our objective was to determine the validation performance of mid-infrared (MIR) milk analyzers, using the traditional fixed-filter approach, when the instruments were calibrated with producer milk calibration samples vs. modified milk calibration samples. Ten MIR analyzers were calibrated using producer milk calibration sample sets, and 9 MIR milk analyzers were calibrated using modified milk sample sets. Three sets of 12 validation milk samples with all-laboratory mean chemistry reference values were tested during a 3-mo period. Calibration of MIR milk analyzers using modified milk increased the accuracy (i.e., better agreement with chemistry) and improved agreement between laboratories on validation milk samples compared with MIR analyzers calibrated with producer milk samples. Calibration of MIR analyzers using modified milk samples reduced overall mean Euclidian distance for all components for all 3 validation sets by at least 24% compared with MIR analyzers calibrated with producer milk sets. Calibration with modified milk sets reduced the average Euclidian distance from all-laboratory mean reference chemistry on validation samples by 40, 25, 36, and 27%, respectively for fat, anhydrous lactose, true protein, and total solids. Between-laboratory agreement was evaluated using reproducibility standard deviation (sR). The number of single Grubbs statistical outliers in the validation data was much higher (53 vs. 7) for the instruments calibrated with producer milk than for instruments calibrated with modified milk sets. The sR for instruments calibrated with producer milks (with statistical outliers removed) was similar to data collected in recent proficiency studies, whereas the sR for instruments calibrated with modified milks was lower than those calibrated with producer milks by 46, 52, 61, and 55%, respectively for fat, anhydrous lactose, true protein, and total solids.

Key words: infrared milk analysis, validation, calibration

 

Back to Article Outline

Introduction 

Mid-infrared (MIR) milk analysis is an indirect method that requires instrument calibration with milk samples that have reference values established by reference chemistry methods. Traditional milk analysis has been done using fixed-filter instruments, with specific wavelengths for fat, protein, and lactose determination. Fourier transform infrared (FTIR) instruments for milk analysis have options for using a fixed-filter wavelength mode or a full spectral calibration mode. The principles underlying the MIR analysis of milk are presented elsewhere (Biggs et al., 1987). In the present study, all instruments were operated in the fixed-filter wavelength mode. Use of partial least squares calibration was outside the scope and objectives of the present study.

Accuracy of MIR analysis of milk is affected by instrument factors, quality of reference chemistry, characteristics of the calibration sample set, and individual milk sample composition factors (Biggs et al., 1987; Barbano and Clark, 1989; Kaylegian et al., 2006). Characteristics of the calibration sample set that affect calibration performance include the number of samples, the range of component concentration, and distribution within the range (presence of high leverage samples), correlation of fat and protein concentrations, and changes in these characteristics between consecutive sample sets (Kaylegian et al., 2006).

Traditional calibration sample sets are made from preserved raw individual producer milk samples (n = 8 to 12) obtained locally and generally analyzed by a single laboratory using reference chemistry methods. Producer milk calibration sets often have a narrow range of component concentrations, high leverage samples, and a positive correlation between fat and protein concentrations. These factors can cause reduced accuracy of the calibration, which is indicated by a larger confidence interval around the calibration linear regression line (Kaylegian et al., 2006). An approach to overcoming these limitations is the use of preserved pasteurized modified milk calibration samples. The modified milk calibration approach eliminated high leverage samples and substantially decreased uncertainty of instrument calibration by reducing the size of the 95% confidence interval around the linear regression calibration line for each component (Kaylegian et al., 2006). The objective of this research was to compare the accuracy of testing (using independent sets of validation samples) of 2 groups of MIR analyzers: one group using traditional individual producer milk samples vs. another group using modified milk samples for calibration.

Back to Article Outline

Materials and Methods 

Experimental Design 

A total of 19 MIR analyzers in 13 laboratories (Cornell University, USDA Federal Milk Market, and commercial) were used. One group of MIR analyzers (n = 10) was calibrated using producer milk calibration samples and calibration procedures used in those laboratories for milk payment testing. Another group of MIR analyzers (n = 9) was calibrated using modified milk samples manufactured at Cornell University as previously described (Kaylegian et al., 2006).

The validation sample sets were assembled from local individual producer milks by a USDA Federal Milk Market laboratory known to have calibration samples with a wide component range and a good distribution of samples within the component ranges (Kaylegian et al., 2006). The validation milk samples used to evaluate calibration performance were not part of the calibration samples in either group of instruments being evaluated in this study.

Manufacture of the modified milk calibration samples began on Monday of the first week. These pasteurized, dichromate-preserved samples were shipped on wet ice by overnight delivery and arrived at each laboratory on Thursday morning. Each laboratory was sent several sets of samples so that a fresh set of unopened samples could be used for each chemical analysis method and each MIR milk analyzer. All chemical analyses of the modified milk samples were completed by Tuesday of the second week and the data were sent to Cornell for calculation of the all-laboratory mean reference values. The 9 instruments in the study using modified milk calibration were calibrated (i.e., adjustment of slope and intercept) with these samples. The remaining 10 instruments in this study were calibrated using producer milk calibration samples each laboratory normally used as the calibration for their payment testing.

On Monday of the second week, the validation samples (n = 12 from individual farms) were assembled and shipped by overnight delivery to all laboratories. On Wednesday of the second week, the validation milk samples were tested on all 19 instruments in the study. Each laboratory was instructed to calibrate their MIR analyzer with the appropriate calibration sample set (either modified milk or their own producer milk calibration set), make any necessary adjustments to the slope and intercept of their calibration, and then immediately test the validation samples. The MIR results were immediately returned to Cornell. After testing the validation samples by MIR, additional sets of the validation samples were analyzed by all laboratories using reference chemistry methods and those results were returned to Cornell by Tuesday of the third week for calculation of the all-laboratory mean chemical reference values for the validation samples. This was repeated 3 times.

The MIR-predicted value for each of the validation samples was compared with the all-laboratory mean reference chemistry value. The mean difference (MD) and standard deviation of the difference (SDD) for each milk component of each validation set were determined for each instrument. Validation performance was evaluated by plotting the SDD as a function of MD for each component for the modified milk and producer milk calibrated instruments and calculating the Euclidian distance (ED) for each component, on each instrument, as an index of the impact of type of calibration set on testing accuracy. Reproducibility standard deviation(sR) was used as an index to evaluate the closeness of agreement of results between laboratories within each type of calibration set.

Producer Calibration Samples Used to Calibrate MIR Analyzers 

Each laboratory that calibrated their MIR analyzer with producer milk calibration samples used the procedures that they routinely used for milk payment testing. The general process used to assemble producer milk calibration samples (USDA Federal Milk Market and commercial) was described by Kaylegian et al. (2006). Producer calibration sample sets consisted of 10 to 12 milk samples, depending on the laboratory. The raw milk samples were preserved, split into vials and refrigerated (4°C). The samples were analyzed using reference chemistry methods by 1 to 4 laboratories; mean reference chemistry values were used when more than one laboratory determined the chemistry.

Modified Milk Calibration Samples Used to Calibrate MIR Analyzers 

The manufacture of the 14 sample modified milk calibration set was done in the Cornell University pilot plant described by Kaylegian et al. (2006). Pasteurized milk was gravity separated overnight at 4°C. The gravity skim layer (90% by weight) was drained from the bottom of the tank and the cream was removed in several layers. The cream layers were analyzed for fat content and selected layers were blended to create a cream ingredient with a fat content of 22 to 27%. The gravity skim layer was further separated by centrifugal separation to reduce the fat content to <0.07%. The centrifugally separated skim milk was ultrafiltered (2×) to obtain retentate and permeate. The cream ingredient, skim UF permeate, skim UF retentate, reagent grade α-lactose monohydrate (MultiPharm, EM Science, Gibbstown, NJ), and laboratory-grade water were blended to create 14 calibration samples with a broad range and an orthogonal matrix of component concentrations (Kaylegian et al., 2006). Samples were preserved with potassium dichromate, split into vials, and refrigerated (4°C).

The modified milk calibration sets used in this study were produced at Cornell University and shipped with wet ice overnight to each of the participating laboratories for analysis. Samples were analyzed using reference chemistry methods by at least 7 laboratories for fat, true protein, and total solids, and by at least 4 laboratories for lactose using an enzymatic method. The all-laboratory mean chemistry values were used as the reference chemistry values for calibration of 9 MIR instruments. Over the course of the study, 3 different batches of modified milk calibration samples were produced and used with all 9 MIR analyzers.

Validation Samples 

Validation sets were obtained from a USDA Federal Milk Market Laboratory and consisted of 12 individual farm raw milk samples. The chemical analyses of the validation samples were performed by the same group of laboratories that determined the reference chemistry values for the modified milk calibration sets. The all-laboratory mean chemistry values (Table 1) were used as the reference chemistry values for validation. All laboratories analyzed validation samples from the same batch regardless of the type of samples they used to calibrate an MIR analyzer.

Table 1. All-laboratory mean chemistry values for 3 sets of validation milks
SampleValidation set 1Validation set 2Validation set 3
FatLactose1True proteinTotal solidsFatLactoseTrue proteinTotal solidsFatLactoseTrue proteinTotal solids
––––––––––––––––––––––––––––———%———–––––––––––––––––––––––––––––––––––––
12.78274.56013.340911.89703.21524.67032.961011.94213.29994.68252.955412.0415
23.76804.28493.180712.31853.66204.51313.062412.29232.66834.56543.144611.6225
33.50574.62063.009912.27293.44904.64893.018612.29223.29684.48863.024311.8994
43.56714.63652.953412.29853.89024.64373.079812.72873.57554.65552.981612.3534
54.19784.60163.322213.25482.51594.55383.083011.38363.94824.59922.938212.5763
64.29754.49043.399213.31954.06854.61943.254913.04494.15634.51023.237713.0156
74.52564.29593.504013.59834.14554.48873.292613.04424.03254.60453.171312.9259
83.41074.29433.065711.88623.32564.36062.959911.75883.51444.31802.962311.9281
94.38114.62013.167013.30564.72994.61593.396213.88164.72234.47653.524413.9109
104.19874.54383.372213.27204.28414.70243.119713.19405.53524.54823.821115.0876
114.54514.55853.128413.36944.85754.47653.592514.10344.38674.70303.018513.2292
125.82694.48733.897315.41785.60564.53653.939415.25204.31414.60013.517813.5742
Mean4.08394.51033.278413.01753.97994.56923.230012.90983.95424.56263.191412.8470

1Lactose reported as anhydrous lactose in all validation sets.

Chemical Analyses 

Chemical analyses of all samples were conducted using the following AOACI (2000) methods: fat by modified Mojonnier ether extraction (method 989.05; 33.2.26), true protein by Kjeldahl analysis (method 991.22; 33.2.13), total solids by oven drying (method 990.20; 33.2.44), and lactose determined by enzyme analysis (method 984.15; 33.2.24) modified to measure lactose by weight instead of volume with the results expressed as anhydrous lactose. For instruments calibrated with producer milks, lactose by difference [lactose = total solids −(fat + true protein + ash + 0.19)] was used as reference. Ash was estimated using an updated version of the equation described by Lynch et al. (1990): ash = (0.0596×true protein) + 0.5379.

MIR Analysis 

A total of 19 instruments in 13 laboratories were used. No laboratory had more than 2 instruments. Both fixed-filter and FTIR instruments were included in this study (Table 2). All of the laboratories participated in the monthly precalibration of instruments according to the procedures of the USDA Federal Milk Markets. Precalibration procedures ensure that the instruments perform within the specified mechanical and electronic tolerances as described by Lynch et al. (2006). All instruments, including FTIR instruments, used fixed fat B, lactose, protein, and fat A filter wavelengths with corresponding reference wavelengths, as described by Kaylegian et al. (2006). All instruments in this study were operated in the traditional fixed-filter mode. The non-FTIR instruments (e.g., Milkoscan 134, 255, 300, and 605) used the classical sample filter wavelengths of 3.48, 9.61, 6.46, and 5.73μm and reference filter wavelengths of 3.60, 7.70, 6.70, and 5.60μm for fat B, lactose, protein, and fat A, respectively. The traditional fixed-filter wavelengths used in the Foss FT 6000 were not disclosed by the manufacturer (Foss Electric, Hillerød, Denmark) but presumably are similar to those recommended in their model FT 120. Several FTIR instruments and a Foss FT 120 participated in the study and were operated in a fixed-filter mode. The sample filter wavelengths used on the Delta FTIR instruments were 3.51, 9.54, 6.60, and 5.79μm and the reference wavelengths were 3.56, 7.79, 6.77, 5.62μm for fat B, lactose, protein, and fat A, respectively. These wavelengths are slightly different than for the fixed-filter instruments because it is necessary to use a narrower bandwidth than the fixed-filter mode on the FTIR instruments. Instruments used fixed intercorrection factors that were established as part of the precalibration procedures, except in the case of FT 6000 instruments, in which the fixed intercorrection factors were established by Foss Electric (Hillerød, Denmark). Instruments calibrated with producer milk samples followed the procedures that each laboratory normally used to calibrate (i.e., adjust slope and bias) their instrument for milk payment testing. Instruments calibrated with modified milk samples followed the calibration procedures described by Kaylegian et al. (2006). The validation samples were analyzed for fat, true protein, anhydrous lactose, and total solids on each instrument in duplicate. Data were analyzed as the mean of duplicates. The results were sent electronically to Cornell University for evaluation of validation performance.

Table 2. Type of calibration, makes, and models of mid-infrared analyzers used in this study
Calibration set typeNo. of instrumentsInstrument typeInstrument manufacturer1Instrument model
Modified milk5FilterFossMS 134, MS 255, MS 303, MS 605
4FTIR2FossFT 120
DeltaLactoScope FTIR
Producer milk4FilterFossMS 134, MS 4000
Bentley2000
6FTIRFossFT 6000
DeltaLactoScope FTIR

1Foss, Hillerød, Denmark; Delta, Drachter, The Netherlands; and Bentley, Chaska, MN.

2FTIR = Fourier transform infrared.

Validation Performance of Modified and Producer Milk Calibrations 

MD and SDD 

The MD and SDD were determined for fat, anhydrous lactose, true protein, and total solids for each instrument for each validation set. The difference value was calculated for each sample in the set by subtracting the reference chemistry value from the MIR predicted value. The mean of the individual sample differences (MD) and the standard deviation of these differences (SDD) were calculated for the entire validation set (12 samples) for each instrument. The MD and SDD values for all instruments were compared using a Euclidian distance plot by calibration set type (modified milk or producer milk) and component (fat, true protein, anhydrous lactose, and total solids).

ED 

The ED is a statistical measure of similarity that is the distance from an individual data point to the center point of a cluster of similar data (Massart et al., 1988). In this study, ED was used as a measure of the distance of each instrument's MD and SDD for each milk component from the mean reference chemical value for the validation samples and reflects the accuracy of the MIR method. The center point (mean chemistry value for the validation set) in this study was set at (0, 0). The ED was calculated as follows:

The ED values were determined for each instrument for each validation set for fat, true protein, anhydrous lactose, and total solids on MIR instruments calibrated with modified milk sets or producer milk sets. A mean ED for each milk component for the group of instruments for each calibration method for each of the 3 validation sets was calculated. The data were analyzed using the GLM procedure in SAS (Version 8e, 2001; SAS Institute, Cary, NC) to determine if the mean ED for the modified milk calibration vs. producer milk calibration were different. The ANOVA model was as follows: calibration set type and validation set were de-fined as class variables, and the model was ED = calibration set type + validation set + calibration set type×validation set + error. If the model was significant (P<0.05), the mean ED values were compared using a t-test at P<0.05.

Repeatability and Reproducibility Standard Deviation 

The statistical metric for within-laboratory variation of a method is the repeatability standard deviation (sr) and the metric for between-laboratory variation is the reproducibility standard deviation (sR). These values are commonly calculated as part of the process of method performance validation. The calculation and practical use of these metrics in the laboratory were described by Lynch (1998). In the present study, the impact of the type of calibration set on agreement between laboratories was determined by comparison of the sR on the validation samples.

Our validation data were analyzed by the statistical procedures of the AOACI (2000, Appendix D: Guidelines for collaborative study procedures to validate characteristics of a method of analysis) to determine the sR for instruments calibrated with producer milk samples vs. modified milk samples. The outlier identification procedures of the AOACI were used to remove individual laboratory outlier data points (α= 0.025).

Back to Article Outline

Results and Discussion 

Validation of MIR Analyzers Calibrated with Modified and Producer Milk Samples 

Comparison of ED Plots 

There was no more than 1 outlier instrument in any given validation set for the modified milk or the producer milk calibration sets. When removal of the MD or SDD data from a single instrument reduced the range of MD or SDD by at least 50% across instruments within a calibration set type it was considered an outlier. The data presented in Figure 1, Figure 2, Figure 3, Figure 4 and Table 3 are shown with these outliers removed.

  • View full-size image.
  • Figure 1. 

    Plot of mean difference (MD) and standard deviation of the difference (SDD) for fat (g/100g) from mid-infrared analyzers calibrated with (a) modified milk sets (n = 26) and (b) producer milk sets (n = 30). Outlier point removed (1 point, Figure 1b).

  • View full-size image.
  • Figure 2. 

    Plot of mean difference (MD) and standard deviation of the difference (SDD) for lactose (g/100g) from mid-infrared analyzers calibrated with (a) modified milk sets (n = 26) and (b) producer milk sets (n = 30). Outlier point removed (1 point, Figure 2a, 3 points, Figure 2b).

  • View full-size image.
  • Figure 3. 

    Plot of mean difference (MD) and standard deviation of the difference (SDD) for protein (g/100g) from mid-infrared analyzers calibrated with (a) modified milk sets (n = 26) and (b) producer milk sets (n = 30). Outlier point removed (1 point, Figure 3a).

  • View full-size image.
  • Figure 4. 

    Plot of mean difference (MD) and standard deviation of the difference (SDD) for total solids (g/100g) from mid-infrared analyzers calibrated with (a) modified milk sets (n = 26) and (b) producer milk sets (n = 30). Outlier point removed (2 points, Figure 4b).

Table 3. Mean Euclidian distance (ED) of validation samples (g/100g) using mid-infrared (MIR) analyzers calibrated with modified milk and producer milk samples1
ComponentValidation setModified milkProducer milkPercent reduction2LSD3
Fat
10.02630.037630
20.02790.0504445
30.02330.041644
Mean0.0257b0.0430a400.0061
Anhydrous lactose
10.01820.0278434
20.027340.0317414
30.02790.0371425
Mean0.0242b0.0322a250.0057
True protein
10.02080.034740
20.027040.041535
30.02670.038831
Mean0.0246b0.0383a360.0052
Total solids
10.04720.0555415
20.03970.056630
30.04370.0675435
Mean0.0437b0.0597a270.0089

a,bValues in the same row with different superscripts are different (P<0.05).

1Number of MIR analyzers calibrated with modified milk samples n = 9; with producer milk samples n = 10. Data with outliers were removed.

2Percent reduction = [(mean ED producer milkmean ED modified milk)/mean ED producer milk]×100.

3LSD = Least significant difference (P<0.05).

4Outlier value removed.

The scatter of MD and SDD for fat, lactose, true protein, and total solids across all 3 validation sets (shown in Euclidian distance plots) was reduced by MIR calibration with modified milk sets (Figures 1a, 2a, 3a, and 4a) compared with calibration using producer milk sets (Figures 1b, 2b, 3b, and 4b). Generally, the modified milk calibration sets produced smaller MD and SDD from the all-laboratory mean reference chemistry than producer milk calibration sets. In a validation study of 50 fixed-filter MIR milk analyzers calibrated with producer samples conducted by Ginn and Packard (1989) over a 6-mo period, the MD and SDD for the group of instruments. In general, the MD and SDD values for the validation of modified milk calibration in our study (Figures 1a, 2a, 3a, and 4a) are smaller than those reported by Ginn and Packard (1989).

Comparison of Mean ED 

The mean ED values for all 3 validation sets were consistently lower for all milk components for instruments calibrated with modified milk sets compared with instruments calibrated with producer milk sets (Table 3). There was a reduction (P<0.05) of overall mean ED values (fat, anhydrous lactose, true protein, and total solids were reduced by 40, 25, 36, and 27%, respectively) for instruments calibrated with modified milk sets compared with producer milk sets for all components (Table 3).

We used the average MD and SDD values reported by Ginn and Packard (1989) to calculate an ED for their data. The ED for fat and protein were 0.049 and 0.034% for data from Ginn and Packard. These values are similar to the ED values of 0.043 and 0.038% for fat and protein, respectively, for validation samples analyzed using MIR calibrated with producer milk samples (Table 3). The mean validation ED for fat and protein for the instruments calibrated with modified milks were 0.0257 and 0.0246%, respectively (Table 3), which demonstrates better accuracy of testing than observed by Ginn and Packard (1989). It is clear, based on the comparison of the data of Ginn and Packard (1989) to the data presented in Table 3, that in spite of improvements in the quality of the hardware and software used for infrared milk analysis, the validation performance of instruments when using producer calibration samples has not changed since the report in 1989. The data in Table 3 demonstrate that calibration of fixed-filter wavelength MIR milk analyzers with modified milk calibration samples produced significantly lower ED (i.e., better agreement with reference chemistry) on validation for all milk components than calibration with producer milks.

Current Industry Practice for MIR Milk Analysis 

AOACI Method 

The official first action method for fat, lactose, protein, and solids in milk by an MIR spectroscopic method using fixed-filter wavelengths (AOACI, 2000; method 972.16, 33.2.31) indicates that the MD between instruments and reference method values should be ≤0.05% for fat, protein, and lactose, and ≤0.09% for total solids, but gives no estimates of method performance for comparison of results among different instruments that are calibrated with different reference samples. The information on method performance for AOACI method 972.16 was collected before the current guidelines for collaborative studies. The methodology for evaluation of method performance has evolved over the years, particularly in the late 1980s and early 1990s to its current status, where there are specific guidelines that are accepted internationally for conducting collaborative studies and calculation of within- and between-laboratory metrics of method performance (AOACI 2000, Appendix D: Guidelines for collaborative study procedures to validate characteristics of a method of analysis).

State Regulations 

The states of New York and Wisconsin have specified procedures for electronic testing of milk components that give method performance limits for MD between instrument and reference chemistry. The New York state regulations (New York State Department of Agriculture and Markets, 2005) specify that 20 samples be used for the calibration of fat over the range of 3.0 to ≥4.5%, and have no specifications for calibration of other milk components. The Wisconsin regulations (Wisconsin Administrative Code, 2005) specify a calibration set of 12 individual herd samples with a fat range of at least 2.5 to 5.0%, a protein range of at least 2.7 to 3.4%, and a total solids range of at least 11 to 13%. The New York regulations specify the calibration performance limits at a MD (between reference values and instrument values) of ≤0.02 and an SDD of ≤0.04 for fat and true protein. The Wisconsin regulations set the performance limits for the average MD (between reference value and instrument values) of triplicate analyses at±0.044% for fat and protein, and±0.084% for total solids. No data that provides a basis for these performance limits were provided.

USDA Federal Milk Market Laboratories 

The laboratories of the USDA Federal Milk Marketing Orders and their affiliated laboratories have been engaged in a long-term research program to improve the accuracy of milk component measurement for use in producer payment testing. The program initially focused on systematic improvement of the accuracy of chemical reference methods that are used as the basis for calibration of MIR milk analyzers. The Babcock (Barbano et al., 1988; Lynch et al., 1995, 1996, 1997a, 2003), ether extraction (Barbano et al., 1988; Lynch et al., 1996, 1997a, 2003), Kjeldahl (Barbano et al., 1990, 1991; Lynch et al., 1998; Lynch and Barbano, 1999), and total solids methods (Clark et al., 1989a,b) for milk analysis have been optimized to improve their within- (sr) and between- (sR) laboratory performance, and the method performance values are included in the AOACI (2000) method descriptions (Babcock method 989.04, 33.2.27; ether extraction method 989.05, 33.2.26; Kjeldahl CP method 991.20, 33.2.11; Kjeldahl NPN method 992.21, 33.2.12; Kjeldahl true protein method 991.22, 33.2.13; Kjeldahl casein nitrogen method 998.06, 33.2.65; total solids method 990.20, 33.2.44; solids-not-fat method 990.21, 33.2.45). Periodically, the USDA Federal Milk Markets have published the results of proficiency testing for some of these methods (Lynch et al., 1994, 1997b).

Over several years, the USDA Federal Milk Market Laboratories have been conducting proficiency tests of MIR milk analyzer performance in their laboratories using 7 unknown milk samples in blind duplicate 6 times each year. The values for sr and sR for MIR milk analyzers have been calculated and are summarized in Table 4 for the month of November for the period 1999 through 2003. In general, the within-laboratory repeatability is excellent, as indicated by an sr that is routinely below 0.01% for fat and protein. However, the between-laboratory agreement (sR) is usually 2 to 3 times larger than the within-laboratory agreement (sr). Compared with the performance of chemical reference methods (AOACI, 2000) for fat (ether extraction method 989.05, 33.2.26) and protein (Kjeldahl method 991.20, 33.2.11), the between-laboratory agreement (sR) is consistently less than 2 times the within-laboratory agreement (sr).

Table 4. Mean repeatability standard deviation (sr) and reproducibility standard deviation (sR) of unknown milk samples (n = 7 sample materials in blind duplicate) used in bimonthly laboratory proficiency testing of mid-infrared milk analyzers for selected months between 1999 and 20031
Year2No. of instrumentsFatProteinTotal solids
srsRsrsRsrsR
1999150.00800.02160.00680.01550.01320.0400
2000140.00950.02440.00760.01660.01570.0411
2001150.00750.02590.00670.02070.01230.0285
2002150.00900.02280.00860.02480.01680.0504
2003170.00810.02090.00630.02320.01300.0351

1Statistical outliers removed using Cochran and Grubbs tests at α= 0.01.

2Data are for the month of November in each year.

The larger ratio of sR to sr for MIR analysis compared with reference chemical analysis methods would indicate that the between-laboratory performance of MIR milk analysis can be improved. The larger difference between within-laboratory repeatability standard deviation (sr) and between laboratory reproducibility standard deviation (sR) can originate from laboratory-to-laboratory differences in the reference chemistry for calibration of the MIR milk analyzers and from the characteristics of the calibration sample sets (Kaylegian et al., 2006). However, it is not uncommon for the chemistry results from 2 laboratories to agree very well for fat and protein on a set of unknown samples and at the same time, their MIR instruments do not agree very well on the same samples. In this case, the cause for disagreement in MIR results would appear to be due to differences in the characteristics of the calibration sample sets between the 2 laboratories or other aspects of performance control of the individual MIR milk analyzers.

Between-Laboratory Performance: Producer vs. Modified Milk Calibration 

SR. The sR values (after outlier removal) for each sample in each of 3 validation sample sets and a grand mean for fat, anhydrous lactose, true protein, and total solids for the modified milk calibration and producer milk calibration approaches are reported in Tables 5 and 6, respectively. A total of 7 single Grubbs outliers were identified and removed from the modified milk validation data (Table 5) and 53 single Grubbs outliers were removed from the producer milk validation data (Table 6). No double Grubbs outliers were detected. The large difference in number of statistical outliers between the 2 types of calibration sets underscores the susceptibility of instruments calibrated with producer milk samples to produce large variations on individual milk samples. The differences between modified milk calibration samples and producer milk calibration sample sets in the number of high leverage samples and the size of calibration regression confidence intervals was presented previously (Kaylegian et al., 2006), and is the likely cause of the high number of validation outlier values for instruments using producer milk calibration.

Table 5. Reproducibility standard deviations (sR) for validation samples (g/100g) analyzed using mid-infrared instruments calibrated with modified milk samples (statistical outliers removed)
SampleValidation set 1Validation set 2Validation set 3
FatLactose1True proteinTotal solidsFatLactoseTrue proteinTotal solidsFatLactoseTrue proteinTotal solids
––––––––––––––––––––––––––––———sR———–––––––––––––––––––––––––––––––––––––––––––––––––––––––––
10.01870.00900.01460.02770.01640.01690.00860.02410.01520.00780.01060.0235
20.01180.01290.01170.02870.01770.00960.00550.02090.01290.01280.01700.0227
30.02190.00820.00830.03060.0063a0.0206a0.0064a0.02330.00830.01020.01250.0212
40.01100.01270.00840.02010.01670.01070.00640.01640.00920.00970.00620.0153
50.01470.01180.01820.03450.01450.01060.02290.0109a0.01570.00970.01180.0238
60.01380.00870.00700.00980.01870.01270.00780.02640.01260.00540.01130.0164
70.00640.00750.01710.0224a0.01430.00950.00690.01830.01320.00970.00660.0148
80.01420.00750.00560.02170.01140.02560.00440.02470.01210.00660.00950.0119
90.02270.00860.01480.03450.01450.01300.00930.01200.01550.00810.01020.0198
100.01860.01390.01340.02930.01390.01200.00870.02650.02800.01790.01650.0131a
110.01080.01200.00830.02660.01530.01310.00770.02060.02840.01190.00780.0289
120.02320.01800.01920.02880.02600.02110.01630.03560.01380.00610.01270.0217
Mean0.01570.01090.01220.02620.01550.01460.00930.02160.01540.00970.01110.0194

aSingle Grubbs outliers removed.

1Lactose reported as anhydrous lactose in all validation sets.

Table 6. Reproducibility standard deviation (sR) for validation samples (g/100g) analyzed using mid-infrared instruments calibrated with producer milk samples (statistical outliers removed)
SampleValidation set 1Validation set 2Validation set 3
FatLactose1True proteinTotal SolidsFatLactoseTrue proteinTotal solidsFatLactoseTrue proteinTotal solids
———––––––––––––––––––––––––––––––––sR———–––––––––––––––––––––––––––––––
10.02750.0263a0.04760.0354a0.0422a0.0289a0.02140.04630.03260.0299a0.02760.0500
20.02290.0165a0.02490.09310.0286a0.0164a0.02510.03830.05350.0763a0.08390.0498a
30.01420.0173a0.01870.0335a0.03500.0178a0.02380.03960.03140.0276a0.01640.0334
40.02380.0157a0.01780.0396a0.0311a0.0206a0.02380.0223a0.02980.0253a0.02540.0470
50.02200.0175a0.02710.0374a0.03980.0250a0.04930.04380.0265a0.0302a0.01060.0433a
60.02580.0200a0.02820.0460a0.02890.0237a0.03150.05010.01800.0200a0.01520.0345
70.02160.0238a0.03180.0401a0.02870.0170a0.03340.04740.02650.0217a0.01160.0453
80.02090.0224a0.02080.09670.03200.0237a0.03130.05170.03360.0286a0.02670.0495
90.01960.0156a0.01930.0365a0.02840.0242a0.03510.05530.03190.0246a0.02420.0482
100.02190.0206a0.02810.0386a0.02670.0199a0.02810.04870.03010.0377a0.03310.0692
110.02950.0164a0.01650.0414a0.02780.0277a0.03930.06010.02180.0227a0.01750.0454
120.05370.0482a0.04040.09510.03930.0397a0.04190.06880.02500.0168a0.02080.0514
Mean0.02530.02170.02680.05280.03240.02370.03200.04770.03010.03010.02610.0473

aSingle Grubbs outliers removed.

1Lactose reported as anhydrous lactose in all validation sets.

Calibration of MIR analyzers with modified milk reduced the mean sR by 46, 52, 61, and 55% for fat, lactose, true protein, and total solids, respectively, compared with calibration with producer milk (Table 7). Compared with the long term between-laboratory performance (i.e., SR) of MIR analyzers calibrated with producer milks (Table 4), the modified milk calibration (Table 7) improved performance by reducing sR from an average of 0.0231, 0.0202, and 0.0390 to 0.0155, 0.0109, and 0.0224 for fat, protein, and total solids, respectively. A useful form of expression of the sR value for the analyst is to convert the sR to the reproducibility value (R-value), which is calculated as sR×2.8. The R-value indicates that 95% of the time the analysis of an unknown milk sample by 2 laboratories using the method (in this case, MIR) will not differ by more than the R-value, assuming the sample was at the correct temperature and properly mixed at the time of analysis. The overall mean R-values for all validation sets for fat, anhydrous lactose, true protein, and total solids for validation samples analyzed using the modified milk calibration were 0.043, 0.033, 0.030, and 0.063%, respectively, and for producer milk calibrated instruments, the mean R-values were 0.082, 0.071, 0.079, and 0.138%, respectively. These between-laboratory statistical performance values for a method are useful in setting practical guidelines for use in the verification of accuracy on individual validation samples.

Table 7. Comparison of reproducibility standard deviations (sR) for validation samples (g/100g) analyzed using mid-infrared instruments calibrated with modified milk and producer milk samples
ReplicateFatLactose1True proteinTotal solidsFatLactoseTrue proteinTotal solids
———sR for modified milk calibration————sR for producer milk calibration—
10.01570.01090.01220.02620.02530.02170.02680.0528
20.01550.01460.00930.02160.03240.02370.03200.0477
30.01540.00970.01110.01940.03010.03010.02610.0473
Mean0.01550.01190.01090.02260.02940.02540.02840.0493
———Reduction of sR2——————Reduction of sR, %———
1−0.0096−0.0108−0.0146−0.026638.149.854.450.3
2−0.0169−0.0091−0.0228−0.026152.238.371.154.6
3−0.0147−0.0205−0.0150−0.027848.867.957.558.9
Mean−0.0137−0.0135−0.0175−0.026846.452.061.054.6

1Lactose reported as anhydrous lactose in all validation sets.

2Reduction = sR for producer milk calibrationsR for modified milk calibration.

Regulatory Verification of the Accuracy of Instrument Performance 

The goal of regulatory verification of instrument performance is to detect when a milk payment testing instrument is out of compliance with a standard for accuracy on a set of unknown samples. When is the difference sufficiently large to warrant a required adjustment in instrument calibration? Small differences in milk component tests can have significant economic impacts in payment testing (Lynch et al., 2004).

The current practice for both state regulatory agencies and USDA Federal Milk Market Laboratories is to have a laboratory test a set of verification samples with their MIR milk analyzer. The set of verification samples has been analyzed, usually by one regulatory laboratory, using chemistry methods. It is clear from the results in Table 8 that there is some degree of uncertainty when only one laboratory's chemistry results are used for reference. The regulatory agency calculates the MD and SDD between their reference chemistry values for the samples and the instrument values. If the MD exceeds a specified tolerance limit, then the laboratory may be required to adjust its instrument and possibly some of the past test results. Are there practical approaches to MIR milk analyzer calibration and verification that would allow the industry to achieve improved accuracy without excessive cost?

Table 8. Comparison of all-laboratory mean chemistry and single laboratory reference chemistry values for the validation sets
Validation set 1Validation set 2Validation set 3
FatLactose1True proteinTotal solidsFatLactoseTrue proteinTotal solidsFatLactoseTrue proteinTotal solids
———–––––––––––––––––––––––––––––––%———––––––––––––––––––––––––––––––––
All-lab mean4.08394.51033.278413.01753.97994.56923.230012.90983.95424.56263.191412.8470
Single lab with outliers removed2
Highest4.09494.55023.298213.03393.995433.249412.93203.96904.57943.209612.8547
Lowest4.07824.47273.266113.00073.973233.222812.89543.94094.56323.184512.8316
Range0.01670.07750.03210.03320.022230.02660.03660.02810.01620.02510.0231
Single lab with no outliers removed
Highest4.09494.55023.298213.03393.99544.60723.249412.93433.96904.61043.209612.8709
Lowest4.07694.47273.266112.99773.97324.53633.201012.88353.88024.53373.155712.7981
Range0.01800.07750.03210.03620.02220.07090.04840.05070.08880.07670.05390.0728

1Lactose reported as anhydrous lactose in all validation sets.

2Single Grubbs outlier removed.

3Only 1 out of 7 laboratories had no outlier values.

One approach to minimize the impact of laboratory-to-laboratory variation in reference chemistry is to have a set of validation samples with all-laboratory mean reference chemistry for each sample (Table 1). However, if every USDA Federal Milk Market Laboratory or state regulatory laboratory had to produce milk sample sets for instrument accuracy verification that required a network of laboratories to run chemistry on all samples, the cost would be high. The USDA Federal Milk Market Administrator Laboratories maintain and calibrate MIR milk analyzers in their own laboratories for testing. Clearly, as a group of laboratories, the data in Kaylegian et al. (2006) and the validation presented in Figure 1, Figure 2, Figure 3, Figure 4 and Table 3 indicate that a single common set of modified milk calibration samples with all-laboratory mean chemistry would allow this group of laboratories to achieve closer agreement with each other on MIR milk analysis and better agreement of their MIR milk analysis with reference chemistry methods than the same laboratories can agree with each other's chemistry on these samples. However, these laboratories still have a need to use producer milk samples (with established reference values) in the field to verify the accuracy of performance of instruments in industry laboratories. Currently, most regulatory laboratories would test the validation samples that they make in their laboratory using their own chemistry. The level of laboratory-to-laboratory uncertainty (with and without outliers removed) in the mean reference chemistry values for 3 different sets of 12 verification samples if each laboratory was running chemistry is shown in Table 8. If only one laboratory was running chemistry on these validation samples, then the means without outliers removed are the values of interest. For example, in validation set 3 for fat (Table 8), a single laboratory running reference chemistry on these samples and using them for validation of instrument performance in the field could have had a mean fat test for the set that was 0.074% lower (3.9542 vs. 3.8802) than the all-laboratory mean and they would not realize they were low. One cost-effective strategy to reduce the risk of this happening in the production and use of validation samples is as follows: use a group (i.e., 8 or more) of MIR milk analyzers that have been calibrated with modified milk samples with all-laboratory mean chemistry reference values to test each validation sample to establish an all-laboratory MIR instrument mean reference value (with statistical outliers removed) for each validation sample instead of chemistry. This approach eliminates the uncertainty of using the chemistry from a single laboratory for validation samples without having all laboratories run chemistry tests (and incurring the chemical analysis cost) on these validation samples. Next, we will explore the feasibility of this unconventional approach using the data we have collected.

All-Laboratory Mean Reference Chemistry vs. Instrument-Predicted Reference Values 

The all-laboratory mean reference chemistry values and the all-laboratory instrument mean values (calculated from instruments calibrated with modified milk samples) for the validations sets used in this study are shown in Table 9. The mean difference between all-laboratory mean instrument and reference chemistry values across the 3 validation sets was approximately±0.0060 for fat, anhydrous lactose, and true protein, and −0.0174 for total solids (Table 9). The instrument all-laboratory mean was a much better predictor of all-laboratory mean reference chemistry on the validation samples (Table 9) than many individual laboratory reference chemistry values (Table 8). There were statistically significant differences (P<0.05) between the all-laboratory mean instrument value and the all-laboratory mean reference chemistry value for some components and validation sets (Table 9). However, the least significant difference values were small (<0.0068%), and this is better agreement with the all-laboratory mean chemistry than that achieved by comparison of most individual laboratory's chemistry with the all-laboratory mean chemistry (Table 8). From a practical point of view, the substitution of all-laboratory instrument mean value for individual laboratory reference chemistry on producer milk validation samples could improve regulatory verification of instrument accuracy for fat, lactose, protein, and total solids testing in a more cost-effective manner than obtaining all-laboratory mean chemistry values on producer milk validation sample sets used in various regions of the country. A more costly, but more conventional and acceptable, approach from a regulatory perspective would be to have a common set of calibration samples and a common set of validation samples used by all regulatory laboratories with all-laboratory mean chemistry for values for both sample sets.

Table 9. All-laboratory mean values (g/100g) for chemistry and instrument (modified milk calibration) prediction of fat, true protein, anhydrous lactose, and total solids of validation samples1
ComponentValidation setAll-lab meanDifference2LSD3
ChemistryInstrument
———–––%———––
Fat
14.0839a4.0760b−0.00790.0038
23.9799a3.9634b−0.01650.0042
33.9542b3.9607a0.00650.0040
Mean4.00604.0000−0.0060
Anhydrous lactose
14.51034.51590.00560.0061
24.56924.5649−0.00430.0068
34.5626b4.5797a0.01710.0044
Mean4.54744.55350.0061
True protein
13.2784a3.2684b−0.01000.0038
23.23003.2280−0.00200.0048
33.19143.1874−0.00400.0045
Mean3.23333.2279−0.0053
Total solids
113.0175a12.9936b−0.02390.0062
212.9098a12.8893b−0.02050.0064
312.8470a12.8393b−0.00770.0051
Mean12.924812.9074−0.0174

a,bChemistry and instrument means in the same row with different superscripts are different (P<0.05).

1Instrument mean values determined for all components by 9 instruments.

2Difference = Instrument mean valuechemistry mean value.

3LSD = Least significant difference (P<0.05).

Back to Article Outline

Conclusions 

Calibration of MIR milk analyzers using modified milk samples increased the accuracy of testing and improved agreement between laboratories on independent sets of validation milk samples compared with MIR analyzers calibrated with producer milk samples. Calibration with modified milk samples reduced the mean ED from all-laboratory mean reference chemistry on validation samples by 40, 25, 36, and 27%, respectively, for fat, anhydrous lactose, true protein, and total solids. The number of single Grubbs statistical outliers in the validation data was much higher for instruments calibrated with producer milk samples, 53 vs. 7 for instruments calibrated with modified milk samples. The sR for instruments calibrated with producer milk samples (with statistical outliers removed) was similar to data collected in recent proficiency studies conducted by the USDA Federal Milk Market Laboratories, as expected, whereas the sR for instruments calibrated with modified milk samples was lower than those calibrated with producer milk samples by 46, 52, 61, and 55%, respectively for fat, anhydrous lactose, true protein, and total solids.

Back to Article Outline

Acknowledgments 

The authors would like to thank the staff of all the USDA Federal Milk Market laboratories and affiliated laboratories for their collaboration and sample analysis in this work. The technical assistance in sample preparation by Maureen Chapman, Laura Landolf, Bob Kaltaler, Mark Schweisthal, and Pat Wood was important for the success of this project. The authors thank the Test Procedures Committee of the USDA, Dairy Programs, Federal Milk Markets for their financial support of this research.

Back to Article Outline

Supplementary data 

Interpretive summary.

Back to Article Outline

References 

  1. Association of Official Analytical Chemists International (AOACI). Official Methods for Analysis. Gaithersburg, MD: AOACI; 2000;
  2. Barbano DM, Clark JL. Infrared milk analysis –Challenges for the future. J. Dairy Sci. 1989;72:1627–1636
  3. Barbano DM, Clark JL, Dunham CE. Comparison of the Babcock and ether extraction methods for determination of fat content in milk: Collaborative study. J. AOACI. 1988;71:898–914
  4. Barbano DM, Clark JL, Dunham CE, Fleming JR. Kjeldahl method for determination of total nitrogen content of milk: Collaborative study. J. AOACI. 1990;73:849–859
  5. Barbano DM, Lynch JL, Fleming JR. Direct and indirect determination of true protein content of milk by Kjeldahl analysis: Collaborative study. J. AOACI. 1991;74:281–288
  6. Biggs DA, Johnsson G, Sjaunja L-O. Analysis of fat, protein, lactose, and total solids by infrared absorption. Monograph on Rapid Indirect Methods for Measurement of the Major Components of Milk. Bull. Int. Dairy Fed.. Brussels, Belgium: International Dairy Federation; 1987;Pages 21–30. No. 208
  7. Clark JL, Barbano DM, Dunham CE. Comparison of two methods for determining total solids content of raw milk: Collaborative study. J. AOACI. 1989;72:712–718
  8. Clark JL, Barbano DM, Dunham CE. Combination of total solids determined by oven drying and fat determined by Mojonnier extraction for measurement of solids-not-fat content of raw milk: Collaborative study. J. AOACI. 1989;72:719–724
  9. Ginn RE, Packard VS. A study of the accuracy of infrared milk component analysis in DHIA laboratories. Dairy Food Environ. Sanit. 1989;9:61–64
  10. Kaylegian KE, Houghton GE, Lynch JM, Fleming JR, Barbano DM. Calibration of infrared milk analyzers: Modified milk versus producer milk. J. Dairy Sci. 2006;89:2817–2832
  11. Lynch JM. Use of AOAC International method performance statistics in the laboratory. J. AOACI. 1998;81:679–684
  12. Lynch JM, Barbano DM. Kjeldahl nitrogen analysis as a reference method for protein determination in dairy products. J. AOACI. 1999;82:1389–1398
  13. Lynch JM, Barbano DM, Fleming JR. Variation in the ash and nonprotein nitrogen content of milk, and use of milk protein content to predict ash content. J. Dairy Sci. 1990;73(Suppl.1):92;(Abstr.)
  14. Lynch JM, Barbano DM, Fleming JR. Comparison of Babcock and ether extraction methods for determination of fat content of cream: Collaborative study. J. AOACI. 1996;79:907–916
  15. Lynch JM, Barbano DM, Fleming JR. Modification of Babcock method to eliminate fat testing bias between the Babcock and ether extraction methods (modification of AOAC official methods 989.04 and 995.18): Collaborative study. J. AOACI. 1997;80:845–859
  16. Lynch JM, Barbano DM, Fleming JR. Indirect and direct determination of the casein content of milk by Kjeldahl nitrogen analysis: Collaborative study. J. AOACI. 1998;81:763–774
  17. Lynch JM, Barbano DM, Fleming JR, Nicholson D. In: Component testing, the dairy industry, and AOAC International. Inside Laboratory Management, publication of AOACI July/Aug:25–28. 2004;
  18. Lynch JM, Barbano DM, Healy PA, Fleming JR. Performance evaluation of the Babcock and ether extraction methods: 1989 through 1992. J. AOACI. 1994;77:976–981
  19. Lynch JM, Barbano DM, Healy PA, Fleming JR. Performance evaluation of direct forced-air total solids and Kjeldahl total nitrogen methods: 1990 through 1995. J. AOACI. 1997;80:1038–1043
  20. Lynch JM, Barbano DM, Healy PA, Fleming JR. Effectiveness of temperature modification in decreasing the bias in milk fat test results between the Babcock and ether extraction methods. J. AOACI. 2003;86:768–774
  21. Lynch JM, Barbano DM, Houghton GE, Fleming JR. Babcock bottle certification apparatus: Performance evaluation. J. AOACI. 1995;78:463–471
  22. Lynch JM, Barbano DM, Schweisthal M, Fleming JR. Precalibration evaluation procedures for mid-infrared milk analyzers. J. Dairy Sci. 2006;89:2761–2774
  23. Massart DL, Vandeginste BGM, Deming SM, Michotee Y, Kaufman L. Clustering techniques. Chemometrics: A textbook, Data Handling in Science and Technology. Vol. 2. New York, NY: Elsevier; 1988;Pages 371–375
  24. New York State Department of Agriculture and Markets. 2005. Pages 1 to 5 in Bulletin: Electronic and Other Methods of Testing for Component Content. NYS Department of Agriculture and Markets. Division of Milk Control, 1 Winners Circle. Albany, NY.
  25. Wisconsin Administrative Code. Dairy Plants. Agric.. Wisconsin State Register: Trade & Consumer Protection; 2005;Chapter 80. http://www.legis.state.wi.us/rsb/code/atcp/atcp080.pdf

PII: S0022-0302(06)72556-5

doi:10.3168/jds.S0022-0302(06)72556-5

Journal of Dairy Science
Volume 89, Issue 8 , Pages 2833-2845, August 2006