spec-Simultaneous detection for adulterations of maltodextrin, sodium carbonate, and whey in raw milk using Raman spectroscopy and chemometrics

To achieve rapid on-site identification of raw milk adulteration and simultaneously quantify the levels of various adulterants, we combined Raman spectroscopy with chemometrics to detect 3 of the most common adulterants. Raw milk was artificially adulterated with maltodextrin (0.5–15.0%; wt/wt), sodium carbonate (10–100 mg/kg), or whey (1.0–20.0%; wt/wt). Partial least square discriminant analysis (PLS-DA) classification and a partial least square (PLS) regression model were established using Raman spectra of 144 samples, among which 108 samples were used for training and 36 were used for validation. A model with excellent performance was obtained by spectral preprocessing with first derivative, and variable selection optimization with variable importance in the projection. The classification accuracy of the PLS-DA model was 95.83% for maltodextrin, 100% for sodium carbonate, 95.84% for whey, and 92.25% for pure raw milk. The PLS model had a detection limit of 1.46% for maltodextrin, 4.38 mg/kg for sodium carbonate, and 2.64% for whey. These results suggested that Raman spectroscopy combined with PLS-DA and PLS model can rapidly and efficiently detect adulterants of maltodextrin, sodium carbonate, and whey in raw milk.


INTRODUCTION
In light of the increasing types of milk products available (such as milk powder, yogurt, and cheese) and the boosted consumption of dairy products, the quality and safety of raw milk are regarded as the cornerstone for the development of milk-based products (He et al., 2019).Milk adulteration with the addition of various chemicals such as melamine, caustic soda, formalin, and hydrogen peroxide is still very common; however, these chemicals have the potential to cause serious health-related problems (Kamal and Karoui, 2015;Abdallah Musa Salih and Yang, 2017;Poonia et al., 2017;He et al., 2019).The purpose of adulteration is to increase volume of products, to compensate for undesirable factors for consumption, to enhance the content of protein and fat, as well as to improve economic benefits (Moore et al., 2012;Cattaneo and Holroyd, 2013;Santos et al., 2013).Adding water to raw milk can be masked by an addition of a thickener, such as maltodextrin (Tronco, 2010;Capuano et al., 2015;de Souza Gondim et al., 2015;Bergana et al., 2019).The lactic acid created during long-term storage can significantly affect the quality of raw milk; therefore, neutralizers such as sodium carbonate or sodium bicarbonate are added to reduce acidity (de Souza Gondim et al., 2015;Chakraborty and Biswas, 2018).In Brazil, the addition of cheese whey to milk has been widely reported to increase the volume and fat content without significantly altering the sensory characteristics (Aquino et al., 2014;Farah et al., 2021).Usually, different adulterants are not added separately to raw milk for adulteration; for instance, an addition of whey leads to a decrease in the density of raw milk, which requires maltodextrin to increase density, and sodium carbonate may also be added simultaneously to extend shelf life.
However, most of the current literature on the detection of raw milk adulteration by the use of Raman spectroscopy (RS) focuses on single adulteration or multiple adulterations of the same adulteration type.Confocal Raman microscopy and artificial neural network have been used to quantify whey in milk (Alves da Rocha et al., 2015).Several studies have reported an effective screening method for detecting melamine, dicyandiamide, ammonium sulfate, and urea in milk (Nieuwoudt et al., 2016(Nieuwoudt et al., , 2017)).Our work is the first time that 3 types of adulterants have been simultaneously detected by RS.The detection of only a single adulteration is not efficient.In addition, a good detection method is also key to improving the efficiency of adulteration detection.Liquid chromatography (MacMahon et al., 2012), infrared spectroscopy (Kene Ejeahalaka and On, 2020), front-face fluorescence spec-troscopy (Ullah et al., 2020), and RS (Xu et al., 2020b) have been used to detect milk adulteration, among which RS is more popular because it is nondestructive, rapid, and does not require sample pretreatment.
Specifically, chemometric methods can provide data processing, variable selection feature extraction, and pattern recognition.In contrast to other chemometric methods, support-vector machines and partial least square (PLS) discriminant analysis (PLS-DA) classification models are more accurate (Jiménez-Carvelo et al., 2017).However, support-vector machines are highly time-consuming algorithms and can easily be over-fitting (Xu et al., 2020a).Conversely, PLS-DA is a model with simple operation and excellent performance; specifically, near-infrared spectroscopy combined with PLS-DA presents 100% sensitivity and specificity in calibration, cross-validation, and prediction in detecting water, urea, bovine whey, and cow milk in goat milk samples (Teixeira et al., 2020).Additionally, PLS is one of the most widely used analytical vibrational spectroscopy techniques to quantify various components.Pereira et al. (2020) used near-infrared spectroscopy and PLS algorithms to quantify goat milk adulteration by adding cow milk, reaching a linear correlation coefficient of the cross-validation set (R CV ) of 0.9996 and a linear correlation coefficient of the prediction set (R Pred ) of 0.9955.Nevertheless, no studies have combined RS with PLS-DA and PLS to simultaneously identify and quantify adulterations of whey, maltodextrin, and sodium carbonate in milk samples.
Herein, we are for the first time establishing a method for the simultaneous identification of 3 types of adulterants (whey, maltodextrin, and sodium carbonate) in raw milk using RS with PLS-DA.Additionally, the concentration of adulterants is determined by a PLS quantitative model that presents efficient processing with low error.This strategy represents a rapid and promising analytical method to identify the type of substance used in the adulteration process and to predict different levels of adulteration.

Sample Collection and Preparation
Raw milk samples were collected from Shanghai No. 4 Dairy Product Factory Co. Ltd. over a 3-mo period (from September to December of 2020).The overall process was under standard quality control.The whey was prepared from the laboratory.First, 2 mL of a milk coagulating enzyme (MT2200) solution was added into 1.5 L of mixed raw milk, incubated at 40°C for 40 min, and then stirred slowly and continuously for 30 min.Finally, we filtered the mixture and collected whey for later use.Maltodextrin was purchased from Adamas Reagent Co. Ltd., and sodium carbonate was purchased from Sigma-Aldrich Trading Co. Ltd.No animals were used in this study, and ethical approval for the use of animals was thus deemed unnecessary.

Raman Spectroscopy
For confocal Raman microscopic observations, 100 µL of each sample was pipetted onto a silicon chip and air dried for 10 min.Raman spectra were collected by a DXR laser micro-Raman spectrometer (Thermo Fisher Scientific) with a diode-pumped solid-state laser source (532 nm), which produced Raman scattering of irradiated molecules.The sample needed to be focused by a microscope before spectral collection.It was difficult to achieve microscopic focusing with the liquid sample (liquid milk), so the milk droplets needed to be dried.The spectral regions from 50 to 3,000 cm −1 were recorded at a resolution of 0.2 nm, with 5 s of acquisition time and 100 mW of laser power.For each sample, 3 spectra were collected at 3 different points, and the average values were calculated.All measurements were carried out at room temperature.

Data Analysis
The data analysis workflow is shown in Figure 1, and the principal component analysis (PCA) was carried out for outlier detection.Multivariate analysis of RS data was carried out using PLS-DA and PLS.Before building a classification model, we investigated the effects of various preprocessing method on the classification model (Berzins et al., 2021).The first transformation applied to the Raman spectra was the smoothing method of median filtering, using a gap size of 3, because the spectra were noisy and exhibited systematic variations on the baseline.The first derivative (1st DER), The PLS-DA was performed to identify the adulteration substances added in raw milk, including whey, maltodextrin, or sodium carbonate.The PLS-DA was calibrated with combined regression and discriminant analyses.To do this, the whey, maltodextrin, sodium carbonate, and pure raw milk spectra were pooled to create a total of 144 spectra, which were then divided into sets of 108 training and 36 validation spectra according to the data set partitioning method of Kennard-Stone (both samples for training and validation were at the adulterated samples to nonadulterated samples of 1:1; Monzón et al., 2019;Sun et al., 2021).Then PLS regression model was established based on the same data set.

Model Performance Evaluation Method
The total classification accuracy (ACC, %), sensitivity or true positive rate (TPR), specificity or true negative rate (TNR), root mean square error (RMSE), and receiver operating characteristic (ROC) were used as the model performance evaluation indexes.Among them, ACC T and ACC P represent the classification accuracy of the training set and prediction set, respectively, and ACC is the arithmetic mean of ACC T and ACC P .The TPR T and TNR T represent the sensitivity and specificity of the training set, respectively, and TPR P and TNR P represent the sensitivity and specific-ity of the prediction set, respectively.The RMSE CV is root mean square error which is used to evaluate the discriminant error of the training set, whereas RMSE P represents the predicted root mean square error, which is used for evaluating the discriminant error of the validation set.The above parameters were calculated according to methods reported in the literature (Bassbasi et al., 2014).

Accuracy
number of correct predictions total number of pred = i ictions TP TN TP TN FP FN where TP = true positives, TN = true negatives, FP = false positives, and FN = false negatives.
Receiver operating characteristic curves were drawn with the TPR (or sensitivity) as ordinates and false positive rate (1 − specificity) as the abscissa.The area under the curve (AUC) was between 1.0 and 0.5.When AUC was above 0.9, the model was considered to have high accuracy.By selecting different thresholds, the sensitivity and specificity of the model provide different results.The ideal classification method would produce a point in the upper left corner of the ROC space whereby the sensitivity and specificity of the model are at a maximum, which is the best threshold point (Ballabio and Consonni, 2013).

Software
Raman spectra were acquired using OMNIC-Atlµs image software from Thermo Fisher Scientific.Chemometric statistical analyses were carried out using MATLAB 2019b (MathWorks) and SIMCA version 14.1 (Umetrics).

Raman Spectra of Milk Samples and Adulterations
Figure 2 (a) shows spectra of milk and pure adulterations of whey, sodium carbonate, and maltodextrin.The strongest band for milk appeared at 2,888 cm −1 , which was caused by superposition of vibrations at 2,900 cm −1 (H-C asymmetric stretching) and 2,854 cm −1 (H-C symmetric stretching) of lactose molecules (Pijls et al., 2016).The weaker peak near 1,646 cm −1 was attributed to C=O stretching from amide I, associ-ated with the COOH and COC deformation modes of phenylalanine (Almeida et al., 2011).The peak at 1,442 cm −1 was due to C-H scissoring from lipid molecules (Pijls et al., 2016); meanwhile, at 1,300 cm −1 , there was a peak corresponding to SFA (Ullah et al., 2020).However, no significant differences were observed in corresponding Raman spectra between milk and whey due to their similarities in chemical compositions [Figure 2 (c)].Sodium carbonate had a strong absorption peak at 1,080 cm −1 , which was due to C-O stretching vibration in CO 3 2− .For maltodextrin, 2 bands were seen at 2,900 and 1,122 cm −1 due to C-H and C-O stretching (Rodrigues Júnior et al., 2016).
The 1,122 and 2,900 cm −1 bands seen in maltodextrin samples were near the 1,150 and 2,888 cm −1 bands seen for milk.As a result, the bands overlapped in mixtures of maltodextrin and milk, resulting in imprecise peak height measurements of the spectra [Figure 2  its calculation process is complicated, particularly for maltodextrin and whey.However, peak heights at 1,080 and 2,900 cm −1 can be used to analyze sodium carbonate content qualitatively and quantitatively in raw milk.

Spectral Preprocessing and Variable Selection
Two different classification models were considered: (1) four 2-class PLS-DA models, one for each of the 3 adulterated and 1 nonadulterated milk samples and (2) one multiclass PLS-DA model for all sample types.First, the influence of different spectral preprocessing methods on the quality and discrimination ability of the multiclass PLS-DA model was examined.We found that these procedures significantly influenced the classification models, as reflected in variations in sensitivity and specificity (Table 1).The 1st DER, multiplicative signal correction, adjacent-averaging, and 1st DSG improved classification ability, whereas the orthogonal signal correction and 2nd DSG reduced classification ability, and the standard normal variate had no obvious effect on classification ability.It should be emphasized that after selection of the optimal signal preprocessing method, the whole set of spectra were processed in the same way.Detection conditions were optimized to ensure that the relative standard deviation of parallel samples remained below 10%.Finally, the data with 1st DER preprocessing were determined to provide the best sensitivity and specificity for the training sets and validation sets, compared with unprocessed spectra (Table 1).Therefore, all spectra (108 spectra of the training set and 36 spectra of the validation set) were preprocessed using the median filter together with the 1st DER method before using them for PLS analysis and PLS-DA (Figure 3).
Using a diagram of important variables in projection (VIP), determinant regions of Raman spectra used by the PLS-DA classification model were identified.This curve is shown in Figure 4.In this study, VIP scores were used for data reduction, and the model established using variables with VIP >1 was found to have the highest classification accuracy.The RMSE CV value was significantly lower for this model than for PLS-DA models lacking VIP variable selection.Therefore, subsequent studies used 1st DER methods to preprocess the spectral data, and spectral points with VIP >1 were selected for PLS-DA and PLS modeling.

PCA of the Raman Spectra Data of Samples
Principal component analysis is used in exploratory data analysis and for making predictive models.It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the variation in the data as possible.The first principal component (PC1) can equivalently be defined as a direction that maximizes the variance of the projected data.The nth principal component can be taken as a direction orthogonal to the PC1 that maximizes the variance of the projected data.Therefore, in the spectral analysis, PCA projects the displacement points of all spectra into a small dimension representation space, and calculates the new variables called principal components, which are the linear combination of the original spectral data.First, PCA is used to process the spectrum.Principal component analysis can determine the main characteristics of the spectra and highlight the relationship between description variables.Seven main components were extracted in a PCA of 144 of spectral data.The contribution rates of the PC1 and the second principal component (PC2) were 57.9 and 14.7%.The total contribution rate of the first 2 principal components was 71.6%.This proved that the first 2 principal components contain most of the information about variables, so the first 2 principal components are selected for data visualization.Figure 5 shows the analysis results of the data on the 2-dimensional scatter plot, in which PC1 is on the x-axis and PC2 is on the y-axis.The adulterated raw milk samples were dispersed along the PC1 axis according to adulterant concentration, showing an obvious classification trend.However, scores of many samples with low adulterant concentrations were close to those of the unadulterated raw milk samples.To better distinguish adulterated and unadulterated raw milk samples, a PLS-DA model was required.The PLS-DA training model was trained with 108 samples, including 54 spectra without adulteration and 54 spectra from milk samples containing maltodextrin, whey, or sodium carbonate.

PLS-DA for Screening of Adulterated Milk Samples
Partial least square discriminant analysis is a supervised statistical method of discriminant analysis that judges how to classify research objects according to the observed or measured variable values.In spectral analysis, PLS-DA can establish the relationship model between peak intensity of spectral point signal and sample category, so as to realize the prediction of a sample category.First, four 2-class PLS-DA models were established to investigate the effect of pairwise classification of 4 samples (one sample as a category, the remaining 3 samples as another category).
The modeling process includes the selection of the optimal latent variables (LV).When the LV number is too large, it will lead to over-fitting of the model; however, when the LV number is too small, the spectral information of the sample cannot be fully expressed.Both cases result in decreases of the prediction ability of the model (Luo et al., 2019).Therefore, through cross-validation misclassification, the optimal number of LV (4-5) was selected for each PLS-DA adulteration discriminant model.We examined variations, TPR (or sensitivity), and TNR (or specificity) for the training set, as shown in Table 2.Among the four 2-class PLS-DA models, the TPR and TNR of the prediction set were 1; additionally, the sodium carbonate model had the highest classification accuracy, reaching 100% and indicating that the model had good prediction ability.The classification accuracy of maltodextrin and whey models was 97.92 and 97.83%, indicating that these 2 classification models could classify effectively.For the classification model of pure raw milk samples, the accuracy was 97.02%, slightly lower than that of the other 3 classification models.Therefore, RS combined with PLS-DA can identify and accurately distinguish the 4 adulteration types using 2-class models.
Furthermore, we investigated whether RS combined with chemometric data analysis can successfully identify multiple adulterants in an unknown raw milk sample.To answer this question, a multiclass PLS-DA model was established to simultaneously detect maltodextrin, sodium carbonate, and whey adulteration in raw milk.Using cross-validation misclassification, the optimal number of the LV was 8 for the multiclass PLS-DA adulteration discrimination model, as shown in Table 3.The multiclass PLS-DA model can incorporate up to  4 milk samples.The ROC curve of the model is shown in Figure 6.The AUC values of maltodextrin-, whey-, and sodium carbonate-adulterated milk samples were greater than 0.99, indicating that the model was very effective.The upper left corner of the ROC represents maximum sensitivity and specificity, corresponding to the intersection point of the sensitivity-specificity curve.The corresponding threshold at this point is the optimal threshold point of the model.A deviation from the optimal threshold point was seen for adulterated raw milk and pure raw milk.Considering the specific classification process, it was hypothesized that the misjudgment rates of the adulterated raw milk samples were the lowest.Therefore, the ROC curve threshold of adulterated raw milk was selected as the discriminant threshold of the PLS-DA model.When the predictive value of the discriminant model was greater than or equal to the threshold, the sample was identified as a positive sample; moreover, when the predictive value of the discriminant model was less than the threshold, the sample was identified as a negative sample.
The results of multiclass model discrimination are shown in Figure 7. Four distinct areas corresponding to maltodextrin-, whey-, and sodium carbonate-adulterated milk and pure raw milk samples were observed.The classification performance of PLS-DA model for all samples is shown in Table 3.The TPR and TNR values are equal to 1, showing excellent specificity and sensitivity.Among the results of classification accuracy, the effect of sodium carbonate is the best, and the accuracy is 100%.The whey and maltodextrin models are slightly poor, and the classification accuracy is 95.83 and 95.84%, indicating that the Raman spectra of these 2 adulteration samples are not significantly different from those of the other 3 samples.For the classification of pure raw milk samples, the Raman spectra of raw milk were compared with those of 3 adulteration samples at the same time.Due to the small spectral difference between maltodextrin-and whey-adulterated samples and raw milk samples, pure raw milk may be classified as adulterated samples, or adulterated samples may be classified as pure raw milk samples.When setting the threshold, the adulteration samples should be classified as pure raw milk samples as seldomly as possible.Finally, the PLS-DA model accuracy of pure raw milk was 92.25%.de Souza Gondim et al. (2015) showed that a multiclass model established using midinfrared spectroscopy and soft independent modeling of class analogy techniques could provide 82% correct classifications of unadulterated and formaldehyde-, hydrogen peroxide-, citrate-, hydroxide-, and starchadulterated milk samples.By contrast, the prediction accuracy of the model established in our research was higher.Moreover, our strategy was shown to be highly efficient, especially when a large number of samples required analysis, and it greatly reduced the required experimental time with very low error rates.This method also can be applied to other signals, samples, or adulterants.

Quantification of Adulterant in Milk
The PLS models were developed to quantify the level of the adulterants in the milk samples.A PLS scatter plot (Figure 8) shows the correlation between the reference value and the values predicted by the PLS Raman method.Models were developed using 5 to 6 LV and could explain more than 90% of the variance in the multispectral data set.Correlation between training sets and validation sets for each of 3 adulterated samples were all greater than 0.95, showing the predictive accuracy of quantitative model is satisfactory.According to Table 4, RMSE and RMSE P values of maltodextrin were 0.53 and 0.49%, sodium carbonate values were 1.46 and 1.09 mg/kg, and whey values were 0.67 and 0.86%.Our PLS model showed advanced performance statistics when compared with RMSE P value for milk adulterated with whey (2.33%),  as reported by Santos et al. (2013).Low RMSE or RMSE P and high R 2 confirmed that RS was a suitable method for milk adulteration detection.However, the accuracy and reliability of the PLS model used for quantitative detection needs to be assessed according to relative error of prediction (REP), limit of detection (LOD), and limit of quantification.The REP was calculated according to sensitivity, and LOD and limit of quantification values were calculated according to the previous method (Cattaneo and Holroyd, 2013;Allegrini and Olivieri, 2014).The REP of maltodextrin, sodium carbonate, and whey were 9.39, 11.57, and 10.31, whereas the LOD was 1.46%, 4.86 mg/kg, and 2.64%, respectively.The 4.86 mg/kg LOD obtained  here for sodium carbonate is much lower than the 2 g/L obtained in a recent study using electrochemical sensors for sodium bicarbonate in milk (Chakraborty and Biswas, 2018).
Considering that adulteration is typically carried out to obtain economic benefits, adulterations in fractions below 5% are not typically found (Rocha et al., 2015).Therefore, the performance of method is considered to be satisfactory and sufficient for qualitative and quantitative detection of adulterated substances.Residual prediction deviation (RPD) estimations revealed that the Raman with PLS combined with RS is excellent for all analysis tasks when RPD >8 (Nieuwoudt et al., 2016(Nieuwoudt et al., , 2017;;de Oliveira Mendes et al., 2020).For sodium carbonate and whey, the PLS training sets model is suitable for quantitative detection, whereas for maltodextrin, the RPD of 5.61 qualifies it for use in qualitative detection alone.

CONCLUSIONS
The present study demonstrated a simultaneous and effective strategy based on RS and chemometrics for the rapid screening of adulterants in raw milk.Maltodextrin, whey, and sodium carbonate could be identified in raw milk using RS and PLS-DA, and the concentrations of each adulterant were quantified by PLS.The classification accuracy of pure raw milk, maltodextrin, sodium carbonate, and whey in the PLS-DA model was 92.25, 95.83, 100, and 95.84%, respectively, whereas the LOD of maltodextrin, sodium carbonation, and whey in PLS model was 1.46%, 4.38 mg/kg, and 2.64%, respectively.The low RMSE or RMSE P and high R 2 verified that RS is a suitable technique for qualitative and quantitative detection of adulterated substances in raw milk.Raman fingerprints were recorded from dry milk drops without sample preparation or addition of chemicals, making the technique suitable for rapid onsite screening methods.Simultaneous analysis of the 3 adulterated substances in raw milk was possible at low levels as a consequence of the high sensitivity of RS and the use of multivariate analysis.With the increasing popularity of competitively priced portable mini-Raman systems and microchip applications, the technique shows potential for deployment on-site to achieve rapid and reliable detection of adulterated milk.
Figure 2. (a) Raman spectra of maltodextrin, sodium carbonate, whey, and raw milk.(b-d) Raman spectra of adulterated raw milk with different concentrations of (b) maltodextrin, (c) sodium carbonate, and (d) whey.

Figure 3 .
Figure 3.All spectral data used for partial least square discriminant analysis and partial least squares analyses.

Figure 4 .
Figure 4. Variables in projection (VIP) scores (>0.8) obtained from the variable selection procedures performed on the transformed spectra.

Figure 6 .
Figure 6.Receiver operator characteristic curves of the multiclass partial least square discriminant model.AUC = area under the curve; TPR = true positive rate; FPR = false positive rate.
Figure 7.A multiclass partial least square discriminant analysis model for identifying (a) maltodextrin, (b) sodium carbonate, (c) whey, and (d) pure raw milk.

Figure 8 .
Figure 8. Correlation between the values predicted by the partial least square Raman method and the reference values of (a) maltodextrin, (b) sodium carbonate, and (c) whey.

Table 1 .
Tian et al.: MALTODEXTRIN, SODIUM CARBONATE, AND WHEY IN RAW MILK Results of spectral pretreatment and variable selection on the performance of the multiclass partial least square discriminant analysis model

Table 2 .
Tian et al.: MALTODEXTRIN, SODIUM CARBONATE, AND WHEY IN RAW MILK Parameters of the 2-class partial least square discriminant analysis models

Table 3 .
Parameters of the multiclass partial least square discriminant analysis model

Table 4 .
Statistical parameters of the partial least square models 1 LOD = limit of detection; LOQ = limit of quantification; RSD = relative error of standard deviation; REP = relative error of prediction; RMSE = root mean square error of training set; RMSE P = root mean square error of prediction; RPD = residual prediction deviation.