Identification of rare genetic variants of the α S -caseins in milk from native Norwegian dairy breeds and comparison of protein composition with milk from high-yielding Norwegian Red cows

Several factors influence the composition of milk. Among these, genetic variation within and between cattle breeds influences milk protein composition, protein heterogeneity, and their posttranslational modifications. Such variations may further influence technological properties, which are of importance for the utilization of milk into dairy products. Furthermore, these potential variations may also facilitate the production of differentiated products (e.g., related to specific breeds or specific genetic variants). The objective of this study was to investigate the genetic variation and relative protein composition of the major proteins in milk from 6 native Norwegian dairy breeds representing heterogeneity in geographical origin, using the modern Norwegian breed, Norwegian Red, as reference. In total, milk samples from 144 individual cows were collected and subjected to liquid chromatography-electrospray ionization/mass spectrometry–based proteomics for identification of genetic and posttranslational modification isoforms of the 4 caseins (α S1 -CN, α S2 -CN, β-CN, κ-CN) and the 2 most abundant whey proteins (α-lactalbumin and β-lactoglobulin). Relative quantification of these proteins and their major isoforms, including phosphorylations of α S1 -CN and glycosylation of κ-CN, were determined based on UV absorbance. The presence and frequency of genetic variants of the breeds were found to be very diverse and it was possible to identify rare variants of the CN, which, to our knowledge, have not been identified in these breeds before. Thus, α S1 -CN variant D was identified in low frequency in 3 of the 6 native Norwegian breeds. In general, α S1 -CN was found to be quite diverse between the native breeds, and the even less frequent A and C variants were furthermore detected in 1 and 5 of the native breeds, respectively. The α S1 -CN variant C was also identified in samples from the Norwegian Red cattle. The variant E of κ-CN was identified in 2 of the native Norwegian breeds. Another interesting finding was the identification of α S2 -CN variant D, which was found in relatively high frequencies in the native breeds. Diversity in more common protein genetic variants were furthermore observed in the protein profiles of the native breeds compared with milk from the high-yielding Norwegian Reds, probably reflecting the more diverse genetic background between the native breeds.


INTRODUCTION
Conservation of native dairy breeds is important to minimize erosion of genetic diversity.The Nordic countries have committed to conserve these animal genetic resources, where many of the native breeds have been classified as endangered (Kierkegaard et al., 2020).Compared with a high number of studies focusing on genetic diversity within and among these breeds, only a limited number of studies have been dedicated to provide phenotypic characterization and even fewer have studied milk compositional traits (Kierkegaard et al., 2020), which are nevertheless important characteristics in relation to eventual exploitation that could contribute to their preservation.
The major proteins in bovine milk consist of the 4 caseins, α S1 -CN, α S2 -CN, β-CN, and κ-CN, together with the 2 most abundant whey proteins, α-LA and β-LG.Generally, the genes encoding the 6 major proteins are polymorphic to different degrees, and more than 47 different milk protein variants have been identified within the Bos genus (Caroli et al., 2009).
Genetic polymorphisms within the coding regions for the bovine milk proteins originate from SNP or deletions/insertions of one or more nucleotides, resulting in synonymous or nonsynonymous mutations or even longer sequences of the coding regions being deleted or inserted (Caroli et al., 2009;Ketto et al., 2017).This results in a wide range of protein variants influencing the overall composition of milk, which can potentially affect both technological and nutritional properties of milk and milk-derived products (Hallén et al., 2008;Caroli et al., 2009).Total milk yield and especially cheese production are highly influenced by variations caused by genetic polymorphisms of the major milk proteins (Wedholm et al., 2006;Glantz et al., 2010;Frederiksen et al., 2011;Jensen et al., 2012b;Poulsen et al., 2013Poulsen et al., , 2017a;;Nilsson et al., 2020).
Some of the native Nordic dairy breeds have been subject to profound studies (Kierkegaard et al., 2020).Using DNA extracts from blood samples, Lien et al. (1999) reported large frequency differences in casein genetic variants among 17 native Nordic dairy breeds and 5 modern breeds.Such variation may imprint in different characteristics of the milk and its suitability for utilization into specific dairy products.
The present investigation on genetic variants of the major milk proteins and the resulting protein profile of native Norwegian dairy breeds aims to provide knowledge of the differences between these breeds and compare them with a modern high-yielding dairy breed.A better understanding of the unique characteristics of the protein composition of native Norwegian breeds may provide a basis for potential conservation programs and may lead to better utilization and development of niche dairy products with specific properties.Moreover, this study might also shed light on how the breeding for specific traits resulting in today's modern breeds has resulted in a loss of heterogeneity among the most abundant milk proteins.Hence, this study's objective is to investigate the genetic variation among native and more common Norwegian dairy breeds and how these differences are reflected in the protein profiles.

Animals and Milk Samples
Milk samples were collected from Telemark (TF, n = 15), Blacksided Trønder and Nordland (STN, n = 30), Doela (DF, n = 17), Western Fjord (VFF, n = 35), Western Red Polled (VR, n = 13), Eastern Red Polled (ORA, n = 18), and Norwegian Red (NRF, n = 16) cattle from a total of 30 different farms, where both the native and modern cows were milked routinely.Some farms had more than one of the breeds represented in their herd.The number of herds represented by each breed is noted in Table 1.A representative milk sample of 50 mL was collected from each cow during one morning and one evening milking and preserved with one tablet of bronopol (2-bromo-2-nitropropane-1,3-diol; D&F Control Systems Inc.) bronopol.The morning and evening milk samples from each cow were, based on volume, pooled into one sample upon arrival to the laboratory.The milk samples were skimmed (Inglingstad et al., 2014) and stored at −20°C until further analysis.The samples were transported frozen to Aarhus University, Department of Food Science, and stored at −40°C until further analysis.

Sample Preparation and Proteomic Profiling by Liquid Chromatography-Electrospray Ionization/MS
The frozen skim milk samples were thawed at room temperature and 150 µL of each skim milk sample was mixed with 450 µL of working solution (5.37 mM sodium citrate in 100 mM Bis-Tris buffer, 6 M guanidine hydrochloride pH 6.8) and 12 µL of 1 M dithioerythritol.The samples were incubated at room temperature for 1 h and centrifuged at 20,817 × g (Eppendorf centrifuge 5417 R) for 10 min at 7°C.Four hundred microliters of the sample solution was filtered through a 0.45-µm polytetrafluoroethylene filter (Mini-UniPrep, Whatman).The experimental setup for liquid chromatography (LC)-electrospray ionization (ESI)/MS analysis and the subsequent data analysis are based on previous studies (Bobe et al., 1998;Bonfatti et al., 2008;Bonizzi et al., 2009) with minor modifications (Frederiksen et al., 2011).The major proteins from the skim milk samples were separated using a reverse-phase LC-based method, according to their hydrophobic properties.The separation was performed on an Agilent 1260 Infinity II series instrument (Agilent Technologies), equipped with a binary pump including degasser (G7112B), a vial sampler (G7129A), and a column thermostate (G7116A).The system was coupled to a diode array detector (G7117C) for protein identification and relative quantification.The column used for separation was Jupiter C4 (250 × 2.00 mm, 5 µm particle size, 300 Å pores; Phenomenex).The running temperature was 40°C and UV absorbance was detected at 214 nm.An attached Agilent LC/MSD XD (G6135B) was used for MS.The different genetic variants and their posttranslational modifications (PTM) were identified by ESI/MS.The running flow was 0.3 mL/min and the injection volume was 6 µL.The elution gradient started at 33% solvent B (0.05% triflouroacetic acid in acetonitrile) mixed with solvent A (0.05% TFA in MilliQ water) and increased linearly to 50% solution B over 25 min by a decrease in solution B from 50 to 33% for 1 min.Each milk sample was analyzed once.

Identification and Quantification of Variants of the Major Milk Proteins
Data from HPLC was analyzed by the ChemStation software (Rev.C.01.10,Agilent Technologies).The major milk proteins were eluted in the following order: κ-CN, α S2 -CN, α S1 -CN, β-CN, β-LG, and α-LA, where retention times of eluted proteins varied, depending on genotype and degree of PTM (Frederiksen et al., 2011;Jensen et al., 2012b).The deconvolution tool in Chem-Station was used for identification of genetic variants of the major isoforms of κ-CN (A, B, E), α S2 -CN (A), α S1 -CN (B, C), β-CN (A 1 , A 2 , B, I), β-LG (A, B), and α-LA (A), according to Jensen et al. (2012b).An in-house custom-made database of known masses and retention times was used for identification of the possible genetic variants and isoforms of the 6 major milk protein.In addition, low abundant genetic variants, not previously identified in these Norwegian breeds, were identified.These included α S2 -CN (D) and α S1 -CN (A, D), which have been reported previously, though in other breeds (Grosclaude et al., 1979;Eberhardt, 1993;Mohr et al., 1994;Jann et al., 2002;Ibeagha-Awemu et al., 2007).
Relative quantification of major proteins was calculated based on UV absorbance at 214 nm of the 4 caseins (α S1 -CN, α S2 -CN, β-CN, κ-CN) and the 2 major whey proteins (β-LG, α-LA) to ascertain relative protein composition per breed.The relative quantification was based on integrated individual peak areas relative to the sum of total integrated area, calculated as the sum of peaks representing all genetic variants of α S1 -CN, α S2 -CN, β-CN, κ-CN, α-LA, and β-LG and their major PTM isoforms.More specifically, κ-CN was identified in both glycosylated (κ-CN G) and unglycosylated (κ-CN UG) forms, including separation into its minor molecular glycosylation forms, as outlined by Visser et al. (1991).It was further possible to distinguish and separately quantify the 2 major α S1 -CN phosphorylation isoforms α S1 -CN 8P and 9P (for B and C variants) by integrating their separate peaks, as outlined by Jensen et al. (2012b).For the D variant of α S1 -CN it was possible to distinguish the 9P and 10P forms (Visser et al., 1991).Based on this, the phosphorylation degree (PD) of α S1 -CN was calculated as α S1 -CN 9P/total α S1 -CN for both B and C variants.For the D variant of α S1 -CN PD was calculated as 10P/total α S1 -CN.The peak representing the A variant contained masses corresponding to both 8P and 9P isoforms of this variant of α S1 -CN.Hence, the phosphorylation of the A variant did not contribute to the PD in the sample containing this variant of α S1 -CN.When defining the peaks for integration in the assessment of the protein compositions, proteolytic degradation products were not included, which means that, for example, the γ 3 -CN fragment of β-CN was not included in integration of the β-CN peaks or for determining the total protein, even though it was identified by its mass (being 11,557 Da).The peak representing the γ 3 -CN fragment was, however, used as a quality parameter for the milk samples.It was verified that none of the included milk samples contained more than 4% of γ 3 -CN, as this could indicate an increased proteolytic activity.

Statistical Analysis
One-way ANOVA was used to compare mean values of the relative abundance of each protein, as well as PD and glycosylation degree (GD) among breeds.Pairwise comparisons for all tests were done using Tukey's pairwise post hoc test.The statistical analysis were all conducted in R (version 4.0.0;http: / / www .r-project .org).

RESULTS
The sample set contained 144 milk samples from individual cows collected from 30 different farms in Norway analyzed as individual skim milk samples by LC-ESI/MS Single Q.

Identification of Rare α S1 -and β-CN Genetic Variants
Figure 1 represents a full UV chromatogram of the elution profile of one of the milk samples.The figure shows a clear separation of the major milk proteins and some of their genetic variants.This separation is used to identify the protein profile and the presence of the genetic variants that can be identified solely by LC.
Figure 2 displays representative UV chromatograms of milk samples from individual cows representing genetic variations in α S1 -CN.The chromatograms are enlarged in the interval from 12 to 19 min, representing α S1 -CN eluting first followed by β-CN (included as a reference).From this figure it is clear that different genetic variants of α S1 -CN show different elution profiles based on the LC only.Mass spectrometry is then included to identify and confirm masses of variants that were not identified by LC only.Based on obtained deconvoluted masses, the identified variants of α S1 -CN are presented in Table 2 along with the detected major phosphorylation isoforms, their elution times, as well as their detected and theoretical molecular masses.The α S1 -CN variant B 8P serves as the reference variant.The difference in the peak height of the 8P and 9P isoforms of α S1 -CN B variant and hence peak area indi-cates the relative proportion of each isoform (Figure 2).The α S1 -CN C variant elutes at the same retention time as the B variant, and displays a chromatogram, where the 8P and 9P isoforms are also shown as a double peak, like the B variant.The difference between the B and the C variants is a substitution of only one AA residue (Glu 207 → Gly), which apparently does not affect retention time of the protein isoforms significantly (P < 0.05) in this system.Thus, milk samples representing BB, BC, and CC show similar elution profiles (Figure 2), and the 2 variants are recognized from each other only by their deconvoluted masses (Table 2).
Furthermore, α S1 -CN variant D was also identified, which represents a substitution of Ala 68 → Thr 68 P, and introducing a potential extra phosphorylation site.This results in a slightly higher elution time for the α S1 -CN variant D isoforms, as the major isoform of the α S1 -CN D 9P variant elutes at the approximately same position, as the α S1 -CN B and C 8P isoforms, whereas the α S1 -CN D 10P isoform elutes a bit later.This results in a triple peak in Figure 2 for BD genotypes of α S1 -CN.In contrast, the triple peak in α S1 -CN DD milk is due to the 9P isoforms splitting into a double peak.These findings illustrates the degree of information obtained by coupling LC with MS as the genetic variants and their differences in phosphorylation sites are very hard to identify only by using LC.
Using the experimental setup with the LC-ESI/MS we were able to detect 6 different genotypes of α S1 -CN across the analyzed samples, including genotypes with the rare A variant and the low abundant variant D (represented by the genotypes AB, BB, BC, BD, CC, and DD).The α S1 -CN peaks are followed by the peaks representing different β-CN isoforms.These are represented either by a single peak or present as double peaks, depending on the retention times of the actual genetic variants present in each of the analyzed individual milk samples.The well-known variants A 1 , A 2 , B, and I of β-CN were identified in the sample set of 144 samples.The most common β-CN variants A1 and A2 were identified in all of the breeds, whereas the I and B variants were only found in 2 of the native breeds each.

Identification of Rare α S2 -and κ-CN Variants
Figure 3 displays UV chromatograms from the individual milk samples as an enlargement of the area between 7 and 17 min, including peaks of κ-CN (B variant) and α S2 -CN.The broad peak of α S2 -CN is due to different phosphorylation isoforms as outlined in Table 3.The detected variants, their phosphorylation isoforms, AA substitutions, elution times, and detected and theoretical masses are presented in Table 3.The α S2 -CN variant A was identified in the phosphorylation forms 11 (reference variant), 10, 12, and 13P, all eluting approximately at the same retention time, resulting in one rather broad peak.The α S2 -CN variant D is the result of a deletion of AA residues 51 to 59, which comprises 3 phosphorylation sites (Caroli et al., 2009), and the major isoform therefore contains only 8 phosphorylations (Table 3).Despite this, the elution times of the α S2 -CN A and the D variants are almost identical, with both variants being represented by only one peak, appearing with some slightly different peak shapes.Apart from these known variants, we also identified some unknown masses in some of the milk samples.These unknown masses were identified in the peak representing the α S2 -CN variants but could not be assigned to any of the other known α S2 -CN variants.Hence, the elution time corresponded to the known α S2 -CN variants but the masses did not.One of these samples is included in Figure 3, where the chromatogram displays a changed elution profile compared with the other samples.In this particular sample, there was a minor peak (molecular weight = 24,279 Da) just before the peak usually representing the α S2 -CN.The masses identified in the peak usually representing α S2 -CN were also different in these particular samples (molecular weight = 25,198 and 25,278 Da).The κ-CN was found in 3 variants: A, B, and E, where E was only detected in 2 of the native Norwegian breeds (VFF cattle).

Variation in Genetic Variant Frequencies Between Breeds
The frequencies of the genetic variants of the 4 caseins and β-LG among the 7 Norwegian breeds are given in Table 4.As stated above, for α S1 -CN the A, B, C, and D variants were detected, for variant α S2 -CN variants A, D, as well as an unknown variant as mentioned above.For β-CN A 1 , A 2 , B, and I were present, and for κ-CN A, B, and E variants were present.Of the whey proteins both A and B variants of β-LG were present, whereas for α-LA all was found exclusively as the B variant, and is therefore not shown in the table, with B variant then comprising 100% of all identified α-LA.The majority of the less frequent alleles were found as heterozygotes, which was expected due to the limited sample size included in the study.
For α S1 -CN, the predominant form is the B variant, with frequencies varying from 68.6% in VFF to 96.2% in VR (Table 4).From Table 5 it is clear that the predominant genotype of α S1 -CN is the BB (Table 5).The α S1 -CN C variant was found in all breeds except TF cattle, and had a relatively high frequency in both NRF (15.6%) and in VFF cattle (28.6%).Only one sample from VFF cattle represented a homozygous cow of α S1 -CN variant C (Table 5).The A variant of α S1 -CN was identified only in one sample belonging to the TF cattle breed.The D variant of α S1 -CN was detected in only 3 of the 7 breeds, and all in relatively low frequencies with VFF (2.9%), TF (3.3%), and DF cattle (14.7%), and thus mainly found as BD heterozygotes (Tables 4  and 5).
For α S2 -CN, the predominant variant was the A variant, which was found in frequencies ranging from 73.1% (VR cattle) to 100% in both the reference breed, NRF, and in ORA cattle.The α S2 -CN D variant was found in low frequencies in STN (1.7%) and in VFF cattle (7.1%), but in relatively high frequencies in TF (16.7%) and in VR cattle (19.2%).As outlined in Tables 4 and 5 and in Figure 3, in some (n = 3) of the samples we were unable to identify masses corresponding to any known variants of α S2 -CN.These samples with no identified α S2 -CN masses belonged to DF (n = 2) and to VR (n = 1) cattle.
The frequencies of the β-CN variants indicate a very diverse distribution among the different breeds (Table 4).For both NRF, DF and VFF cattle, the A 2 is the predominant allele, but found at quite varying frequencies (87.5%, 52.9%, and 52.9%, respectively).On the other side, for TF, STN, and VR cattle, the most abundant allele is the A 1 , found at frequencies ranging from 53.9% to 73.3%.The ORA cattle showed a 50/50 distribution of the β-CN A 1 and A 2 variants.The B allele variant was found only in STN, and in VFF cattle at relatively low frequencies (10% and 4.3%, respectively), whereas the I variant was found only in DF and VFF cattle (14.7% and 1.4%, respectively).From Table 5 it is clear that these differences in presence and frequencies of genetic variants manifest that genotype frequencies vary a lot among these breeds.The frequency of the homozygous genotype A 1 A 1 is highest in TF cattle, where it is accountable for nearly half (46.7%) of the β-CN genotypes (Table 5).The remaining part is constituted by heterozygous A 1 A 2 .The reference breed, The variants were identified using liquid chromatography-electrospray ionization/mass spectrometry in milks from individual cows. 2 Information about AA substitutions is based on Caroli et al. (2009) and UniProt (www .uniprot.org). 3 Elution time (ET) of each of the peaks representing the given protein variants detected during the separation of the milk proteins.
NRF, showed the opposite tendency of being comprised of 75% A 2 A 2 and 25% A 1 A 2 .For the remaining breeds the homozygous forms A 1 A 1 and A 2 A 2 are also present, but the majority of the samples were characterized by the genotype A 1 A 2 .The homozygous form II was identified in only one sample of the DF cattle breed.Doela, STN, and VFF cattle were the 3 breeds expressing the β-CN I and B variants.The samples representing the other 4 breeds expressed only A 1 and A 2 variants of β-CN.The β-CN A 3 and F variants were not identified in any of the 144 samples representing the native Norwegian cattle breeds and common NRF.For κ-CN, most breeds showed a slightly higher frequency of the A compared with the B variant, but for Norwegian cattle the A variant was found to be predominant (94%), with a corresponding very low frequency of the B variant.Western Red Polled cattle showed an opposite tendency compared with the other breeds, with a higher proportion of the B variant.The reference breed, NRF, was mainly composed of AA homozygotes (87.5%), whereas most of the native breeds had higher frequencies of the heterozygous variant AB.Western Red Polled showed relatively high proportion of homozygous variant BB (38.5%) and low frequency of variant AA (15.4%).The variant E of κ-CN was found only in STN (n = 1) and in ORA cattle (n = 2).
In the majority of the breeds, the most abundant variant of β-LG was the B variant with the highest frequency found in TF (80%; Table 4).Only VR cattle showed a higher frequency of β-LG variant A over B (69.2%).

Variations in the Relative Abundance of the Major Milk Proteins and Their Isoforms Between Breeds
Relative protein concentrations of major milk proteins and their specific isoforms as determined by LC-ESI/ MS UV profiles are given in Table 6 for each breed.The proportion of α S1 -CN is seen to be relatively constant  The variants were identified using liquid chromatography-electrospray ionization/mass spectrometry in milks from individual cows. 2 Information about AA substitutions is based on Caroli et al. (2009) and UniProt (www .uniprot.org). 3 Elution time (ET) of each of the peaks representing the given protein variants detected during the separation of the milk proteins. 4 The detected molecular weight (M w ) of each of the identified peaks corresponding to the protein variants measured in dalton (Da).
between breeds, but with samples from STN (33.8%) cattle containing a significantly higher proportion of α S1 -CN than VR (31.9%) cattle (Table 6).The PD was found to be significantly different among the breeds, ranging from 20.7% to 24.4% (Table 6).The lowest PD was found in ORA (20.7%) and with the highest in DF (24.4%),STN (24.4%), and VFF (23.7%) cattle (Table 6).The PD could not be calculated for α S1 -CN BD samples and these samples were thus excluded from the PD calculations (4 DF samples, 1 TF sample, and 2 VFF samples).
For α S2 -CN, the lowest proportions were found in DF (6.9%) and STN (7.3%) cattle, being significantly different from the proportion found in the highest amount in the NRF (8.5%) cattle.For β-CN, TF (41.4%) expressed a significantly higher proportion of β-CN compared with ORA (39.3%) and STN (38.9%) cattle (Table 6).With respect to κ-CN, its proportion varied significantly among breeds (Table 6), with the highest percentage of κ-CN found in VR (9.3%), being significantly higher than relative proportion found in NRF cattle, representing the lowest (7.4%).Also, VFF (8.9%), TF (9.1%), and STN (9.2%) had significantly higher κ-CN percentage than the actual NRF cattle.There was no significant difference in GD among the breeds.The GD was around 30% for all breeds.The proportion of α-LA did not vary significantly among the breeds, with all containing between 3.1 and 3.5% of α-LA.For β-LG, the proportion was found to be significantly higher in ORA cattle (7.7%) compared with proportions found in TF (6.6%) and VFF cattle (6.4%;Table 6).
Relative protein proportions in relation to specific genotypes of α S1 -and α S2 -CN are displayed in Tables 7 and 8, respectively.The genotypes AB and DD of α S1 -CN were both identified only in one sample, as the variants A and D are rare isoforms in this sample set.Hence, the values representing these genotypes are based on only one sample, and these were accordingly not included in the statistical analysis.
PD of the homozygote DD is based only on one sample, and is calculated as the area of the 10P peak divided by the total area of α S1 -CN.The proportion of α S2 -CN was significantly different between α S1 -CN genotypes with the highest proportion found in BD (8.7%) and the lowest in BC (7.3%).The proportions of the other traits did not vary significantly between α S1 -CN genotypes.Table 8 displays proportions of the major proteins, including PD and GD relative to α S2 -CN genotypes.The proportion of α S1 -CN varied significantly between genotype AA (33.4%) and AD (30.8%) of α S2 -CN.The α S1 -CN PD did not vary significantly among genotypes, though the highest proportion was interestingly identified in the sample identified as the DD homozygote (28.6%).The proportion of α S2 -CN was found to vary significantly between genotypes of α S2 -CN, with the highest proportion found in the α S2 -CN heterozygous AD (9.1%) and the lowest proportion in α S2 -CN homozygous AA (7.5%).The relative concentrations of β-CN did not vary significantly among genotypes.The proportion of κ-CN was highest in the samples of α S2 -CN AD genotype (10.0%) and lowest in the "unknown" (7.4%) and AA (8.7%) α S2 -CN type samples.The GD of κ-CN, and the proportions of α-LA and β-LG, did not vary significantly among different genotypes of α S2 -CN.

Presence of Rare α S1 -and α S2 -CN Genetic Variants and PTM Isoforms in Native Breeds
This study documents great variation both in terms of the variants present and their variable frequencies of major milk proteins from native Norwegian dairy breeds, both among the breeds and in comparison with the current high-yielding NRF cattle.It was possible to identify the presence of rare genetic variants in some of the native breeds, especially of the α S1 -and α S2 -CN in TF, VFF, DF, and VR cattle as well as some unidentified potential new variants of α S2 -CN.These variants had in some instances large implications on their PTM patterns.Within α S1 -CN, a total of 4 variants were demonstrated, representing A, B, C, and D variants.The A (present in TF) and D (present in TF, VFF, and DF cattle) variants of α S1 -CN have not been identified previously in these native breeds, and were furthermore not represented in the current NRF samples analyzed here.This study also illustrates how the usage of LC coupled to MS provides a more detailed protein profile compared with only using LC for protein profiling.Especially the phosphorylations of the genetic variants of α S1 -CN were identified by a combination of LC and MS.Lien et al. (1999) carried out a comparable study of protein allele frequencies in Nordic cattle breeds, though at the DNA level using blood samples.Among others, they included the same native Norwegian breeds as here, but, relying on primer selection, did not allow for identification of these A and D variants of α S1 -CN.Interestingly, we found the A variant of α S1 -CN to elute earlier than the other α S1 -CN variants during the HPLC separation, probably caused by the deletion of AA and buried phosphorylations (Table 2).The α S1 -CN A variant has been identified earlier in various breeds, including Holstein Friesians, Red Holsteins, and German Red cattle (Ng- Kwai-Hang et al., 1984;Farrell et al., 2004;Caroli et al., 2009), but is usually found in relatively low frequencies in comparison with the more common α S1 - CN B and C variants (Ng-Kwai-Hang et al., 1984).The finding of the rare α S1 -CN A variant among the native Norwegian breeds, but only in TF, is interesting from a conservation perspective.Additionally, we identified the, also rare, α S1 -CN D variant in 3 of the native breeds (TF, VFF, and DF).It was, not surprisingly, detected primarily in heterozygotes, but is interesting due to its extra phosphorylation site at the Thr 68 position, resulting in 9P and 10P isoforms.Our data revealed one sample from DF being homozygote α S1 -CN DD.It would be interesting to further study the CN micelle size and structural features in this sample, and furthermore to isolate the α S1 -CN D variant for comparison of its characteristics, including position and degree in use of exact phosphorylation sites, with other variants of α S1 -CN.The chromatogram of this sample (Figure 2) also shows an interesting shape of a triple peak, probably due to the difference in phosphorylation sites, as the extra potential phosphorylation site also facilitates possibilities for heterogeneities.
For α S2 -CN, the experimental setup permitted the identification of the rare D deletion variant (Table 3).Values with different superscripts within each trait differ significantly from each other (P < 0.05).Unknown and DD genotypes are not included in the statistical test due to low sample size. 1 The values represent the mean ± SD as relative proportions within each genotype of α S2 -CN.
As the elution time of α S2 -CN D is the same as for the more common variant A, the D variant was identified based solely on its deconvoluted phosphorylation isoform masses (Table 3) of 24,045 and 23,965 Da.The α S2 -CN variant D has previously been identified in other breeds.It was first reported in the French Vosgienne and Montbéliarde cattle breeds, by gel electrophoresis (Grosclaude et al., 1979), and has also been found in German breeds by Eberhardt (1993).Ibeagha-Awemu et al. (2007) further identified the α S2 -CN D variant in several of the African Bos indicus cattle breeds by DNA sequencing of blood samples.They identified the α S2 -CN D variant in relatively low allele frequencies (1.1-8.8%),compared with the more common A and B variants (Ibeagha-Awemu et al., 2007).
In this study, both the A and D variants of α S2 -CN were found in 4 different phosphorylation forms.The A variant was found as 10P, 11P, 12P, and 13P.Jensen et al. (2012b) were able to detect the varying degrees of phosphorylation of the A variant of α S2 -CN by the elution of 2 partly separated peaks.This was not the case in the current study, as the elution of α S2 -CN A did not result in 2 separated peaks.We were only able to identify one peak representing α S2 -CN A variant (Figure 3) and were not able to separate the α S2 -CN A on the basis on phosphorylation forms.The deletion characterizing the α S2 -CN D variant (Table 3) implies the loss of 3 phosphorylation sites (Glu 51 -Glu 59 ), and in accordance with this the D variant was found as 7P, 8P, 9P, and 10P forms.The finding of this variant in homozygous form in the VR breed would permit the potential isolation of this variant and its isoforms for further studies of their peculiar molecular features.Furthermore it would be interesting also to study the micelle size and other characteristics of this variant in fresh (without cooling or freezing history) milk from α S2 -CN DD VR animals.Apart from identifying the A and D variants of α S2 -CN in the sample set, we also found some samples with masses not corresponding to any of the known variants of α S2 -CN.These masses (25,198 and 25,278 Da) were identified in a peak with the same elution time and shape as the known variants of α S2 -CN.The difference between the 2 masses is 80 Da corresponding to the mass of a phosphate unit, which is very interesting and could suggest a homozygous variant.It would be interesting to analyze these findings on fresh milk samples and to isolate according to α S2 -CN protocols.

Differences in Frequencies of the Genetic Variants Between Breeds
Our results documented large differences in the genotype frequencies of the major milk proteins among the breeds included in this study.In line with Lien et al. (1999), α S1 -CN B is still the predominant variant in all breeds studied here, but VFF cattle had a relatively high frequency of α S1 -CN variant C (28.6%).This variant was also relatively high in NRF cattle, suggesting a similarity between these breeds on this point.The C variant of α S1 -CN was here reported in DF cattle for the first time.Likewise, α S1 -CN variant A and D together with α S2 -CN variant D have not been reported before in these breeds.The D variant of α S1 -CN was mainly detected in relatively low frequencies, but with a frequency of 14.7% in the DF cattle.Neither the A nor the D variant of α S1 -CN were identified in the samples representing the high-yielding NRF cattle, as only the more common α S1 -CN B and C variants were found within this breed.However, for all breeds α S1 -CN B variant was the most common, which is also the most common variant in European cattle breeds (Lien et al., 1999).
The current study identified the rare D variant of α S2 -CN in several breeds.Telemark and VR cattle expressed relatively high proportions of the D variant, 16.7% and 19.2%,One sample of VR cattle even represented the α S2 -CN homozygote DD.Phosphorylations of the caseins are very important for technological properties of the milk as they are directly related to the binding capacity of calcium phosphate in the micelles and may therefore have implications for the coagulation properties (Frederiksen et al., 2011;Le et al., 2017).Hence, degree of phosphorylation of especially α S1 -CN and α S2 -CN and the loss of phosphorylation sites found in α S2 -CN D and extra phosphorylation site found in α S1 -CN D may thus affect milk coagulation properties due to the importance of casein phosphorylations in casein micelle formation and stability.
With respect to β-CN, in the study of Lien et al. (1999), variant B had similar high frequencies in STN, whereas this variant has not been reported previously for VFF cattle.Likewise, identification of β-CN variant I in DF and VFF is novel.Further compared with the study of Lien et al. (1999), a much higher allele frequency of β-CN variant A 2 was found in our study (87.5% vs. 48.7%),whereas STN was found to have a higher frequency of β-CN variant A 1 (56.7%)than reported earlier (28.1%),compared with the slight shifts in A 1 and A 2 frequencies observed when comparing the other breeds.In this study, we found a very high frequency of β-CN A 2 A 2 in NRF compared with the native breeds.This is interesting from a technological perspective, as A 2 has been found to be associated with higher milk yield but also a lower κ-CN concentration and thereby potentially poorer rennet coagulation (Hallén et al., 2007(Hallén et al., , 2008;;Frederiksen et al., 2011;Jensen et al., 2012a).However, it is argued that it is hard to dis-tinguish a real effect of isolated protein variants like for instance the A 2 variant of β-CN, as it is often associated with specific genetic variants of the other caseins due to linkage disequilibrium within the casein gene cluster.Sanchez et al. (2020) studied the haplotype frequencies in 12 French cattle breeds and found them to be very diverse across breeds.They found the association of the A 2 variant of β-CN to be mainly associated with the B variant of α S1 -CN and the A variant of κ-CN in Holstein.For Jersey the A 2 variant of β-CN was oppositely mainly associated with the C variant of α S1 -CN and the B variant of κ-CN (Sanchez et al., 2020).
For κ-CN, variant E has not previously been reported in STN cattle, and generally higher frequencies of κ-CN B were found for all breeds except NRF cattle, where it was slightly lower and no κ-CN variant E was detected (Lien et al., 1999).For β-LG, variant A was also reported as predominant in VR cattle earlier, but at a lower frequency, whereas β-LG variant B was predominant in the others breeds.Geographically, STN cattle are located in the Northern part of Norway, whereas the other native breeds have generally been kept in the mid and southern area of Norway, apart from VFF-and Red Polled cattle, which are breeds originating from the Norwegian Westland.Based on the Lien et al. (1999) study on protein variants, STN and VFF cattle were classified into a group of Nordic breeds with Northern location, whereas the other native breeds were assigned to a group of breeds with southern location and NRF cattle together with modern breeds.Blacksided Trønder and Nordland and VFF cattle also display very similar variant frequencies here supporting this grouping, but other genetic studies have also suggested VFF to be more closely related to VR (Kantanen et al., 2000;Tapio et al., 2006).

Implications of Variation in Protein Composition Between the Breeds
In relation to detailed protein composition, relative amount of α S1 -CN varied significantly among breeds.Highest proportion was found in STN (33.8%) and the lowest in VR (31.9%).Other studies have found different associations between the β-CN variants (A 1 and A 2 ) and the compositional traits.Heck et al. (2009) and Bonfatti et al. (2010) both reported the A 1 variant of β-CN to be associated with higher relative proportions of both α S1 -CN and κ-CN, as well as lower concentrations of β-CN and α S2 -CN (Heck et al., 2009;Bonfatti et al., 2010).A more recent study found the A 1 variant of β-CN to be associated with a higher relative concentration of β-CN (Poulsen et al., 2017b).In the current study, TF cattle showed the highest frequency of A 1 β-CN (73.3%) and had a significantly higher rela-tive concentration of β-CN than STN and ORA cattle (Table 6).In line with this, Nilsson et al. (2020) found a significantly higher relative concentration of β-CN in poor and noncoagulating milk.
The κ-CN proportion varied between the native Norwegian breeds and NRF cattle, whereas NRF holds the lowest κ-CN variant B frequency (Table 6) and had significant lower proportion of κ-CN compared with VR, which also had the highest frequency of κ-CN variant B (Table 6).This suggests that the previous association between κ-CN variant B and the proportion of κ-CN in milk is also observed here, whereas no difference in κ-CN GD is observed among the breeds.This could suggest that compared with the native breeds, NRF cattle could have larger casein micelles, which in turn is reflected in lower coagulation properties compared with the native breeds.Ketto et al. (2017) studied the coagulation properties of milk from NRF cows related to the genetic variants and found that κ-CN variant BB was associated with improved coagulation properties (Ketto et al., 2017).

CONCLUSIONS
The genetic variants and important PTM of the major milk proteins, α S1 -CN, α S2 -CN, β-CN, κ-CN, α-LA, and β-LG, can be identified by LC/ESI-MS from individual cow milk.This study identified variants of α S1 -CN and α S2 -CN that have not previously been identified in the native Norwegian breeds included in this study.The very rare A variant of α S1 -CN was identified in TF cattle, whereas the rare D variant was identified in TF, VFF, and DF cattle.Additionally, we were able to detect a α S1 -CN DD homozygote in one of the samples of the DF cattle.In general, the native Norwegian breeds showed more diversity in the genetic variants and the protein profile of the major proteins compared with the high-yielding NRF cattle.In this study, the rare A variant of α S1 -CN was identified in TF cattle only.The D variant of α S1 -CN was found in both TF, VFF, and DF cattle.These variants have not been identified in these breeds before and were also not found in the NRF.For α S2 -CN we identified the rarer D variant as well as the common A variant.The D variant was identified in 4 of the native breeds and not in the NRF cattle.It was identified in homozygous in one sample from VR cattle.The protein profiles based on relative peak areas also showed differences among breeds, as well as the relative protein distribution according to α S1 -CN and α S2 -CN genotypes.The relative proportion of α S1 -CN varied significantly between breeds, being highest in STN and lowest in VR cattle.Norwegian Red cattle showed significantly lower relative amounts of κ-CN and also a very low frequency of the κ-CN B variant compared

Figure 1 .
Figure 1.Full chromatogram of the elution profile of one sample from the sample set.The figure shows the entire elution profile with identified proteins and genetic variants indicated by arrows.
Figure 2. Elution profile of samples with detection of different masses of α S1 -CN, corresponding to different genetic variants of the α S1 -CN.The figure represents milk from AB, BB, BC, BD, CC, and DD cows.The figure is an enlargement of the chromatogram in the interval of 12 to 19 min showing the peaks of both α S1 -CN and β-CN.The genotypes of α S1 -CN are noted at the right side of each chromatogram.

Figure 3 .
Figure 3. Elution profile of samples with detection of different masses of α S2 -CN.The figure represents milk from AA, AD, and DD cows and a sample with masses not corresponding to any theoretical known masses (noted "not identified").The figure shows a cutout of the elution time 5 to 15 min to visualize the difference between the peaks of α S2 -CN.The first peak at approximately 6 min elution time and the peak just before α S2 -CN represent κ-CN and serve as a reference.
Roin et al.: COMPARISON OF MILK FROM NORWEGIAN DAIRY BREEDS with the native breeds, which may cause implications for the technological properties of the milk.

Table 1 .
Roin et al.:COMPARISON OF MILK FROM NORWEGIAN DAIRY BREEDS Abbreviations and local and English names of the 6 native Norwegian breeds and 1 modern breed (Norwegian Red) subjected to analysis in current study 1 1 Number of samples per breed and herd represented are listed.

Table 2 .
Roin et al.:COMPARISON OF MILK FROM NORWEGIAN DAIRY BREEDS Identified genetic variants of α S1 -CN including their phosphorylation isoforms, AA substitution, elution time, and detected and theoretical molecular weights 1

Table 3 .
Identified genetic variants of α S2 -CN including the different phosphorylation forms, AA substitution, elution time, and detected and theoretical molecular weights 1

Table 6 .
Roin et al.:COMPARISON OF MILK FROM NORWEGIAN DAIRY BREEDS Protein profile based on the relative proportions of each trait extracted from the UV signal using liquid chromatography-electrospray ionization/MS1,2

Table 7 .
Protein profile based on the proportions relative to total protein of each trait extracted from the UV signal when using liquid chromatography-electrospray ionization/MS relative to α S1 -CN genotypes (mean ± SD)

Table 8 .
Roin et al.:COMPARISON OF MILK FROM NORWEGIAN DAIRY BREEDS Protein profile based on the relative proportions of each trait extracted from the UV signal when using liquid chromatographyelectrospray ionization/MS1 relative to α S2 -CN genotype 1