Genetic diversity of Lactobacillus delbrueckii isolated from raw milk in Hokkaido, Japan

Lactic acid bacteria (LAB) play important roles in acid production and flavor formation in fermented dairy products. Lactic acid bacteria strains with distinct characteristics confer unique features to products. Diverse LAB have been identified in raw milk and traditional fermented milk prepared from raw milk. However, little is known about LAB in raw milk in Japan. To preserve diverse LAB as potential starters or probiotics for future use, we have isolated and identified various kinds of LAB from raw milk produced in Japan. In this study, we focused on Lactobacillus delbrueckii, one of the most important species in the dairy industry. We identified L. delbrueckii subspecies isolated from raw milk in Hokkaido, Japan, by analyzing intraspecific diversity using 4 distinct methods, hsp60 cluster analysis, multilocus sequence analysis, core-genome analysis, and whole-genome analysis based on average nucleotide identity. The subspecies distribution and a new dominant subset of L. delbrueckii from raw milk in Japan were revealed. The discovery of new strains with different genotypes is important for understanding the geographic distribution and characteristics of the bacteria and further their use as a microbial resource with the potential to express unconventional flavors and functionalities. The strains identified in this study may have practical applications in the development of fermented dairy products.


INTRODUCTION
Lactic acid bacteria (LAB) are used in the production of various fermented foods, especially fermented dairy products. To meet the diverse needs of consumers, LAB strains with various characteristics have been developed as starters and are used in the production of dairy products worldwide. Traditional dairy products such as artisan cheese contain various LAB originating from not only natural starter cultures but also raw milk used as a base material (Callon et al., 2004;Poznanski et al., 2004;Coppola et al., 2006). In contrast, industrially fermented dairy products are manufactured using a starter consisting of a small number of strains and pasteurized milk. Although this is the most effective strategy to control quality and ensure safety, it tends to result in poor or weak flavors compared with traditional fermented dairy products (Wijesundera et al., 1997;Fox et al., 1998). This mainly reflects the low diversity of LAB in pasteurized milk and starters, which contain limited species or strains (Wouters et al., 2002;Van Hoorde et al., 2010a). There are several reports on the composition and diversity of LAB derived from raw milk (Giannino et al., 2009;Van Hoorde et al., 2010b). Japan has various traditional fermented foods, but no traditional fermented dairy products; hence, data on indigenous LAB in raw milk in Japan are limited (Hagi et al., 2010(Hagi et al., , 2013. The intraspecific diversity of LAB isolated from raw milk has been evaluated using random amplified polymorphic DNA (RAPD; Moschetti et al., 1998;Mora et al., 2002;Giraffa et al., 2004). Random amplified polymorphic DNA is commonly used for bacterial typing owing to its convenience; however, it is limited by low reproducibility. Recently, analyses of housekeeping genes have been used as a complementary method to RAPD. The hsp60 (groEL) gene is useful for identifying bacterial subspecies that cannot be identified using 16S rDNA and for elucidating diversity within species (Zeaiter et al., 2002;Masco et al., 2004;Avendaño-Herrera et al., 2014). Multilocus sequence typing or multilocus sequence analysis (MLSA), based on 7 or 8 housekeeping genes, is an alternative approach to identify subspecies (Tanigawa and Watanabe, 2011) and characterize intraspecific diversity (Yu et al., 2015; Genetic diversity of Lactobacillus delbrueckii isolated from raw milk in Hokkaido, Japan H. Tsuchihashi, 1 * A. Ichikawa, 1 M. Takeda, 1 A. Koizumi, 1 C. Mizoguchi, 2 T. Ishida, 1 and K. Kimura 1 Delorme et al., 2017), intrasubspecific diversity Song et al., 2016), and regional diversity of intersubspecies . Moreover, nextgeneration sequencing techniques have been used to sequence whole bacterial genomes for comparative analyses of closely related strains (Fontana et al., 2018;Frantzen et al., 2018;Alexandraki et al., 2019). Average nucleotide identity (ANI) is an index developed to distinguish species instead of DNA-DNA hybridization, which is complicated to operate. Average nucleotide identity is widely used in species identification as a standard approach (Chun et al., 2018). Although an ANI of 95 to 96% corresponds to DNA-DNA hybridization 70% (Konstantinidis and Tiedje, 2005;Goris et al., 2007;Richter and Rosselló-Móra, 2009), there is no ANI threshold to distinguish subspecies. However, recently, there have been reports on the use of ANI to assess diversity within species (Tanizawa et al., 2016) and to distinguish subspecies .
Lactobacillus delbrueckii is one of the most important species in the dairy industry; it has 6 subspecies (Weiss et al., 1983;Dellaglio et al., 2005;Kudo et al., 2012;Adimpong et al., 2013), including Lactobacillus delbrueckii ssp. bulgaricus, a yogurt starter. As the subspecies are indistinguishable by 16S rDNA sequences (Stackebrandt, 2006), subspeciation of L. delbrueckii has been performed by multilocus sequence typing (Tanigawa and Watanabe, 2011). Despite the frequent isolation of this species from various fermented foods and analyses of diversity (Giraffa et al., 2004;Mohammed et al., 2009;Song et al., 2016), only a few studies have focused on intraspecific genetic diversity based on isolates from raw milk.
Recently, various fermented dairy products have been developed at both local small-scale and industrial scale in Japan. To expand the dairy industry in Japan, it is necessary to preserve diverse LAB in raw milk as potential starters or probiotics for future production of unique dairy products. Therefore, we have isolated and identified various kinds of LAB, including L. delbrueckii, from raw milk in Japan since 2009.
The objective of this study is to identify the subspecies of 226 L. delbrueckii strains isolated from raw milk in Hokkaido, the largest dairy farming region in Japan, from 2009 to 2015, and estimate their distribution. For species classification, polyphasic analysis with multiple indicators is an effective and reliable method, since the monophasic analysis always has potential errors ( Vandamme et al., 1996). In this study, the intraspecific diversity of L. delbrueckii derived from raw milk was evaluated using 4 distinct methods, hsp60 cluster analysis, MLSA, core-genome analysis, and whole-genome analysis based on ANI.

Bacterial Strains and Growth Conditions
A total of 228 L. delbrueckii strains including 226 strains derived from raw milk produced in Hokkaido were used in this study. The 226 strains were obtained through 6 different studies on the isolation of LAB from raw milk conducted from 2009 to 2015. The raw milk from which these strains were isolated was sampled from bulk tanks at several plants that belong to Meiji Company Ltd. (formerly Meiji Dairies Corporation) in Hokkaido. The strains were isolated using multiple types of media preferred by LAB such as de Man, Rogosa, and Sharpe, M17, and Lactobacillus Selection (LBS) agar and identified as L. delbrueckii by partial 16S ribosomal DNA sequencing, as described previously (Kimura et al., 2010). These 226 strains were confirmed to be free of duplicates by RAPD. Lactobacillus delbrueckii ssp. sunkii NRIC 0766 and NRIC 0767 were obtained from the Center for Microorganism Resources at the Tokyo University of Agriculture. All strains were grown in de Man, Rogosa, and Sharpe broth (BD Difco) at 37°C for 18 h.
First, hsp60 (511 bp) of the 226 L. delbrueckii strains isolated from raw milk from the Hokkaido region was sequenced for a cluster analysis. Thereafter, 26 representative strains including all novel alleles were selected. Seven housekeeping genes were sequenced in 30 strains, and the NODAI Culture Collection (NRIC) and ME numbers are shown in Table 1.

MLSA
All 7 housekeeping genes were concatenated into single sequences. A total of 86 concatenated sequences, consisting of 45 sequences obtained from strains shown in Table 1 and 41 previously reported sequences (Tanigawa and Watanabe, 2011), were used for MLSA. Sequence alignment and phylogenetic analyses of 80 concatenated sequences (length: 3,155 bp) were performed using MEGA X version 10.0.5 (Kumar et al., 2018). Evolutionary distances were computed using the Tamura 3-parameter method (Tamura, 1992). A phylogenetic tree with 1,000 bootstrap replicates (Felsenstein, 1985) was constructed using the unweighted pair group method with arithmetic mean (UPGMA) method (Sneath and Sokal, 1973).

Extraction and Purification of Genomic DNA from L. delbrueckii
For the 28 strains with NRIC and ME numbers, excluding ME-781 and ME-794, listed in Table 1, genomic DNA was extracted for genome sequencing.
One milliliter of each culture was used for genomic DNA extraction. Extraction and purification of genomic DNA were performed using 2 column-based commercial kits, the Wizard Genomic DNA Purification Kit (A1220; Promega) and Genomic DNA Clean and Concentrator Kit (#D4011, Zymo Research). Before purification, the cells were collected by centrifugation [at 13,000 × g for 2 min at room temperature (24°C)], washed once with an equal volume of 5 mM Tris-HCl (pH 8.0), and suspended in 100 μL of 20 mM Tris-HCl (pH 7.0) buffer containing 0.3 mM raffinose, 5 mM CaCl 2 , and 5 mM MgCl 2 . Cell wall digestion was performed by adding 20 μL of 10 mg/mL lysozyme (L6876, chicken egg white; Sigma-Aldrich) and 5 μL of 10 U/μL mutanolysin (catalog number 1017-10; A&A Biotechnology) and incubating at 37°C for 10 min. Pellets collected by centrifugation after digestion were suspended in 600 μL of Nuclei Lysis solution provided in the Wizard Genomic DNA Purification Kit and mixed gently. The first purification was performed according to the manufacturer's instructions. Thereafter, 100 μL of genomic DNA was suspended in 200 μL of Chip DNA Binding Buffer (Genomic DNA Clean and Concentrator). The second purification was performed according to the manufacturer's instructions. Finally, genomic DNA was eluted with 100 μL of sterilized water and quantified using the Qubit dsDNA HS Assay Kit (Q32851; Thermo Fisher Scientific).

DNA Library Preparation and Genome Sequencing
The DNA library was prepared using the QIAseq FX DNA Library Kit (#180473; Qiagen) according to the manufacturer's instructions. Briefly, 100 ng of DNA was used as the input and fragmented at 32°C for 10 min. After adapter ligation, the sample was purified using Agencourt AMPure XP (Beckman Coulter, Inc.). The quality and quantity of DNA were assessed using the Agilent BioAnalyzer and QIAseq Library Quant Assay Kit (#333314; Qiagen), respectively. The library (0.5 nM) was denatured and diluted to 10 pM (phiX: 1% spike in) following the NextSeq System Denature and Dilute Libraries Guide (Illumina Inc.). Genomes were sequenced using the Illumina MiSeq platform.
The nucleotide identity percentage among 28 hsp60 sequences is shown in Supplemental Table S1 (https: / / doi .org/ 10 .5281/ zenodo .5650124). All 28 hsp60 sequences were clustered into 4 groups according to the subspecies (Figure 1). Group 1 included 2 alleles: number 1 was carried by the type strain of L. delbrueckii ssp. delbrueckii and number 3 was shared by 2 type strains of L. delbrueckii ssp. lactis and L. delbrueckii ssp. jakobsenii. Group 2 consisted of only alleles car-ried by L. delbrueckii ssp. bulgaricus. Groups 3 and 4 included alleles of the type strain of L. delbrueckii ssp. sunkii and L. delbrueckii ssp. indicus, respectively.
Alleles belonging to groups 1 and 3 were carried by 199 (88.1%) and 26 strains (11.5%), respectively. Only one strain carrying allele number 21 belonged to group 4. No strains carried the allele belonging to group 2 among the 226 strains tested.
These results suggest that strains carrying the hsp60 allele belonging to group 1, especially allele number 18, were dominant among L. delbrueckii strains derived from raw milk in Japan. We also examined the hsp60 allele in nearly 200 isolates from raw milk procured outside Hokkaido and confirmed that the distribution of the hsp60 allele is not limited to raw milk from Hokkaido (data not shown). We are interested in determining whether this distribution of hsp60 alleles and the existence of allele number 18 are unique to the L. delbrueckii strain derived from raw milk in Japan.

MLSA of L. delbrueckii Isolated from Raw Milk in Japan
To estimate the distribution of subspecies among the 226 strains isolated from raw milk in Japan, 26 representative strains including all novel alleles were selected for subspecies identification by MLSA (Tanigawa and Watanabe, 2011). All clusters or subclusters generated by MLSA were newly named because they were reorganized in the context of newly isolated strains.
Five main clusters (I to V, Figure 2) corresponding to 5 subspecies (lactis, delbrueckii, sunkii, bulgaricus, and indicus) were identified in the phylogenetic analysis of 7 concatenated housekeeping genes. Lactobacillus delbrueckii ssp. jakobsenii belonged to cluster I. These results were consistent with those of previous studies (Tanigawa and Watanabe, 2011;Kudo et al., 2012;Adimpong et al., 2013) and suggested that strains assigned to cluster I cannot be identified as lactis or jakobsenii species. Therefore, we used cluster, subcluster, or subset names in this study.
Of the 26 strains derived from raw milk, 25 were classified into cluster I or III. Cluster I was divided into 7 subclusters (from I-A to I-G), and subcluster I-B consisted of 2 subsets (I-B1 and I-B2). The subclusters I-B2 and I-C included type strains of L. delbrueckii ssp. lactis and L. delbrueckii ssp. jakobsenii, respectively. Strains derived from raw milk were classified into 5 subclusters (from I-A to I-E), and subcluster I-B1 consisted of only strains derived from raw milk. Interestingly, hsp60 allele number 18, the most dominant allele, was found only in subcluster I-B1.
Cluster III was divided into 2 subclusters (III-A and III-B). Subcluster III-A consisted only L. delbrueckii Tsuchihashi et  ssp. sunkii, whereas III-B consisted of strains isolated from raw milk and YIT 0373. Strains belonging to III-B were not named L. delbrueckii ssp. sunkii in this study because these strains could utilize lactose except for ME-800 (data not shown) contrary to the description of L. delbrueckii ssp. sunkii, and YIT 0373 was not included in L. delbrueckii ssp. sunkii in a previous study (Kudo et al., 2012). The relationship between the lactose utilization ability and subclusters found in III-A and III-B is consistent with previous findings (Tanigawa and Watanabe, 2011). Strains belonging to III-A and ME-800 (III-B) showed very low homology against both lactose permease gene (lacS) and β-galactosidase gene (lacZ) of L. delbrueckii ssp. bulgaricus ATCC 11842 T , suggesting mutation or deletion of the lactose operon (data not shown). As the sugar utilization ability reflects the origin of isolates, differences in subclusters within the same cluster may therefore reflect the origin rather than differences in subspecies.
Only one strain (ME-792) derived from raw milk was classified into cluster V and was named L. delbrueckii ssp. indicus. To the best of our knowledge, this is the first report of L. delbrueckii ssp. indicus isolated from raw milk in Japan. The discovery of this strain, belonging to L. delbrueckii ssp. indicus, previously reported in fermented milk in India (Dellaglio et al., 2005) and also found by us in fermented milk in Nepal (ME-781), in Japanese raw milk may have important implications for the geographical comparison of distribution and genetic diversity.
The hsp60 cluster analysis and MLSA showed that MLSA has a higher resolution than cluster analysis using a single hsp60 gene in our study, as described by Tanigawa and Watanabe (2011). Although a single gene analysis using hsp60 is inferior to MLSA in terms of its ability to identify strains, the results suggest that hsp60 cluster analysis may be useful as a simple method to estimate the subspecies of L. delbrueckii.

Core-Genome Analysis of L. delbrueckii
The hsp60 cluster analysis and MLSA showed that subcluster I-B1 is the dominant subset in L. delbrueckii strains derived from raw milk in Hokkaido. However, as evidenced by the lower bootstrap values, subcluster I-B1 and cluster I were not well supported compared with other clusters, such as IV and V (Figure 2). These results suggest that the 7 genes were not sufficient to clearly distinguish L. delbrueckii belonging to cluster I. We performed a core-genome analysis using the BPGA tool with 43 strains, including 17 strains as a reference.
The topology of the tree based on 529 core genes ( Figure 3) was very similar to that based on 7 housekeeping genes ( Figure 2); the bootstrap values of the subclusters within cluster I improved substantially. The core-genome analysis demonstrated that there were subsets within cluster I, and strains belonging to subcluster I-B1 consistently formed an independent subcluster supported by a high bootstrap value (99.5%).

Whole-Genome Analysis Based on ANI
The ANI is widely used in species identification as a standard approach, although not in subspecies identification (Chun et al., 2018). However, Tanizawa et al. (2016) suggested, using the genome data of L. delbrueckii, that ANI is reliable for the evaluation of intraspecific diversity; therefore, we evaluated our strains using ANI.
The ANI ranged from 97.00% to 99.96%, suggesting that the strains were closely related. The dendrogram based on the ANI (Figure 4) also presented a topology similar to that based on 7 housekeeping genes ( Figure  2) and core-genome analysis (Figure 3). The reason for the similarity in the results of the core-genome analysis and those of the ANI-based analysis is that both  analyses were conducted on orthologous genes. The consistency of the results of genome-based analysis and MLSA indicates that the 7 housekeeping genes are sufficient for the subspeciation of L. delbrueckii.
Of the 4 different analyses, MLSA was the most reliable and simplest to operate in identifying the subspecies of L. delbrueckii.

Distribution of MLSA Clusters (Subspecies) Among the 226 Strains
We focused on the relationship between the hsp60 alleles and MLSA clusters to estimate the distribution of subspecies in 226 strains derived from raw milk. All strains carrying alleles belonging to group 1 (except for allele number 1) based on hsp60 sequences were classified by MLSA into cluster I, which included the type strains of L. delbrueckii ssp. lactis and jakobsenii. In addition, other groups in the hsp60 cluster corresponded to the MLSA cluster one-on-one without crossover among different MLSA clusters (Tables 1 and  2). Accordingly, the distribution of the MLSA cluster (nearly equal to subspecies) was estimated as I: 88.1%, II: 0.0%, III: 11.5%, IV: 0.0%, and V: 0.4% (Table 2).
Although there have been reports of L. delbrueckii in fermented foods (Giraffa et al., 2004;Mohammed et al., 2009) and L. delbrueckii ssp. bulgaricus , there have been no detailed analyses of L.
delbrueckii in raw milk in other areas. Therefore, the intraspecific diversity observed in this study cannot be compared with previous findings. However, the bacterial flora in raw milk collected in tanks on farms after milking is affected by the season, farm environment, cow hygiene and rearing environment, and storage conditions of raw milk. Lactobacillus delbrueckii evaluated in this study was isolated from milk sampled from a tank in a milk plant, and its flora may be influenced by various factors (Hagi et al., 2010;Hagi et al., 2013;Cao et al., 2021;Hornik et al., 2021;Nikoloudaki et al., 2021).
Three of the 5 strains carrying hsp60 allele number 4 and all 8 strains carrying number 18, 26, or 27 were assigned to subcluster I-B1 by MLSA of 26 strains. It was estimated that among the 226 strains, 199 strains (69.9%) belonged to subcluster I-B1. Although subcluster I-B1 is not an independent subspecies, it was the largest subset, accounting for 69.9% of L. delbrueckii strains derived from raw milk. Finally, these results suggest that strains assigned to cluster I, specifically subcluster I-B1, were dominant among L. delbrueckii isolated from raw milk in Hokkaido.
The strains belonging to subcluster I-B1 (I-B1 strain) can potentially impart different flavor characteristics compared with the commercial yogurt starter L. delbrueckii ssp. bulgaricus strain. In the future, the characterization of I-B1 strain will help to develop fermented  . Dendrogram based on the average nucleotide identity (ANI). The scale bar represents 1 − ANI (%)/100 as distance. The distance from 0.000 to 0.030 means from ANI 100% to 97%, respectively. dairy products with different flavors from the yogurt starter. However, the ability of the I-B1 strain to ferment milk is slightly weaker than that of yogurt starter (data not shown). Therefore, it would be more effective to use it in combination with a strain having a superior ability to ferment milk, such as S. thermophilus strains possessing prtS, which encodes the cell wall protease (Shahbal et al., 1991), when producing yogurt industrially (Yamamoto et al., 2020).

CONCLUSIONS
To the best of our knowledge, this study was the first to demonstrate the intraspecific genetic diversity of L. delbrueckii derived from raw milk in Hokkaido. We identified the subspecies and estimated their distribution within the 226 strains isolated from raw milk produced in Hokkaido using 4 distinct approaches (hsp60 cluster analysis, MLSA, core-genome analysis, and whole-genome analysis based on ANI). The distribution of the MLSA cluster (nearly equal to subspecies) was estimated as I: 88.1%, II: 0.0%, III: 11.5%, IV: 0.0%, and V: 0.4%. One strain of L. delbrueckii ssp. indicus (cluster V) was discovered for the first time in raw milk in Japan. Moreover, we found that subcluster I-B1, which comprised 66.9% of the total strains, was the most dominant subset of L. delbrueckii derived from raw milk in Hokkaido. The discovery of new strains with different genotypes is important for understanding the behavior and geographic distribution of the bacteria and using them as a microbial resource with potential to impart unconventional flavors and functionalities. In the future, characterizing the strains derived from raw milk including I-B1 subsets and elucidating their differences from the yogurt starter L. delbrueckii ssp. bulgaricus (cluster IV) may have practical applications in the development of new dairy products such as yogurt.