Signatures of positive selection after the introduction of genomic selection in the Finnish Ayrshire population

The Finnish Ayrshire ( FAY ) belongs to the Nordic Red breeds and is characterized by high milk yield, high milk components, good fertility, and functional conformation. The FAY breeding program is based on genomic selection. Despite the benefits of selection on breeding values, autozygosity in the genome may increase due to selection, and increased autozygosity may cause inbreeding depression in selected traits. However, there is lack of studies concerning selection signatures in the FAY after genomic selection introduction. The aim of this study was to identify signatures of selection in FAY after the introduction of genomic selection. Genomic data included 45,834 SNPs. The genotyped animals were divided into 2 groups: animals born before genomic selection introduction (6,108 cows) and animals born after genomic selection introduction (47,361 cows). We identified the selection signatures using 3 complementary methods: 2 based on identification of selection signatures from runs of homozygosity ( ROH ) islands and one based on the decay of site-specific extended haplotype between populations at SNP sites ( Rsb ). In

GS has been applied in dairy cattle breeding for over a decade.To date, genomic breeding values are generally estimated for all genotyped RDC females and artificial insemination bulls (NAV, 2023A).Despite benefits of selection on breeding values, both traditional BLUP -based selection and GS cause increased autozygosity in the genome (Forutan et al., 2018).As a consequence of autozygosity, inbreeding depression in fertility and production traits in FAY has been detected (Martikainen et al., 2018;Martikainen et al., 2020).
In addition to the increased local frequency of ROHs, positive selection may change allele frequencies so that homozygosity is extended to exceptionally long distances (Sabeti et al., 2002).Extended homozygosity and strong selection sweeps that have occurred in one of the 2 populations can be identified from genomic data using the Rsb approach (Tang et al., 2007;Weigand and Leese, 2018;Klassmann and Gautier, 2022).The Rsb approach is based on a standardized log-ratio of integrated site-specific extended haplotype homozygosity between pairs of populations (Klassmann and Gautier, 2022).The Rsb approach is robust to departures from neutrality that are due to demographic parameters other than selection (Tang et al., 2007) and does not require phased or polarized data (the ancestral versus derived state of each allele known) or recombination maps (Tang et al., 2007;Weigand and Leese 2018;Klassmann and Gautier, 2022).The Rsb approach has commonly been applied to detect signatures of selection in indigenous or local breeds and in crossbred or composite cattle breeds (Goszczynski et al., 2018;Alshawi et al., 2019;Mekonnen et al., 2019;van der Nest et al., 2021;Seo et al., 2022).However, only a few studies have applied information on temporal groups within the same cattle breed with the Rsb approach (e.g., Mastrangelo et al., 2020;Seo et al., 2022).
Increased homozygosity due to GS are expected near QTL affecting desired traits (Sonesson et al., 2010).Favorable haplotypes are selected to advance e.g., production, health, and fertility traits, but both desired and undesired haplotypes may simultaneously be 'hitchhiked' near by the selected regions (Ma et al., 2019;Xu et al., 2020).Signatures of selection and the hitchhiking effect may be identified by studying genomic regions with increased homozygosity.However, a lack of studies exists concerning selection signatures in FAY after GS introduction.
In this study, we aim to detect signatures of selection in the Finnish Ayrshire population.Using complementary methods -ROH island and ∆ROH islands detection and the Rsb method -we will identify genomic regions where selection has caused increased levels of homozygosity during the introduction of GS.

MATERIALS AND METHODS
No animal experiments were carried out in this study.Genotype data were obtained from the Nordic Cattle Genetic Evaluation (NAV).The data included genotypes of 53,469 FAY cows with 46,914 imputed autosomal SNP.Animals were considered FAY if they were registered as RDC with the country code FIN (Finland).Information on birth years of the genotyped animals were obtained from the Finnish Animal Breeding Association (Faba).
Genotyped animals were divided into 2 groups: group BGS; animals born before the introduction of GS (between 1980 and 2011, 6,108 cows) and group AGS; animals born after GS introduction (between 2015 and 2020, 47,361 cows).Most of the animals were genotyped with Illumina BovineLD v.1 Beadchip (Boichard et al., 2012).Before analysis, genotypes from low-density chips were imputed to 50K density using the FImpute software (Sargolzaei et al., 2014).Because genotypes were based on the UMD3.1 assembly, the positions of the SNPs were updated according to the ARS-UCD1.2assembly (NCBI, 2023).This resulted in the exclusion of 1,080 SNP variants that were either located on the sex chromosome or the mitochondrial chromosome in the ARS-UCD1.2assembly, lacked chromosomal location, were missing from the ARS-UCD1.2assembly, or shared a position in the assembly.In total, 45,834 SNP variants remained for further analyses.

ROH identification
ROHs were identified using PLINK version 1.90b6.20 (Chang et al., 2015;Purcell and Chang, 2021).Based on the average heterozygosity (32% in both groups), the minimum number of SNPs in a ROH was set to 61 and the sliding window length to 61 SNPs following Meyermans et al. (2020).Threshold t is defined as the hit rate of homozygous sliding windows that must contain the given SNP to define the SNP as part of a ROH.Threshold t was calculated as where N out is the desired number of outer SNPs excluded from a final ROH at both ends of a segment and L is the sliding window size (Meyermans et al., 2020).Threshold t was rounded down to 2 decimals.Thus, the sliding window size of 61 and N out = 4 resulted in threshold t = 0.08.Additionally, to control false positive ROH (Meyermans et al., 2020), the minimum ROH length was set to 500 kb.The minimum density of SNPs on a ROH was set to 1 SNP/65 kb, and the maximum gap size between adjacent SNPs in a ROH was limited to 500 kb to obtain sufficient genome coverage.Finally, as imputed genotypes were used, one heterozygote was allowed per sliding window to account for genotyping and imputation errors.The data were not pruned for minor allele frequency or deviations from the Hardy-Weinberg equilibrium (HWE).Individual genomic inbreeding (F ROH ) was estimated as the number of SNPs in ROHs divided by the total number of SNPs (Pryce et al., 2014).

ROH islands and ∆ROH islands.
Selection signature detection was based on ROH islands and ∆ROH islands, where ROH frequency has increased at certain genomic regions.ROH islands (representing selection signatures detected within a population) were identified i) in the BGS group and ii) in the AGS group based on SNP-wise H-scores (the proportion of SNPs belonging to ROH) (Purfield et al., 2017).The ∆ROH islands (representing selection signatures in one population relative to the other) were identified based on SNPwise ∆H-scores, i.e., the difference between SNP-wise H-scores of the BGS and AGS groups.
The SNPs in the top 1% of H-score and ∆H-score -distributions, with p-value < 0.01, formed the bases of the ROH islands and ∆ROH islands.The SNPs belonging to the top 1% SNP set and separated by a maximum of 2 SNPs were considered a part of the same ROH island or ∆ROH island.A ROH island or ∆ROH island had to include a minimum of 2 SNPs.

Rsb approach
To complement the ∆ROH island approach, we identified selective sweeps due to strong positive selection also with the Rsb method using the "rehh" package in R (Klassmann et al., 2021;Klassmann and Gautier, 2022).Before analysis, SNPs significantly deviating from the HWE (P < 1x10 −6 ) in either group were excluded (648 markers in total).A method by Klassmann and Gautier (2022) was applied to obtain a ratio between the groups (Rsb), briefly, as follows.First, using a SNP (s) as a reference, a SNP (t), where a site-specific extended haplotype homozygosity score (EHHS) decreased below a chosen cut-off limit, was chosen.Then, the number of homozygous individuals for a region that extended from SNP s to SNP t was calculated I s,t .The EHHS between SNPs s and t (EHHS s,t ) was determined independently in both directions starting from SNP s.The EHHS s,t was calculated for both groups by dividing the I s,t by the number of homozygous individuals for SNP s (I s,s ): This was followed by integration over the physical distance and normalization of EHHS s,t to its value at marker t = s, producing integrated normalized EHHS scores Because unphased data were used, the cut-off limit for EHHS was increased from the default 0.05 to 0.1 (Klassmann et al., 2021).The higher cut-off limit was recommended by Klassmann and Gautier (2022) to reduce statistical noise, i.e., to prevent high EHHS scores due to one or a few extremely long, shared haplotypes.In addition, EHHS was only considered for those SNP s with at least 5 homozygous individuals (sample-wise), and the integration was stopped if less than 2 individuals remained homozygous at SNP t (Klassmann et al., 2021).The unstandardized log-ratios of inES scores between populations were obtained by where inES BGS and inES AGS refer to groups BGS and AGS, respectively.Finally, ln(Rsb (s)) scores were obtained by standardization of the unRsb (s) ratio by the median; ) where median(Rsb) and sd(Rsb) are the median and standard deviation of unRsb computed over all analyzed markers.
The ln(Rsb) (s) scores were assumed to be normally distributed under neutrality (Gautier and Naves 2011).Hence, the desired significance threshold of the ln(Rsb Sarviaho et al.: Signatures selection in the Finnish Ayrshire (s) scores was calculated as the log 10 -transformed reciprocal of the p-value, where p-value is associated with the neutral hypothesis (no selection) (Gautier and Naves, 2011;Weigand and Leese, 2018).To control the false-discovery rate, q-values associated with the p-values were obtained from the "qvalue" package in R (Storey et al., 2023).The SNPs with ln(Rsb (s)) ≤ −3.88 (SNPs with very high EHHS within the AGS group compared with the BGS group) had very low false-discovery rates (q < 0.01) and can be considered to be under positive selection.Consecutive significant SNPs were joined to form regions under positive selection (hereafter Rsb regions).Because imputed genotypes were used, only Rsb regions with more than one significant SNP were considered.

ROH and ∆ROH islands
We identified a total of 34 ROH islands on chromosomes 1, 3, 6, 8, 12-15, 17, 19, 22, and 26 in the BGS group (Figure 1).The number of ROH islands within these chromosomes varied from one to 9.After concatenating the ROH islands, the total number of SNPs within the ROH islands was 1,401, with an average of 41.2 SNPs, varying from 2 to 192.H-scores of SNPs within the ROH islands varied from 0.110 to 0.170 (the average H-score was 0.055 across all 45,834 SNPs).ROH island length varied from 0.03 Mb to 10.6 Mb, with a mean of 2.06 Mb (Supplemental Table S1; Sarviaho 2024).All SNPs within each ROH island were less than 500 Mb apart.
Similarly, in the AGS group, we identified a total 30 ROH islands on chromosomes 1-3, 13-17, 22, and 25-26 (Figure 2).The number of ROH islands within these chromosomes varied from one to 7. After concatenating the ROH islands, the total number of SNPs within the ROH islands was 1,396, with an average of 46.5 SNPs, varying from 2 to 203.H-scores of SNPs within the ROH islands varied from 0.116 to 0.227 (the average H-score was 0.057 across all 45,834 SNPs).ROH island length varied from 0.03 Mb to 11.5 Mb with a mean of 2.33 Mb (Supplemental Table S2; Sarviaho, 2024).All SNPs within each ROH island were less than 500 Mb apart.Nine ROH islands (ROH_AGS_8,9,11,14,20,21,27,28,and 29) were identified in the AGS group that did not overlap the ROH islands detected in the BGS group.
We detected a total of 22 ∆ROH islands on chromosomes 2-3, 11, 13, 14, 16, 18, 20, and 25-26 (Figure 3, Supplemental Table S3;Sarviaho, 2024).The number of ∆ROH islands found varied from one to 7 across the chromosomes.After concatenating the ∆ROH islands, the total number of SNPs within the ∆ROH islands was 1,237, with an average of 56.2 SNPs, varying from 2 to 164.On average, the ∆ROH islands were 2.79 Mb in length, varying from 0.55 Mb to 9.23 Mb (Supplemental Table S3; Sarviaho, 2024).Two ∆ROH islands on chromosomes 3 and 14, overlapped the ROH islands in both groups, and 8 ∆ROH islands, on chromosomes 2, 13, 14, 16, 25, and 26 overlapped ROH islands observed only in the AGS group (Table 1).Eight ∆ROH islands are considered as novel ∆ROH islands given they overlap ROH islands observed only in the AGS group and do not locate to a ROH island observed in the BGS group.The number of genes within the ∆ROH islands was 424 (Supplemental Table S4; Sarviaho, 2024).

Rsb approach
Out of 159 significant SNPs, 60 were single SNPs without surrounding significant SNPs, and were discarded.A total of 99 SNPs formed 31 Rsb regions on chromosomes 2, 3, 14, 18, 20, and 25 (Supplemental Table S5, Supplemental Figure S1; Sarviaho, 2024).All SNPs within these regions were less than 500 kb apart from each other.Rsb region length ranged from 0.02 to 0.44 Mb.The total area covered by the observed Rsb regions (2.89 Mb) and their mean length (0.09 Mb) were relatively small compared with the coverage (61.4 Mb) and mean length (2.79 Mb) of the ∆ROH islands.Out of 31 observed Rsb regions, 24 overlapped 7 ∆ROH islands completely or partially on chromosomes 2, 3, 14, 18, and 25 (Table 1).One of these ∆ROH islands was a novel ∆ROH island (∆ROH_21).

DISCUSSION
A limited number of methods are available for detecting selection signatures between 2 or more populations using unphased SNP genotype data.The variety is even narrower with methods that do not require the polarization of alleles or recombination maps.In our study, we applied 3 complementary methods to detect signatures of selection using unphased and unpolarized genotype data in the FAY population: one based on ROH islands and one based on the standardized logratios of integrated site-specific extended haplotype homozygosity between pairs of populations (Rsb).

Pruning the data
As UMD3 bovine genome assembly has been used in designing the Illumina BovineLD v.1 Beadchip (Boichard et al., 2012), we updated the SNP positions according to the ARS-UCD 1.2 assembly.This resulted in the exclusion of 1,080 SNPs out of 46,914.Because of this, Sarviaho et al.: Signatures selection in the Finnish Ayrshire it is possible that certain genotypes have been imputed erroneously.Moreover, some ROHs have possibly been spliced or slightly elongated due to updated positions of the SNPs.Nevertheless, updating the SNP positions should have minor effects on the final results of this study because i) we allowed one heterozygote within the sliding window in ROH detection, ii) we allowed 2 nonsignificant SNPs between SNPs within ROH island and ∆ROH islands, iii) we did not consider regions with only one SNP as a ROH island or ∆ROH island, or a Rsb region, and iv) it is unlikely that long homozygous regions contained single heterozygous SNPs between many homozygous SNPs.Due to the latter, it is unlikely that ROHs were considerably elongated due to updated SNP positions, i.e., a homozygous SNP added between 2 ROHs would unlikely result in one false ROH segment.However, it is possible that updating the SNP positions causes bias in the Rsb approach in relation to imputed genotypes.More specifically, if many individuals have SNPs imputed in such positions that the adapted frequency of alleles deviates notably from the true frequency, it is possible that the integration of EHHS scores is prematurely stopped.Despite this, as genotype imputation accuracy from low density to medium is relatively high (Boichard et al., 2012), imputation errors should have negligible effects on the Rsb regions detected in our study.
We did not prune the data used in the ROH island or ∆ROH island studies for deviations from HWE.In addition to being genotyping errors, deviations from HWE can also be due to selection (Cox and Kraft, 2006).Because we were interested in selection, we allowed one heterozygote in a sliding window in ROH detection to account for genotyping errors.On the contrary, we pruned the data used with the Rsb approach for significant deviations (P < 1x10 −6 ) from HWE due to the requirement of the HWE when unphased data were used (Klassmann and Gautier, 2022).This excluded 648 SNPs from the genotype data and resulted in 45,186 markers left for further analyses.
We did not prune the genotype data for MAF either in the ROH island or ∆ROH island studies or with the Rsb approach because: i) pruning for MAF could lead to fewer detected ROHs (Meyermans et al., 2020), ii) notable changes in the frequency of SNPs with MAF < 0.05 were unlikely, and iii) pruning for MAF could cause considerable loss of information when the Rsb approach was used (Klassmann et al., 2021).Finally, Meyermans et al. (2020) do not recommend pruning genotype data for linkage disequilibrium (LD) in ROH studies if medium-density genotypes are used.Because of this, we avoided false-ROHs arising from LD by setting the minimum ROH length i) in SNPs and ii) in kbs.

ROH identification
The amount of heterozygosity in the 2 groups varied little.Average heterozygosity in the BGS group was 32.7%, which, in terms of recommended parameters in PLINK, would mean a sliding window of 56 SNPs and a minimum length of ROHs 56 SNPs (Meyermans et al., 2020).In contrast, average heterozygosity in the AGS group was 32.8%, and the optimal sliding window size and the minimum ROH length was 61 SNPs.Despite this, we assumed similar average heterozygosity, even if sample size was increased, in both groups, and used equal sliding window size and a minimum ROH length of 61 SNPs for both groups.
Although ROHs resulting from linkage disequilibrium and low recombination rates in cattle are less than 1 Mb long (Ferenčaković et al., 2012), short ROHs may also result from LD that is due to selection (Sabeti et al., 2002).In addition, ROHs, whether emerged from selection or inbreeding, are spliced into shorter segments by recombination (Kirin et al., 2010;Howrigan et al., 2011).We therefore additionally limited the minimum ROH length to 500 kb.This may have led to some ROHs less than 500 kb or 61 SNPs in length to go undetected and some homozygous segments 0.5-1 Mb in length, arising from low recombination rates, considered as ROHs.However, we expect that the parameters limiting ROH length had negligible effects on the final results of this study.

Identification of ROH-and ∆ROH islands and Rsb approach
A high frequency of ROHs at certain genomic regions in a population form 'ROH islands' (Nothnagel et al., 2010), which are considered selection signatures (Purfield et al., 2017;Gorssen et al., 2019).Even though, signatures of selection have been detected in various cattle breeds, ROH-patterns vary between breeds (Szmatoła et al., 2016;Mastrangelo et al., 2018) which complicates the results comparison between studies.Moreover, there is no consensus in the definition of ROH islands between previous studies.
The appropriate significance threshold for considering an SNP as part of a ROH island depends on the population inbreeding level: the higher the inbreeding level, the less SNPs surpass a strict significance threshold (Gorssen et al., 2021).In previous studies, the threshold for the top significant SNPs has generally varied between 0.1% (Moscarelli et al., 2020;Gorssen et al., 2021) and 1% (Szmatoła et al., 2016;Szmatoła et al., 2019;Cesarani et al., 2022).In our study, the mean F ROH was moderate -0.055 (SD 0.025) in the BGS group and 0.057 (SD 0.022) in the AGS group.There- fore, 1% was used as the threshold for significance.For continuity of the ROH islands, we allowed the significant SNPs to be separated by a maximum of 2 nonsignificant SNPs to be called in a ROH island and considered the in-between SNPs to be significant.Despite this, the maximum inter-marker distances between consecutive SNPs within ROH islands and ∆ROH islands were less than 0.5 Mb, which falls within the extent of LD in cattle (MacKay et al., 2007).Nevertheless, it is pos-sible that selection pressure is still shared by certain distinct ROH islands or ∆ROH islands.Finally, as bovine autosomes are acrocentric, with a centromere near the chromosome start, ROH islands emerging due to the centromeric location of ROHs are expected to locate within the few first megabases of a chromosome (Nandolo et al., 2018).
Due to the small number of studies on ∆ROH islands, the requisites for calling a ∆ROH island were  the same as for the ROH islands.However, it is likely that not all observed ∆ROH islands are regions truly under positive selection due to the limited sample size of animals born before 2012.Similar to ROH island studies, there is no consensus of the appropriate thresholds to use for detecting the regions under selection with the Rsb approach.Commonly, selection is inferred using a standard normal distribution (Weigand and Leese, 2018), and the threshold of significance for SNPs under selection has generally varied between ln(Rsb) ≤ ± 2 and ln(Rsb) ≤ ± 4, corresponding to p-values associated with the neutral hypothesis (no selection) between 0.01 and 0.0001, respectively (e.g., Goszczynski et al., 2018;Mastrangelo et al., 2020).To control the false-discovery rate, we calculated q-values associated with the p-values, and only the SNPs with very low false-discovery rates (q < 0.01) were considered positive selection.

Observed selection signatures
FAY belong to the Nordic Red breeds that are selected based on the NTM (NAV, 2023A).Production traits (milk, protein, and fat) have the highest weights in the NTM (NAV, 2023A).As expected, we observed ∆ROH islands overlapping genes related to milk yield and composition.For example, MECR, CHUK, and COX15, candidate genes for milk fatty acid traits (Ibeagha-Awemu et al., 2016;Palombo et al., 2018;Shi et al., 2019), overlap the novel selection signatures ∆ROH_3 and ∆ROH_22.Milk fatty acids are correlated with milk fat and milk protein contents (Mele et al., 2009).Additionally, PLA2G4A, a gene associated with milk yield in Holstein (Lu et al., 2022), is located in the novel selection signature region ∆ROH_16.Furthermore, QTL regions related to milk fat yield in the RDC breed (Iso-Touru et al., 2016;Fang et al., 2018) overlap the novel selection signature ∆ROH_21.Selection signatures have also been identified in these regions in Holstein (Szmatoła et al., 2016;Mastrangelo et al., 2018;Szmatoła et al., 2019) and in Jersey (Lozada-Soto et al., 2022).
Although FAY is most importantly a dairy breed, a growth index is also included in the NTM (NAV, 2023A).ANGPTL3, a gene associated with growth and meat quality traits in cattle (Chen et al., 2015), is located in region ∆ROH_7.This ∆ROH island was located within ROH islands observed in the BGS and AGS groups.In addition, SHISA9, a candidate gene for maturity rate in beef cattle (Duan et al., 2021) is located in the novel ∆ROH_21.Additionally, SLC44A5, associated with birth weight in Holstein (Sugimoto et al., 2012) overlaps ∆ROH_5, and HSPB6, influencing the intramuscular fat and meat and carcass traits in dual-purpose cattle (Picard et al., 2023), overlaps ∆ROH_19.

CONCLUSIONS
In conclusion, we detected selection signatures that were related to milk production, fertility, growth, and feed efficiency in FAY.The results of this study indicate that, after genomic selection has been introduced, selection has favored genomic regions related to traits relevant in the FAY breeding program.Because homozygosity within these genomic regions has increased, the effect of genotypes in a homozygous state within the selected regions should be studied.In addition, we recommend monitoring the level of homozygosity within these regions that are undergoing selection in the FAY population.

Figure 1 .
Figure 1.Distribution of H-scores along the genome in the BGS group.SNPs within ROH islands are highlighted in green.The blue horizontal line indicates the cut-off threshold (0.116, corresponding to a p-value of < 0.01) of the H-scores.

Figure 2 .
Figure 2. Distribution of H-scores along the genome in the AGS group.SNPs within ROH islands are highlighted in green.The blue horizontal line indicates the cut-off threshold (0.120, corresponding to a p-value of < 0.01) of the H-scores.

Figure 3 .
Figure 3. Distribution of ∆H-scores along the genome.SNPs within ∆ROH islands are highlighted in green.The blue horizontal line indicates the cut-off threshold (0.031, corresponding to a p-value of < 0.01) for ∆H.

Table 1 .
Selection signatures that emerged after the introduction of genomic selection.Positions are based on ARS-UCD1.2assembly