Genomic inbreeding coefficients using imputation genotypes: assessing the effect of ancestral genotyping in Holstein-Friesian dairy cows

The objective of this study was to assess the effect of using or not the genotypes of the parents of a cow for imputing single nucleotide polymorphisms (SNP), on the estimation of genomic inbreeding coefficients of cows. Imputation (i.e., genotyped plus imputed) genotypes from 68,127 Italian Holstein dairy cows registered in the Italian National Association of Holstein, Brown and Jersey Breeders (ANAFIBJ) were analyzed. Cows were genotyped with the HD Illumina Infinium BovineHD BeadChip and GeneSeek Genomic Profiler HD-150K, and the MD GeneSeek Genomic Profiler 3, GeneSeek Genomic Profiler 4, GeneSeek MD and the Labogena MD. To assess differences among estimators genomic in-breeding coefficients were estimated with 4 PLINK v1.9 estimators (F, F hat1, 2, 3 ), 2 genomic relationship matrix (grm) based estimators (F grm and F grm2 ; with the latter including also pedigree information) and one estimator of runs of homozygosity (ROH; F ROH ). Assuming that the correct genomic inbreeding coefficients should be those estimated from genotyped SNP, a comparison of the genomic inbreeding coefficients estimated either with the genotyped SNP or the SNP after imputation was made. Information on the presence or absence of genotypic information from sire, dam and maternal grandsire during the imputation was investigated. Genomic inbreeding coefficients estimated with genotyped SNP or SNP after imputation were consistent for F, F hat3 , F grm2 and F ROH , when at least one of the parents was genotyped. Biased (mainly higher) genomic in-breeding coefficients of imputation SNP were observed in cows that were genotyped with MD SNP panels whose SNP were poorly represented in the selected imputation SNP data set and also did not have their parents genotyped compared with what expected based on actual genotype data


INTRODUCTION
In recent years, whole genome single nucleotide polymorphisms (SNPs) are routinely used in many livestock breeding programs worldwide, mainly for the estimation of genomic breeding values.Additionally, SNP data are widely used in genome-wide association studies, population genetics, estimation of the realized homozygosity and inbreeding, signatures of selection, parental assignment, etc. Especially in dairy cattle, a plethora of SNP panels has been and is available in the market, with greatly varying number of SNPs, from low density (LD; few hundreds to ~20K) to medium (MD; 20K to ~70K) and high density (HD; 70K to ~700K).As a result, dairy cattle breeding companies had to genotype cattle groups using different SNP panels over time, depending upon the breeding objective, research focus and budget availability.Due to the overlap of SNPs among SNP panels, procedures allowing to genotype animals in LD/MD and to impute their genotypes from LD/ MD to HD are widely used (Scheet and Stephens, 2006;Browning and Browning, 2007;Daetwyler et al., 2011).Various algorithms exist for SNP imputation (Hickey et al., 2013;Whalen et al., 2018Whalen et al., , 2019)).With imputation strategies, breeding companies can reduce genotyping costs.This can be done by genotyping few core animals (i.e., characterized by genomic information shared in the entire population) with HD SNP panels and many animals with LD/HD, and project them to HD genotypes or to a set of preselected SNP.At the end of the imputation process, the genotypes to be used in further analysis (e.g., estimation of genomic breeding values, GWAS, etc.) consist of the imputation SNP, i.e., a mixture of genotyped and imputed SNPs.
The imputation success depends on 3 major factors: i) the relationship between the core animals genotyped in HD and those to be imputed from LD/MD to HD (e.g., parent-offspring), ii) the distribution along the genome and the number of SNPs in the LD/MD panels and iii) the linkage disequilibrium between SNPs in the LD/MD and SNPs in the HD (Zhang and Druet, 2010;Hickey et al., 2012;Huang et al., 2012).Practically, various strategies exist to achieve high imputation accuracy, that vary in cost and the available genomic information.For e.g., all parents, grandparents and great-grandparents of the animals to be imputed could be genotyped in HD.Alternatively, grandparents could be excluded from HD genotyping, sires could be genotyped in HD while dams could be genotyped either in HD or LD/MD or not genotyped at all.These factors and strategies are specific to each breeding program.However, as a consequence of the imputation accuracy, variation in genomic estimates based on imputed SNP data are expected, yet not always quantified.
The present work builds on previous analyses on whole genome imputation SNP based genomic inbreeding coefficients (F SNP ) in dairy cattle (Dadousis et al., 2022(Dadousis et al., , 2023)).A coefficient of inbreeding (traditionally estimated via pedigree and nowadays also via SNP data) represents the inbreeding level of an individual.In previous works we showed that extreme genomic inbreeding coefficients might be also a result of imputation.This effect was observed on cows genotyped with MD SNP panels and was more profound in SNP panels that had few of their SNPs included in the final imputation SNP data (meaning that the majority of the SNPs for these cows were imputed rather than genotyped).The still open question was if exceeding high genomic inbreeding coefficients might also be caused by absence of genotyped parents in the imputation process.Thus, the objective of this study was to further investigate if high genomic inbreeding coefficients of cows genotyped with MD SNP panels that had few of their SNPs included in the final imputation SNP data might be a result of also not having their parents genotyped in the imputation process.In scientific literature both "F" and "f" are alternately used to abbreviate the inbreeding coefficient.Herein, F denotes one of the 4 genomic inbreeding estimators available in PLINK v.1.9software (Purcell et al., 2007; https: / / www .cog-genomics .org/plink/ ) and F SNP was adopted to denote any whole genome SNP based inbreeding coefficients.

Animals, Genotypes and Imputation
In total, 68,127 dairy cows from the Italian Holstein population were analyzed.From those, 66,808 cows were genotyped with 4 MD SNP panels: GeneSeek Genomic Profiler 3 (10,679 cows;26,151 SNPs),GeneSeek Genomic Profiler 4 (33,394 cows;30,113 SNPs), Gen-eSeek MD (12,030 cows; 47,850 SNPs) and Labogena MD (10,705 cows;41,911 SNPs).Moreover, there were 678 cows genotyped with the Illumina Infinium Bovine-HD BeadChip containing 777,962 SNPs and 641 cows genotyped with the GeneSeek Genomic Profiler HD-150K containing 139,914 SNPs.The number of SNPs per SNP panel is referring here to the total number of SNPs included in each SNP panel by the manufacturer.For each cow the genotype status (G: genotyped, N: not genotyped, M: missing information) of the parents and the maternal grandsire was available.In the present study we used a 3-letter code to indicate when Sire-Dam-MGS are genotyped (G) or not (N) or missing (M).Specifically, N means not genotyped but known pedigree of ancectors and M means not genotyped and pedigree missing (the parent or the maternal grandsire is a founder animal).
All cows genotyped with MD SNP panels were imputed to 84,445 SNPs (preselected by ANAFIBJ), while genotypes of cows in HD were downgraded to the 84,445 SNPs.The imputation is part of the routine genomic evaluation procedure of ANAFIBJ and runs via a modified version of the PedImpute software (Nicolazzi et al., 2013).The SNPs' quality control was completed during this procedure and excluded: i) call rate <95%, ii) parent-offspring SNP mismatch >0.01, iii) minor allele and genotype frequencies (<0.02 and <0.001, respectively), iv) extreme deviation from Hardy-Weinberg equilibrium (P < 0.005) and v) SNPs on sex chromosomes.No other filtering was applied in the present study.All SNPs' positions were mapped to UCSC Bos taurus ARS-UCD1.2(bosTau9, Apr.2018) assembly.
Two measures of imputation success were analyzed, namely the fraction of non-missing, i.e., filled SNPs after imputation (hereafter declared as imputation fill rate) and the fraction of completely (both alleles) imputed SNPs out of the non-missing SNPs after imputation (hereafter reported as imputation rate).Both metrics were on 0-1 scale.Imputation fill rate of one implies that all of the 84,445 SNP were available (i.e., not missing) for the cow, while a high value of imputation rate indicates that the majority of the 84,445 SNPs were not genotyped but imputed SNPs for the given cow.

Genomic Inbreeding Coefficients
PLINK software.F (flag -het; Li and Horvitz (1953)] and F hat1, 2, 3 [flag -ibc; Yang et al., (2011)] estimators were used as implemented in PLINK v1.9 (Purcell et al., 2007; https: / / www .cog-genomics .org/plink/ mozygosity, where m is the total number of SNPs, and O and E the observed and expected number of homozygous SNPs.With X (n cows and m SNP) being the matrix of genotypes (coded using the number of copies of the reference allele), F hat1 = , where, p is the frequency of the reference and q of the alternative allele in the ith SNP.Further, . The latter estimator represents Wright's definition of inbreeding as correlation between uniting gametes (Wright, 1921(Wright, , 1922;;Yang et al., 2010).Diagonal of genomic relationship matrix.The first method (F grm ) (Leutenegger et al., 2003;Amin et al., 2007;VanRaden, 2008) was estimated as follows: where Z = X -2q i and q is the frequency of the second allele; where F grm = diag(grm) -1.Alternatively, the diagonal of XX′ was regressed on pedigree inbreeding coefficients (VanRaden, 2008;third method).Following, F SNP was obtained as: , with mean and slope derived from the previous regression.For F grm observed allele frequencies were used, while for F grm2 no extra information [e.g., sires from semen center, as reported in Rolf et al., (2010)] was used to calibrate the diagonal of XX′.Runs of homozygosity.F ROH estimates the proportion of SNP along the genome being in ROH.The consecutive runs method was adopted, available in the R (v. 3.6.3)package detectRUNS v. 0.9.5 (R Core Team, 2013;Marras et al., 2015;Biscarini et al., 2019).The length of ROH was set to 1 Mbp.Moreover, a minimum of 15 SNPs/ROH and at most one heterozygous SNP within a ROH (the latter to account for possible genotyping errors) were permitted.

Relationship between Genotyped and Imputation Genomic Data
Correlations (Pearson and Spearman; denoted as r and ρ hereafter) between genotyped-imputation F SNP were investigated within each SNP panel and ancestral genotype information.For each SNP panel, genotyped SNP (i.e., not imputed) were considered only those SNPs that were included in the final imputation data (i.e., being part of the 84,445 imputation SNP).Hence, the final numbers of SNPs for each SNP panel were 13,870, 16,862, 27,331, 40,218, 77,085 and 79,900 for the GeneSeek Genomic Profiler 3, GeneSeek Genomic Profiler 4, GeneSeek MD, Labogena MD, Genomic Profiler HD-150K and Illumina Infinium BovineHD Bead-Chip, respectively.Regression coefficient β ( ) of the re- gression of imputation F SNP on genotyped F SNP was used as a measure to assess the bias of F SNP after SNP imputation.For a better and more secure interpretation of the results we reported and discussed only correlations and regression coefficients estimated with at least ten data points.Although such cases are limited in data points and have to be interpreted with caution, they refer to unique groups of cows of particular interest (for e.g., cows with none of the parents being genotyped).To further validate the impact of imputation on genomic inbreeding coefficients we selected 329 cows (25%) from the two HD SNP panels.Two downgraded MD datasets were considered, containing the full data except for the 329 cows which where downgraded namely to the GeneSeek Genomic Profiler 3 (set-2) and the Labogena MD (set-3) and re-imputed to the 84,445 SNPs (set-1).Moreover, to evaluate the effect of ancestral genotyping, 266 of the 329 selected cows had the sire genotyped and either the dam or the maternal grandsire, and the other 63 cows had maximum one ancestor genotyped.Pearson correlations of F grm and F ROH between each of the downgraded MD datasets (GeneSeek Genomic Profiler 3 and Labogena MD) and the initial 84,445 SNP data were estimated.

Similarities Among SNP Panels and the Imputed SNP Data
From the 68,127 cows, 31,539 were genotyped with the GeneSeek Genomic Profiler 4 (20,498 GNG and 11,041 GGG).Moreover 11,308 were genotyped with the GeneSeek MD (7,727 GNG and 3,581 GGG) and 10,148 with the Labogena MD (7,535 GNG and 2,613 GGG).Regarding the GeneSeek Genomic Profiler 3, there were 10,097 cows (7,191 and 2,906 for the GNG and GGG, respectively).All the remaining SNP panel -parental genotyping combinations were with less than 1,000 cows each.Not always ancestral genotype status was present within each SNP panel.For instance, there were 30 cows genotyped with the GeneSeek Genomic Profiler 4 and without information on the sire (genotyped or not) but with information on the dam and maternal grand sire: MGG (3), MNG (19), MNM (5) and MNN (3); while in GeneSeek Genomic Profiler 3 there was only one MNG case cow.In total, 67,296 out of the 68,127 cows had the sire genotyped (GGG: 20,256; GGM: 10; GGN: 165; GMM: 102; GNG: 43,908; GNM: 741; GNN: 2,105).Figure 1 shows the average of r and ρ (genotyped vs. imputation F SNP ; considering at least 10 data points for each correlation) within each SNP panel, ancestral genotype status and genomic inbreeding estimator, while correlations tested on the complete data set are reported in the Supplemental Table S1.
In general, r of genotyped vs. imputation F SNP was higher in cows genotyped with HD SNP panels.However, even in HD SNP panels there where cases of r < 0.8.For instance, the r of the Illumina Infinium BovineHD BeadChip in the group of cows that did not have any ancestor genotyped (case NNN) were 0.79, 0.97, 0.95, 0.56, 0.57, 0.59 and 0.98 for the F, F hat1 , F hat2 , F hat3 , F grm , F grm2 and F ROH , respectively (Figure 1; Supplemental Table S1; Supplemental Figure S1).Overall, r and ρ showed similar results for each tested estimator depending on the available ancestor information.However, there were cases where ρ was much higher than r (depending upon estimator).For instance, for the 2,906 cows genotyped with the GeneSeek Genomic Profiler 3 having all ancestors genotyped (GGG case; Supplemental Table S1), r for F hat1 and F hat2 was 0.63 and 0.58, respectively, while ρ was ~0.80 for both estimators.
Overall, similar patterns were found for F, F hat3 , F grm2 and F ROH .These estimators provided, in general, higher r compared with the rest of estimators (Figure 1), and when the cow had the sire genotyped (GGG, GGN, GMM, GNG, GNM and GNN) or the dam and maternal grandsire (NGG), resulted in r > 0.82, independently of the SNP panel used to genotype the cow.However, in the absence of parental genotyping (MNG, NNG and NNN), results varied among the cow's genotyping SNP panel and correlations were dropped down (even below 0.5) and close to zero for the MD SNP panels for all the estimators.
Interestingly, in the MNG ancestral genotype status, different r between genotyped and imputation F SNP was found for cows genotyped with GeneSeek Genomic Profiler 4 (n = 19) and the GeneSeek MD (n = 16), with variability of r among estimators.More precisely, the F estimator had weak negative correlations for both GeneSeek Genomic Profiler 4 (−0.10) and Gen-eSeek MD (−0.12),F ROH had correlation of 0.39 and 0.13 (for the GeneSeek MD and GeneSeek Genomic Profiler 4, respectively), while F hat3 had correlations of 0.78 for the GeneSeek MD and 0.10 for the GeneSeek Genomic Profiler 4. On the contrary, F grm had opposite correlations compared with F hat3 , regarding the SNP panels (i.e., 0.16 for the GeneSeek MD and 0.70 for the GeneSeek genomic Profiler 4), while high correlations (0.78 and 0.98 for the GeneSeek Genomic Profiler 4 and GeneSeek MD, respectively) were observed for F hat1 .A large difference between GeneSeek Genomic Profiler 4 and GeneSeek MD was observed for the F grm2 (0.94 and −0.42, respectively).This is evidence that the consistency between genotyped and imputation F SNP depends on both the ancestral genotype status and the SNP panel that the cow was genotyped.

Bias of imputation whole genome SNP inbreeding coefficients
Bias of SNP imputation F SNP when parents were genotyped.The median values of the distributions of β for all estimators ranged between 0.92 and 1.04.However, biased imputation F SNP 0 28 080 .
ˆ. ≤ ≤ ( ) β were identified in the case of GGG and/or GGN cows for F, F hat1 , F hat2 , F hat3 and F grm in all MD SNP panels, except the Labogena MD (Table 1, Figure 2).Bias was more profound for the F hat1 , F hat2 and F grm .More precisely, for the 11,041 GGG cows genotyped with the GeneSeek Genomic Profiler 4, the r between genotyped vs. imputation F SNP for F hat1 and F hat2 was 0.58 and the β was 0.28.Similarly, for F hat1 and F hat2 and the Gene-Seek MD (3,581 cows) r was 0.56 and 0.63 and β 0.64 and 0.66, respectively.Regarding the GeneSeek Genomic Profiler 3 (2,906 cows) estimates of r and β for F hat2 were 0.58 and 0.68, while for F hat1 0.63 and 0.77.Moreover, for the GeneSeek Genomic Profiler 3 and 4 (81 cows), biased imputation F SNP was found for F hat1 ) 1 30 for the GGN cows (29 cows).Similarly, for the GeneSeek Genomic Profiler 4 in the GGN cows (n = 81) the β was 1.83, 1.47 and 1.21 for the F hat1 , F hat2 and F hat3 , respectively.
Bias of imputation F SNP when parents were not genotyped.
The bias was observed across all SNP panels and estimators.The lowest β was observed for the NNN cows genotyped with GeneSeek Genomic Profiler 3 (25 cows) with values of −2.15, −2.06, −1.72 and −1.73 for the F ROH , F hat2 , F grm2 and F, respectively.Cases with ˆ.
β ≥ 2 00 were also found.This was observed for all MD SNP panels and estimators.Interestingly, bias was observed also in HD SNP panels.More precisely, there were 13 NNN cows geno-

Imputation Metrics per Ancestral Genotype Information
The potential effect of imputation success on the estimation of genomic inbreeding coefficients derived from whole genome imputed SNP data was further investigated.The imputation fill rate and imputation rate for each cow are reported in Supplemental Table S2 across the groups of ancestral genotype information and are summarized in Figure 3. Practically, a cow genotyped with a HD SNP panel is expected to have high imputation fill rate (e.g., > 98%) and low imputation rate (e.g., < 0.5%).The imputation fill rate had low variability (values between 0.92 and 1.00), while imputation rate was highly affected by the SNP panels and the available ancestral information showing values between 0.04 and 0.80.Specifically, within each SNP panel the average imputation rate was 0.80, 0.76, 0.62, 0.41, 0.08 and 0.06 for the GeneSeek Genomic Profiler 3, GeneSeek Genomic Profiler 4, GeneSeek MD, Labogena MD, GeneSeek Genomic Profiler HD-150K and the Illumina Infinium BovineHD BeadChip, respectively.As expected, cows genotyped with the 2 HD SNP panels had the imputation fill rate close to 1 (Figure 3a) and imputation rate <0.10 (Figure 3b).The GeneSeek Genomic Profiler 3 always had the highest imputation rate, followed by the GeneSeek Genomic Profiler 4, the GeneSeek MD and the Labogena MD.
Regarding the effect of the ancestral information, when both parents or when the sire and maternal grandsire, were genotyped consistent results were found on fill rate and imputation fill rate across all SNP panels.In the cases were only one or none of the 3 ancestors was genotyped (3,637 cows in total) more variation was observed for both metrics in all of the SNP panels.This effect was more profound in the GMM, MNG, NNG and NNN groups of cows.

Case study of F ROH
We further investigated the F ROH estimator, that has been suggested to be the most robust estimator for whole genome imputed SNP genomic inbreeding coefficients (Caballero et al., 2022;Dadousis et al., 2022Dadousis et al., , 2023)).As shown in Figure 4a, r between genotyped vs. imputation F SNP was increased with an increased SNP panel density (ranging from 0.79 for cows genotyped with the GeneSeek Genomic Profiler 3 to 1.00 for the 2 HD SNP panels).We further split the groups of cows based on ancestral genotyping status (Figure 4b).Overall, in the cases where at least one parent of the cow was genotyped (67,351 cows) the correlation between genotyped vs. imputed F SNP was > 0.94 for all SNP panels, with the lowest (0.94) found in the GNG case and GeneSeek Genomic Profilers 3 (Figure 3).In contrast, when none of the parents was genotyped (657 cows), discrepancies were found between genotyped and imputation F SNP estimates for the MD SNP panels.More precisely, correlations varied between 0.03 (GeneSeek Genomic Profiler 3; NNG; 92 cows) to 0.83 (GeneSeek MD; NNN; 15 cows).Negative correlations (−0.26; 15 cows) were also found for the GeneSeek Ge- nomic Profiler 3 when none of the parental information was available.In the latter case, also for GeneSeek Genomic Profiler 4 (46 cows) and Labogena MD (10 cows) correlations were low (0.28 and 0.62, respectively).There were 548 cows with only the maternal grandsire genotyped (NNG).From those, 39 were genotyped in HD and 509 in MD (92, 255, 112 and 50 cows genotyped with GeneSeek Genomic Profiler 3, GeneSeek Genomic Profiler 4, GeneSeek MD and Labogena MD, respectively).Correlations were between 0.03 -0.05 (for GeneSeek Genomic Profiler 3 and GeneSeek Genomic Profiler 4, respectively), 0.50 and 0.60 for the GeneSeek MD and Labogena MD and close to one for the two HD SNP panels.In addition to low F SNP correlation for the MD SNP panels, β was biased, varying between 0.14 -0.30 for the three GeneSeek SNP panels (Figure 4).In contrast, r was 0.92 for the Labogena MD and 1.00 for the two HD.These results evidence the effect of parental genotyping during SNP imputation for MD SNP panels that have low representation of SNPs in the imputation set, that can create discrepancies between estimates of F SNP based on genotyped or imputation SNPs.
As shown in Figure 2, in the case of MNG, although for the GeneSeek MD and Labogena MD the F SNP estimates between genotyped vs. imputation were fairly consistent, there was a bias of the imputation F SNP for cows genotyped with the GeneSeek Genomic Profilers 3 and 4 and for some cows genotyped with the GeneSeek MD.More precisely, for the subgroup of cows genotyped with the GeneSeek Genomic Profilers 3 and 4 having imputation F SNP > 0.45, the genotyped F SNP varied between 0.09 and 0.14 (pedigree f ranged between 0.00 and 0.06), while the imputation F SNP ranged between 0.47 and 0.60.Similar patterns were found in the cases of MNM, MNN, NNG and NNN with genotyped cows with the GeneSeek Genomic Profilers 3 and 4.
We further focused on ROH analysis for the cows genotyped with the GeneSeek Genomic Profiler 3 as a case study.It is common in ROH analysis to identify common ROHs within a population (given a predefined arbitrary threshold).In the R package detectRUNS this can be achieved with the function tableRuns.The overreaching hypothesis was that imputation might create artificial large homozygous blocks for GeneSeek Genomic Profiler 3 that has a low SNPs representation to the imputation SNP set.More precisely, cows genotyped with the GeneSeek Genomic Profiler 3, after imputation have 15% genotyped SNP (13,870 SNPs) and 85% imputed SNPs (70,575 SNPs).We set a threshold of 70% for a ROH identification within a group, and as group ("breed" in terms of detectRUNS language) we considered the groups of ancestral genotype information.Large homozygous segments, between 2 -75Mbp   S3).Indeed, there was a cow with F SNP = 0.60 and MNG ancestral genotype information with a ROH of ~75.4Mb length at Bos taurus autosome (BTA) 2. In this region between ~29.5 -104.8Mb containing 2,217 SNPs, only one heterozygous SNP was present, with 697 homozygous for the reference allele and 1,519 SNP homozygous for the alternative allele.Apart from BTA 2, long ROH chunks were found on other BTAs, such as BTA 16 and 17 (~23Mb long) and BTA 14 (~15Mb long).
To further evaluate the impact of imputation on genomic inbreeding coefficients Pearson correlations of F grm and F ROH between each of the downgraded MD data sets (GeneSeek Genomic Profiler 3 and Labogena MD) and the initial 84,445 SNP data were estimated (Table 3).Overall, F ROH correlations were 0.48 (set-1 vs. set-2) and 0.97 between set-1 vs. set-3.When further considering the effect of having the ancestors of the cow genotyped, correlations were still 0.96 and 0.99 for both comparisons for the 266 cows that had the sire and either the dam or the maternal grandsire genotyped,     4 was generated with the function "ggscatter" of the R package ggpubr (Kassambara, 2020).Pearson correlation coefficient (R) and regression line within each SNP panel are presented.Wherever correlations are not presented there was a computation failure due to not enough observations.In the text, only correlations estimated with at least ten data points were discussed.while they considerably dropped down for the 63 cows that did not have a genotyped (grand)parent on one side.More precisely for set-1 vs set-2 the correlation was 0.34, while in the scenario of set-1 vs. set-3 they were high (0.82), yet much lower than the cows that had the sire and either the dam or the maternal grandsire genotyped.When both parental sides had genotypes then F ROH was more solid than F grm .Nevertheless, if one parental side is not genotyped, F ROH suffers more than F grm when panel density is low, while there is an opposite scenario if density is medium.Nowadays, nearly all animals have sires and maternal grandsires genotyped so this is very supportive for the use of F ROH as the preferred measure.Also these results again support the interaction between SNP panel chosen to genotype the cows (and the final number of those SNPs included in the imputation) and having cows' parents and/or their maternal grandsire genotyped during imputation, on providing consistent F ROH estimates, when imputation SNPs are used in genomic analysis.

CONCLUSIONS
Imputation of whole genome SNP data is routinely applied in dairy cattle breeding and can drastically reduce costs for genotyping.Although imputation accuracy is, in general, very high (close to one) there might be cows with inaccurate imputation, e.g., because their parents were not genotyped.Our results showed that whole genome SNP inbreeding coefficients derived from imputation SNPs could be biased for cows that do not have parents and the maternal grandsire genotyped and were also genotyped with a SNP panel with low representation of its SNPs on the selected imputation SNP data.This bias might be attributed to the imputation procedure rather than to the estimator used to calculate inbreeding coefficients or the SNP panel that the cow was genotyped.Moreover, this can be observed even for cows that have been genotyped with high-density SNP panels, but is more profound in medium-density SNP panels that have few SNP included in the final imputation data.Genotyping of at least one parent of the cow, especially the sire, alleviates this effect independently of the SNP panel used to genotype the cow.The F hat1 and F hat2 estimators in PLINK and F grm (as applied in our study) were influenced by the ancestral genotype status, even in the case that both parents and the maternal grandsire were genotyped, for the cows genotyped with MD SNP panels.The choice of SNP panel for genotyping cows and parental genotyping should be considered when designing imputation strategies for genomic analysis in dairy cattle.For computing genomic inbreeding coefficients, it is recommendable (a) to have at least 20K SNPs in common between the SNP panel and the imputation set, (b) to have at least one parent genotyped, (c) use an ROH based estimator.
Dadousis et al.: GENOMIC INBREEDING WITH IMPUTATION SNP Dadousis et al.: GENOMIC INBREEDING WITH IMPUTATION SNP

Figure 1 .
Figure 1.Average Pearson (a) and Spearman (b) correlation coefficients between genotyped vs. imputed whole genome SNP inbreeding coefficients for each estimator across group of ancestral genotype information (sire-dam-maternal grandsire; G: Genotyped, N: Not genotyped, M: Missing information) and SNP panel.Red horizontal line was set to zero.Dashed grey horizontal line was set to 0.8.

Figure 2 .
Figure 2. Regression coefficients obtained by regressing imputation on genotyped whole genome SNP inbreeding coefficients for each estimator across group of ancestral genotype information (sire-dam-maternal grandsire; G: Genotyped, N: Not genotyped, M: Missing information) and SNP panel.Blue horizontal line was set to one.

(Figure 3 .
Figure 3. Distribution of a) imputation fill rate (the fraction of non-missing SNPs after imputation) and b) imputation rate [fraction of completely (both alleles) imputed SNPs out of the non-missing SNPs after imputation] per group of ancestral genotype information (sire-dammaternal grandsire; G: Genotyped, N: Not genotyped, M: Missing information) and SNP panel, estimated following the runs of homozygosity method.

Figure 4 .
Figure 4. Genotyped vs. imputation whole genome SNP inbreeding coefficients estimated with runs of homozygosity (F ROH ) per a) SNP panel and b) group of ancestral genotype information (sire-dam-maternal grandsire; G: Genotyped, N: Not genotyped, M: Missing information) and SNP panel.Figure4was generated with the function "ggscatter" of the R package ggpubr(Kassambara, 2020).Pearson correlation coefficient (R) and regression line within each SNP panel are presented.Wherever correlations are not presented there was a computation failure due to not enough observations.In the text, only correlations estimated with at least ten data points were discussed.
Figure 4. Genotyped vs. imputation whole genome SNP inbreeding coefficients estimated with runs of homozygosity (F ROH ) per a) SNP panel and b) group of ancestral genotype information (sire-dam-maternal grandsire; G: Genotyped, N: Not genotyped, M: Missing information) and SNP panel.Figure4was generated with the function "ggscatter" of the R package ggpubr(Kassambara, 2020).Pearson correlation coefficient (R) and regression line within each SNP panel are presented.Wherever correlations are not presented there was a computation failure due to not enough observations.In the text, only correlations estimated with at least ten data points were discussed.

Table 1 .
Dadousis et al.: GENOMIC INBREEDING WITH IMPUTATION SNP Pearson (r), Spearman (ρ) and regression coefficient β ( ) in cows having both parents genotyped and SNP panel = SNP panel used to genotype the cow; Ancestral genotypes = ancestral genotype information (sire-dam-maternal grandsire; G: Genotyped, N: Not genotyped); Estimator = estimator used for whole genome SNP inbreeding coefficients; N = number of cows; Results are sorted by ascending ˆ.β .

Table 2 .
Pearson (r), Spearman (ρ) and regression coefficient β ( ) when comparing genotyped vs. imputation whole genome SNP inbreeding coefficients in cows having none of the parents genotyped and .

Table 3 .
Assessment of the impact of imputation on genomic inbreeding coefficients using imputation of partly downgraded sets of samples HD samples original results; set-2: Part of HD samples downgraded to GeneSeek Genomic Profiler 3 before imputation; set-3: Part of HD samples downgraded to Labogena MD before imputation.