If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
In cattle, the X chromosome accounts for approximately 3 and 6% of the genome in bulls and cows, respectively. In spite of the large size of this chromosome, very few studies report analysis of the X chromosome in genome-wide association studies and genomic selection. This lack of genetic interrogation is likely due to the complexities of undertaking these studies given the hemizygous state of some, but not all, of the X chromosome in males. The first step in facilitating analysis of this gene-rich chromosome is to accurately identify coordinates for the pseudoautosomal boundary (PAB) to split the chromosome into a region that may be treated as autosomal sequence (pseudoautosomal region) and a region that requires more complex statistical models. With the recent release of ARS-UCD1.2, a more complete and accurate assembly of the cattle genome than was previously available, it is timely to fine map the PAB for the first time. Here we report the use of SNP chip genotypes, short-read sequences, and long-read sequences to fine map the PAB (X chromosome:133,300,518) and simultaneously determine the neighboring regions of reduced homology and true pseudoautosomal region. These results greatly facilitate the inclusion of the X chromosome in genome-wide association studies, genomic selection, and other genetic analysis undertaken on this reference genome.
In mammals, sex determination is based on heteromorphic sex chromosomes designated X and Y. Females have 2 of the large, gene-rich X chromosomes, whereas males have a single X chromosome and a small, gene-poor Y chromosome. Although the X and Y chromosomes differ significantly in gene content, they do share a relatively small region of sequence homology known as the pseudoautosomal region (PAR). In males, recombination of the X and Y chromosome-specific regions of the sex chromosomes is not possible. In contrast, pairing and recombination within the PAR of X and Y chromosomes is a critical step for faithful segregation of the sex chromosomes during male meiosis (
). The physical domain of the PAR is demarcated by the pseudoautosomal boundary (PAB), where the sequence similarity between X and Y chromosomes decreases from ∼100% to between 80 and 50% in a region referred to as the region of reduced homology, which is a transition to the purely sex specific sequence (
). Despite the vital role of PAR recombination during male meiosis, the PAR is evolving, in terms of gene structure and DNA sequence variation, at a considerably faster rate than both adjacent chromosomal regions and autosomes (reviewed by
). The rapid rate of PAR evolution has presented challenges in between-species comparison of molecular organization, and only a few reports of PAR comparisons between species other than human and mouse have been undertaken (
), with these comparisons limited to rough mapping.
The past 10 yr has seen the rise of genome-wide association studies (GWAS) in a wide range of species, including cattle, and has resulted in remarkable advances being made in the interrogation of complex traits (reviewed by
). However, to date, researchers have almost exclusively used additive models in GWAS. Additive models are often of limited use for analysis of the X chromosome unless the population under investigation is entirely female (so that all individuals carry 2 copies of the X chromosome). Even in cases where all individuals studied are female, the use of additive models may not be appropriate given the inactivation of 1 X chromosome in each cell. Although X inactivation is largely thought to be random in nature (
). The first step required for the X chromosome to be included in GWAS in a meaningful manner is fine mapping of the PAR and, more importantly, accurate identification of the PAB. Identification of the PAB then allows the PAR to be analyzed as an autosome separately from the sex-specific region that will require specialized statistical models. Similarly, identification of the PAB is required to undertake other common or important steps in genetic analysis, including accurate imputation and use of X chromosome genotypes to identify or confirm the sex of a sample (
). However, positioning of this sequence onto the subsequent genome assembly (UMD3.1; https://www.ncbi.nlm.nih.gov/assembly/GCA_000003055.5) is not published, perhaps due to the complicated structure of the assembled PAR in UMD3.1 (represented in multiple fragments;
The objective of our study was to finely map the PAR and identify the exact location of the PAB in the newly released bovine reference genome assembly (ARS-UCD1.2, https://www.ncbi.nlm.nih.gov/assembly/GCA_002263795.2; GenBank accession NKLS00000000.2). Data sets used to fine map the PAB were (1) Illumina (San Diego, CA) 150-bp paired end read whole-genome sequence from 2 bulls and 2 cows (∼30× coverage/animal), representative of the New Zealand dairy population (Holstein Friesian and Jersey); (2) Illumina BovineHD SNP chip genotypes from 578 bulls and 3,306 cows; and (3) whole-genome long-read sequences (Pacific Biosciences, Menlo Park, CA; ∼60× coverage) from 1 New Zealand Holstein Friesian bull and 1 New Zealand Jersey bull.
Illumina sequence and SNP chip positions were mapped using BWA-MEM (
) to ARS-UCD1.2 both in the presence and absence of a Y chromosome assembly. Given that the ARS-UCD1.2 reference genome has been assembled from a cow, no Y chromosome sequence is available from this animal. The best available Y chromosome assembly generated from the reference cow's sire (Btau_5.0.1; GenBank accession number AAFC00000000) was therefore used (the small PAR sequence present in this assembly was masked) to identify the region of the X chromosome where all males were homozygous and females were heterozygous (sex-specific region) and to identify a sudden change in read depth in bulls (but not cows) indicating the end of the PAR. However, as previously reported (
), use of SNP chip genotypes allows only a rough estimation of the PAB. Similarly, utilizing sequence coverage with the aim of observing a 2-fold difference in coverage provided only a rough guide as to PAB location due in part to the presence of repetitive elements on both the X and Y chromosomes (data not shown;
). Given the challenges in accurately identifying the PAB using short-read sequences and SNP chip genotypes, we used long DNA sequence reads generated from both Pacific Biosciences (PacBio) RS II (Holstein Friesian; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY) and PacBio Sequel (Jersey; USDA, Clay Center, NE) instruments. The PacBio sequences were mapped to the ARS-UCD1.2 assembly of the bovine genome plus Btau_5.0.1 chromosome Y (PAR masked) using minimap2 (
). Analysis of the region identified as most likely to contain the PAB (ChrX:133,280,000–133,310,000) based on genotypes and short-read sequences clearly revealed the PAR, PAB, and the region of reduced homology (Figure 1; region of reduced homology sequence Supplemental Figure S1, https://doi.org/10.3168/jds.2018-15638). The presence of numerous soft clipped reads (indicated by sequential colored nucleotides in mapped sequences; Figure 1), where clipping had occurred within a narrow window, clearly indicated the presence of DNA sequence reads originating from the Y chromosome. The Y chromosome derivation of these soft clipped reads was confirmed by determining that all these soft clipped regions mapped to a Y chromosome location. Furthermore, sequence reads that had undergone soft clipping contained a specific haplotype distinguishing them from reads that were not soft clipped. This haplotype represents the region of reduced homology, where sequence similarity between the X and Y chromosomes is sufficiently high for mapping algorithms to map reads with a considerable number of variants highlighted. The 2 individual bulls for which PacBio sequence was available displayed 40 identical variants, with the Holstein Friesian bull carrying an additional 4 variants in this region of reduced homology. The sequence similarity observed in this region, and the mapping of Y chromosome sequences to the X chromosome here, are confounding factors when attempting to use short- and long-read sequence data to identify a 2-fold change in coverage at the PAB. Because of this sequence similarity, the expected 2-fold drop in sequence depth beginning at the PAB was not observed in PacBio data. Rather, a gradual tapering of read depth was seen across the length of this region, with a more marked drop where the true sex specific sequence begins (and difference in coverage was approximately half that of the PAR).
The region of reduced homology is 394 bp and, in the 2 bulls analyzed in our study, the similarity between X and Y chromosomes was higher (89–90%) than reported for many species (50–85%;
). The final position in the region of reduced homology is on the X chromosome (ChrX) at 133,300,517, and the PAB was therefore assigned position ChrX:133,300,518. A comparison of sequences around this chromosomal location with sequences surrounding the PAB, as described by
before the release of the first bovine reference genome, indicated a perfect match between the PAB (and region of reduced homology) identified in our study using PacBio sequence and sequence comparison of bacterial artificial chromosome sequencing previously reported (
). Taken together, this presents strong evidence that this position does represent the true PAB.
The size of the PAR on the ARS-UCD1.2 reference genome assembly was 5,708,626 bp. This is notably smaller than the PAR in the previous bovine reference genome (UMD3.1), which consisted of 2 closely located regions with a combined size of approximately 8 Mbp (
). A comparative analysis of PAR regions in UMD3.1 and ARS-UCD1.2 using map positions of Illumina BovineHD SNP chip markers (Figure 2) indicated approximately one-third of the difference in size can be attributed to SNP previously positioned on the first PAR region now mapping to the 900-kb ARS-UCD1.2 “contig_X_unplaced.” The remainder of the first UMD3.1 PAR region now resides within the single PAR in ARS-UCD1.2. Illumina BovineHD SNP chip genotypes from SNP mapping to the “contig_X_unplaced” contig from 578 bulls were reported almost exclusively as homozygous (in reality, hemizygous), whereas genotypes from females were a mix of homozygous and heterozygous (data not shown), thereby providing strong evidence that the sequence represented in this contig is not a part of the PAR.
In conclusion, we used long-read PacBio sequences from 2 bulls to fine map the PAR and accurately identify the PAB (ChrX:133,300,518) and region of reduced homology on the X chromosome. These data are important for utilizing the X chromosome in all genetic analyses that involve imputation and GWAS, as it allows the necessary splitting of this chromosome into a region that can be analyzed in a manner similar to autosomes and the larger, sex-specific region that requires different statistical models for interrogation.
The authors thank Vivienne Bennett (Livestock Improvement Corporation, Hamilton, New Zealand) for critical reading of the manuscript. Funding for this project was provided by the New Zealand Ministry for Primary Industries (Wellington, New Zealand) as a Primary Growth Partnership.