Discovery of large genomic inversions using pooled clone sequencing

Mapping Intimacies ◽

10.1101/015156 ◽

2015 ◽

Author(s):

Marzieh Eslami Rasekh ◽

Giorgia Chiatante ◽

Mattia Miroballo ◽

Joyce Tang ◽

Mario Ventura ◽

...

Keyword(s):

High Throughput Sequencing ◽

Simulated Data ◽

Segmental Duplications ◽

Genomic Structural Variation ◽

Sequencing Technologies ◽

Clone Sequencing ◽

Sequencing Method ◽

Number Variation ◽

Using Data

There are many different forms of genomic structural variation that can be broadly classified as copy number variation (CNV) and balanced rearrangements. Although many algorithms are now available in the literature that aim to characterize CNVs, discovery of balanced rearrangements (inversions and translocations) remains an open problem. This is mainly because the breakpoints of such events typically lie within segmental duplications and common repeats, which reduce the mappability of short reads. The 1000 Genomes Project spearheaded the development of several methods to identify inversions, however, they are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies (HTS). Here we propose to use a sequencing method (Kitzman et al., 2011) originally developed to improve haplotype resolution to characterize large genomic inversions. This method, called pooled clone sequencing, merges the advantages of clone based sequencing approach with the speed and cost efficiency of HTS technologies. Using data generated with pooled clone sequencing method, we developed a novel algorithm, dipSeq, to discover large inversions (>500 Kbp). We show the power of dipSeq first on simulated data, and then apply it to the genome of a HapMap individual (NA12878). We were able to accurately discover all previously known and experimentally validated large inversions in the same genome. We also identified a novel inversion, and confirmed using fluorescent in situ hybridization. Availability: Implementation of the dipSeq algorithm is available at https://github.com/BilkentCompGen/dipseq

Download Full-text

Characterization of segmental duplications and large inversions using Linked-Reads

10.1101/394528 ◽

2018 ◽

Cited By ~ 4

Author(s):

Fatih Karaoglanoglu ◽

Camir Ricketts ◽

Marzieh Eslami Rasekh ◽

Ezgi Ebren ◽

Iman Hajirasouliha ◽

...

Keyword(s):

High Throughput Sequencing ◽

Segmental Duplications ◽

Sequencing Data ◽

Full Spectrum ◽

Genomic Structural Variation ◽

Split Read ◽

Long Read ◽

Novel Algorithms ◽

Insertion Locus

AbstractMany algorithms aimed at characterizing genomic structural variation (SV) have been developed since the inception of high-throughput sequencing. However, the full spectrum of SVs in the human genome is not yet assessed. Most of the existing methods focus on discovery and genotyping of deletions, insertions, and mobile elements. Detection of balanced SVs with no gain or loss of genomic segments (e.g., inversions) is particularly a challenging task. Long read sequencing has been leveraged to find short inversions but there is still a need to develop methods to detect large genomic inversions. Furthermore, currently there are no algorithms to predict the insertion locus of large interspersed segmental duplications.Here we propose novel algorithms to characterize large (>40Kbp) interspersed segmental duplications and (>80Kbp) inversions using Linked-Read sequencing data. Linked-Read sequencing provides long range information, where Illumina reads are tagged with barcodes that can be used to assign short reads to pools of larger (30-50 Kbp) molecules. Our methods rely on split molecule sequence signature that we have previously described [11]. Similar to the split read, split molecules refer to large segments of DNA that span an SV breakpoint. Therefore, when mapped to the reference genome, the mapping of these segments would be discontinuous. We redesign our earlier algorithm, VALOR, to specifically leverage Linked-Read sequencing data to discover large inversions and characterize interspersed segmental duplications. We implement our new algorithms in a new software package, called VALOR2.AvailabilityVALOR2 is available at https://github.com/BilkentCompGen/valor.

Download Full-text

Copy number variation in female infertility and candidate gene screening for common infertility-related diseases

Bulletin of the Karaganda University. “Biology, medicine, geography Series” ◽

10.31489/2021bmg3/73-79 ◽

2021 ◽

Vol 103 (3) ◽

pp. 73-79

Author(s):

Zhainagul Kozhabek ◽

◽

Min Pang ◽

Qiongzhen Zhao ◽

Jiangyan Yi ◽

...

Keyword(s):

Copy Number Variation ◽

High Frequency ◽

Copy Number ◽

High Throughput Sequencing ◽

Female Infertility ◽

Functional Genes ◽

Genome Copy ◽

Sequencing Technologies ◽

Number Variation ◽

Genome Copy Number

To investigate the correlation between the genome copy number variation and female infertility we collected 3962 female infertility samples and analyzed copy number variation (CNV) using high-throughput sequencing technologies. In this study 269 CNVs were found in 246 samples, 17 of which were new CNVs. The occurrence of CNVs was mostly found in X chromosome, and some candidate genes related to female infertility were screened. We also found some high frequency CNVs, which contain important functional genes. This study filled the blank of CNV research on female infertility and discovered the characteristics of CNV (CNV preference, recurrent CNV), which provided genetic reference for female infertility.

Download Full-text

Tools and best practices for retrotransposon analysis using high-throughput sequencing data

Mobile DNA ◽

10.1186/s13100-019-0192-1 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 4

Author(s):

Aurélie Teissandier ◽

Nicolas Servant ◽

Emmanuel Barillot ◽

Deborah Bourc’his

Keyword(s):

Transposable Elements ◽

Transposable Element ◽

Molecular Mechanisms ◽

High Throughput Sequencing ◽

Reference Genome ◽

Repetitive Sequences ◽

Simulated Data ◽

Sequencing Data ◽

Sequencing Technologies ◽

Human Genomes

Abstract Background Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets. Results Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution. Conclusions Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.

Download Full-text

Next Generation Sequencing: Potential and Application in Drug Discovery

The Scientific World JOURNAL ◽

10.1155/2014/802437 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

Navneet Kumar Yadav ◽

Pooja Shukla ◽

Ankur Omer ◽

Shruti Pareek ◽

R. K. Singh

Keyword(s):

Drug Discovery ◽

High Throughput Sequencing ◽

Next Generation ◽

Drug Discovery Process ◽

Animal Kingdom ◽

New Era ◽

Sequencing Technologies ◽

Oligonucleotide Detection ◽

Generation Sequencing

The world has now entered into a new era of genomics because of the continued advancements in the next generation high throughput sequencing technologies, which includes sequencing by synthesis-fluorescent in situ sequencing (FISSEQ), pyrosequencing, sequencing by ligation using polony amplification, supported oligonucleotide detection (SOLiD), sequencing by hybridization along with sequencing by ligation, and nanopore technology. Great impacts of these methods can be seen for solving the genome related problems of plant and animal kingdom that will open the door of a new era of genomics. This may ultimately overcome the Sanger sequencing that ruled for 30 years. NGS is expected to advance and make the drug discovery process more rapid.

Download Full-text

Reassortment of Genome Segments Creates Stable Lineages Among Strains of Orchid Fleck Virus Infecting Citrus in Mexico

Phytopathology ◽

10.1094/phyto-07-19-0253-fi ◽

2020 ◽

Vol 110 (1) ◽

pp. 106-120 ◽

Cited By ~ 1

Author(s):

Avijit Roy ◽

Andrew L. Stone ◽

Gabriel Otero-Colina ◽

Gang Wei ◽

Ronald H. Brlansky ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sensu Stricto ◽

Genome Segment ◽

Rt Pcr ◽

Sequence Comparisons ◽

Orchid Fleck Virus ◽

Reverse Transcription Pcr ◽

Sequencing Technologies ◽

Negative Sense

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.

Download Full-text

HP1 drives de novo 3D genome reorganization in early Drosophila embryos

Nature ◽

10.1038/s41586-021-03460-z ◽

2021 ◽

Author(s):

Fides Zenk ◽

Yinxiu Zhan ◽

Pavel Kos ◽

Eva Löser ◽

Nazerke Atinbayeva ◽

...

Keyword(s):

Genome Organization ◽

Molecular Mechanisms ◽

High Throughput Sequencing ◽

De Novo ◽

Early Embryo ◽

Heterochromatin Protein ◽

Chromosome Conformation ◽

3D Genome ◽

Genome Reorganization

AbstractFundamental features of 3D genome organization are established de novo in the early embryo, including clustering of pericentromeric regions, the folding of chromosome arms and the segregation of chromosomes into active (A-) and inactive (B-) compartments. However, the molecular mechanisms that drive de novo organization remain unknown1,2. Here, by combining chromosome conformation capture (Hi-C), chromatin immunoprecipitation with high-throughput sequencing (ChIP–seq), 3D DNA fluorescence in situ hybridization (3D DNA FISH) and polymer simulations, we show that heterochromatin protein 1a (HP1a) is essential for de novo 3D genome organization during Drosophila early development. The binding of HP1a at pericentromeric heterochromatin is required to establish clustering of pericentromeric regions. Moreover, HP1a binding within chromosome arms is responsible for overall chromosome folding and has an important role in the formation of B-compartment regions. However, depletion of HP1a does not affect the A-compartment, which suggests that a different molecular mechanism segregates active chromosome regions. Our work identifies HP1a as an epigenetic regulator that is involved in establishing the global structure of the genome in the early embryo.

Download Full-text

Application of Oxford Nanopore Technology to Plant Virus Detection

Viruses ◽

10.3390/v13081424 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1424

Author(s):

Lia W. Liefting ◽

David W. Waite ◽

Jeremy R. Thompson

Keyword(s):

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Diagnostic Methods ◽

Plant Virus Detection ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Virus Diagnostics ◽

Post Entry ◽

Read Accuracy

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.

Download Full-text

Metagenome-assembled genomes infer potential microbial metabolism in alkaline sulphidic tailings

Environmental Microbiome ◽

10.1186/s40793-021-00380-3 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Wenjun Li ◽

Xiaofang Li

Keyword(s):

Community Structure ◽

Mine Tailings ◽

Fluorescent In Situ Hybridization ◽

High Throughput Sequencing ◽

Microbial Consortia ◽

Metabolic Reconstruction ◽

Metal Release ◽

Community Members ◽

Oxidative Stresses

Abstract Background Mine tailings are hostile environment. It has been well documented that several microbes can inhabit such environment, and metagenomic reconstruction has successfully pinpointed their activities and community structure in acidic tailings environments. We still know little about the microbial metabolic capacities of alkaline sulphidic environment where microbial processes are critically important for the revegetation. Microbial communities therein may not only provide soil functions, but also ameliorate the environment stresses for plants’ survival. Results In this study, we detected a considerable amount of viable bacterial and archaeal cells using fluorescent in situ hybridization in alkaline sulphidic tailings from Mt Isa, Queensland. By taking advantage of high-throughput sequencing and up-to-date metagenomic binning technology, we reconstructed the microbial community structure and potential coupled iron and nitrogen metabolism pathways in the tailings. Assembly of 10 metagenome-assembled genomes (MAGs), with 5 nearly complete, was achieved. From this, detailed insights into the community metabolic capabilities was derived. Dominant microbial species were seen to possess powerful resistance systems for osmotic, metal and oxidative stresses. Additionally, these community members had metabolic capabilities for sulphide oxidation, for causing increased salinity and metal release, and for leading to N depletion. Conclusions Here our results show that a considerable amount of microbial cells inhabit the mine tailings, who possess a variety of genes for stress response. Metabolic reconstruction infers that the microbial consortia may actively accelerate the sulphide weathering and N depletion therein.

Download Full-text

Nebula: ultra-efficient mapping-free structural variant genotyper

Nucleic Acids Research ◽

10.1093/nar/gkab025 ◽

2021 ◽

Author(s):

Parsoa Khorsand ◽

Fereydoun Hormozdiari

Keyword(s):

Large Scale ◽

Structural Variants ◽

Sequencing Technologies ◽

Generic Framework ◽

Common Genetic Variants ◽

Order Of Magnitude ◽

Complex Events ◽

Comparable Accuracy ◽

Using Data ◽

Computational Resources

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.

Download Full-text

Application of Copy Number Variation Sequencing in Genetic Analysis of Miscarriages in Early and Middle Pregnancy

Cytogenetic and Genome Research ◽

10.1159/000512801 ◽

2020 ◽

Vol 160 (11-12) ◽

pp. 634-642

Author(s):

Shiqiang Luo ◽

Xingyuan Chen ◽

Tizhen Yan ◽

Jiaolian Ya ◽

Zehui Xu ◽

...

Keyword(s):

Copy Number Variation ◽

High Throughput ◽

Copy Number ◽

High Throughput Sequencing ◽

Chromosomal Abnormalities ◽

Pregnancy Termination ◽

Mendelian Inheritance ◽

Copy Number Variations ◽

Abnormal Chromosome ◽

Number Variation

High-throughput sequencing based on copy number variation (CNV-seq) is commonly used to detect chromosomal abnormalities. This study identifies chromosomal abnormalities in aborted embryos/fetuses in early and middle pregnancy and explores the application value of CNV-seq in determining the causes of pregnancy termination. High-throughput sequencing was used to detect chromosome copy number variations (CNVs) in 116 aborted embryos in early and middle pregnancy. The detection data were compared with the Database of Genomic Variants (DGV), the Database of Chromosomal Imbalance and Phenotype in Humans using Ensemble Resources (DECIPHER), and the Online Mendelian Inheritance in Man (OMIM) database to determine the CNV type and the clinical significance. High-throughput sequencing results were successfully obtained in 109 out of 116 specimens, with a detection success rate of 93.97%. In brief, there were 64 cases with abnormal chromosome numbers and 23 cases with CNVs, in which 10 were pathogenic mutations and 13 were variants of uncertain significance. An abnormal chromosome number is the most important reason for embryo termination in early and middle pregnancy, followed by pathogenic chromosome CNVs. CNV-seq can quickly and accurately detect chromosome abnormalities and identify microdeletion and microduplication CNVs that cannot be detected by conventional chromosome analysis, which is convenient and efficient for genetic etiology diagnosis in miscarriage.

Download Full-text