Long-read only assembly of Drechmeria coniospora genomes reveals widespread chromosome plasticity and illustrates the limitations of current nanopore methods

Mapping Intimacies ◽

10.1101/866020 ◽

2019 ◽

Author(s):

Damien Courtine ◽

Jan Provaznik ◽

Jerome Reboul ◽

Guillaume Blanc ◽

Vladimir Benes ◽

...

Keyword(s):

State Of The Art ◽

Genome Structure ◽

Nematophagous Fungus ◽

Genome Sequences ◽

Depth Analysis ◽

Long Read ◽

Different Strains ◽

Sequence Errors ◽

Eukaryotic Genomes ◽

Chromosome Level

AbstractLong read sequencing is increasingly being used to determine eukaryotic genomes. We used nanopore technology to generate chromosome-level assemblies for 3 different strains of Drechmeria coniospora, a nematophagous fungus used extensively in the study of innate immunity in Caenorhabditis elegans. One natural geographical isolate demonstrated high stability over decades, whereas a second isolate, not only had a profoundly altered genome structure, but exhibited extensive instability. We conducted an in-depth analysis of sequence errors within the 3 genomes and established that even with state-of-the-art tools, nanopore methods alone are insufficient to generate eukaryotic genome sequences of sufficient accuracy to merit inclusion in public databases.

Download Full-text

Long-read only assembly of Drechmeria coniospora genomes reveals widespread chromosome plasticity and illustrates the limitations of current nanopore methods

GigaScience ◽

10.1093/gigascience/giaa099 ◽

2020 ◽

Vol 9 (9) ◽

Author(s):

Damien Courtine ◽

Jan Provaznik ◽

Jerome Reboul ◽

Guillaume Blanc ◽

Vladimir Benes ◽

...

Keyword(s):

Gene Evolution ◽

Genome Structure ◽

Nematophagous Fungus ◽

Genome Plasticity ◽

New Information ◽

Depth Analysis ◽

Long Read ◽

Different Strains ◽

Sequence Errors ◽

Eukaryotic Genomes

Abstract Background Long-read sequencing is increasingly being used to determine eukaryotic genomes. We used nanopore technology to generate chromosome-level assemblies for 3 different strains of Drechmeria coniospora, a nematophagous fungus used extensively in the study of innate immunity in Caenorhabditis elegans. Results One natural geographical isolate demonstrated high stability over decades, whereas a second isolate not only had a profoundly altered genome structure but exhibited extensive instability. We conducted an in-depth analysis of sequence errors within the 3 genomes and established that even with state-of-the-art tools, nanopore methods alone are insufficient to generate eukaryotic genome sequences of sufficient accuracy to merit inclusion in public databases. Conclusions Although nanopore long-read sequencing is not accurate enough to produce publishable eukaryotic genomes, in our case, it has revealed new information about genome plasticity in D. coniospora and provided a backbone that will permit future detailed study to characterize gene evolution in this important model fungal pathogen.

Download Full-text

Chromosome-level genome assembly of Japanese chestnut (Castanea crenata Sieb. et Zucc.) reveals conserved chromosomal segments in woody rosids

10.1101/2021.07.29.454274 ◽

2021 ◽

Author(s):

Kenta Shirasawa ◽

Sogo Nishio ◽

Shingo Terakami ◽

Roberto Botta ◽

Daniela Torello Marinoni ◽

...

Keyword(s):

Genome Assembly ◽

Sequence Data ◽

Repetitive Sequences ◽

Genome Structure ◽

Resistance Mechanisms ◽

Nucleotide Polymorphisms ◽

Genome Sequences ◽

Castanea Crenata ◽

Long Read ◽

Chromosome Level

Japanese chestnut (Castanea crenata Sieb. et Zucc.), unlike other Castanea species, is resistant to most diseases and wasps. However, genomic data of Japanese chestnut that could be used to determine its biotic stress resistance mechanisms have not been reported to date. In this study, we employed long-read sequencing and genetic mapping to generate genome sequences of Japanese chestnut at the chromosome level. Long reads (47.7 Gb; 71.6× genome coverage) were assembled into 781 contigs, with a total length of 721.2 Mb and a contig N50 length of 1.6 Mb. Genome sequences were anchored to the chestnut genetic map, comprising 14,973 single nucleotide polymorphisms (SNPs) and covering 1,807.8 cM map distance, to establish a chromosome-level genome assembly (683.8 Mb), with 69,980 potential protein-encoding genes and 425.5 Mb repetitive sequences. Furthermore, comparative genome structure analysis revealed that Japanese chestnut shares conserved chromosomal segments with woody plants, but not with herbaceous plants, of rosids. Overall, the genome sequence data of Japanese chestnut generated in this study is expected to enhance not only its genetics and genomics but also the evolutionary genomics of woody rosids.

Download Full-text

Integrating Hi-C links with assembly graphs for chromosome-scale assembly

10.1101/261149 ◽

2018 ◽

Cited By ~ 9

Author(s):

Jay Ghurye ◽

Arang Rhie ◽

Brian P. Walenz ◽

Anthony Schmitt ◽

Siddarth Selvaraj ◽

...

Keyword(s):

Chromosome Number ◽

Open Source ◽

De Novo ◽

State Of The Art ◽

A Priori ◽

Economical Method ◽

A Genome ◽

Long Read ◽

Reference Quality ◽

Eukaryotic Genomes

AbstractLong-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSAAuthor summaryHi-C technology was originally proposed to study the 3D organization of a genome. Recently, it has also been applied to assemble large eukaryotic genomes into chromosome-scale scaffolds. Despite this, there are few open source methods to generate these assemblies. Existing methods are also prone to small inversion errors due to noise in the Hi-C data. In this work, we address these challenges and develop a method, named SALSA2. SALSA2 uses sequence overlap information from an assembly graph to correct inversion errors and provide accurate chromosome-scale assemblies.

Download Full-text

Security assurance cases—state of the art of an emerging approach

Empirical Software Engineering ◽

10.1007/s10664-021-09971-7 ◽

2021 ◽

Vol 26 (4) ◽

Author(s):

Mazen Mohamad ◽

Jan-Philipp Steghöfer ◽

Riccardo Scandariato

Keyword(s):

State Of The Art ◽

Tool Support ◽

Security Assurance ◽

The Past ◽

Depth Analysis ◽

Structured Argumentation ◽

Security Properties ◽

Standards And Regulations ◽

Assurance Cases ◽

Security Standards

AbstractSecurity Assurance Cases (SAC) are a form of structured argumentation used to reason about the security properties of a system. After the successful adoption of assurance cases for safety, SAC are getting significant traction in recent years, especially in safety-critical industries (e.g., automotive), where there is an increasing pressure to be compliant with several security standards and regulations. Accordingly, research in the field of SAC has flourished in the past decade, with different approaches being investigated. In an effort to systematize this active field of research, we conducted a systematic literature review (SLR) of the existing academic studies on SAC. Our review resulted in an in-depth analysis and comparison of 51 papers. Our results indicate that, while there are numerous papers discussing the importance of SAC and their usage scenarios, the literature is still immature with respect to concrete support for practitioners on how to build and maintain a SAC. More importantly, even though some methodologies are available, their validation and tool support is still lacking.

Download Full-text

Persistent memory hash indexes

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446101 ◽

2021 ◽

Vol 14 (5) ◽

pp. 785-798

Author(s):

Daokun Hu ◽

Zhiwen Chen ◽

Jianbing Wu ◽

Jianhua Sun ◽

Hao Chen

Keyword(s):

Future Development ◽

High Performance ◽

Performance Metrics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Hash Tables ◽

Trade Offs ◽

Depth Analysis ◽

Persistent Memory ◽

Memory Modules

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.

Download Full-text

Systemically important financial institutions in Latin America - a Primer

Brazilian Journal of Political Economy ◽

10.1590/0101-31572016v36n02a09 ◽

2016 ◽

Vol 36 (2) ◽

pp. 410-429

Author(s):

JACOB KLEINOW ◽

MARIO GARCIA MOLINA ◽

ANDREAS HORSCH

Keyword(s):

Latin America ◽

Latin American ◽

Financial Institutions ◽

Global Financial Crisis ◽

State Of The Art ◽

Risk Exposure ◽

Historical Experience ◽

Systemic Risks ◽

Depth Analysis ◽

The Global Financial Crisis

ABSTRACT Financial institutions show a characteristic risk exposure and vulnerability, making them prone to instability. Financial systems in Latin America, however, were left largely unscathed by the global financial crisis starting in 2008. This state-of-the-art survey provides an in-depth analysis on the identification and regulation of systemically important financial institutions (SIFIs). While Latin America benefits from its rich historical experience in managing systemic risks, we find the problem of SIFIs to be still underestimated. However, there are first efforts to cope with SIFIs in science and particularly Latin American supervisors and regulators are starting to take the threat posed by SIFIs seriously.

Download Full-text

Genotyping structural variants in pangenome graphs using the vg toolkit

10.1101/654566 ◽

2019 ◽

Cited By ~ 7

Author(s):

Glenn Hickey ◽

David Heller ◽

Jean Monlong ◽

Jonas A. Sibbesen ◽

Jouni Sirén ◽

...

Keyword(s):

De Novo ◽

State Of The Art ◽

Effective Means ◽

Point Mutations ◽

Structural Variants ◽

Short Read ◽

Yeast Strains ◽

Sequencing Studies ◽

Long Read

AbstractStructural variants (SVs) remain challenging to represent and study relative to point mutations despite their demonstrated importance. We show that variation graphs, as implemented in the vg toolkit, provide an effective means for leveraging SV catalogs for short-read SV genotyping experiments. We benchmarked vg against state-of-the-art SV genotypers using three sequence-resolved SV catalogs generated by recent long-read sequencing studies. In addition, we use assemblies from 12 yeast strains to show that graphs constructed directly from aligned de novo assemblies improve genotyping compared to graphs built from intermediate SV catalogs in the VCF format.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

De Novo Mutational Signature Discovery in Tumor Genomes using SparseSignatures

10.1101/384834 ◽

2018 ◽

Cited By ~ 5

Author(s):

Avantika Lal ◽

Keli Liu ◽

Robert Tibshirani ◽

Arend Sidow ◽

Daniele Ramazzotti

Keyword(s):

Cross Validation ◽

De Novo ◽

State Of The Art ◽

Point Mutations ◽

Simulated Data ◽

Large Datasets ◽

Genome Sequences ◽

Mutational Signatures ◽

Mutational Signature ◽

Current State

AbstractCancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using standard metrics. We then apply SparseSignatures to whole genome sequences of 147 tumors from pancreatic cancer, discovering 8 signatures in addition to the background.

Download Full-text

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band

10.1101/2020.05.07.083196 ◽

2020 ◽

Author(s):

Yan Gao ◽

Yongzhuang Liu ◽

Yanmei Ma ◽

Bo Liu ◽

Yadong Wang ◽

...

Keyword(s):

Error Correction ◽

Partial Order ◽

Directed Acyclic Graph ◽

State Of The Art ◽

Single Instruction Multiple Data ◽

Multiple Sequence ◽

Software Interface ◽

Multiple Data ◽

Long Read ◽

Read Error Correction

AbstractSummaryPartial order alignment, which aligns a sequence to a directed acyclic graph, is now frequently used as a key component in long-read error correction and assembly. We present abPOA (adaptive banded Partial Order Alignment), a Single Instruction Multiple Data (SIMD) based C library for fast partial order alignment using adaptive banded dynamic programming. It can work as a stand-alone multiple sequence alignment and consensus calling tool or be easily integrated into any long-read error correction and assembly workflow. Compared to a state-of-the-art tool (SPOA), abPOA is up to 15 times faster with a comparable alignment accuracy.Availability and implementationabPOA is implemented in C. A stand-alone tool and a C/Python software interface are freely available at https://github.com/yangao07/[email protected] or [email protected]

Download Full-text