NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data

Mapping Intimacies ◽

10.1101/277178 ◽

2018 ◽

Cited By ~ 3

Author(s):

Qian Liu ◽

Daniela C. Georgieva ◽

Dieter Egli ◽

Kai Wang

Keyword(s):

Neighborhood Effects ◽

Large Scale ◽

Superior Performance ◽

Reference Sequence ◽

Read Length ◽

Nanopore Sequencing ◽

Computational Tool ◽

Data Set ◽

Two Samples ◽

Dna Modifications

AbstractBackgroundRecent advances in single-molecule sequencing techniques, such as Nanopore sequencing, improved read length, increased sequencing throughput, and enabled direct detection of DNA modifications through the analysis of raw signals. These DNA modifications include naturally occurring modifications such as DNA methylations, as well as modifications that are introduced by DNA damage or through synthetic modifications to one of the four standard nucleotides.MethodsTo improve the performance of detecting DNA modifications, especially synthetically introduced modifications, we developed a novel computational tool called NanoMod. NanoMod takes raw signal data on a pair of DNA samples with and without modified bases, extracts signal intensities, performs base error correction based on a reference sequence, and then identifies bases with modifications by comparing the distribution of raw signals between two samples, while taking into account of the effects of neighboring bases on modified bases (“neighborhood effects”).ResultsWe evaluated NanoMod on simulation data sets, based on different types of modifications and different magnitudes of neighborhood effects, and found that NanoMod outperformed other methods in identifying known modified bases. Additionally, we demonstrated superior performance of NanoMod on an E. coli data set with 5mC (5-methylcytosine) modifications.ConclusionsIn summary, NanoMod is a flexible tool to detect DNA modifications with single-base resolution from raw signals in Nanopore sequencing, and will greatly facilitate large-scale functional genomics experiments in the future that use modified nucleotides.

Download Full-text

Introducing ribosomal tandem repeat barcoding for fungi

10.1101/310540 ◽

2018 ◽

Cited By ~ 2

Author(s):

Christian Wurzbacher ◽

Ellen Larsson ◽

Johan Bengtsson-Palme ◽

Silke Van den Wyngaert ◽

Sten Svantesson ◽

...

Keyword(s):

Large Scale ◽

Tandem Repeats ◽

Reference Data ◽

Reference Sequence ◽

Herbarium Specimens ◽

Nanopore Sequencing ◽

Desktop Computer ◽

Third Generation Sequencing ◽

Ribosomal Operon ◽

Sequencing Facility

AbstractSequence analysis of the various ribosomal genetic markers is the dominant molecular method for identification and description of fungi. However, there is little agreement on what ribosomal markers should be used, and research groups utilize different markers depending on what fungal groups are targeted. New environmental fungal lineages known only from DNA data reveal significant gaps in the coverage of the fungal kingdom both in terms of taxonomy and marker coverage in the reference sequence databases. In order to integrate references covering all of the ribosomal markers, we present three sets of general primers that allow the amplification of the complete ribosomal operon from the ribosomal tandem repeats. The primers cover all ribosomal markers (ETS, SSU, ITS1, 5.8S, ITS2, LSU, and IGS) from the 5’ end of the ribosomal operon all the way to the 3’ end. We coupled these primers successfully with third generation sequencing (PacBio and Nanopore sequencing) to showcase our approach on authentic fungal herbarium specimens. In particular, we were able to generate high-quality reference data with Nanopore sequencing in a high-throughput manner, showing that the generation of reference data can be achieved on a regular desktop computer without the need for a large-scale sequencing facility. The quality of the Nanopore generated sequences was 99.85 %, which is comparable with the 99.78 % accuracy described for Sanger sequencing. With this work, we hope to stimulate the generation of a new comprehensive standard of ribosomal reference data with the ultimate aim to close the huge gaps in our reference datasets.

Download Full-text

Exploring Quantitative Metagenomics Studies using Oxford Nanopore Sequencing: A Computational and Experimental Protocol

10.21203/rs.3.rs-131495/v1 ◽

2020 ◽

Author(s):

Rohia ALILI ◽

Eugeni BELDA ◽

Karine CLEMENT ◽

Phuong Le ◽

Edi PRIFTI ◽

...

Keyword(s):

Dna Extraction ◽

Gut Microbiome ◽

Large Scale ◽

Low Cost ◽

Read Length ◽

Whole Genome ◽

Nanopore Sequencing ◽

Experimental Protocol ◽

Microbial Composition ◽

Oxford Nanopore

Abstract Background: The gut microbiome plays a major role in chronic diseases, several of which are characterized by an altered diversity and composition of bacterial communities. Large-scale sequencing projects allowed the characterization of these microbial community perturbations. However, a gap remains in how these discoveries can be translated into clinical applications. To facilitate routine implementation of microbiome profiling in clinical settings, portable, real-time, and low-cost sequencing technologies are needed.Results: Here, we propose a computational and experimental protocol for whole genome quantitative metagenomics studies of the human gut microbiome with Oxford Nanopore sequencing technology (ONT). We developed a bioinformatic pipeline to process ONT sequences based on the evaluation of different alignment parameters in the estimation of microbial diversity and composition. We also optimized stool collection and DNA extraction methods to maximize read length, a critical parameter for the sequence alignment and classification. Our analytical pipeline was evaluated using simulations of metagenomic communities to reflect naturally occuring compositional variations. We then validated our experimental and analytical pipeline with stool samples from a bariatric surgery cohort sequenced with ONT and Illumina, revealing comparable diversity and microbial composition profiles. These results were compared to those previously obtained with SOLiD sequencing, where differences were observed, possibly explained by variations in library preparation steps. Finally, we found that sequences obtained with ONT allowed assembly of complete genomes for disease-related species.Conclusion: This protocol can be implemented in the clinical or individual setting, bringing rapid personalized whole genome profiling of target microbiome species. Keywords: quantitative metagenomics, microbiome, obesity, gut microbiota, microbial DNA extraction, sequencing, Simulation, Oxford Nanopore Technologies, MinION.

Download Full-text

Postal recruitment for genetic studies of preterm birth: A feasibility study

Wellcome Open Research ◽

10.12688/wellcomeopenres.15207.1 ◽

2020 ◽

Vol 5 ◽

pp. 26

Author(s):

Oonagh E. Keag ◽

Lee Murphy ◽

Aoibheann Bradley ◽

Naomi Deakin ◽

Sonia Whyte ◽

...

Keyword(s):

Preterm Birth ◽

Dna Extraction ◽

Feasibility Study ◽

Large Scale ◽

Randomised Trial ◽

Prospective Trial ◽

Data Set ◽

Genetic Studies ◽

Two Samples ◽

Pregnancy And Birth

Background: Preterm birth (PTB) represents the leading cause of neonatal death. Large-scale genetic studies are necessary to determine genetic influences on PTB risk, but prospective cohort studies are expensive and time-consuming. We investigated the feasibility of retrospective recruitment of post-partum women for efficient collection of genetic samples, with self-collected saliva for DNA extraction from themselves and their babies, alongside self-recollection of pregnancy and birth details to phenotype PTB. Methods: 708 women who had participated in the OPPTIMUM trial (a randomised trial of progesterone pessaries to prevent PTB [ISRCTN14568373]) and consented to further contact were invited to provide self-collected saliva from themselves and their babies. DNA was extracted from Oragene OG-500 (adults) and OG-575 (babies) saliva kits and the yield measured by Qubit. Samples were analysed using a panel of Taqman single nucleotide polymorphism (SNP) assays. A questionnaire designed to meet the minimum data set required for phenotyping PTB was included. Questionnaire responses were transcribed and analysed for concordance with prospective trial data. Results: Recruitment rate was 162/708 (23%) for self-collected saliva samples and 157/708 (22%) for questionnaire responses. 161 samples from the mother provided DNA with median yield 59.0µg (0.4-148.9µg). 156 samples were successfully genotyped (96.9%). 136 baby samples had a median yield 11.5µg (0.1-102.7µg); two samples failed DNA extraction. 131 baby samples (96.3%) were successfully genotyped. Concordance between self-recalled birth details and prospective birth details ranged from 55 – 99%, median 86%. The highest rates of concordance were found for mode of birth (154/156 [99%]), smoking status (151/157 [96%]) and ethnicity (149/156 [96%]). Conclusion: This feasibility study demonstrates that self-collected DNA samples from mothers and babies were sufficient for genetic analysis but yields were variable. Self-recollection of pregnancy and birth details was inadequate for accurately phenotyping PTB, highlighting the need for alternative strategies for investigating genetic links with PTB.

Download Full-text

Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation

10.1101/2020.07.14.201947 ◽

2020 ◽

Author(s):

Caroline Charre ◽

Christophe Ginevra ◽

Marina Sabatier ◽

Hadrien Regue ◽

Grégory Destras ◽

...

Keyword(s):

Large Scale ◽

Consensus Sequence ◽

Amplicon Sequencing ◽

Reference Sequence ◽

High Threshold ◽

Whole Genome ◽

Bioinformatics Pipeline ◽

Viral Loads ◽

Two Samples ◽

Complete Genomes

AbstractSince the beginning of the COVID-19 outbreak, SARS-CoV-2 whole-genome sequencing (WGS) has been performed at unprecedented rate worldwide with the use of very diverse Next Generation Sequencing (NGS) methods. Herein, we compare the performance of four NGS-based approaches for SARS-CoV-2 WGS. Twenty four clinical respiratory samples with a large scale of Ct values (from 10.7 to 33.9) were sequenced with four methods. Three used Illumina sequencing: an in-house metagenomic NGS (mNGS) protocol and two newly commercialized kits including a hybridization capture method developed by Illumina (DNA Prep with Enrichment kit and Respiratory Virus Oligo Panel, RVOP) and an amplicon sequencing method developed by Paragon Genomics (CleanPlex SARS-CoV-2 kit). We also evaluated the widely used amplicon sequencing protocol developed by ARTIC Network and combined with Oxford Nanopore Technologies (ONT) sequencing. All four methods yielded near-complete genomes (>99%) for high viral loads samples, with mNGS and RVOP producing the most complete genomes. For mid viral loads, 2/8 and 1/8 genomes were incomplete (<99%) with mNGS and both CleanPlex and RVOP, respectively. For low viral loads (Ct ≥25), amplicon-based enrichment methods were the most sensitive techniques yielding complete genomes for 7/8 samples. All methods were highly concordant in terms of identity in complete consensus sequence. Just one mismatch in two samples was observed in CleanPlex vs the other methods, due to the dedicated bioinformatics pipeline setting a high threshold to call SNP compared to reference sequence. Importantly, all methods correctly identified a newly observed 34-nt deletion in ORF6 but required specific bioinformatic validation for RVOP. Finally, as a major warning for targeted techniques, a default of coverage in any given region of the genome should alert to a potential rearrangement or a SNP in primer annealing or probe-hybridizing regions and would require regular updates of the technique according to SARS-CoV-2 evolution.

Download Full-text

Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly.

F1000Research ◽

10.12688/f1000research.6016.1 ◽

2015 ◽

Vol 4 ◽

pp. 20 ◽

Cited By ~ 1

Author(s):

Priti Kumari ◽

Raja Mazumder ◽

Vahan Simonyan ◽

Konstantinos Krampis

Keyword(s):

Cloud Computing ◽

Genome Assembly ◽

Large Scale ◽

Model Organism ◽

Cost Effective ◽

Read Length ◽

Data Set ◽

Sequencing Technologies ◽

Full Dataset ◽

Computing Platforms

Background: The transition to Next Generation sequencing (NGS) sequencing technologies has had numerous applications in Plant, Microbial and Human genomics during the past decade. However, NGS sequencing trades high read throughput for shorter read length, increasing the difficulty for genome assembly. This research presents a comparison of traditional versus Cloud computing-based genome assembly software, using as examples the Velvet and Contrail assemblers and reads from the genome sequence of the zebrafish (Danio rerio) model organism.Results: The first phase of the analysis involved a subset of the zebrafish data set (2X coverage) and best results were obtained using K-mer size of 65, while it was observed that Velvet takes less time than Contrail to complete the assembly. In the next phase, genome assembly was attempted using the full dataset of read coverage 192x and while Velvet failed to complete on a 256GB memory compute server, Contrail completed but required 240hours of computation.Conclusion: This research concludes that for deciding on which assembler software to use, the size of the dataset and available computing hardware should be taken into consideration. For a relatively small sequencing dataset, such as microbial or small eukaryotic genome, the Velvet assembler is a good option. However, for larger datasets Velvet requires large-memory compute servers in the order of 1000GB or more. On the other hand, Contrail is implemented using Hadoop, which performs the assembly in parallel across nodes of a compute cluster. Furthermore, Hadoop clusters can be rented on-demand from Cloud computing providers, and therefore Contrail can provide a simple and cost effective way for genome assembly of data generated at laboratories that lack the infrastructure or funds to build their own clusters.

Download Full-text

Exploring Semi-Quantitative Metagenomic Studies Using Oxford Nanopore Sequencing: A Computational and Experimental Protocol

Genes ◽

10.3390/genes12101496 ◽

2021 ◽

Vol 12 (10) ◽

pp. 1496

Author(s):

Rohia Alili ◽

Eugeni Belda ◽

Phuong Le ◽

Thierry Wirth ◽

Jean-Daniel Zucker ◽

...

Keyword(s):

Gut Microbiome ◽

Large Scale ◽

Low Cost ◽

Extraction Methods ◽

Read Length ◽

Whole Genome ◽

Nanopore Sequencing ◽

Experimental Protocol ◽

Microbial Composition ◽

Oxford Nanopore

The gut microbiome plays a major role in chronic diseases, of which several are characterized by an altered composition and diversity of bacterial communities. Large-scale sequencing projects allowed for characterizing the perturbations of these communities. However, translating these discoveries into clinical applications remains a challenge. To facilitate routine implementation of microbiome profiling in clinical settings, portable, real-time, and low-cost sequencing technologies are needed. Here, we propose a computational and experimental protocol for whole-genome semi-quantitative metagenomic studies of human gut microbiome with Oxford Nanopore sequencing technology (ONT) that could be applied to other microbial ecosystems. We developed a bioinformatics protocol to analyze ONT sequences taxonomically and functionally and optimized preanalytic protocols, including stool collection and DNA extraction methods to maximize read length. This is a critical parameter for the sequence alignment and classification. Our protocol was evaluated using simulations of metagenomic communities, which reflect naturally occurring compositional variations. Next, we validated both protocols using stool samples from a bariatric surgery cohort, sequenced with ONT, Illumina, and SOLiD technologies. Results revealed similar diversity and microbial composition profiles. This protocol can be implemented in a clinical or research setting, bringing rapid personalized whole-genome profiling of target microbiome species.

Download Full-text

Exploring Quantitative Metagenomics Studies Using Oxford Nanopore Sequencing: A Computational and Experimental Protocol

10.20944/preprints202108.0104.v1 ◽

2021 ◽

Author(s):

Rohia Alili ◽

Eugeni Belda ◽

Phuong Le ◽

Thierry Wirth ◽

Jean-Daniel Zucker ◽

...

Keyword(s):

Gut Microbiome ◽

Large Scale ◽

Low Cost ◽

Extraction Methods ◽

Read Length ◽

Whole Genome ◽

Nanopore Sequencing ◽

Experimental Protocol ◽

Microbial Composition ◽

Oxford Nanopore

Background: The gut microbiome plays a major role in chronic diseases, of which several are characterized by an altered composition and diversity of bacterial communities. Large-scale sequencing projects allowed characterizing the perturbations of these communities. However, translating these discoveries into clinical applications remains a challenges. To facilitate routine implementation of microbiome profiling in clinical settings, portable, real-time, and low-cost sequencing technologies are needed. Results: Here, we propose a computational and experimental protocol for whole genome quantitative metagenomics studies of human gut microbiome with Oxford Nanopore sequencing technology (ONT) that could be applied to other microbial ecosystems. We developed a bioinformatic protocol to analyse ONT sequences taxonomically and functionally and optimized pre-analytic protocols including stool collection and DNA extraction methods to maximize read length. This is a critical parameter for the sequence alignment and classification. Our protocol was evaluated using simulations of metagenomic communities which reflect naturally occuring compositional variations. Next, we validated both protocols using stool samples from a bariatric surgery cohort, sequenced with ONT, Illumina and SOLiD technologies. Results revealed similar diversity and microbial composition profiles. Conclusion: This protocol can be implemented in the clinical or research setting, bringing rapid personalized whole genome profiling of target microbiome species.

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Financial distress determinants among SMEs: empirical evidence from Sweden

Journal of Economic Studies ◽

10.1108/jes-01-2019-0030 ◽

2020 ◽

Vol 47 (3) ◽

pp. 547-560 ◽

Cited By ~ 1

Author(s):

Darush Yazdanfar ◽

Peter Öhman

Keyword(s):

Financial Crisis ◽

Financial Distress ◽

Large Scale ◽

Global Financial Crisis ◽

Binary Logistic Regression ◽

Data Availability ◽

Cross Sectional ◽

Data Set ◽

Content Type ◽

The Global Financial Crisis

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.

Download Full-text