fastq format Latest Research Papers

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data

10.1101/2021.09.18.460896 ◽

2021 ◽

Author(s):

Anjana Anilkumar Sithara ◽

Devi Priyanka Maripuri ◽

Keerthika Moorthy ◽

Sai Sruthi Amirtha Ganesh ◽

Philge Philip ◽

...

Keyword(s):

Data Analysis ◽

Workflow Management ◽

Human Monocyte ◽

Complex Data ◽

Omics Data ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Sequencing Technologies ◽

Fastq Format ◽

User Friendly

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics on the nature of the data. Our iCOMIC toolkit pipeline can analyze whole-genome and transcriptome data and is embedded in the popular Snakemake workflow management system. iCOMIC is characterized by a user-friendly GUI that offers several advantages, including executing analyses with minimal steps, eliminating the need for complex command-line arguments. The toolkit features many independent core workflows for both whole genomic and transcriptomic data analysis. Even though all the necessary, well-established tools are integrated into the pipeline to enable "out-of-the-box" analysis, we provide the user with the means to replace modules or alter the pipeline as needed. Notably, we have integrated algorithms developed in-house for predicting driver and passenger mutations based on mutational context and tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle (GIAB) benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM - GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of r=0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, with minimal steps, significantly ameliorating complex data analysis pipelines. Availability: https://github.com/RamanLab/iCOMIC

Download Full-text

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

International Journal of Molecular Sciences ◽

10.3390/ijms21113828 ◽

2020 ◽

Vol 21 (11) ◽

pp. 3828

Author(s):

Omer An ◽

Kar-Tong Tan ◽

Ying Li ◽

Jia Li ◽

Chan-Shuo Wu ◽

...

Keyword(s):

Data Analysis ◽

Final Report ◽

Online Platform ◽

Fastq Format ◽

Health And Disease ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and a significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in FASTQ format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal.

Download Full-text

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

10.20944/preprints201910.0146.v2 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ömer An ◽

Kar-Tong Tan ◽

Ying Li ◽

Jia Li ◽

Chan-Shuo Wu ◽

...

Keyword(s):

Data Analysis ◽

Final Report ◽

Online Platform ◽

Fastq Format ◽

Health And Disease ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in fastq format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal

Download Full-text

FASTQ Format

10.32388/qtk4zm ◽

2020 ◽

Cited By ~ 1

Author(s):

Keyword(s):

Fastq Format

Download Full-text

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

10.20944/preprints201910.0146.v1 ◽

2019 ◽

Cited By ~ 4

Author(s):

Ömer An ◽

Kar-Tong Tan ◽

Ying Li ◽

Jia Li ◽

Chan-Shuo Wu ◽

...

Keyword(s):

Data Analysis ◽

Final Report ◽

Raw Data ◽

Data Share ◽

Online Platform ◽

Fastq Format ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, however, they often require bioinformatics skills and tedious work to handle with. Moreover, processing raw data such as genome alignment and expression quantification consume a significant amount of time before having the data ready for downstream analyses. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website, developed in PHP and JavaScript with an integrated MariaDB database running on a dedicated Linux server. The portal currently provides 14 standard pipelines for analysing NGS data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP and Bisulfite sequencing, and is flexible to expand with new and customised pipelines. The users can upload raw data in fastq format and submit jobs for the desired analyses in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as an input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data, share with colleagues and keep it organised without the aid of a bioinformatician. The website is freely available at: https://csibioinfo.nus.edu.sg/csingsportal

Download Full-text

GEO2RNAseq: An easy-to-use R pipeline for complete pre-processing of RNA-seq data

10.1101/771063 ◽

2019 ◽

Cited By ~ 2

Author(s):

Bastian Seelbinder ◽

Thomas Wolf ◽

Steffen Priebe ◽

Sylvie McNamara ◽

Silvia Gerber ◽

...

Keyword(s):

Gene Expression ◽

Single Species ◽

Gene Expression Omnibus ◽

Rna Seq ◽

Sequencing Data ◽

Interacting Species ◽

Link Type ◽

Fastq Format ◽

Standard Tool ◽

Processing Steps

ABSTRACTIn transcriptomics, the study of the total set of RNAs transcribed by the cell, RNA sequencing (RNA-seq) has become the standard tool for analysing gene expression. The primary goal is the detection of genes whose expression changes significantly between two or more conditions, either for a single species or for two or more interacting species at the same time (dual RNA-seq, triple RNA-seq and so forth). The analysis of RNA-seq can be simplified as many steps of the data pre-processing can be standardised in a pipeline.In this publication we present the “GEO2RNAseq” pipeline for complete, quick and concurrent pre-processing of single, dual, and triple RNA-seq data. It covers all pre-processing steps starting from raw sequencing data to the analysis of differentially expressed genes, including various tables and figures to report intermediate and final results. Raw data may be provided in FASTQ format or can be downloaded automatically from the Gene Expression Omnibus repository. GEO2RNAseq strongly incorporates experimental as well as computational metadata. GEO2RNAseq is implemented in R, lightweight, easy to install via Conda and easy to use, but still very flexible through using modular programming and offering many extensions and alternative workflows.GEO2RNAseq is publicly available at https://anaconda.org/xentrics/r-geo2rnaseq and https://bitbucket.org/thomas_wolf/geo2rnaseq/overview, including source code, installation instruction, and comprehensive package documentation.

Download Full-text

PyPore: a python toolbox for nanopore sequencing data handling

Bioinformatics ◽

10.1093/bioinformatics/btz269 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4445-4447 ◽

Cited By ~ 1

Author(s):

Roberto Semeraro ◽

Alberto Magi

Keyword(s):

Open Source Software ◽

Reference Genome ◽

State Of The Art ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Software Packages ◽

Technological Improvement ◽

Fastq Format ◽

Oxford Nanopore

Abstract Motivation The recent technological improvement of Oxford Nanopore sequencing pushed the throughput of these devices to 10–20 Gb allowing the generation of millions of reads. For these reasons, the availability of fast software packages for evaluating experimental quality by generating highly informative and interactive summary plots is of fundamental importance. Results We developed PyPore, a three module python toolbox designed to handle raw FAST5 files from quality checking to alignment to a reference genome and to explore their features through the generation of browsable HTML files. The first module provides an interface to explore and evaluate the information contained in FAST5 and summarize them into informative quality measures. The second module converts raw data in FASTQ format, while the third module allows to easily use three state-of-the-art aligners and collects mapping statistics. Availability and implementation PyPore is an open-source software and is written in Python2.7, source code is freely available, for all OS platforms, in Github at https://github.com/rsemeraro/PyPore Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Fastq-pair: efficient synchronization of paired-end fastq files

10.1101/552885 ◽

2019 ◽

Cited By ~ 7

Author(s):

John A. Edwards ◽

Robert A. Edwards

Keyword(s):

Dna Sequence ◽

Efficient Solution ◽

Bioinformatics Analysis ◽

Sequence Data ◽

Additional Information ◽

Fastq Format ◽

Separate File ◽

The One ◽

Computational Resources ◽

Memory Efficient

AbstractPaired end DNA sequencing provides additional information about the sequence data that is used in sequence assembly, mapping, and other downstream bioinformatics analysis. Paired end reads are usually provided as two fastq-format files, with each file representing one end of the read. Many commonly used downstream tools require that the sequence reads appear in each file in the same order, and reads that do not have a pair in the corresponding file are placed in a separate file of singletons. Although most sequencing instruments capable of generating paired end reads produce files where each read has a corresponding mate, many downstream bioinformatics manipulations break the one-to-one correspondence between reads, and paired-end sequence files loose synchronicity, and contain either unordered sequences or sequences in one or other file without a mate. Trivial solutions to this problem require reading one or both of the DNA sequence files into memory but quickly become limited by computational resources for moderate to large sized sequence files that are common nowadays. Here, we introduce a fast and memory efficient solution, written in C for portability, that synchronizes paired-end fastq files for subsequent analysis and places unmatched reads into singleton files.Fastq-pair is freely available from https://github.com/linsalrob/fastq-pair and is released under the MIT license.

Download Full-text

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files

Bioinformatics ◽

10.1093/bioinformatics/bty841 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2193-2198 ◽

Cited By ~ 58

Author(s):

Alexander Payne ◽

Nadine Holmes ◽

Vardhman Rakyan ◽

Matthew Loose

Keyword(s):

Supplementary Information ◽

Sample Type ◽

Single Dna Molecules ◽

Fastq Format ◽

Long Reads ◽

Oxford Nanopore ◽

Sample Extraction ◽

Long Read ◽

Input Sample ◽

Specific Position

Abstract Motivation The Oxford Nanopore Technologies (ONT) MinION is used for sequencing a wide variety of sample types with diverse methods of sample extraction. Nanopore sequencers output FAST5 files containing signal data subsequently base called to FASTQ format. Optionally, ONT devices can collect data from all sequencing channels simultaneously in a bulk FAST5 file enabling inspection of signal in any channel at any point. We sought to visualize this signal to inspect challenging or difficult to sequence samples. Results The BulkVis tool can load a bulk FAST5 file and overlays MinKNOW (the software that controls ONT sequencers) classifications on the signal trace and can show mappings to a reference. Users can navigate to a channel and time or, given a FASTQ header from a read, jump to its specific position. BulkVis can export regions as Nanopore base caller compatible reads. Using BulkVis, we find long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2 272 580 bases in length and reported in eleven consecutive reads. We provide helper scripts that identify and reconstruct split reads given a sequencing summary file and alignment to a reference. We note that incorrect read splitting appears to vary according to input sample type and is more common in ’ultra-long’ read preparations. Availability and implementation The software is available freely under an MIT license at https://github.com/LooseLab/bulkvis. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Ecological Estimation of Sredniy Kaban Lake Based on Molecular Methods

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.7.20521 ◽

2018 ◽

Vol 7 (4.7) ◽

pp. 88

Author(s):

M. Khusainov ◽

Ludmil L.Frolova

Keyword(s):

Water Quality ◽

Marker Gene ◽

Metagenomic Data ◽

Sporting Events ◽

Ecological State ◽

International Database ◽

Surrounding Environment ◽

Fastq Format ◽

Visual Approach

Sredniy Kaban lake is part of the system of Kaban urban lakes, experiencing anthropogenic load, and being currently used for sporting events in rowing. Monitoring of the reservoir is carried out regularly with restoration and improvement activities, and green beaches landscaped. Assessment of the ecological state of the reservoir and the surrounding environment is carried out by different methods, one of the main is bioindication. The method is based on the study of indicator species, identified by obsolete methods based on their morphological features. As an alternative to the visual approach with the use of a microscope, the paper considers a method for identifying hydrobionts by the CO1 marker gene based on the DNA-barcoding method and modern sequencing methods. The sequenced sequences of the fragment of the CO1 hydrobiont gene of freshwater Sredniy Kaban lake in the autumn (2016) and summer (2017) sampling periods in the fastq format are included in the international database on the NCBI’s website with unique numbers SRR5852708 (2016) and SRR5839796 (2017). The paper presents the results of the analysis and gives an assessment of the water quality of Sredniy Kaban lake (Kazan, Russia). Comparative analysis of metagenomic data shows that most of the animals of Sredniy Kaban lake are grouped near the b-mesosaprobic zone in 2016, and o-saprobic zone in 2017. By water quality Sredniy Kaban lake is transitional from b-o-saprobic to b-a-mesosaprobic as of the results of 2016, and according to the results of 2017 - from b-o-saprobic to o-saprobic, which is due to the restoration activities carried out during this period on Sredniy Kaban lake.

Download Full-text

fastq format
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

FASTQ Format

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

GEO2RNAseq: An easy-to-use R pipeline for complete pre-processing of RNA-seq data

PyPore: a python toolbox for nanopore sequencing data handling

Fastq-pair: efficient synchronization of paired-end fastq files

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files

The Ecological Estimation of Sredniy Kaban Lake Based on Molecular Methods

Export Citation Format

fastq formatRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

FASTQ Format

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

GEO2RNAseq: An easy-to-use R pipeline for complete pre-processing of RNA-seq data

PyPore: a python toolbox for nanopore sequencing data handling

Fastq-pair: efficient synchronization of paired-end fastq files

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files

The Ecological Estimation of Sredniy Kaban Lake Based on Molecular Methods

fastq format
Recently Published Documents