fastq format
Recently Published Documents


TOTAL DOCUMENTS

23
(FIVE YEARS 1)

H-INDEX

6
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Anjana Anilkumar Sithara ◽  
Devi Priyanka Maripuri ◽  
Keerthika Moorthy ◽  
Sai Sruthi Amirtha Ganesh ◽  
Philge Philip ◽  
...  

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics on the nature of the data. Our iCOMIC toolkit pipeline can analyze whole-genome and transcriptome data and is embedded in the popular Snakemake workflow management system. iCOMIC is characterized by a user-friendly GUI that offers several advantages, including executing analyses with minimal steps, eliminating the need for complex command-line arguments. The toolkit features many independent core workflows for both whole genomic and transcriptomic data analysis. Even though all the necessary, well-established tools are integrated into the pipeline to enable "out-of-the-box" analysis, we provide the user with the means to replace modules or alter the pipeline as needed. Notably, we have integrated algorithms developed in-house for predicting driver and passenger mutations based on mutational context and tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle (GIAB) benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM - GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of r=0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, with minimal steps, significantly ameliorating complex data analysis pipelines. Availability: https://github.com/RamanLab/iCOMIC


2020 ◽  
Vol 21 (11) ◽  
pp. 3828
Author(s):  
Omer An ◽  
Kar-Tong Tan ◽  
Ying Li ◽  
Jia Li ◽  
Chan-Shuo Wu ◽  
...  

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and a significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in FASTQ format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal.


Author(s):  
Ömer An ◽  
Kar-Tong Tan ◽  
Ying Li ◽  
Jia Li ◽  
Chan-Shuo Wu ◽  
...  

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in fastq format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal


Author(s):  
Ömer An ◽  
Kar-Tong Tan ◽  
Ying Li ◽  
Jia Li ◽  
Chan-Shuo Wu ◽  
...  

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, however, they often require bioinformatics skills and tedious work to handle with. Moreover, processing raw data such as genome alignment and expression quantification consume a significant amount of time before having the data ready for downstream analyses. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website, developed in PHP and JavaScript with an integrated MariaDB database running on a dedicated Linux server. The portal currently provides 14 standard pipelines for analysing NGS data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP and Bisulfite sequencing, and is flexible to expand with new and customised pipelines. The users can upload raw data in fastq format and submit jobs for the desired analyses in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as an input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data, share with colleagues and keep it organised without the aid of a bioinformatician. The website is freely available at: https://csibioinfo.nus.edu.sg/csingsportal


2019 ◽  
Author(s):  
Bastian Seelbinder ◽  
Thomas Wolf ◽  
Steffen Priebe ◽  
Sylvie McNamara ◽  
Silvia Gerber ◽  
...  

ABSTRACTIn transcriptomics, the study of the total set of RNAs transcribed by the cell, RNA sequencing (RNA-seq) has become the standard tool for analysing gene expression. The primary goal is the detection of genes whose expression changes significantly between two or more conditions, either for a single species or for two or more interacting species at the same time (dual RNA-seq, triple RNA-seq and so forth). The analysis of RNA-seq can be simplified as many steps of the data pre-processing can be standardised in a pipeline.In this publication we present the “GEO2RNAseq” pipeline for complete, quick and concurrent pre-processing of single, dual, and triple RNA-seq data. It covers all pre-processing steps starting from raw sequencing data to the analysis of differentially expressed genes, including various tables and figures to report intermediate and final results. Raw data may be provided in FASTQ format or can be downloaded automatically from the Gene Expression Omnibus repository. GEO2RNAseq strongly incorporates experimental as well as computational metadata. GEO2RNAseq is implemented in R, lightweight, easy to install via Conda and easy to use, but still very flexible through using modular programming and offering many extensions and alternative workflows.GEO2RNAseq is publicly available at https://anaconda.org/xentrics/r-geo2rnaseq and https://bitbucket.org/thomas_wolf/geo2rnaseq/overview, including source code, installation instruction, and comprehensive package documentation.


2019 ◽  
Vol 35 (21) ◽  
pp. 4445-4447 ◽  
Author(s):  
Roberto Semeraro ◽  
Alberto Magi

Abstract Motivation The recent technological improvement of Oxford Nanopore sequencing pushed the throughput of these devices to 10–20 Gb allowing the generation of millions of reads. For these reasons, the availability of fast software packages for evaluating experimental quality by generating highly informative and interactive summary plots is of fundamental importance. Results We developed PyPore, a three module python toolbox designed to handle raw FAST5 files from quality checking to alignment to a reference genome and to explore their features through the generation of browsable HTML files. The first module provides an interface to explore and evaluate the information contained in FAST5 and summarize them into informative quality measures. The second module converts raw data in FASTQ format, while the third module allows to easily use three state-of-the-art aligners and collects mapping statistics. Availability and implementation PyPore is an open-source software and is written in Python2.7, source code is freely available, for all OS platforms, in Github at https://github.com/rsemeraro/PyPore Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
John A. Edwards ◽  
Robert A. Edwards

AbstractPaired end DNA sequencing provides additional information about the sequence data that is used in sequence assembly, mapping, and other downstream bioinformatics analysis. Paired end reads are usually provided as two fastq-format files, with each file representing one end of the read. Many commonly used downstream tools require that the sequence reads appear in each file in the same order, and reads that do not have a pair in the corresponding file are placed in a separate file of singletons. Although most sequencing instruments capable of generating paired end reads produce files where each read has a corresponding mate, many downstream bioinformatics manipulations break the one-to-one correspondence between reads, and paired-end sequence files loose synchronicity, and contain either unordered sequences or sequences in one or other file without a mate. Trivial solutions to this problem require reading one or both of the DNA sequence files into memory but quickly become limited by computational resources for moderate to large sized sequence files that are common nowadays. Here, we introduce a fast and memory efficient solution, written in C for portability, that synchronizes paired-end fastq files for subsequent analysis and places unmatched reads into singleton files.Fastq-pair is freely available from https://github.com/linsalrob/fastq-pair and is released under the MIT license.


2018 ◽  
Vol 35 (13) ◽  
pp. 2193-2198 ◽  
Author(s):  
Alexander Payne ◽  
Nadine Holmes ◽  
Vardhman Rakyan ◽  
Matthew Loose

Abstract Motivation The Oxford Nanopore Technologies (ONT) MinION is used for sequencing a wide variety of sample types with diverse methods of sample extraction. Nanopore sequencers output FAST5 files containing signal data subsequently base called to FASTQ format. Optionally, ONT devices can collect data from all sequencing channels simultaneously in a bulk FAST5 file enabling inspection of signal in any channel at any point. We sought to visualize this signal to inspect challenging or difficult to sequence samples. Results The BulkVis tool can load a bulk FAST5 file and overlays MinKNOW (the software that controls ONT sequencers) classifications on the signal trace and can show mappings to a reference. Users can navigate to a channel and time or, given a FASTQ header from a read, jump to its specific position. BulkVis can export regions as Nanopore base caller compatible reads. Using BulkVis, we find long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2 272 580 bases in length and reported in eleven consecutive reads. We provide helper scripts that identify and reconstruct split reads given a sequencing summary file and alignment to a reference. We note that incorrect read splitting appears to vary according to input sample type and is more common in ’ultra-long’ read preparations. Availability and implementation The software is available freely under an MIT license at https://github.com/LooseLab/bulkvis. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 7 (4.7) ◽  
pp. 88
Author(s):  
M. Khusainov ◽  
Ludmil L.Frolova

Sredniy Kaban lake is part of the system of Kaban urban lakes, experiencing anthropogenic load, and being currently used for sporting events in rowing. Monitoring of the reservoir is carried out regularly with restoration and improvement activities, and green beaches landscaped. Assessment of the ecological state of the reservoir and the surrounding environment is carried out by different methods, one of the main is bioindication. The method is based on the study of indicator species, identified by obsolete methods based on their morphological features. As an alternative to the visual approach with the use of a microscope, the paper considers a method for identifying hydrobionts by the CO1 marker gene based on the DNA-barcoding method and modern sequencing methods. The sequenced sequences of the fragment of the CO1 hydrobiont gene of freshwater Sredniy Kaban lake in the autumn (2016) and summer (2017) sampling periods in the fastq format are included in the international database on the NCBI’s website with unique numbers SRR5852708 (2016) and SRR5839796 (2017). The paper presents the results of the analysis and gives an assessment of the water quality of Sredniy Kaban lake (Kazan, Russia). Comparative analysis of metagenomic data shows that most of the animals of Sredniy Kaban lake are grouped near the b-mesosaprobic zone in 2016, and o-saprobic zone in 2017. By water quality Sredniy Kaban lake is transitional from b-o-saprobic to b-a-mesosaprobic as of the results of 2016, and according to the results of 2017 - from b-o-saprobic to o-saprobic, which is due to the restoration activities carried out during this period on Sredniy Kaban lake.  


Sign in / Sign up

Export Citation Format

Share Document