FASTQ Format

The Ecological Estimation of Sredniy Kaban Lake Based on Molecular Methods

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.7.20521 ◽

2018 ◽

Vol 7 (4.7) ◽

pp. 88

Author(s):

M. Khusainov ◽

Ludmil L.Frolova

Keyword(s):

Water Quality ◽

Marker Gene ◽

Metagenomic Data ◽

Sporting Events ◽

Ecological State ◽

International Database ◽

Surrounding Environment ◽

Fastq Format ◽

Visual Approach

Sredniy Kaban lake is part of the system of Kaban urban lakes, experiencing anthropogenic load, and being currently used for sporting events in rowing. Monitoring of the reservoir is carried out regularly with restoration and improvement activities, and green beaches landscaped. Assessment of the ecological state of the reservoir and the surrounding environment is carried out by different methods, one of the main is bioindication. The method is based on the study of indicator species, identified by obsolete methods based on their morphological features. As an alternative to the visual approach with the use of a microscope, the paper considers a method for identifying hydrobionts by the CO1 marker gene based on the DNA-barcoding method and modern sequencing methods. The sequenced sequences of the fragment of the CO1 hydrobiont gene of freshwater Sredniy Kaban lake in the autumn (2016) and summer (2017) sampling periods in the fastq format are included in the international database on the NCBI’s website with unique numbers SRR5852708 (2016) and SRR5839796 (2017). The paper presents the results of the analysis and gives an assessment of the water quality of Sredniy Kaban lake (Kazan, Russia). Comparative analysis of metagenomic data shows that most of the animals of Sredniy Kaban lake are grouped near the b-mesosaprobic zone in 2016, and o-saprobic zone in 2017. By water quality Sredniy Kaban lake is transitional from b-o-saprobic to b-a-mesosaprobic as of the results of 2016, and according to the results of 2017 - from b-o-saprobic to o-saprobic, which is due to the restoration activities carried out during this period on Sredniy Kaban lake.

Download Full-text

GEO2RNAseq: An easy-to-use R pipeline for complete pre-processing of RNA-seq data

10.1101/771063 ◽

2019 ◽

Cited By ~ 2

Author(s):

Bastian Seelbinder ◽

Thomas Wolf ◽

Steffen Priebe ◽

Sylvie McNamara ◽

Silvia Gerber ◽

...

Keyword(s):

Gene Expression ◽

Single Species ◽

Gene Expression Omnibus ◽

Rna Seq ◽

Sequencing Data ◽

Interacting Species ◽

Link Type ◽

Fastq Format ◽

Standard Tool ◽

Processing Steps

ABSTRACTIn transcriptomics, the study of the total set of RNAs transcribed by the cell, RNA sequencing (RNA-seq) has become the standard tool for analysing gene expression. The primary goal is the detection of genes whose expression changes significantly between two or more conditions, either for a single species or for two or more interacting species at the same time (dual RNA-seq, triple RNA-seq and so forth). The analysis of RNA-seq can be simplified as many steps of the data pre-processing can be standardised in a pipeline.In this publication we present the “GEO2RNAseq” pipeline for complete, quick and concurrent pre-processing of single, dual, and triple RNA-seq data. It covers all pre-processing steps starting from raw sequencing data to the analysis of differentially expressed genes, including various tables and figures to report intermediate and final results. Raw data may be provided in FASTQ format or can be downloaded automatically from the Gene Expression Omnibus repository. GEO2RNAseq strongly incorporates experimental as well as computational metadata. GEO2RNAseq is implemented in R, lightweight, easy to install via Conda and easy to use, but still very flexible through using modular programming and offering many extensions and alternative workflows.GEO2RNAseq is publicly available at https://anaconda.org/xentrics/r-geo2rnaseq and https://bitbucket.org/thomas_wolf/geo2rnaseq/overview, including source code, installation instruction, and comprehensive package documentation.

Download Full-text

Fastq-pair: efficient synchronization of paired-end fastq files

10.1101/552885 ◽

2019 ◽

Cited By ~ 7

Author(s):

John A. Edwards ◽

Robert A. Edwards

Keyword(s):

Dna Sequence ◽

Efficient Solution ◽

Bioinformatics Analysis ◽

Sequence Data ◽

Additional Information ◽

Fastq Format ◽

Separate File ◽

The One ◽

Computational Resources ◽

Memory Efficient

AbstractPaired end DNA sequencing provides additional information about the sequence data that is used in sequence assembly, mapping, and other downstream bioinformatics analysis. Paired end reads are usually provided as two fastq-format files, with each file representing one end of the read. Many commonly used downstream tools require that the sequence reads appear in each file in the same order, and reads that do not have a pair in the corresponding file are placed in a separate file of singletons. Although most sequencing instruments capable of generating paired end reads produce files where each read has a corresponding mate, many downstream bioinformatics manipulations break the one-to-one correspondence between reads, and paired-end sequence files loose synchronicity, and contain either unordered sequences or sequences in one or other file without a mate. Trivial solutions to this problem require reading one or both of the DNA sequence files into memory but quickly become limited by computational resources for moderate to large sized sequence files that are common nowadays. Here, we introduce a fast and memory efficient solution, written in C for portability, that synchronizes paired-end fastq files for subsequent analysis and places unmatched reads into singleton files.Fastq-pair is freely available from https://github.com/linsalrob/fastq-pair and is released under the MIT license.

Download Full-text

CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

International Journal of Molecular Sciences ◽

10.3390/ijms21113828 ◽

2020 ◽

Vol 21 (11) ◽

pp. 3828

Author(s):

Omer An ◽

Kar-Tong Tan ◽

Ying Li ◽

Jia Li ◽

Chan-Shuo Wu ◽

...

Keyword(s):

Data Analysis ◽

Final Report ◽

Online Platform ◽

Fastq Format ◽

Health And Disease ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and a significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in FASTQ format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal.

Download Full-text

Computational Exome and Genome Analysis ◽

10.1201/9781315154770-5 ◽

2017 ◽

pp. 57-65

Author(s):

Peter N. Robinson ◽

Rosario M. Piro ◽

Marten Jäger

Keyword(s):

Fastq Format

Download Full-text

Workflow for Genome-Wide Determination of Pre-mRNA Splicing Efficiency from Yeast RNA-seq Data

BioMed Research International ◽

10.1155/2016/4783841 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Martin Převorovský ◽

Martina Hálová ◽

Kateřina Abrhámová ◽

Jiří Libus ◽

Petr Folk

Keyword(s):

Growth Conditions ◽

Mrna Splicing ◽

Rna Seq ◽

Fastq Format ◽

Genome Wide ◽

Eukaryotic Gene Expression ◽

Eukaryotic Gene ◽

Splicing Efficiency ◽

Splice Junctions

Pre-mRNA splicing represents an important regulatory layer of eukaryotic gene expression. In the simple budding yeast Saccharomyces cerevisiae, about one-third of all mRNA molecules undergo splicing, and splicing efficiency is tightly regulated, for example, during meiotic differentiation. S. cerevisiae features a streamlined, evolutionarily highly conserved splicing machinery and serves as a favourite model for studies of various aspects of splicing. RNA-seq represents a robust, versatile, and affordable technique for transcriptome interrogation, which can also be used to study splicing efficiency. However, convenient bioinformatics tools for the analysis of splicing efficiency from yeast RNA-seq data are lacking. We present a complete workflow for the calculation of genome-wide splicing efficiency in S. cerevisiae using strand-specific RNA-seq data. Our pipeline takes sequencing reads in the FASTQ format and provides splicing efficiency values for the 5′ and 3′ splice junctions of each intron. The pipeline is based on up-to-date open-source software tools and requires very limited input from the user. We provide all relevant scripts in a ready-to-use form. We demonstrate the functionality of the workflow using RNA-seq datasets from three spliceosome mutants. The workflow should prove useful for studies of yeast splicing mutants or of regulated splicing, for example, under specific growth conditions.

Download Full-text

Compression of DNA sequence reads in FASTQ format

Bioinformatics ◽

10.1093/bioinformatics/btr014 ◽

2011 ◽

Vol 27 (6) ◽

pp. 860-862 ◽

Cited By ~ 103

Author(s):

Sebastian Deorowicz ◽

Szymon Grabowski

Keyword(s):

Dna Sequence ◽

Fastq Format

Download Full-text

PhageTerm: a Fast and User-friendly Software to Determine Bacteriophage Termini and Packaging Mode using randomly fragmented NGS data

10.1101/108100 ◽

2017 ◽

Cited By ~ 2

Author(s):

Julian Garneau ◽

Florence Depardieu ◽

Louis-Charles Fortier ◽

David Bikard ◽

Marc Monot

Keyword(s):

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Link Type ◽

Sequencing Technologies ◽

Statistical Framework ◽

Fastq Format ◽

Viral Particles ◽

User Friendly ◽

Ngs Data

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.

Download Full-text

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data

10.1101/2021.09.18.460896 ◽

2021 ◽

Author(s):

Anjana Anilkumar Sithara ◽

Devi Priyanka Maripuri ◽

Keerthika Moorthy ◽

Sai Sruthi Amirtha Ganesh ◽

Philge Philip ◽

...

Keyword(s):

Data Analysis ◽

Workflow Management ◽

Human Monocyte ◽

Complex Data ◽

Omics Data ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Sequencing Technologies ◽

Fastq Format ◽

User Friendly

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics on the nature of the data. Our iCOMIC toolkit pipeline can analyze whole-genome and transcriptome data and is embedded in the popular Snakemake workflow management system. iCOMIC is characterized by a user-friendly GUI that offers several advantages, including executing analyses with minimal steps, eliminating the need for complex command-line arguments. The toolkit features many independent core workflows for both whole genomic and transcriptomic data analysis. Even though all the necessary, well-established tools are integrated into the pipeline to enable "out-of-the-box" analysis, we provide the user with the means to replace modules or alter the pipeline as needed. Notably, we have integrated algorithms developed in-house for predicting driver and passenger mutations based on mutational context and tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle (GIAB) benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM - GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of r=0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, with minimal steps, significantly ameliorating complex data analysis pipelines. Availability: https://github.com/RamanLab/iCOMIC

Download Full-text

mitoMaker: A Pipeline for Automatic Assembly and Annotation of Animal Mitochondria Using Raw NGS Data

10.20944/preprints201808.0423.v1 ◽

2018 ◽

Cited By ~ 5

Author(s):

Alex Schomaker-Bastos ◽

Francisco Prosdocimi

Keyword(s):

Next Generation Sequencing ◽

Mitochondrial Genome ◽

De Novo Assembly ◽

De Novo ◽

Mitochondrial Genomes ◽

Automatic Assembly ◽

Fastq Format ◽

Ngs Data ◽

Generation Sequencing ◽

Animal Genomes

Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.

Download Full-text