TBtools - an integrative toolkit developed for interactive analyses of big biological data

Mapping Intimacies ◽

10.1101/289660 ◽

2018 ◽

Cited By ~ 226

Author(s):

Chengjie Chen ◽

Hao Chen ◽

Yi Zhang ◽

Hannah R. Thomas ◽

Margaret H. Frank ◽

...

Keyword(s):

Big Data ◽

High Throughput Sequencing ◽

Rapid Development ◽

Biological Data ◽

Data Analyses ◽

Runtime Environment ◽

Wet Lab ◽

Interactive Data ◽

Increasing Demand ◽

User Friendly

AbstractThe rapid development of high-throughput sequencing (HTS) techniques has led biology into the big-data era. Data analyses using various bioinformatics tools rely on programming and command-line environments, which are challenging and time-consuming for most wet-lab biologists. Here, we present TBtools (a Toolkit for Biologists integrating various biological data handling tools), a stand-alone software with a user-friendly interface. The toolkit incorporates over 100 functions, which are designed to meet the increasing demand for big-data analyses, ranging from bulk sequence processing to interactive data visualization. A wide variety of graphs can be prepared in TBtools, with a new plotting engine (“JIGplot”) developed to maximum their interactive ability, which allows quick point-and-click modification to almost every graphic feature.TBtools is a platform-independent software that can be run under all operating systems with Java Runtime Environment 1.6 or newer. It is freely available to non-commercial users at https://github.com/CJ-Chen/TBtools/releases.

Download Full-text

SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses

Bioinformatics ◽

10.1093/bioinformatics/bty071 ◽

2018 ◽

Vol 34 (13) ◽

pp. 2292-2294 ◽

Cited By ~ 59

Author(s):

Tomáš Větrovský ◽

Petr Baldrian ◽

Daniel Morais

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Data Analyses ◽

High Throughput Sequencing Data ◽

User Friendly

Download Full-text

Decoding herbal materials of representative TCM preparations with the multi-barcoding approach

10.1101/2020.06.29.177188 ◽

2020 ◽

Author(s):

Qi Yao ◽

Xue Zhu ◽

Maozhen Han ◽

Chaoyun Chen ◽

Wei Li ◽

...

Keyword(s):

Quality Control ◽

Big Data ◽

Chinese Medicine ◽

High Throughput Sequencing ◽

Rapid Development ◽

Species Level ◽

Materials Analysis ◽

One Step ◽

Drug Quality Control ◽

Higher Sensitivity

AbstractWith the rapid development of high-throughput sequencing (HTS) technology, the techniques for the assessment of biological ingredients in Traditional Chinese Medicine (TCM) preparations have also advanced. By using HTS together with the multi-barcoding approach, all biological ingredients could be identified from TCM preparations in theory, as long as their DNA is present. The biological ingredients of a handful of classical TCM preparations were analyzed successfully based on this approach in previous studies. However, the universality, sensitivity and reliability of this approach used on TCM preparations remain unclear. Here, four representative TCM preparations, namely Bazhen Yimu Wan, Da Huoluo Wan, Niuhuang Jiangya Wan and You Gui Wan, were selected for concrete assessment of this approach. We have successfully detected from 77.8% to 100% prescribed herbal materials based on both ITS2 and trnL biomarkers. The results based on ITS2 have also shown a higher level of reliability than those of trnL at species level, and the integration of both biomarkers could provide higher sensitivity and reliability. In the omics big-data era, this study has undoubtedly made one step forward for the multi-barcoding approach for prescribed herbal materials analysis of TCM preparation, towards better digitization and modernization of drug quality control.

Download Full-text

clubber: removing the bioinformatics bottleneck in big data analyses

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0020 ◽

2017 ◽

Vol 14 (2) ◽

Cited By ~ 5

Author(s):

Maximilian Miller ◽

Chengsheng Zhu ◽

Yana Bromberg

Keyword(s):

Big Data ◽

High Performance ◽

Cluster Computing ◽

Study Data ◽

Heterogeneous Environments ◽

Data Analyses ◽

Speed Up ◽

Biological Discovery ◽

The Cost ◽

User Friendly

AbstractWith the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.

Download Full-text

ChIPdig: a comprehensive user-friendly tool for mining multi-sample ChIP-seq data

F1000Research ◽

10.12688/f1000research.20027.1 ◽

2019 ◽

Vol 8 ◽

pp. 1295 ◽

Cited By ~ 1

Author(s):

Ruben Esse

Keyword(s):

Data Analysis ◽

High Throughput Sequencing ◽

Enrichment Analysis ◽

Peak Calling ◽

Read Mapping ◽

Sequencing Technologies ◽

Genome Wide ◽

Wet Lab ◽

User Friendly ◽

Epigenetic Research

In recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data. Here we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, and 5' and 3' untranslated regions), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example. ChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig.

Download Full-text

ChIPdig: a comprehensive user-friendly tool for mining multi-sample ChIP-seq data

10.1101/220079 ◽

2017 ◽

Cited By ~ 2

Author(s):

Ruben Esse ◽

Alla Grishok

Keyword(s):

Data Analysis ◽

High Throughput Sequencing ◽

Enrichment Analysis ◽

Peak Calling ◽

Read Mapping ◽

Sequencing Technologies ◽

Genome Wide ◽

Wet Lab ◽

User Friendly ◽

Epigenetic Research

AbstractBackgroundIn recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data.ResultsHere we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, 5′ UTRs and 3′ UTRs), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example.ConclusionsChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig.

Download Full-text

Big Data Analyses of Korea's Nation Branding on Google and Facebook

Korea Observer - Institute of Korean Studies ◽

10.29152/koiks.2020.51.1.151 ◽

2020 ◽

Vol 51 (1) ◽

pp. 151-174

Author(s):

Chung Joo Chung ◽

Yunna Rhee ◽

Heewon Cha

Keyword(s):

Big Data ◽

Nation Branding ◽

Data Analyses

Download Full-text

Using Galaxy to Perform Large‐Scale Interactive Data Analyses—An Update

Current Protocols ◽

10.1002/cpz1.31 ◽

2021 ◽

Vol 1 (2) ◽

Cited By ~ 1

Author(s):

Alexander Ostrovsky ◽

Jennifer Hillman‐Jackson ◽

Dave Bouvier ◽

Dave Clements ◽

Enis Afgan ◽

...

Keyword(s):

Large Scale ◽

Data Analyses ◽

Interactive Data

Download Full-text

metaXplor: an interactive viral and microbial metagenomic data manager

GigaScience ◽

10.1093/gigascience/giab001 ◽

2021 ◽

Vol 10 (2) ◽

Author(s):

Guilhem Sempéré ◽

Adrien Pétel ◽

Magsen Abbé ◽

Pierre Lefeuvre ◽

Philippe Roumagnac ◽

...

Keyword(s):

Heterogeneous Data ◽

Metagenomic Data ◽

Online Data ◽

Data Repositories ◽

Ongoing Research ◽

Efficient Management ◽

Public Data ◽

Reference Databases ◽

Interactive Data ◽

User Friendly

Abstract Background Efficiently managing large, heterogeneous data in a structured yet flexible way is a challenge to research laboratories working with genomic data. Specifically regarding both shotgun- and metabarcoding-based metagenomics, while online reference databases and user-friendly tools exist for running various types of analyses (e.g., Qiime, Mothur, Megan, IMG/VR, Anvi'o, Qiita, MetaVir), scientists lack comprehensive software for easily building scalable, searchable, online data repositories on which they can rely during their ongoing research. Results metaXplor is a scalable, distributable, fully web-interfaced application for managing, sharing, and exploring metagenomic data. Being based on a flexible NoSQL data model, it has few constraints regarding dataset contents and thus proves useful for handling outputs from both shotgun and metabarcoding techniques. By supporting incremental data feeding and providing means to combine filters on all imported fields, it allows for exhaustive content browsing, as well as rapid narrowing to find specific records. The application also features various interactive data visualization tools, ways to query contents by BLASTing external sequences, and an integrated pipeline to enrich assignments with phylogenetic placements. The project home page provides the URL of a live instance allowing users to test the system on public data. Conclusion metaXplor allows efficient management and exploration of metagenomic data. Its availability as a set of Docker containers, making it easy to deploy on academic servers, on the cloud, or even on personal computers, will facilitate its adoption.

Download Full-text

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

Bibliometric analysis and critical review of the research on big data in the construction industry

Engineering Construction & Architectural Management ◽

10.1108/ecam-01-2021-0005 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yusheng Lu ◽

Jiantong Zhang

Keyword(s):

Big Data ◽

Construction Industry ◽

Bibliometric Analysis ◽

Critical Review ◽

Rapid Development ◽

Heterogeneous Data ◽

The Body ◽

Future Research ◽

Temporal And Spatial Distribution ◽

Content Type

PurposeThe digital revolution and the use of big data (BD) in particular has important applications in the construction industry. In construction, massive amounts of heterogeneous data need to be analyzed to improve onsite efficiency. This article presents a systematic review and identifies future research directions, presenting valuable conclusions derived from rigorous bibliometric tools. The results of this study may provide guidelines for construction engineering and global policymaking to change the current low-efficiency of construction sites.Design/methodology/approachThis study identifies research trends from 1,253 peer-reviewed papers, using general statistics, keyword co-occurrence analysis, critical review, and qualitative-bibliometric techniques in two rounds of search.FindingsThe number of studies in this area rapidly increased from 2012 to 2020. A significant number of publications originated in the UK, China, the US, and Australia, and the smallest number from one of these countries is more than twice the largest number in the remaining countries. Keyword co-occurrence is divided into three clusters: BD application scenarios, emerging technology in BD, and BD management. Currently developing approaches in BD analytics include machine learning, data mining, and heuristic-optimization algorithms such as graph convolutional, recurrent neural networks and natural language processes (NLP). Studies have focused on safety management, energy reduction, and cost prediction. Blockchain integrated with BD is a promising means of managing construction contracts.Research limitations/implicationsThe study of BD is in a stage of rapid development, and this bibliometric analysis is only a part of the necessary practical analysis.Practical implicationsNational policies, temporal and spatial distribution, BD flow are interpreted, and the results of this may provide guidelines for policymakers. Overall, this work may develop the body of knowledge, producing a reference point and identifying future development.Originality/valueTo our knowledge, this is the first bibliometric review of BD in the construction industry. This study can also benefit construction practitioners by providing them a focused perspective of BD for emerging practices in the construction industry.

Download Full-text