CAncer bioMarker Prediction Pipeline (CAMPP) - A standardised and user-friendly framework for the analysis of quantitative biological data

Mapping Intimacies ◽

10.1101/608422 ◽

2019 ◽

Author(s):

Thilde Terkelsen ◽

Anders Krogh ◽

Elena Papaleo

Keyword(s):

High Throughput ◽

Mirna Gene ◽

Gene Interaction ◽

Cancer Biomarker ◽

Biological Data ◽

Network Analyses ◽

Gene Interaction Networks ◽

Computational Data ◽

Next Generation Sequencing Ngs ◽

User Friendly

AbstractMotivationRecent improvements in -omics and next-generation sequencing (NGS) technologies, and the lowered costs associated with generating these types of data, have made the analysis of high-throughput datasets standard, both for forming and testing biomedical hypotheses. Alongside new wet-lab methodologies, our knowledge of how to normalise bio-data has grown extensively. By removing latent undesirable variances, we obtain standardised datasets, which can be more easily compared between studies. These advancements mean that non-experts in bioinformatics are now faced with the challenge of performing computational data analysis, pre-processing and visualisation. One example could be the analysis of biological data to pinpoint disease-related biomarkers for experimental validation. In this case, bio-researchers will desire an easy and standardised way of analysing high-throughput datasets.ResultsHere we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper intended to aid non-experts in bioinformatics with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. CAMPP performs missing value imputation and normalisation followed by (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis and (IV) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist biomedical researchers in the analysis of quantitative biological data, whilst ensuring an appropriate biostatistical framework.Availability and ImplementationCAMPP is available athttps://github.com/ELELAB/CAMPP

Download Full-text

GEView (Gene Expression View) Tool for Intuitive and High Accessible Visualization of Expression Data for Non-Programmer Biologists

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch032 ◽

2020 ◽

pp. 580-592

Author(s):

Libi Hertzberg ◽

Assif Yitzhaky ◽

Metsada Pasmanik-Chor

Keyword(s):

Gene Expression ◽

Quality Control ◽

User Interface ◽

High Throughput ◽

Graphical User Interface ◽

Differential Expression Analysis ◽

Biological Data ◽

Expression Data ◽

Batch Correction ◽

User Friendly

This article describes how the last decade has been characterized by the production of huge amounts of different types of biological data. Following that, a flood of bioinformatics tools have been published. However, many of these tools are commercial, or require computational skills. In addition, not all tools provide intuitive and highly accessible visualization of the results. The authors have developed GEView (Gene Expression View), which is a free, user-friendly tool harboring several existing algorithms and statistical methods for the analysis of high-throughput gene, microRNA or protein expression data. It can be used to perform basic analysis such as quality control, outlier detection, batch correction and differential expression analysis, through a single intuitive graphical user interface. GEView is unique in its simplicity and highly accessible visualization it provides. Together with its basic and intuitive functionality it allows Bio-Medical scientists with no computational skills to independently analyze and visualize high-throughput data produced in their own labs.

Download Full-text

A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce

Genes ◽

10.3390/genes11020166 ◽

2020 ◽

Vol 11 (2) ◽

pp. 166

Author(s):

Muhammad Tahir ◽

Muhammad Sardaraz

Keyword(s):

Memory Management ◽

Biological Data ◽

Genome Sequences ◽

Single Nucleotide ◽

Detection Algorithms ◽

Computational Overhead ◽

Benchmark Datasets ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

High Processing

Next generation sequencing (NGS) technologies produce a huge amount of biological data, which poses various issues such as requirements of high processing time and large memory. This research focuses on the detection of single nucleotide polymorphism (SNP) in genome sequences. Currently, SNPs detection algorithms face several issues, e.g., computational overhead cost, accuracy, and memory requirements. In this research, we propose a fast and scalable workflow that integrates Bowtie aligner with Hadoop based Heap SNP caller to improve the SNPs detection in genome sequences. The proposed workflow is validated through benchmark datasets obtained from publicly available web-portals, e.g., NCBI and DDBJ DRA. Extensive experiments have been performed and the results obtained are compared with Bowtie and BWA aligner in the alignment phase, while compared with GATK, FaSD, SparkGA, Halvade, and Heap in SNP calling phase. Experimental results analysis shows that the proposed workflow outperforms existing frameworks e.g., GATK, FaSD, Heap integrated with BWA and Bowtie aligners, SparkGA, and Halvade. The proposed framework achieved 22.46% more efficient F-score and 99.80% consistent accuracy on average. More, comparatively 0.21% mean higher accuracy is achieved. Moreover, SNP mining has also been performed to identify specific regions in genome sequences. All the frameworks are implemented with the default configuration of memory management. The observations show that all workflows have approximately same memory requirement. In the future, it is intended to graphically show the mined SNPs for user-friendly interaction, analyze and optimize the memory requirements as well.

Download Full-text

GEView (Gene Expression View) Tool for Intuitive and High Accessible Visualization of Expression Data for Non-Programmer Biologists

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2018010107 ◽

2018 ◽

Vol 8 (1) ◽

pp. 94-105

Author(s):

Libi Hertzberg ◽

Assif Yitzhaky ◽

Metsada Pasmanik-Chor

Keyword(s):

Gene Expression ◽

High Throughput ◽

Graphical User Interface ◽

Differential Expression Analysis ◽

Biological Data ◽

Expression Data ◽

High Throughput Data ◽

Batch Correction ◽

Different Types ◽

User Friendly

Download Full-text

Identifying Alzheimer’s Disease-related miRNA Based on Semi-clustering

Current Gene Therapy ◽

10.2174/1566523219666190924113737 ◽

2019 ◽

Vol 19 (4) ◽

pp. 216-223 ◽

Cited By ~ 2

Author(s):

Tianyi Zhao ◽

Donghua Wang ◽

Yang Hu ◽

Ningyi Zhang ◽

Tianyi Zang ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Drug Targets ◽

Molecular Mechanisms ◽

Feature Vector ◽

Mirna Gene ◽

Interaction Network ◽

Gene Interaction ◽

Proteinprotein Interaction ◽

Synaptic Structures

Background: More and more scholars are trying to use it as a specific biomarker for Alzheimer’s Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of AD, and may also be involved in the disease through some specific molecular mechanisms. Objective: Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early diagnosis. Materials and Methods: We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein interaction network is used to find more AD-related genes by known AD-related genes. Then, each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not generate negative samples randomly with using classification method to identify AD-related miRNAs. Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers). Results and Conclusion: We identified 257 novel AD-related miRNAs and compare our method with SVM which is applied by generating negative samples. The AUC of our method is much higher than SVM and we did case studies to prove that our results are reliable.

Download Full-text

Gene ontology based functional analysis and graph theory for partitioning gene interaction networks

International Journal of Pharma and Bio Sciences ◽

10.22376/ijpbs.2017.8.2.b183-192 ◽

2017 ◽

Vol 8 (2) ◽

Author(s):

SREEJA ASHOK ◽

DR.U.KRISHNA KUMAR

Keyword(s):

Functional Analysis ◽

Gene Ontology ◽

Graph Theory ◽

Gene Interaction ◽

Interaction Networks ◽

Gene Interaction Networks

Download Full-text

Comparison of Penalty-based Feature Selection Approach on High Throughput Biological Data

Proceedings of the 2020 10th International Conference on Biomedical Engineering and Technology ◽

10.1145/3397391.3397404 ◽

2020 ◽

Author(s):

Ningya Wang ◽

Wenbin Zhou ◽

Jiamin Wu ◽

Shengjia Chen ◽

Ziling Fan

Keyword(s):

Feature Selection ◽

High Throughput ◽

Biological Data ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Integrated protein function prediction by mining function associations, sequences, and protein–protein and gene–gene interaction networks

Methods ◽

10.1016/j.ymeth.2015.09.011 ◽

2016 ◽

Vol 93 ◽

pp. 84-91 ◽

Cited By ~ 55

Author(s):

Renzhi Cao ◽

Jianlin Cheng

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Gene Interaction ◽

Function Prediction ◽

Interaction Networks ◽

Gene Interaction Networks

Download Full-text

Efficient High-throughput Techniques for the Analysis of Disease-Resistant Plant Varieties and Detection of Food Adulteration

Current Protein and Peptide Science ◽

10.2174/1389203723666211223111238 ◽

2021 ◽

Vol 23 ◽

Author(s):

Romesh Kumar Salgotra ◽

Rafiq Ahmad Bhat ◽

Deyue Yu ◽

Javaid Akhter Bhat

Keyword(s):

Multiplex Pcr ◽

High Throughput ◽

High Throughput Sequencing ◽

Low Cost ◽

Crop Improvement ◽

Small Sample ◽

Food Crops ◽

Genomic Resources ◽

Breeding Programmes ◽

Next Generation Sequencing Ngs

Abstract: Over the past two decades, the advances in the next generation sequencing (NGS) platforms have led to the identification of numerous genes/QTLs at high-resolution for their potential use in crop improvement. The genomic resources generated through these high-throughput sequencing techniques have been efficiently used in screening of particular gene of interest particularly for numerous types of plant stresses and quality traits. Subsequently, the identified-markers linked to a particular trait have been used in marker-assisted backcross breeding (MABB) activities. Besides, these markers are also being used to catalogue the food crops for detection of adulteration to improve the quality of food. With the advancement of technologies, the genomic resources are originating with new markers; however, to use these markers efficiently in crop breeding, high-throughput techniques (HTT) such as multiplex PCR and capillary electrophoresis (CE) can be exploited. Robustness, ease of operation, good reproducibility and low cost are the main advantages of multiplex PCR and CE. The CE is capable of separating and characterizing proteins with simplicity, speed and small sample requirements. Keeping in view the availability of vast data generated through NGS techniques and development of numerous markers, there is a need to use these resources efficiently in crop improvement programmes. In summary, this review describes the use of molecular markers in the screening of resistance genes in breeding programmes and detection of adulterations in food crops using high-throughput techniques.

Download Full-text

High-Throughput Analysis of Selected Urinary Hydroxy Polycyclic Aromatic Hydrocarbons by an Innovative Automated Solid-Phase Microextraction

Molecules ◽

10.3390/molecules23081869 ◽

2018 ◽

Vol 23 (8) ◽

pp. 1869 ◽

Cited By ~ 9

Author(s):

Stefano Dugheri ◽

Alessandro Bonari ◽

Matteo Gentili ◽

Giovanni Cappelli ◽

Ilenia Pompilio ◽

...

Keyword(s):

Polycyclic Aromatic Hydrocarbons ◽

High Throughput ◽

Aromatic Hydrocarbons ◽

High Throughput Screening ◽

Solid Phase Microextraction ◽

Solid Phase ◽

User Friendliness ◽

High Throughput Analysis ◽

Polycyclic Aromatic ◽

User Friendly

High-throughput screening of samples is the strategy of choice to detect occupational exposure biomarkers, yet it requires a user-friendly apparatus that gives relatively prompt results while ensuring high degrees of selectivity, precision, accuracy and automation, particularly in the preparation process. Miniaturization has attracted much attention in analytical chemistry and has driven solvent and sample savings as easier automation, the latter thanks to the introduction on the market of the three axis autosampler. In light of the above, this contribution describes a novel user-friendly solid-phase microextraction (SPME) off- and on-line platform coupled with gas chromatography and triple quadrupole-mass spectrometry to determine urinary metabolites of polycyclic aromatic hydrocarbons 1- and 2-hydroxy-naphthalene, 9-hydroxy-phenanthrene, 1-hydroxy-pyrene, 3- and 9-hydroxy-benzoantracene, and 3-hydroxy-benzo[a]pyrene. In this new procedure, chromatography’s sensitivity is combined with the user-friendliness of N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide on-fiber SPME derivatization using direct immersion sampling; moreover, specific isotope-labelled internal standards provide quantitative accuracy. The detection limits for the seven OH-PAHs ranged from 0.25 to 4.52 ng/L. Intra-(from 2.5 to 3.0%) and inter-session (from 2.4 to 3.9%) repeatability was also evaluated. This method serves to identify suitable risk-control strategies for occupational hygiene conservation programs.

Download Full-text

SSFinder: High Throughput CRISPR-Cas Target Sites Prediction Tool

BioMed Research International ◽

10.1155/2014/742482 ◽

2014 ◽

Vol 2014 ◽

pp. 1-4 ◽

Cited By ~ 22

Author(s):

Santosh Kumar Upadhyay ◽

Shailesh Sharma

Keyword(s):

Operating Systems ◽

High Throughput ◽

Genomic Data ◽

Prediction Tool ◽

High Demand ◽

Specific Target ◽

Target Sites ◽

High Throughput Detection ◽

User Friendly

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online.

Download Full-text