scholarly journals CAncer bioMarker Prediction Pipeline (CAMPP) - A standardised and user-friendly framework for the analysis of quantitative biological data

2019 ◽  
Author(s):  
Thilde Terkelsen ◽  
Anders Krogh ◽  
Elena Papaleo

AbstractMotivationRecent improvements in -omics and next-generation sequencing (NGS) technologies, and the lowered costs associated with generating these types of data, have made the analysis of high-throughput datasets standard, both for forming and testing biomedical hypotheses. Alongside new wet-lab methodologies, our knowledge of how to normalise bio-data has grown extensively. By removing latent undesirable variances, we obtain standardised datasets, which can be more easily compared between studies. These advancements mean that non-experts in bioinformatics are now faced with the challenge of performing computational data analysis, pre-processing and visualisation. One example could be the analysis of biological data to pinpoint disease-related biomarkers for experimental validation. In this case, bio-researchers will desire an easy and standardised way of analysing high-throughput datasets.ResultsHere we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper intended to aid non-experts in bioinformatics with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. CAMPP performs missing value imputation and normalisation followed by (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis and (IV) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist biomedical researchers in the analysis of quantitative biological data, whilst ensuring an appropriate biostatistical framework.Availability and ImplementationCAMPP is available athttps://github.com/ELELAB/CAMPP

2020 ◽  
pp. 580-592
Author(s):  
Libi Hertzberg ◽  
Assif Yitzhaky ◽  
Metsada Pasmanik-Chor

This article describes how the last decade has been characterized by the production of huge amounts of different types of biological data. Following that, a flood of bioinformatics tools have been published. However, many of these tools are commercial, or require computational skills. In addition, not all tools provide intuitive and highly accessible visualization of the results. The authors have developed GEView (Gene Expression View), which is a free, user-friendly tool harboring several existing algorithms and statistical methods for the analysis of high-throughput gene, microRNA or protein expression data. It can be used to perform basic analysis such as quality control, outlier detection, batch correction and differential expression analysis, through a single intuitive graphical user interface. GEView is unique in its simplicity and highly accessible visualization it provides. Together with its basic and intuitive functionality it allows Bio-Medical scientists with no computational skills to independently analyze and visualize high-throughput data produced in their own labs.


Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 166
Author(s):  
Muhammad Tahir ◽  
Muhammad Sardaraz

Next generation sequencing (NGS) technologies produce a huge amount of biological data, which poses various issues such as requirements of high processing time and large memory. This research focuses on the detection of single nucleotide polymorphism (SNP) in genome sequences. Currently, SNPs detection algorithms face several issues, e.g., computational overhead cost, accuracy, and memory requirements. In this research, we propose a fast and scalable workflow that integrates Bowtie aligner with Hadoop based Heap SNP caller to improve the SNPs detection in genome sequences. The proposed workflow is validated through benchmark datasets obtained from publicly available web-portals, e.g., NCBI and DDBJ DRA. Extensive experiments have been performed and the results obtained are compared with Bowtie and BWA aligner in the alignment phase, while compared with GATK, FaSD, SparkGA, Halvade, and Heap in SNP calling phase. Experimental results analysis shows that the proposed workflow outperforms existing frameworks e.g., GATK, FaSD, Heap integrated with BWA and Bowtie aligners, SparkGA, and Halvade. The proposed framework achieved 22.46% more efficient F-score and 99.80% consistent accuracy on average. More, comparatively 0.21% mean higher accuracy is achieved. Moreover, SNP mining has also been performed to identify specific regions in genome sequences. All the frameworks are implemented with the default configuration of memory management. The observations show that all workflows have approximately same memory requirement. In the future, it is intended to graphically show the mined SNPs for user-friendly interaction, analyze and optimize the memory requirements as well.


Author(s):  
Libi Hertzberg ◽  
Assif Yitzhaky ◽  
Metsada Pasmanik-Chor

This article describes how the last decade has been characterized by the production of huge amounts of different types of biological data. Following that, a flood of bioinformatics tools have been published. However, many of these tools are commercial, or require computational skills. In addition, not all tools provide intuitive and highly accessible visualization of the results. The authors have developed GEView (Gene Expression View), which is a free, user-friendly tool harboring several existing algorithms and statistical methods for the analysis of high-throughput gene, microRNA or protein expression data. It can be used to perform basic analysis such as quality control, outlier detection, batch correction and differential expression analysis, through a single intuitive graphical user interface. GEView is unique in its simplicity and highly accessible visualization it provides. Together with its basic and intuitive functionality it allows Bio-Medical scientists with no computational skills to independently analyze and visualize high-throughput data produced in their own labs.


2019 ◽  
Vol 19 (4) ◽  
pp. 216-223 ◽  
Author(s):  
Tianyi Zhao ◽  
Donghua Wang ◽  
Yang Hu ◽  
Ningyi Zhang ◽  
Tianyi Zang ◽  
...  

Background: More and more scholars are trying to use it as a specific biomarker for Alzheimer’s Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of AD, and may also be involved in the disease through some specific molecular mechanisms. Objective: Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early diagnosis. Materials and Methods: We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein interaction network is used to find more AD-related genes by known AD-related genes. Then, each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not generate negative samples randomly with using classification method to identify AD-related miRNAs. Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers). Results and Conclusion: We identified 257 novel AD-related miRNAs and compare our method with SVM which is applied by generating negative samples. The AUC of our method is much higher than SVM and we did case studies to prove that our results are reliable.


Author(s):  
Romesh Kumar Salgotra ◽  
Rafiq Ahmad Bhat ◽  
Deyue Yu ◽  
Javaid Akhter Bhat

Abstract: Over the past two decades, the advances in the next generation sequencing (NGS) platforms have led to the identification of numerous genes/QTLs at high-resolution for their potential use in crop improvement. The genomic resources generated through these high-throughput sequencing techniques have been efficiently used in screening of particular gene of interest particularly for numerous types of plant stresses and quality traits. Subsequently, the identified-markers linked to a particular trait have been used in marker-assisted backcross breeding (MABB) activities. Besides, these markers are also being used to catalogue the food crops for detection of adulteration to improve the quality of food. With the advancement of technologies, the genomic resources are originating with new markers; however, to use these markers efficiently in crop breeding, high-throughput techniques (HTT) such as multiplex PCR and capillary electrophoresis (CE) can be exploited. Robustness, ease of operation, good reproducibility and low cost are the main advantages of multiplex PCR and CE. The CE is capable of separating and characterizing proteins with simplicity, speed and small sample requirements. Keeping in view the availability of vast data generated through NGS techniques and development of numerous markers, there is a need to use these resources efficiently in crop improvement programmes. In summary, this review describes the use of molecular markers in the screening of resistance genes in breeding programmes and detection of adulterations in food crops using high-throughput techniques.


Molecules ◽  
2018 ◽  
Vol 23 (8) ◽  
pp. 1869 ◽  
Author(s):  
Stefano Dugheri ◽  
Alessandro Bonari ◽  
Matteo Gentili ◽  
Giovanni Cappelli ◽  
Ilenia Pompilio ◽  
...  

High-throughput screening of samples is the strategy of choice to detect occupational exposure biomarkers, yet it requires a user-friendly apparatus that gives relatively prompt results while ensuring high degrees of selectivity, precision, accuracy and automation, particularly in the preparation process. Miniaturization has attracted much attention in analytical chemistry and has driven solvent and sample savings as easier automation, the latter thanks to the introduction on the market of the three axis autosampler. In light of the above, this contribution describes a novel user-friendly solid-phase microextraction (SPME) off- and on-line platform coupled with gas chromatography and triple quadrupole-mass spectrometry to determine urinary metabolites of polycyclic aromatic hydrocarbons 1- and 2-hydroxy-naphthalene, 9-hydroxy-phenanthrene, 1-hydroxy-pyrene, 3- and 9-hydroxy-benzoantracene, and 3-hydroxy-benzo[a]pyrene. In this new procedure, chromatography’s sensitivity is combined with the user-friendliness of N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide on-fiber SPME derivatization using direct immersion sampling; moreover, specific isotope-labelled internal standards provide quantitative accuracy. The detection limits for the seven OH-PAHs ranged from 0.25 to 4.52 ng/L. Intra-(from 2.5 to 3.0%) and inter-session (from 2.4 to 3.9%) repeatability was also evaluated. This method serves to identify suitable risk-control strategies for occupational hygiene conservation programs.


2014 ◽  
Vol 2014 ◽  
pp. 1-4 ◽  
Author(s):  
Santosh Kumar Upadhyay ◽  
Shailesh Sharma

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online.


Sign in / Sign up

Export Citation Format

Share Document