DeGNServer: Deciphering Genome-Scale Gene Networks through High Performance Reverse Engineering Analysis

BioMed Research International ◽

10.1155/2013/856325 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 15

Author(s):

Jun Li ◽

Hairong Wei ◽

Patrick Xuechun Zhao

Keyword(s):

Reverse Engineering ◽

Sample Size ◽

Gene Networks ◽

High Performance ◽

Large Scale ◽

Small Sample ◽

Biological Processes ◽

Expression Data ◽

Data Set ◽

Genome Scale

Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.

Download Full-text

Kernel Null Space Marginal Fisher Analysis for Face Recognition

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.889-890.1065 ◽

2014 ◽

Vol 889-890 ◽

pp. 1065-1068

Author(s):

Yu’e Lin ◽

Xing Zhu Liang ◽

Hua Ping Zhou

Keyword(s):

Feature Extraction ◽

Face Recognition ◽

Sample Size ◽

High Performance ◽

Null Space ◽

Small Sample Size ◽

Recognition Rate ◽

Original Data ◽

Small Sample ◽

Neighborhood Structure

In the recent years, the feature extraction algorithms based on manifold learning, which attempt to project the original data into a lower dimensional feature space by preserving the local neighborhood structure, have drawn much attention. Among them, the Marginal Fisher Analysis (MFA) achieved high performance for face recognition. However, MFA suffers from the small sample size problems and is still a linear technique. This paper develops a new nonlinear feature extraction algorithm, called Kernel Null Space Marginal Fisher Analysis (KNSMFA). KNSMFA based on a new optimization criterion is presented, which means that all the discriminant vectors can be calculated in the null space of the within-class scatter. KNSMFA not only exploits the nonlinear features but also overcomes the small sample size problems. Experimental results on ORL database indicate that the proposed method achieves higher recognition rate than the MFA method and some existing kernel feature extraction algorithms.

Download Full-text

W.S. Gosset and Some Neglected Concepts in Experimental Statistics: Guinnessometrics II

Journal of Wine Economics ◽

10.1017/s1931436100001632 ◽

2011 ◽

Vol 6 (2) ◽

pp. 252-277 ◽

Cited By ~ 3

Author(s):

Stephen T. Ziliak

Keyword(s):

Experimental Design ◽

Sample Size ◽

Large Scale ◽

Statistical Significance ◽

Small Sample ◽

Small Samples ◽

Significant Advance ◽

Economic Approach ◽

Barley Malt ◽

Level Of Significance

AbstractStudent's exacting theory of errors, both random and real, marked a significant advance over ambiguous reports of plant life and fermentation asserted by chemists from Priestley and Lavoisier down to Pasteur and Johannsen, working at the Carlsberg Laboratory. One reason seems to be that William Sealy Gosset (1876–1937) aka “Student” – he of Student'st-table and test of statistical significance – rejected artificial rules about sample size, experimental design, and the level of significance, and took instead an economic approach to the logic of decisions made under uncertainty. In his job as Apprentice Brewer, Head Experimental Brewer, and finally Head Brewer of Guinness, Student produced small samples of experimental barley, malt, and hops, seeking guidance for industrial quality control and maximum expected profit at the large scale brewery. In the process Student invented or inspired half of modern statistics. This article draws on original archival evidence, shedding light on several core yet neglected aspects of Student's methods, that is, Guinnessometrics, not discussed by Ronald A. Fisher (1890–1962). The focus is on Student's small sample, economic approach to real error minimization, particularly in field and laboratory experiments he conducted on barley and malt, 1904 to 1937. Balanced designs of experiments, he found, are more efficient than random and have higher power to detect large and real treatment differences in a series of repeated and independent experiments. Student's world-class achievement poses a challenge to every science. Should statistical methods – such as the choice of sample size, experimental design, and level of significance – follow the purpose of the experiment, rather than the other way around? (JEL classification codes: C10, C90, C93, L66)

Download Full-text

Visualising large-scale geodynamic simulations: How to Dive into Earth's Mantle with Virtual Reality

10.5194/egusphere-egu2020-5714 ◽

2020 ◽

Author(s):

Markus Wiedemann ◽

Bernhard S.A. Schuberth ◽

Lorenzo Colli ◽

Hans-Peter Bunge ◽

Dieter Kranzlmüller

Keyword(s):

Virtual Reality ◽

Real Time ◽

High Performance ◽

Large Scale ◽

Large Data ◽

Optimization Techniques ◽

Physical Parameters ◽

Mantle Flow ◽

Data Set ◽

Precise Knowledge

Precise knowledge of the forces acting at the base of tectonic plates is of fundamental importance, but models of mantle dynamics are still often qualitative in nature to date. One particular problem is that we cannot access the deep interior of our planet and can therefore not make direct in situ measurements of the relevant physical parameters. Fortunately, modern software and powerful high-performance computing infrastructures allow us to generate complex three-dimensional models of the time evolution of mantle flow through large-scale numerical simulations.In this project, we aim at visualizing the resulting convective patterns that occur thousands of kilometres below our feet and to make them "accessible" using high-end virtual reality techniques.Models with several hundred million grid cells are nowadays possible using the modern supercomputing facilities, such as those available at the Leibniz Supercomputing Centre. These models provide quantitative estimates on the inaccessible parameters, such as buoyancy and temperature, as well as predictions of the associated gravity field and seismic wavefield that can be tested against Earth observations.3-D visualizations of the computed physical parameters allow us to inspect the models such as if one were actually travelling down into the Earth. This way, convective processes that occur thousands of kilometres below our feet are virtually accessible by combining the simulations with high-end VR techniques.The large data set used here poses severe challenges for real time visualization, because it cannot fit into graphics memory, while requiring rendering with strict deadlines. This raises the necessity to balance the amount of displayed data versus the time needed for rendering it.As a solution, we introduce a rendering framework and describe our workflow that allows us to visualize this geoscientific dataset. Our example exceeds 16 TByte in size, which is beyond the capabilities of most visualization tools. To display this dataset in real-time, we reduce and declutter the dataset through isosurfacing and mesh optimization techniques.Our rendering framework relies on multithreading and data decoupling mechanisms that allow to upload data to graphics memory while maintaining high frame rates. The final visualization application can be executed in a CAVE installation as well as on head mounted displays such as the HTC Vive or Oculus Rift. The latter devices will allow for viewing our example on-site at the EGU conference.

Download Full-text

Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

Scientific Programming ◽

10.1155/2017/8721612 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Maciej Cytowski ◽

Zuzanna Szymańska ◽

Piotr Umiński ◽

Grzegorz Andrejczuk ◽

Krzysztof Raszkowski

Keyword(s):

High Performance ◽

Large Scale ◽

Spatial Scales ◽

Biological Processes ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Variable Environment ◽

Modelling Framework ◽

Computational Performance ◽

Intel Xeon

Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC) systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size). With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.

Download Full-text

A Cascade Flexible Neural Forest Model for Cancer Subtypes Classification on Gene Expression Data

Computational Intelligence and Neuroscience ◽

10.1155/2021/6480456 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Lianxin Zhong ◽

Qingfang Meng ◽

Yuehui Chen

Keyword(s):

Gene Expression ◽

Sample Size ◽

Gene Expression Data ◽

Small Sample Size ◽

Small Sample ◽

Expression Data ◽

Cancer Subtypes ◽

Subtype Classification ◽

Cancer Subtype

The correct classification of cancer subtypes is of great significance for the in-depth study of cancer pathogenesis and the realization of accurate treatment for cancer patients. In recent years, the classification of cancer subtypes using deep neural networks and gene expression data has become a hot topic. However, most classifiers may face the challenges of overfitting and low classification accuracy when dealing with small sample size and high-dimensional biological data. In this paper, the Cascade Flexible Neural Forest (CFNForest) Model was proposed to accomplish cancer subtype classification. CFNForest extended the traditional flexible neural tree structure to FNT Group Forest exploiting a bagging ensemble strategy and could automatically generate the model’s structure and parameters. In order to deepen the FNT Group Forest without introducing new hyperparameters, the multilayer cascade framework was exploited to design the FNT Group Forest model, which transformed features between levels and improved the performance of the model. The proposed CFNForest model also improved the operational efficiency and the robustness of the model by sample selection mechanism between layers and setting different weights for the output of each layer. To accomplish cancer subtype classification, FNT Group Forest with different feature sets was used to enrich the structural diversity of the model, which make it more suitable for processing small sample size datasets. The experiments on RNA-seq gene expression data showed that CFNForest effectively improves the accuracy of cancer subtype classification. The classification results have good robustness.

Download Full-text

An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network

BMC Bioinformatics ◽

10.1186/s12859-021-04553-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Hanjing Jiang ◽

Yabing Huang

Keyword(s):

High Performance ◽

Large Scale ◽

Representation Learning ◽

Biological Data ◽

Graph Representation ◽

Data Set ◽

Validation Experiment ◽

Biomolecular Network ◽

Disease Associations ◽

Drug Reposition

Abstract Background Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. Results In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. Conclusions The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.

Download Full-text

Pattern of Coronary Artery Stenosis among Ischaemic Heart Disease Cases in Chittagong

Medicine Today ◽

10.3329/medtoday.v28i1.30969 ◽

2017 ◽

Vol 28 (1) ◽

pp. 30-31

Author(s):

Abu Tarek Iqbal ◽

Jalal Uddin ◽

Dhiman Banik ◽

Salehuddin ◽

Hasan Mamun ◽

...

Keyword(s):

Coronary Artery ◽

Sample Size ◽

Large Scale ◽

Resource Constraints ◽

Coronary Artery Stenosis ◽

Small Sample Size ◽

Sampling Technique ◽

Small Sample ◽

Artery Stenosis ◽

Study Results

Many studies were conducted on the topic over the whole world but there is none in Chittagong, Bangladesh. To know the pattern of coronary artery stenosis in Chittagong we have conducted the study because it is important for effective case management. It was an observational study. Convenient sampling technique was used and sample size was fixed to 110 considering resource constraints. All the cases were diagnosed on the basis of history, clinical features and laboratory investigations. Coronary artery angiogram was methodically conducted. All relevant data had been recorded and were managed manually. The findings were validated statistically. Discussion was made with updated literature review and finally conclusion was drawn. Total 110 cases were studied. Stenosis was found in 77(70%) cases. Among them 83% were male and 17% were female. Age range was 30-80 years but 76% cases were of 40-60 years age group. Among the stenosed cases SVD 29%, DVD 20% and TVD 20% only. Only 01% was LMCA. Commonest stenosed vessel was LAD 71%. RCA 60%, LCX 58% and LMCA 6%. 47% of stenosed cases were found with normal ECG. Ejection fraction of 57% stenosed cases was >55%. Study results are not significantly apart from studies in home and abroad. The limitation is small sample size. So, a multicenter study on a large scale cases is hereby advocated for a conclusive opinionMedicine Today 2016 Vol.28(1): 30-31

Download Full-text

An FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification

Neural Computation ◽

10.1162/neco_a_01245 ◽

2020 ◽

Vol 32 (1) ◽

pp. 182-204 ◽

Cited By ~ 3

Author(s):

Xiping Ju ◽

Biao Fang ◽

Rui Yan ◽

Xiaoliang Xu ◽

Huajin Tang

Keyword(s):

Neural Networks ◽

High Performance ◽

Large Scale ◽

Hardware Architecture ◽

Clock Frequency ◽

Data Set ◽

Speed Up ◽

Fast Classification ◽

Spike Signals ◽

Gpu Implementation

A spiking neural network (SNN) is a type of biological plausibility model that performs information processing based on spikes. Training a deep SNN effectively is challenging due to the nondifferention of spike signals. Recent advances have shown that high-performance SNNs can be obtained by converting convolutional neural networks (CNNs). However, the large-scale SNNs are poorly served by conventional architectures due to the dynamic nature of spiking neurons. In this letter, we propose a hardware architecture to enable efficient implementation of SNNs. All layers in the network are mapped on one chip so that the computation of different time steps can be done in parallel to reduce latency. We propose new spiking max-pooling method to reduce computation complexity. In addition, we apply approaches based on shift register and coarsely grained parallels to accelerate convolution operation. We also investigate the effect of different encoding methods on SNN accuracy. Finally, we validate the hardware architecture on the Xilinx Zynq ZCU102. The experimental results on the MNIST data set show that it can achieve an accuracy of 98.94% with eight-bit quantized weights. Furthermore, it achieves 164 frames per second (FPS) under 150 MHz clock frequency and obtains 41[Formula: see text] speed-up compared to CPU implementation and 22 times lower power than GPU implementation.

Download Full-text

Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1618 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 6

Author(s):

Guro Dørum ◽

Lars Snipen ◽

Margrete Solheim ◽

Solve Saebo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Simulated Data ◽

Real Data ◽

Biological Knowledge ◽

Expression Data ◽

Data Set ◽

Gene Set ◽

Network Information

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.

Download Full-text

Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size

Mathematical Problems in Engineering ◽

10.1155/2014/572173 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Zhihua Wang ◽

Yongbo Zhang ◽

Huimin Fu

Keyword(s):

Time Series ◽

Sample Size ◽

Small Sample Size ◽

Computational Effort ◽

Small Sample ◽

Grey Theory ◽

Data Set ◽

Rolling Mechanism ◽

Short Term Forecasting ◽

Prediction Approach

Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR) prediction approach with rolling mechanism is proposed. In the modeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.

Download Full-text