scholarly journals DeGNServer: Deciphering Genome-Scale Gene Networks through High Performance Reverse Engineering Analysis

2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Jun Li ◽  
Hairong Wei ◽  
Patrick Xuechun Zhao

Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.

2014 ◽  
Vol 889-890 ◽  
pp. 1065-1068
Author(s):  
Yu’e Lin ◽  
Xing Zhu Liang ◽  
Hua Ping Zhou

In the recent years, the feature extraction algorithms based on manifold learning, which attempt to project the original data into a lower dimensional feature space by preserving the local neighborhood structure, have drawn much attention. Among them, the Marginal Fisher Analysis (MFA) achieved high performance for face recognition. However, MFA suffers from the small sample size problems and is still a linear technique. This paper develops a new nonlinear feature extraction algorithm, called Kernel Null Space Marginal Fisher Analysis (KNSMFA). KNSMFA based on a new optimization criterion is presented, which means that all the discriminant vectors can be calculated in the null space of the within-class scatter. KNSMFA not only exploits the nonlinear features but also overcomes the small sample size problems. Experimental results on ORL database indicate that the proposed method achieves higher recognition rate than the MFA method and some existing kernel feature extraction algorithms.


2011 ◽  
Vol 6 (2) ◽  
pp. 252-277 ◽  
Author(s):  
Stephen T. Ziliak

AbstractStudent's exacting theory of errors, both random and real, marked a significant advance over ambiguous reports of plant life and fermentation asserted by chemists from Priestley and Lavoisier down to Pasteur and Johannsen, working at the Carlsberg Laboratory. One reason seems to be that William Sealy Gosset (1876–1937) aka “Student” – he of Student'st-table and test of statistical significance – rejected artificial rules about sample size, experimental design, and the level of significance, and took instead an economic approach to the logic of decisions made under uncertainty. In his job as Apprentice Brewer, Head Experimental Brewer, and finally Head Brewer of Guinness, Student produced small samples of experimental barley, malt, and hops, seeking guidance for industrial quality control and maximum expected profit at the large scale brewery. In the process Student invented or inspired half of modern statistics. This article draws on original archival evidence, shedding light on several core yet neglected aspects of Student's methods, that is, Guinnessometrics, not discussed by Ronald A. Fisher (1890–1962). The focus is on Student's small sample, economic approach to real error minimization, particularly in field and laboratory experiments he conducted on barley and malt, 1904 to 1937. Balanced designs of experiments, he found, are more efficient than random and have higher power to detect large and real treatment differences in a series of repeated and independent experiments. Student's world-class achievement poses a challenge to every science. Should statistical methods – such as the choice of sample size, experimental design, and level of significance – follow the purpose of the experiment, rather than the other way around? (JEL classification codes: C10, C90, C93, L66)


2020 ◽  
Author(s):  
Markus Wiedemann ◽  
Bernhard S.A. Schuberth ◽  
Lorenzo Colli ◽  
Hans-Peter Bunge ◽  
Dieter Kranzlmüller

<p>Precise knowledge of the forces acting at the base of tectonic plates is of fundamental importance, but models of mantle dynamics are still often qualitative in nature to date. One particular problem is that we cannot access the deep interior of our planet and can therefore not make direct in situ measurements of the relevant physical parameters. Fortunately, modern software and powerful high-performance computing infrastructures allow us to generate complex three-dimensional models of the time evolution of mantle flow through large-scale numerical simulations.</p><p>In this project, we aim at visualizing the resulting convective patterns that occur thousands of kilometres below our feet and to make them "accessible" using high-end virtual reality techniques.</p><p>Models with several hundred million grid cells are nowadays possible using the modern supercomputing facilities, such as those available at the Leibniz Supercomputing Centre. These models provide quantitative estimates on the inaccessible parameters, such as buoyancy and temperature, as well as predictions of the associated gravity field and seismic wavefield that can be tested against Earth observations.</p><p>3-D visualizations of the computed physical parameters allow us to inspect the models such as if one were actually travelling down into the Earth. This way, convective processes that occur thousands of kilometres below our feet are virtually accessible by combining the simulations with high-end VR techniques.</p><p>The large data set used here poses severe challenges for real time visualization, because it cannot fit into graphics memory, while requiring rendering with strict deadlines. This raises the necessity to balance the amount of displayed data versus the time needed for rendering it.</p><p>As a solution, we introduce a rendering framework and describe our workflow that allows us to visualize this geoscientific dataset. Our example exceeds 16 TByte in size, which is beyond the capabilities of most visualization tools. To display this dataset in real-time, we reduce and declutter the dataset through isosurfacing and mesh optimization techniques.</p><p>Our rendering framework relies on multithreading and data decoupling mechanisms that allow to upload data to graphics memory while maintaining high frame rates. The final visualization application can be executed in a CAVE installation as well as on head mounted displays such as the HTC Vive or Oculus Rift. The latter devices will allow for viewing our example on-site at the EGU conference.</p>


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Maciej Cytowski ◽  
Zuzanna Szymańska ◽  
Piotr Umiński ◽  
Grzegorz Andrejczuk ◽  
Krzysztof Raszkowski

Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC) systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size). With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Lianxin Zhong ◽  
Qingfang Meng ◽  
Yuehui Chen

The correct classification of cancer subtypes is of great significance for the in-depth study of cancer pathogenesis and the realization of accurate treatment for cancer patients. In recent years, the classification of cancer subtypes using deep neural networks and gene expression data has become a hot topic. However, most classifiers may face the challenges of overfitting and low classification accuracy when dealing with small sample size and high-dimensional biological data. In this paper, the Cascade Flexible Neural Forest (CFNForest) Model was proposed to accomplish cancer subtype classification. CFNForest extended the traditional flexible neural tree structure to FNT Group Forest exploiting a bagging ensemble strategy and could automatically generate the model’s structure and parameters. In order to deepen the FNT Group Forest without introducing new hyperparameters, the multilayer cascade framework was exploited to design the FNT Group Forest model, which transformed features between levels and improved the performance of the model. The proposed CFNForest model also improved the operational efficiency and the robustness of the model by sample selection mechanism between layers and setting different weights for the output of each layer. To accomplish cancer subtype classification, FNT Group Forest with different feature sets was used to enrich the structural diversity of the model, which make it more suitable for processing small sample size datasets. The experiments on RNA-seq gene expression data showed that CFNForest effectively improves the accuracy of cancer subtype classification. The classification results have good robustness.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Hanjing Jiang ◽  
Yabing Huang

Abstract Background Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. Results In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. Conclusions The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.


2017 ◽  
Vol 28 (1) ◽  
pp. 30-31
Author(s):  
Abu Tarek Iqbal ◽  
Jalal Uddin ◽  
Dhiman Banik ◽  
Salehuddin ◽  
Hasan Mamun ◽  
...  

Many studies were conducted on the topic over the whole world but there is none in Chittagong, Bangladesh. To know the pattern of coronary artery stenosis in Chittagong we have conducted the study because it is important for effective case management. It was an observational study. Convenient sampling technique was used and sample size was fixed to 110 considering resource constraints. All the cases were diagnosed on the basis of history, clinical features and laboratory investigations. Coronary artery angiogram was methodically conducted. All relevant data had been recorded and were managed manually. The findings were validated statistically. Discussion was made with updated literature review and finally conclusion was drawn. Total 110 cases were studied. Stenosis was found in 77(70%) cases. Among them 83% were male and 17% were female. Age range was 30-80 years but 76% cases were of 40-60 years age group. Among the stenosed cases SVD 29%, DVD 20% and TVD 20% only. Only 01% was LMCA. Commonest stenosed vessel was LAD 71%. RCA 60%, LCX 58% and LMCA 6%. 47% of stenosed cases were found with normal ECG. Ejection fraction of 57% stenosed cases was >55%. Study results are not significantly apart from studies in home and abroad. The limitation is small sample size. So, a multicenter study on a large scale cases is hereby advocated for a conclusive opinionMedicine Today 2016 Vol.28(1): 30-31


2020 ◽  
Vol 32 (1) ◽  
pp. 182-204 ◽  
Author(s):  
Xiping Ju ◽  
Biao Fang ◽  
Rui Yan ◽  
Xiaoliang Xu ◽  
Huajin Tang

A spiking neural network (SNN) is a type of biological plausibility model that performs information processing based on spikes. Training a deep SNN effectively is challenging due to the nondifferention of spike signals. Recent advances have shown that high-performance SNNs can be obtained by converting convolutional neural networks (CNNs). However, the large-scale SNNs are poorly served by conventional architectures due to the dynamic nature of spiking neurons. In this letter, we propose a hardware architecture to enable efficient implementation of SNNs. All layers in the network are mapped on one chip so that the computation of different time steps can be done in parallel to reduce latency. We propose new spiking max-pooling method to reduce computation complexity. In addition, we apply approaches based on shift register and coarsely grained parallels to accelerate convolution operation. We also investigate the effect of different encoding methods on SNN accuracy. Finally, we validate the hardware architecture on the Xilinx Zynq ZCU102. The experimental results on the MNIST data set show that it can achieve an accuracy of 98.94% with eight-bit quantized weights. Furthermore, it achieves 164 frames per second (FPS) under 150 MHz clock frequency and obtains 41[Formula: see text] speed-up compared to CPU implementation and 22 times lower power than GPU implementation.


Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Zhihua Wang ◽  
Yongbo Zhang ◽  
Huimin Fu

Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR) prediction approach with rolling mechanism is proposed. In the modeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.


Sign in / Sign up

Export Citation Format

Share Document