Detection of statistically significant network changes in complex biological networks

Mapping Intimacies ◽

10.1101/061515 ◽

2016 ◽

Author(s):

Raghvendra Mall ◽

Luigi Cerulo ◽

Halima Bensmail ◽

Antonio Iavarone ◽

Michele Ceccarelli

Keyword(s):

Biological Networks ◽

Regulatory Networks ◽

Hamming Distance ◽

State Of The Art ◽

Statistical Significance ◽

Complex Structure ◽

The State ◽

Computational Time ◽

Interaction Patterns ◽

Driver Genes

Abstract1MotivationBiological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localised re-wiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to discovery of novel relevant signatures.2ResultsSeveral procedures have been recently proposed to detect sub-network differences in pairwise labeled weighted networks. In this paper, we propose an improvement over the state-of-the-art based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate p-values for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation. In the case of dense random geometric networks the proposed approach is 10−15x faster and achieves 5-10% higher AUC, Precision/Recall, and Kappa value than the state-of-the-art. We also report the application of the method to dissect the difference between the regulatory networks of IDH-mutant versus IDH-wild-type glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates.3AvailabilityThe scripts implementing the proposed algorithms are available in R at https://sites.google.com/site/raghvendramallmlresearcher/[email protected]

Download Full-text

Effective Multi-Label Classification Using Data Preprocessing

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch005 ◽

2021 ◽

pp. 90-109

Author(s):

Vaishali S. Tidake ◽

Shirish S. Sane

Keyword(s):

Hamming Distance ◽

State Of The Art ◽

Nearest Neighbors ◽

Data Preprocessing ◽

The State ◽

Distance Metrics ◽

Feature Similarity ◽

Improved Performance ◽

Using Data

Usage of feature similarity is expected when the nearest neighbors are to be explored. Examples in multi-label datasets are associated with multiple labels. Hence, the use of label dissimilarity accompanied by feature similarity may reveal better neighbors. Information extracted from such neighbors is explored by devised MLFLD and MLFLD-MAXP algorithms. Among three distance metrics used for computation of label dissimilarity, Hamming distance has shown the most improved performance and hence used for further evaluation. The performance of implemented algorithms is compared with the state-of-the-art MLkNN algorithm. They showed an improvement for some datasets only. This chapter introduces parameters MLE and skew. MLE, skew, along with outlier parameter help to analyze multi-label and imbalanced nature of datasets. Investigation of datasets for various parameters and experimentation explored the need for data preprocessing for removing outliers. It revealed an improvement in the performance of implemented algorithms for all measures, and effectiveness is empirically validated.

Download Full-text

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016626 ◽

2019 ◽

Vol 33 ◽

pp. 6626-6633

Author(s):

Xiang Kong ◽

Qizhe Xie ◽

Zihang Dai ◽

Eduard Hovy

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State ◽

Computational Time ◽

Memory Consumption ◽

Image Captioning ◽

Vocabulary Size ◽

Language Generation ◽

Practical Applications ◽

Coding Schemes

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoSboosted Transformer yields 29.6 BLEU score for English-toGerman and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.9 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.

Download Full-text

Discovering the Past for Yourself

10.1093/oso/9780190611040.003.0012 ◽

2017 ◽

Author(s):

Jukka Tyrkkö

Keyword(s):

Historical Data ◽

State Of The Art ◽

Language Teaching ◽

Statistical Significance ◽

The State ◽

The Past ◽

History Of ◽

Digital Pedagogy ◽

Basic Concepts ◽

Corpus Tools

This chapter outlines the state of the art in corpus-based language teaching and digital pedagogy, focusing on the differences between using corpora with present-day and historical data. The basic concepts of corpus-based research such as representativeness, frequency, and statistical significance can be introduced to students who are new to corpus methods, and the application of these concepts to the history of English can deepen students’ understanding of how historical varieties of the language are researched. This chapter will also address some of the key challenges particular to teaching the history of English using corpora, such as dealing with the seemingly counterintuitive findings, non-standard features, and small datasets. Finally, following an overview of available historical corpora and corpus tools, several practical examples of corpus-driven activities will be discussed in detail, with suggestions and ideas on how a teacher might prepare and run corpus-based lessons.

Download Full-text

Differential Community Detection in Paired Biological Networks

10.1101/147538 ◽

2017 ◽

Author(s):

Raghvendra Mall ◽

Ehsan Ullah ◽

Khalid Kunjia ◽

Halima Bensmail

Keyword(s):

Community Detection ◽

Adjacency Matrix ◽

Biological Networks ◽

Regulatory Networks ◽

Superior Performance ◽

Detection Methods ◽

Absolute Difference ◽

Statistical Techniques ◽

Interaction Patterns ◽

Cancer Dataset

AbstractMotivationBiological networks unravel the inherent structure of molecular interactions which can lead to discovery of driver genes and meaningful pathways especially in cancer context. Often due to gene mutations, the gene expression undergoes changes and the corresponding gene regulatory network sustains some amount of localized re-wiring. The ability to identify significant changes in the interaction patterns caused by the progression of the disease can lead to the revelation of novel relevant signatures.MethodsThe task of identifying differential sub-networks in paired biological networks (A:control,B:case) can be re-phrased as one of finding dense communities in a single noisy differential topological (DT) graph constructed by taking absolute difference between the topological graphs of A and B. In this paper, we propose a fast two-stage approach, namely Differential Community Detection (DCD), to identify differential sub-networks as differential communities in a de-noised version of the DT graph. In the first stage, we iteratively re-order the nodes of the DT graph to determine approximate block diagonals present in the DT adjacency matrix using neighbourhood information of the nodes and Jaccard similarity. In the second stage, the ordered DT adjacency matrix is traversed along the diagonal to remove all the edges associated with a node, if that node has no immediate edges within a window. We then apply community detection methods on this de-noised DT graph to discover differential sub-networks as communities.ResultsOur proposed DCD approach can effectively locate differential sub-networks in several simulated paired random-geometric networks and various paired scale-free graphs with different power-law exponents. The DCD approach easily outperforms community detection methods applied on the original noisy DT graph and recent statistical techniques in simulation studies. We applied DCD method on two real datasets: a) Ovarian cancer dataset to discover differential DNA co-methylation sub-networks in patients and controls; b) Glioma cancer dataset to discover the difference between the regulatory networks of IDH-mutant and IDH-wild-type. We demonstrate the potential benefits of DCD for finding network-inferred bio-markers/pathways associated with a trait of interest.ConclusionThe proposed DCD approach overcomes the limitations of previous statistical techniques and the issues associated with identifying differential sub-networks by use of community detection methods on the noisy DT graph. This is reflected in the superior performance of the DCD method with respect to various metrics like Precision, Accuracy, Kappa and Specificity. The code implementing proposed DCD method is available at https://sites.google.com/site/ raghvendramallmlresearcher/codes.

Download Full-text

Efficient DNA sequence compression with neural networks

GigaScience ◽

10.1093/gigascience/giaa119 ◽

2020 ◽

Vol 9 (11) ◽

Cited By ~ 1

Author(s):

Milton Silva ◽

Diogo Pratas ◽

Armando J Pinho

Keyword(s):

Neural Networks ◽

Data Analysis ◽

Dna Sequence ◽

Dna Sequences ◽

Genomic Sequence ◽

State Of The Art ◽

The State ◽

Computational Time ◽

Dna Sequence Compression ◽

Sequence Compression

Abstract Background The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA models. For this purpose, we created GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models. Findings We benchmark GeCo3 as a reference-free DNA compressor in 5 datasets, including a balanced and comprehensive dataset of DNA sequences, the Y-chromosome and human mitogenome, 2 compilations of archaeal and virus genomes, 4 whole genomes, and 2 collections of FASTQ data of a human virome and ancient DNA. GeCo3 achieves a solid improvement in compression over the previous version (GeCo2) of $2.4\%$, $7.1\%$, $6.1\%$, $5.8\%$, and $6.0\%$, respectively. To test its performance as a reference-based DNA compressor, we benchmark GeCo3 in 4 datasets constituted by the pairwise compression of the chromosomes of the genomes of several primates. GeCo3 improves the compression in $12.4\%$, $11.7\%$, $10.8\%$, and $10.1\%$ over the state of the art. The cost of this compression improvement is some additional computational time (1.7–3 times slower than GeCo2). The RAM use is constant, and the tool scales efficiently, independently of the sequence size. Overall, these values outperform the state of the art. Conclusions GeCo3 is a genomic sequence compressor with a neural network mixing approach that provides additional gains over top specific genomic compressors. The proposed mixing method is portable, requiring only the probabilities of the models as inputs, providing easy adaptation to other data compressors or compression-based data analysis tools. GeCo3 is released under GPLv3 and is available for free download at https://github.com/cobilab/geco3.

Download Full-text

Efficient Video Frame Interpolation Using Generative Adversarial Networks

Applied Sciences ◽

10.3390/app10186245 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6245

Author(s):

Quang Nhat Tran ◽

Shih-Hsuan Yang

Keyword(s):

Video Compression ◽

State Of The Art ◽

Motion Blur ◽

The State ◽

Frame Rate ◽

Computational Time ◽

Generative Adversarial Networks ◽

Video Frame ◽

Frame Interpolation ◽

Frame Rate Up Conversion

Frame interpolation, which generates an intermediate frame given adjacent ones, finds various applications such as frame rate up-conversion, video compression, and video streaming. Instead of using complex network models and additional data involved in the state-of-the-art frame interpolation methods, this paper proposes an approach based on an end-to-end generative adversarial network. A combined loss function is employed, which jointly considers the adversarial loss (difference between data models), reconstruction loss, and motion blur degradation. The objective image quality metric values reach a PSNR of 29.22 dB and SSIM of 0.835 on the UCF101 dataset, similar to those of the state-of-the-art approach. The good visual quality is notably achieved by approximately one-fifth computational time, which entails possible real-time frame rate up-conversion. The interpolated output can be further improved by a GAN based refinement network that better maintains motion and color by image-to-image translation.

Download Full-text

A Novel Bio-Inspired Deep Learning Approach for Liver Cancer Diagnosis

Information ◽

10.3390/info11020080 ◽

2020 ◽

Vol 11 (2) ◽

pp. 80 ◽

Cited By ~ 1

Author(s):

Rania M. Ghoniem

Keyword(s):

Deep Learning ◽

Liver Cancer ◽

State Of The Art ◽

The State ◽

Convergence Time ◽

Computational Time ◽

Learning Approach ◽

Liver Lesions ◽

Learning Models ◽

Abc Algorithm

Current research on computer-aided diagnosis (CAD) of liver cancer is based on traditional feature engineering methods, which have several drawbacks including redundant features and high computational cost. Recent deep learning models overcome these problems by implicitly capturing intricate structures from large-scale medical image data. However, they are still affected by network hyperparameters and topology. Hence, the state of the art in this area can be further optimized by integrating bio-inspired concepts into deep learning models. This work proposes a novel bio-inspired deep learning approach for optimizing predictive results of liver cancer. This approach contributes to the literature in two ways. Firstly, a novel hybrid segmentation algorithm is proposed to extract liver lesions from computed tomography (CT) images using SegNet network, UNet network, and artificial bee colony optimization (ABC), namely, SegNet-UNet-ABC. This algorithm uses the SegNet for separating liver from the abdominal CT scan, then the UNet is used to extract lesions from the liver. In parallel, the ABC algorithm is hybridized with each network to tune its hyperparameters, as they highly affect the segmentation performance. Secondly, a hybrid algorithm of the LeNet-5 model and ABC algorithm, namely, LeNet-5/ABC, is proposed as feature extractor and classifier of liver lesions. The LeNet-5/ABC algorithm uses the ABC to select the optimal topology for constructing the LeNet-5 network, as network structure affects learning time and classification accuracy. For assessing performance of the two proposed algorithms, comparisons have been made to the state-of-the-art algorithms on liver lesion segmentation and classification. The results reveal that the SegNet-UNet-ABC is superior to other compared algorithms regarding Jaccard index, Dice index, correlation coefficient, and convergence time. Moreover, the LeNet-5/ABC algorithm outperforms other algorithms regarding specificity, F1-score, accuracy, and computational time.

Download Full-text

Utility metric for unsupervised feature selection

PeerJ Computer Science ◽

10.7717/peerj-cs.477 ◽

2021 ◽

Vol 7 ◽

pp. e477

Author(s):

Amalia Villa ◽

Abhijith Mundanad Narayanan ◽

Sabine Van Huffel ◽

Alexander Bertrand ◽

Carolina Varon

Keyword(s):

Feature Selection ◽

Manifold Learning ◽

State Of The Art ◽

High Dimensional Data ◽

Subset Selection ◽

The State ◽

Computational Time ◽

High Dimensional ◽

Learning Stage ◽

Unsupervised Feature Selection

Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.

Download Full-text

Fast Vehicle Identification in Surveillance via Ranked Semantic Sampling Based Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/514 ◽

2018 ◽

Cited By ~ 5

Author(s):

Feng Zheng ◽

Xin Miao ◽

Heng Huang

Keyword(s):

Random Sampling ◽

Large Scale ◽

Hamming Distance ◽

State Of The Art ◽

Semantic Distance ◽

The State ◽

Experimental Results ◽

Traffic Surveillance ◽

Vehicle Identification ◽

Hard Samples

Identifying vehicles across cameras in traffic surveillance is fundamentally important for public safety purposes. However, despite some preliminary work, the rapid vehicle search in large-scale datasets has not been investigated. Moreover, modelling a view-invariant similarity between vehicle images from different views is still highly challenging. To address the problems, in this paper, we propose a Ranked Semantic Sampling (RSS) guided binary embedding method for fast cross-view vehicle Re-IDentification (Re-ID). The search can be conducted by efficiently computing similarities in the projected space. Unlike previous methods using random sampling, we design tree-structured attributes to guide the mini-batch sampling. The ranked pairs of hard samples in the mini-batch can improve the convergence of optimization. By minimizing a novel ranked semantic distance loss defined according to the structure, the learned Hamming distance is view-invariant, which enables cross-view Re-ID. The experimental results demonstrate that RSS outperforms the state-of-the-art approaches and the learned embedding from one dataset can be transferred to achieve the task of vehicle Re-ID on another dataset.

Download Full-text

ParsVNN: parsimony visible neural networks for uncovering cancer-specific and drug-sensitive genes and pathways

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab097 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Xiaoqing Huang ◽

Kun Huang ◽

Travis Johnson ◽

Milan Radovich ◽

Jie Zhang ◽

...

Keyword(s):

Biological Networks ◽

Drug Response ◽

State Of The Art ◽

Explanatory Power ◽

Sparse Learning ◽

Driver Genes ◽

Cancer Driver ◽

Learning Framework ◽

Clinical Challenge ◽

Cancer Types

Abstract Prediction of cancer-specific drug responses as well as identification of the corresponding drug-sensitive genes and pathways remains a major biological and clinical challenge. Deep learning models hold immense promise for better drug response predictions, but most of them cannot provide biological and clinical interpretability. Visible neural network (VNN) models have emerged to solve the problem by giving neurons biological meanings and directly casting biological networks into the models. However, the biological networks used in VNNs are often redundant and contain components that are irrelevant to the downstream predictions. Therefore, the VNNs using these redundant biological networks are overparameterized, which significantly limits VNNs’ predictive and explanatory power. To overcome the problem, we treat the edges and nodes in biological networks used in VNNs as features and develop a sparse learning framework ParsVNN to learn parsimony VNNs with only edges and nodes that contribute the most to the prediction task. We applied ParsVNN to build cancer-specific VNN models to predict drug response for five different cancer types. We demonstrated that the parsimony VNNs built by ParsVNN are superior to other state-of-the-art methods in terms of prediction performance and identification of cancer driver genes. Furthermore, we found that the pathways selected by ParsVNN have great potential to predict clinical outcomes as well as recommend synergistic drug combinations.

Download Full-text