A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

Diogo Pratas; Morteza Hosseini; Jorge M. Silva; Armando J. Pinho

doi:10.3390/e21111074

A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

Entropy ◽

10.3390/e21111074 ◽

2019 ◽

Vol 21 (11) ◽

pp. 1074 ◽

Cited By ~ 4

Author(s):

Diogo Pratas ◽

Morteza Hosseini ◽

Jorge M. Silva ◽

Armando J. Pinho

Keyword(s):

Prediction Model ◽

Compression Ratio ◽

Dna Sequences ◽

State Of The Art ◽

Lossless Compression ◽

Efficient Implementation ◽

Inverted Repeats ◽

Efficient Data ◽

Computational Resources ◽

Context Models

The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.

Download Full-text

A Comparison of Compression Codecs for Maritime and Sonar Images in Bandwidth Constrained Applications

Computers ◽

10.3390/computers8020032 ◽

2019 ◽

Vol 8 (2) ◽

pp. 32 ◽

Cited By ~ 3

Author(s):

Chiman Kwan ◽

Jude Larkin ◽

Bence Budavari ◽

Bryan Chou ◽

Eric Shang ◽

...

Keyword(s):

Compression Ratio ◽

Error Concealment ◽

Performance Metrics ◽

State Of The Art ◽

Communication Channels ◽

Lossless Compression ◽

System Model ◽

Transmission Errors ◽

Performance Metric ◽

Sonar Images

Since lossless compression can only achieve two to four times data compression, it may not be efficient to deploy lossless compression in bandwidth constrained applications. Instead, it would be more economical to adopt perceptually lossless compression, which can attain ten times or more compression without loss of important information. Consequently, one can transmit more images over bandwidth limited channels. In this research, we first aimed to compare and select the best compression algorithm in the literature to achieve a compression ratio of 0.1 and 40 dBs or more in terms of a performance metric known as human visual system model (HVSm) for maritime and sonar images. Our second objective was to demonstrate error concealment algorithms that can handle corrupted pixels due to transmission errors in interference-prone communication channels. Using four state-of-the-art codecs, we demonstrated that perceptually lossless compression can be achieved for realistic maritime and sonar images. At the same time, we also selected the best codec for this purpose using four performance metrics. Finally, error concealment was demonstrated to be useful in recovering lost pixels due to transmission errors.

Download Full-text

Heuristic rank selection with progressively searching tensor ring network

Complex & Intelligent Systems ◽

10.1007/s40747-021-00308-x ◽

2021 ◽

Author(s):

Nannan Li ◽

Yu Pan ◽

Yaran Chen ◽

Zixiang Ding ◽

Dongbin Zhao ◽

...

Keyword(s):

Genetic Algorithm ◽

Compression Ratio ◽

State Of The Art ◽

Heuristic Method ◽

Ring Networks ◽

Ring Network ◽

Narrow Region ◽

Network Search ◽

Deep Networks ◽

Evolutionary Phase

AbstractRecently, tensor ring networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank selection is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a narrow region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named progressively searching tensor ring network search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100, UCF11 and HMDB51, achieving the state-of-the-art performance.

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

Robust Visibility Surface Determination in Object Space via Plücker Coordinates

Journal of Imaging ◽

10.3390/jimaging7060096 ◽

2021 ◽

Vol 7 (6) ◽

pp. 96

Author(s):

Alessandro Rossi ◽

Marco Barbiero ◽

Paolo Scremin ◽

Ruggero Carli

Keyword(s):

Finite Number ◽

State Of The Art ◽

Lossless Compression ◽

3D Models ◽

Optimal Result ◽

Object Space ◽

Plücker Coordinates ◽

Determination Methods

Industrial 3D models are usually characterized by a large number of hidden faces and it is very important to simplify them. Visible-surface determination methods provide one of the most common solutions to the visibility problem. This study presents a robust technique to address the global visibility problem in object space that guarantees theoretical convergence to the optimal result. More specifically, we propose a strategy that, in a finite number of steps, determines if each face of the mesh is globally visible or not. The proposed method is based on the use of Plücker coordinates that allows it to provide an efficient way to determine the intersection between a ray and a triangle. This algorithm does not require pre-calculations such as estimating the normal at each face: this implies the resilience to normals orientation. We compared the performance of the proposed algorithm against a state-of-the-art technique. Results showed that our approach is more robust in terms of convergence to the maximum lossless compression.

Download Full-text

Spreading predictability in complex networks

Scientific Reports ◽

10.1038/s41598-021-93611-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Na Zhao ◽

Jian Wang ◽

Yong Yu ◽

Jun-Yan Zhao ◽

Duan-Bing Chen

Keyword(s):

Infectious Diseases ◽

Prediction Model ◽

Complex Networks ◽

State Of The Art ◽

Experimental Results ◽

Vaccination Strategies ◽

Macro Scale ◽

Good Consistency ◽

The Future ◽

Rumor Control

AbstractMany state-of-the-art researches focus on predicting infection scale or threshold in infectious diseases or rumor and give the vaccination strategies correspondingly. In these works, most of them assume that the infection probability and initially infected individuals are known at the very beginning. Generally, infectious diseases or rumor has been spreading for some time when it is noticed. How to predict which individuals will be infected in the future only by knowing the current snapshot becomes a key issue in infectious diseases or rumor control. In this report, a prediction model based on snapshot is presented to predict the potentially infected individuals in the future, not just the macro scale of infection. Experimental results on synthetic and real networks demonstrate that the infected individuals predicted by the model have good consistency with the actual infected ones based on simulations.

Download Full-text

GeCo2: An Optimized Tool for Lossless Compression and Analysis of DNA Sequences

Practical Applications of Computational Biology and Bioinformatics, 13th International Conference - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-23873-5_17 ◽

2019 ◽

pp. 137-145 ◽

Cited By ~ 3

Author(s):

Diogo Pratas ◽

Morteza Hosseini ◽

Armando J. Pinho

Keyword(s):

Dna Sequences ◽

Lossless Compression

Download Full-text

BAQALC: Blockchain Applied Lossless Efficient Transmission of DNA Sequencing Data for Next Generation Medical Informatics

Applied Sciences ◽

10.3390/app8091471 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1471 ◽

Cited By ~ 7

Author(s):

Seo-Joon Lee ◽

Gyoun-Yon Cho ◽

Fumiaki Ikeno ◽

Tae-Ro Lee

Keyword(s):

Dna Sequencing ◽

Medical Informatics ◽

Compression Ratio ◽

Sequence Data ◽

Lossless Compression ◽

World Health ◽

Smart Devices ◽

Next Generation ◽

And Storage ◽

Efficient Transmission

Due to the development of high-throughput DNA sequencing technology, genome-sequencing costs have been significantly reduced, which has led to a number of revolutionary advances in the genetics industry. However, the problem is that compared to the decrease in time and cost needed for DNA sequencing, the management of such large volumes of data is still an issue. Therefore, this research proposes Blockchain Applied FASTQ and FASTA Lossless Compression (BAQALC), a lossless compression algorithm that allows for the efficient transmission and storage of the immense amounts of DNA sequence data that are being generated by Next Generation Sequencing (NGS). Also, security and reliability issues exist in public sequence databases. For methods, compression ratio comparisons were determined for genetic biomarkers corresponding to the five diseases with the highest mortality rates according to the World Health Organization. The results showed an average compression ratio of approximately 12 for all the genetic datasets used. BAQALC performed especially well for lung cancer genetic markers, with a compression ratio of 17.02. BAQALC performed not only comparatively higher than widely used compression algorithms, but also higher than algorithms described in previously published research. The proposed solution is envisioned to contribute to providing an efficient and secure transmission and storage platform for next-generation medical informatics based on smart devices for both researchers and healthcare users.

Download Full-text

Predicting the Lossless Compression Ratio of Remote Sensing Images with Configurational Entropy

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2021.3123650 ◽

2021 ◽

pp. 1-1

Author(s):

Xinghua Cheng ◽

Zhilin Li

Keyword(s):

Remote Sensing ◽

Compression Ratio ◽

Configurational Entropy ◽

Lossless Compression ◽

Remote Sensing Images

Download Full-text

Comparing the State-of-the-Art Efficient Stated Choice Designs Based on Empirical Analysis

Mathematical Problems in Engineering ◽

10.1155/2014/740612 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Li Tang ◽

Xia Luo ◽

Yang Cheng ◽

Fei Yang ◽

Bin Ran

Keyword(s):

Discrete Choice ◽

State Of The Art ◽

Choice Model ◽

Orthogonal Design ◽

Asymptotic Variance ◽

Parameter Estimates ◽

Efficient Design ◽

Stated Choice ◽

Reliable Parameter ◽

Efficient Data

The stated choice (SC) experiment has been generally regarded as an effective method for behavior analysis. Among all the SC experimental design methods, the orthogonal design has been most widely used since it is easy to understand and construct. However, in recent years, a stream of research has put emphasis on the so-called efficient experimental designs rather than keeping the orthogonality of the experiment, as the former is capable of producing more efficient data in the sense that more reliable parameter estimates can be achieved with an equal or lower sample size. This paper provides two state-of-the-art methods called optimal orthogonal choice (OOC) andD-efficient design. More statistically efficient data is expected to be obtained by either maximizing attribute level differences, or minimizing theD-error, a statistic corresponding to the asymptotic variance-covariance (AVC) matrix of the discrete choice model, when using these two methods, respectively. Since comparison and validation in the field of these methods are rarely seen, an empirical study is presented.D-error is chosen as the measure of efficiency. The result shows that both OOC andD-efficient design are more efficient. At last, strength and weakness of orthogonal, OOC, andD-efficient design are summarized.

Download Full-text

FAST RRT* 3D-Sliced Planner for Autonomous Exploration Using MAVs

Unmanned Systems ◽

10.1142/s2301385022500108 ◽

2021 ◽

pp. 1-12

Author(s):

Á. Martínez Novo ◽

Liang Lu ◽

Pascual Campoy

Keyword(s):

State Of The Art ◽

Micro Aerial Vehicles ◽

Autonomous Exploration ◽

Signed Distance ◽

Aerial Vehicles ◽

3D Environment ◽

Frontier Points ◽

Computational Resources ◽

Next Best View ◽

Signed Distance Field

This paper addresses the challenge to build an autonomous exploration system using Micro-Aerial Vehicles (MAVs). MAVs are capable of flying autonomously, generating collision-free paths to navigate in unknown areas and also reconstructing the environment at which they are deployed. One of the contributions of our system is the “3D-Sliced Planner” for exploration. The main innovation is the low computational resources needed. This is because Optimal-Frontier-Points (OFP) to explore are computed in 2D slices of the 3D environment using a global Rapidly-exploring Random Tree (RRT) frontier detector. Then, the MAV can plan path routes to these points to explore the surroundings with our new proposed local “FAST RRT* Planner” that uses a tree reconnection algorithm based on cost, and a collision checking algorithm based on Signed Distance Field (SDF). The results show the proposed explorer takes 43.95% less time to compute exploration points and paths when compared with the State-of-the-Art represented by the Receding Horizon Next Best View Planner (RH-NBVP) in Gazebo simulations.

Download Full-text