An Intelligent Optimization Algorithm for Constructing a DNA Storage Code: NOL-HHO

Qiang Yin; Ben Cao; Xue Li; Bin Wang; Qiang Zhang; Xiaopeng Wei

doi:10.3390/ijms21062191

An Intelligent Optimization Algorithm for Constructing a DNA Storage Code: NOL-HHO

International Journal of Molecular Sciences ◽

10.3390/ijms21062191 ◽

2020 ◽

Vol 21 (6) ◽

pp. 2191 ◽

Cited By ~ 4

Author(s):

Qiang Yin ◽

Ben Cao ◽

Xue Li ◽

Bin Wang ◽

Qiang Zhang ◽

...

Keyword(s):

Optimization Algorithm ◽

Dna Sequences ◽

Learning Strategy ◽

Hamming Distance ◽

Experimental Testing ◽

Gc Content ◽

Smooth Transition ◽

Local Optima ◽

Dna Storage

The high density, large capacity, and long-term stability of DNA molecules make them an emerging storage medium that is especially suitable for the long-term storage of large datasets. The DNA sequences used in storage need to consider relevant constraints to avoid nonspecific hybridization reactions, such as the No-runlength constraint, GC-content, and the Hamming distance. In this work, a new nonlinear control parameter strategy and a random opposition-based learning strategy were used to improve the Harris hawks optimization algorithm (for the improved algorithm NOL-HHO) in order to prevent it from falling into local optima. Experimental testing was performed on 23 widely used benchmark functions, and the proposed algorithm was used to obtain better coding lower bounds for DNA storage. The results show that our algorithm can better maintain a smooth transition between exploration and exploitation and has stronger global exploration capabilities as compared with other algorithms. At the same time, the improvement of the lower bound directly affects the storage capacity and code rate, which promotes the further development of DNA storage technology.

Download Full-text

Evolutionary dynamics of neutral phenotypes under DNA substitution models

10.1101/2020.10.26.355438 ◽

2020 ◽

Author(s):

Shadi Zabad ◽

Alan M Moses

Keyword(s):

Dna Sequences ◽

Evolutionary Dynamics ◽

Gc Content ◽

Stabilizing Selection ◽

Restoring Force ◽

Dna Substitution ◽

Long Term Trends ◽

The Mean ◽

Molecular Phenotypes

AbstractWe study the evolution of quantitative molecular traits in the absence of selection. Using a simple theory based on Felsenstein’s 1981 DNA substitution model, we predict a linear restoring force on the mean of an additive phenotype. Remarkably, the mean dynamics are independent of the effect sizes and genotype and are similar to the widely-used OU model for stabilizing selection. We confirm the predictions empirically using additive molecular phenotypes calculated from ancestral reconstructions of putatively unconstrained DNA sequences in primate genomes. We show that the OU model is favoured by inference software even when applied to GC content of unconstrained sequences or simulations of DNA evolution. We predict and confirm empirically that the dynamics of the variance are more complicated than those predicted by the OU model, and show that our results for the restoring force of mutation hold even for non-additive phenotypes, such as number of transcription factor binding sites, longest encoded peptide and folding propensity of the encoded peptide. Our results have implications for efforts to infer selection based on quantitative phenotype dynamics as well as to understand long-term trends in evolution of quantitative molecular traits.

Download Full-text

Combinatorial constraint coding based on the EORS algorithm in DNA storage

PLoS ONE ◽

10.1371/journal.pone.0255376 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0255376

Author(s):

Li Xiaoru ◽

Guo Ling

Keyword(s):

Data Storage ◽

Hamming Distance ◽

Random Search ◽

Gc Content ◽

Information Storage ◽

Storage Media ◽

Specific Hybridization ◽

Dna Storage ◽

Electronic Storage ◽

Increasing Demand

The development of information technology has produced massive amounts of data, which has brought severe challenges to information storage. Traditional electronic storage media cannot keep up with the ever-increasing demand for data storage, but in its place DNA has emerged as a feasible storage medium with high density, large storage capacity and strong durability. In DNA data storage, many different approaches can be used to encode data into codewords. DNA coding is a key step in DNA storage and can directly affect storage performance and data integrity. However, since errors are prone to occur in DNA synthesis and sequencing, and non-specific hybridization is prone to occur in the solution, how to effectively encode DNA has become an urgent problem to be solved. In this article, we propose a DNA storage coding method based on the equilibrium optimization random search (EORS) algorithm, which meets the Hamming distance, GC content and no-runlength constraints and can reduce the error rate in storage. Simulation experiments have shown that the size of the DNA storage code set constructed by the EORS algorithm that meets the combination constraints has increased by an average of 11% compared with previous work. The increase in the code set means that shorter DNA chains can be used to store more data.

Download Full-text

Sequence determinants, function, and evolution of CpG islands

Biochemical Society Transactions ◽

10.1042/bst20200695 ◽

2021 ◽

Author(s):

Allegra Angeloni ◽

Ozren Bogdanovic

Keyword(s):

Dna Sequences ◽

Transcriptional Activation ◽

Cpg Islands ◽

Gc Content ◽

Regulatory Function ◽

Dna Hypomethylation ◽

Transcriptional Initiation ◽

Cpg Dinucleotides ◽

Repressive Mark

In vertebrates, cytosine-guanine (CpG) dinucleotides are predominantly methylated, with ∼80% of all CpG sites containing 5-methylcytosine (5mC), a repressive mark associated with long-term gene silencing. The exceptions to such a globally hypermethylated state are CpG-rich DNA sequences called CpG islands (CGIs), which are mostly hypomethylated relative to the bulk genome. CGIs overlap promoters from the earliest vertebrates to humans, indicating a concerted evolutionary drive compatible with CGI retention. CGIs are characterised by DNA sequence features that include DNA hypomethylation, elevated CpG and GC content and the presence of transcription factor binding sites. These sequence characteristics are congruous with the recruitment of transcription factors and chromatin modifying enzymes, and transcriptional activation in general. CGIs colocalize with sites of transcriptional initiation in hypermethylated vertebrate genomes, however, a growing body of evidence indicates that CGIs might exert their gene regulatory function in other genomic contexts. In this review, we discuss the diverse regulatory features of CGIs, their functional readout, and the evolutionary implications associated with CGI retention in vertebrates and possibly in invertebrates.

Download Full-text

AO-BBO: A Novel Optimization Algorithm and Its Application in Plant Drug Extraction

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666181130140709 ◽

2019 ◽

Vol 19 (2) ◽

pp. 139-145 ◽

Cited By ~ 1

Author(s):

Bote Lv ◽

Juan Chen ◽

Boyan Liu ◽

Cuiying Dong

Keyword(s):

Optimization Algorithm ◽

Learning Strategy ◽

Optimal Solution ◽

Population Diversity ◽

Global Optimal Solution ◽

Plant Medicine ◽

Suitability Index ◽

Sensing Model ◽

Local Optimal Solution ◽

Global Optimal

Introduction: It is well-known that the biogeography-based optimization (BBO) algorithm lacks searching power in some circumstances. Material & Methods: In order to address this issue, an adaptive opposition-based biogeography-based optimization algorithm (AO-BBO) is proposed. Based on the BBO algorithm and opposite learning strategy, this algorithm chooses different opposite learning probabilities for each individual according to the habitat suitability index (HSI), so as to avoid elite individuals from returning to local optimal solution. Meanwhile, the proposed method is tested in 9 benchmark functions respectively. Result: The results show that the improved AO-BBO algorithm can improve the population diversity better and enhance the search ability of the global optimal solution. The global exploration capability, convergence rate and convergence accuracy have been significantly improved. Eventually, the algorithm is applied to the parameter optimization of soft-sensing model in plant medicine extraction rate. Conclusion: The simulation results show that the model obtained by this method has higher prediction accuracy and generalization ability.

Download Full-text

Solving knapsack problems using a binary gaining sharing knowledge-based optimization algorithm

Complex & Intelligent Systems ◽

10.1007/s40747-021-00351-8 ◽

2021 ◽

Author(s):

Prachi Agrawal ◽

Talari Ganesh ◽

Ali Wagdy Mohamed

Keyword(s):

Life Span ◽

Population Size ◽

Linear Function ◽

Optimization Algorithm ◽

Optimization Problems ◽

Search Space ◽

Knapsack Problems ◽

Local Optima ◽

Knowledge Based ◽

Two Stages

AbstractThis article proposes a novel binary version of recently developed Gaining Sharing knowledge-based optimization algorithm (GSK) to solve binary optimization problems. GSK algorithm is based on the concept of how humans acquire and share knowledge during their life span. A binary version of GSK named novel binary Gaining Sharing knowledge-based optimization algorithm (NBGSK) depends on mainly two binary stages: binary junior gaining sharing stage and binary senior gaining sharing stage with knowledge factor 1. These two stages enable NBGSK for exploring and exploitation of the search space efficiently and effectively to solve problems in binary space. Moreover, to enhance the performance of NBGSK and prevent the solutions from trapping into local optima, NBGSK with population size reduction (PR-NBGSK) is introduced. It decreases the population size gradually with a linear function. The proposed NBGSK and PR-NBGSK applied to set of knapsack instances with small and large dimensions, which shows that NBGSK and PR-NBGSK are more efficient and effective in terms of convergence, robustness, and accuracy.

Download Full-text

The Results of Experimental testing of Technical means of Training in the System of long-term Training of athletes specializing in Martial Arts

Scientific Journal of National Pedagogical Dragomanov University. Series 15. Scientific and pedagogical problems of physical culture (physical culture and sports) ◽

10.31392/npu-nc.series15.2021.10(141).13 ◽

2021 ◽

pp. 56-61

Author(s):

Antonina Diachenko ◽

Yilia Palamarchuk ◽

Mykola Maievsky ◽

Serhii Ilchenko ◽

Eduard Syvokhop ◽

...

Keyword(s):

Martial Arts ◽

Experimental Testing ◽

Scientific Research ◽

World Cup ◽

Internet Resources ◽

Teaching Aids ◽

Analytical Work ◽

Analysis And Synthesis ◽

Technical Teaching

According to the Results of monitoring of Internet resources, analysis of Scientific-Methodical, Special and reference literature, members of the Research Group established, that the issue of implementation (determination of effectiveness) of Modern Scientific tools, as well as Technical means of Training that provide a System of long-term training of athletes specializing in Martial Arts – is devoted to an insufficient number of Scientific and Methodological works. This requires Further Scientific Research and emphasizes the relevance and Practical component of the chosen area of Research. The main Purpose of Scientific Research is to determine the effectiveness of modern Scientific tools ("VKS Katsumoto" and "Visual 3D") in the System of long-term training of wrestlers (on the example of athletes who specialize in Sambo wrestling). In the process of Research and Analytical work, the following Research methods were used: abstraction, Analysis and Synthesis, induction and deduction, modeling, mathematical and statistical, etc. As a Result of Empirical Research, the effectiveness of modern Scientific tools has been determined "VKS Katsumoto" and "Visual 3D" in the System of long-term training of athletes specializing in Sambo (Sports and Combat direction). Prospects for Further Scientific Research in the chosen direction of Research include a comparative analysis of the performance of Ukrainian sambo wrestlers at the 2021 World Cup using modern Scientific tools (Technical Teaching Aids).

Download Full-text

Improved Hybrid Particle Swarm Optimizer With Sine-Cosine Acceleration Coefficients For Transient Electromagnetic Inversion

Current Bioinformatics ◽

10.2174/1574893616666210727164226 ◽

2021 ◽

Vol 16 ◽

Author(s):

Ruiheng Li ◽

Qiong Zhuang ◽

Nian Yu ◽

Ruiyou Li ◽

Huaiqing Zhang

Keyword(s):

Learning Strategy ◽

Particle Swarm ◽

Population Diversity ◽

Nonlinear Inversion ◽

Test Functions ◽

Particle Swarm Optimizer ◽

Local Optima ◽

Hybrid Particle ◽

Transient Electromagnetic ◽

Stability And Accuracy

Background: Recently, particle swarm optimization (PSO) has been increasingly used in geophysics due to its simple operation and fast convergence. Objective: However, PSO lacks population diversity and may fall to local optima. Hence, an improved hybrid particle swarm optimizer with sine-cosine acceleration coefficients (IH-PSO-SCAC) is proposed and successfully applied to test functions and in transient electromagnetic (TEM) nonlinear inversion. Method: A reverse learning strategy is applied to optimize population initialization. The sine-cosine acceleration coefficients are utilized for global convergence. Sine mapping is adopted to enhance population diversity during the search process. In addition, the mutation method is used to reduce the probability of premature convergence. Results: The application of IH-PSO-SCAC in the test functions and several simple layered models are demonstrated with satisfactory results in terms of data fit. Two inversions have been carried out to test our algorithm. The first model contains an underground low-resistivity anomaly body and the second model utilized measured data from a profile of the Xishan landslide in Sichuan Province. In both cases, resistivity profiles are obtained, and the inverse problem is solved for verification. Conclusion: The results show that the IH-PSO-SCAC algorithm is practical, can be effectively applied in TEM inversion and is superior to other representative algorithms in terms of stability and accuracy.

Download Full-text

Cooperative Sequence Clustering and Decoding for DNA Storage System with Fountain Codes

Bioinformatics ◽

10.1093/bioinformatics/btab246 ◽

2021 ◽

Author(s):

Jaeho Jeong ◽

Seong-Joon Park ◽

Jae-Won Kim ◽

Jong-Seon No ◽

Ha Hyeon Jeon ◽

...

Keyword(s):

Hamming Distance ◽

Storage Systems ◽

Storage System ◽

Data Retrieval ◽

Illumina Miseq ◽

Error Correcting Codes ◽

Read Length ◽

Sequence Coverage ◽

Source Codes ◽

Dna Storage

Abstract Motivation In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. Results For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich’s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6–11.9% more sequence reads from the same sequencing environment, this resulted in 6.5–8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. Availability The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.

Download Full-text

Active Object Detection Based on A Novel Deep Q-learning Network and Long-term Learning Strategy for Service Robot

IEEE Transactions on Industrial Electronics ◽

10.1109/tie.2021.3090707 ◽

2021 ◽

pp. 1-1

Author(s):

Shaopeng Liu ◽

Guohui Tian ◽

Ying Zhang ◽

Mengyang Zhang ◽

Shuo Liu

Keyword(s):

Object Detection ◽

Learning Strategy ◽

Service Robot ◽

Active Object ◽

Q Learning ◽

Learning Network

Download Full-text

First Complete Genome Sequence of Brucella abortus 2308 isolated from an abortion storm in a dairy farm in India

10.21203/rs.3.rs-420448/v1 ◽

2021 ◽

Author(s):

Amit Kumar ◽

Malyaj R Prajapati ◽

Surendra Upadhyay ◽

Anamika Bhordia ◽

Vinod Kumar Singh ◽

...

Keyword(s):

Genome Sequence ◽

Dna Sequences ◽

Brucella Abortus ◽

Complete Genome Sequence ◽

Complete Genome ◽

Messenger Rna ◽

Gc Content ◽

Dairy Farm ◽

Rrna Genes ◽

Sequence Length

Abstract The present report communicates the first complete genome sequence of Brucella abortus 2308 strain isolated from a an abortion storm in a dairy farm located at Kanpur, Uttar Pradesh in India. It caused the last trimester abortions of 32 animals out of 100 cows in a dairy over a period of 60 days. The bacteria were isolated in pure culture from the placenta of aborted cows. The genome sequence length of isolated bacteria is 3,285,606 bp with a 57.25 % GC content, an N50 value of 296,426, L50 value of 4 containing 3,119 coding DNA sequences (CDSs), 49 tRNAs, 1 transfer messenger RNA (mRNA), and 3 rRNA genes. It is the first report of Brucella abortus 2308 isolation and complete genome sequence from Indian subcontinent.

Download Full-text